Looking for a Specific Action in a Video This Ai Based Method Can Find It for You

A new technique teaches machine-learning models to identify specific actions in long videos without the need for human
MIT News Machine learning 11:02 am on May 29, 2024

Featured Image Related to Story

MIT researchers developed a self-supervised learning approach for spatio-temporal grounding in videos, without the need for manual annotations or trimming, focusing on global and local representations. They created an uncut video benchmark to evaluate models and aim for automatic detection of misalignments between audio and text cues, extending their framework to include audio data.

  • Self-supervised Learning: Utilizes videos without manual annotations, focusing on global understanding over time and local contexts.
  • Uncut Video Benchmark Creation: Addresses the lack of existing benchmarks for evaluating models on long, untrimmed videos.
  • Misalignment Detection: Plans to automatically detect inconsistencies between audio and text data in videos.
  • Audio Data Inclusion: Extends the model's framework to incorporate audio information for enhanced understanding.
  • MIT Researchers' Contributions: The team includes visiting professor and MIT Spoken Language Systems Group leader, contributing significant expertise to the project.


< Previous Story     -     Next Story >

Copy and Copyright Pubcon Inc.
1996-2024 all rights reserved. Privacy Policy.
All trademarks and copyrights held by respective owners.