Am I Done? Predicting Action Progress in Videos

In this article, we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task, since it can be valuable for a wide range of interaction applications. To this end, we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution. To provide a general definition of action progress, we ground our work in the linguistics literature, borrowing terms and concepts to understand which actions can be the subject of progress estimation. As a result, we define a categorization of actions and their phases. Motivated by the recent success obtained from the interaction of Convolutional and Recurrent Neural Networks, our model is based on a combination of the Faster R-CNN framework, to make framewise predictions, and LSTM networks, to estimate action progress through time. After introducing two evaluation protocols for the task at hand, we demonstrate the capability of our model to effectively predict action progress on the UCF-101 and J-HMDB datasets.

Am I Done? Predicting Action Progress in Videos

Federico Becattini;Tiberio Uricchio;Lorenzo Seidenari;Lamberto Ballan;Alberto Del Bimbo

2021

Abstract

In this article, we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task, since it can be valuable for a wide range of interaction applications. To this end, we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution. To provide a general definition of action progress, we ground our work in the linguistics literature, borrowing terms and concepts to understand which actions can be the subject of progress estimation. As a result, we define a categorization of actions and their phases. Motivated by the recent success obtained from the interaction of Convolutional and Recurrent Neural Networks, our model is based on a combination of the Faster R-CNN framework, to make framewise predictions, and LSTM networks, to estimate action progress through time. After introducing two evaluation protocols for the task at hand, we demonstrate the capability of our model to effectively predict action progress on the UCF-101 and J-HMDB datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista su cui è pubblicata l'opera
	
				ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3402447
			
	Codice WOS
	
				WOS:000614096700003
			
	Codice Scopus
	
				2-s2.0-85100298786
			
	Appare nelle tipologie:
	
				01.01 - Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2012.tomm-arxiv.pdf accesso aperto Tipologia: Preprint (AM - Author's Manuscript - submitted) Licenza: Creative commons Dimensione 6.53 MB Formato Adobe PDF Visualizza/Apri	6.53 MB	Adobe PDF	Visualizza/Apri
2012.tomm-arxiv.pdf accesso aperto Tipologia: Published (Publisher's Version of Record) Licenza: Altro Dimensione 6.53 MB Formato Adobe PDF Visualizza/Apri	6.53 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3355955

Citazioni

ND

31

ND

ND

social impact