SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human actions, and propose a novel attention-based technique to evaluate, simultaneously, slow and fast features extracted from three different modalities, namely RGB, optical flow and extracted objects. Two branches process information at different time scales, i.e., frame-rates, and several fusion schemes are considered to improve prediction accuracy. We perform extensive experiments on EpicKitchens55 and EGTEA Gaze+ datasets, and demonstrate that our technique systematically improves the results of RULSTM architecture for Top-5 accuracy metric at different anticipation times.

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Nada Osman;Guglielmo Camporese;Pasquale Coscia;Lamberto Ballan

2021

Abstract

Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human actions, and propose a novel attention-based technique to evaluate, simultaneously, slow and fast features extracted from three different modalities, namely RGB, optical flow and extracted objects. Two branches process information at different time scales, i.e., frame-rates, and several fusion schemes are considered to improve prediction accuracy. We perform extensive experiments on EpicKitchens55 and EGTEA Gaze+ datasets, and demonstrate that our technique systematically improves the results of RULSTM architecture for Top-5 accuracy metric at different anticipation times.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo del Libro
	
				Proc. of IEEE International Conference on Computer Vision Workshops (ICCV-W)
			
	Collana/serie monografica
	
				PROCEEDINGS IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION
			
	Titolo convegno
	
				18th IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICCVW54120.2021.00383
			
	Codice WOS
	
				WOS:000739651103059
			
	Codice Scopus
	
				2-s2.0-85123058854
			
	Codice ISBN
	
				9781665401913
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2021_iccvw_osman.pdf accesso aperto Tipologia: Published (Publisher's Version of Record) Licenza: Creative commons Dimensione 971.78 kB Formato Adobe PDF Visualizza/Apri	971.78 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3402143

Citazioni

ND

16

10

ND

social impact