Depthformer: Multimodal positional encodings and cross-input attention for transformer-based segmentation networks

Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task, and we propose to employ depth information by embedding it in the positional encoding. Effectively, we extend the network to multimodal data without adding any parameters and in a natural way that exploits the strength of transformers' self-attention modules. We also investigate the idea of performing cross-modality operations inside the attention module, swapping the key inputs between the depth and color branches. Our approach consistently improves performances on the Cityscapes benchmark.

Depthformer: Multimodal positional encodings and cross-input attention for transformer-based segmentation networks

Francesco Barbato;Giulia Rizzoli;Pietro Zanuttigh

2023

Abstract

Most approaches for semantic segmentation use only information from color cameras to parse the scenes, yet recent advancements show that using depth data allows to further improve performances. In this work, we focus on transformer-based deep learning architectures, that have achieved state-of-the-art performances on the segmentation task, and we propose to employ depth information by embedding it in the positional encoding. Effectively, we extend the network to multimodal data without adding any parameters and in a natural way that exploits the strength of transformers' self-attention modules. We also investigate the idea of performing cross-modality operations inside the attention module, swapping the key inputs between the depth and color branches. Our approach consistently improves performances on the Cityscapes benchmark.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del Libro
	
				Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
			
	Collana/serie monografica
	
				PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING
			
	Titolo convegno
	
				48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ICASSP49357.2023.10096314
			
	Codice WOS
	
				WOS:001595432200397
			
	Codice Scopus
	
				2-s2.0-85162079671
			
	Codice OpenAlex
	
				W4372346764
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3508084

Citazioni

ND

0

0

7

social impact