Robust Visual Representation across Modalities in Semantic Scene Understanding

In recent years, advancements in deep learning have greatly improved our ability to understand and interpret three-dimensional scenes, especially in fields like autonomous driving, robotics, and virtual reality. However, achieving a consistent and high-quality representation of visual data remains a significant challenge. This issue is particularly evident with three-dimensional data, which often presents intrinsic problems such as sparsity, uneven data and label distributions. In particular, performing tasks like semantic scene understanding can be difficult due to uneven label distribution in the dataset; additionally, label shifts during training can result in catastrophic forgetting. Uneven data distribution can cause problems such as domain shifts or misalignment in models that handle different types of sensory information. Furthermore, 3D data can be quite complex and challenging to label accurately; consequently, tasks like scene understanding and 3D model reconstruction often suffer from data scarcity or struggle to achieve optimal visual representations. This thesis tackles these challenges by proposing transfer learning based techniques for obtaining robust visual representation in semantic scene understanding across multiple modalities, particularly handling three-dimensional scenes. The dissertation begins by exploring the realm of 3D scene representations and semantic understanding methods, followed by exploring methods to tackle class imbalance in 3D semantic segmentation, introducing coarse-to-fine self-regularizing strategies to improve the representation of infrequent classes in point cloud data. Then, it focuses on continual learning and proposes novel techniques to allow models to evolve over time by incorporating new information without the need for retraining from scratch or forgetting previously learned tasks. Multimodal learning is another key focus, with methods proposed to integrate sensory inputs such as LiDAR and RGB images to enhance scene interpretation in complex, real-world environments. The research further investigates domain adaptation, to improve model robustness when dealing with corrupted or degraded input data, ensuring more reliable performance across varying conditions. Finally, the thesis explores the intersection between semantic scene understanding and 3D reconstruction, proposing few-shot learning techniques to enhance the understanding of 3D models from scarce data, particularly for applications in architectural modelling. Through these contributions, this thesis advances the state of the art in visual representation for semantic scene understanding, offering new insights into the integration of transfer learning techniques in standard pipelines.

Robust Visual Representation across Modalities in Semantic Scene Understanding / Camuffo, Elena. - (2025 Mar 24).

Robust Visual Representation across Modalities in Semantic Scene Understanding

CAMUFFO, ELENA

2025

Abstract

In recent years, advancements in deep learning have greatly improved our ability to understand and interpret three-dimensional scenes, especially in fields like autonomous driving, robotics, and virtual reality. However, achieving a consistent and high-quality representation of visual data remains a significant challenge. This issue is particularly evident with three-dimensional data, which often presents intrinsic problems such as sparsity, uneven data and label distributions. In particular, performing tasks like semantic scene understanding can be difficult due to uneven label distribution in the dataset; additionally, label shifts during training can result in catastrophic forgetting. Uneven data distribution can cause problems such as domain shifts or misalignment in models that handle different types of sensory information. Furthermore, 3D data can be quite complex and challenging to label accurately; consequently, tasks like scene understanding and 3D model reconstruction often suffer from data scarcity or struggle to achieve optimal visual representations. This thesis tackles these challenges by proposing transfer learning based techniques for obtaining robust visual representation in semantic scene understanding across multiple modalities, particularly handling three-dimensional scenes. The dissertation begins by exploring the realm of 3D scene representations and semantic understanding methods, followed by exploring methods to tackle class imbalance in 3D semantic segmentation, introducing coarse-to-fine self-regularizing strategies to improve the representation of infrequent classes in point cloud data. Then, it focuses on continual learning and proposes novel techniques to allow models to evolve over time by incorporating new information without the need for retraining from scratch or forgetting previously learned tasks. Multimodal learning is another key focus, with methods proposed to integrate sensory inputs such as LiDAR and RGB images to enhance scene interpretation in complex, real-world environments. The research further investigates domain adaptation, to improve model robustness when dealing with corrupted or degraded input data, ensuring more reliable performance across varying conditions. Finally, the thesis explores the intersection between semantic scene understanding and 3D reconstruction, proposing few-shot learning techniques to enhance the understanding of 3D models from scarce data, particularly for applications in architectural modelling. Through these contributions, this thesis advances the state of the art in visual representation for semantic scene understanding, offering new insights into the integration of transfer learning techniques in standard pipelines.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Robust Visual Representation across Modalities in Semantic Scene Understanding
			
	Anno di discussione
	
				24-mar-2025
			
	Citazione
	
				Robust Visual Representation across Modalities in Semantic Scene Understanding / Camuffo, Elena. - (2025 Mar 24).
			
	Appare nelle tipologie:
	
				08.01 - Tesi di Dottorato UNIPD (Deposito Legale)

File in questo prodotto:

File	Dimensione	Formato
PhD_Thesis-final-pdfA.pdf accesso aperto Descrizione: Tesi Finale Tipologia: Tesi di dottorato Dimensione 21.52 MB Formato Adobe PDF Visualizza/Apri	21.52 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3550471

Citazioni

ND

ND

ND

ND

social impact