In recent years, advancements in deep learning have greatly improved our ability to understand and interpret three-dimensional scenes, especially in fields like autonomous driving, robotics, and virtual reality. However, achieving a consistent and high-quality representation of visual data remains a significant challenge. This issue is particularly evident with three-dimensional data, which often presents intrinsic problems such as sparsity, uneven data and label distributions. In particular, performing tasks like semantic scene understanding can be difficult due to uneven label distribution in the dataset; additionally, label shifts during training can result in catastrophic forgetting. Uneven data distribution can cause problems such as domain shifts or misalignment in models that handle different types of sensory information. Furthermore, 3D data can be quite complex and challenging to label accurately; consequently, tasks like scene understanding and 3D model reconstruction often suffer from data scarcity or struggle to achieve optimal visual representations. This thesis tackles these challenges by proposing transfer learning based techniques for obtaining robust visual representation in semantic scene understanding across multiple modalities, particularly handling three-dimensional scenes. The dissertation begins by exploring the realm of 3D scene representations and semantic understanding methods, followed by exploring methods to tackle class imbalance in 3D semantic segmentation, introducing coarse-to-fine self-regularizing strategies to improve the representation of infrequent classes in point cloud data. Then, it focuses on continual learning and proposes novel techniques to allow models to evolve over time by incorporating new information without the need for retraining from scratch or forgetting previously learned tasks. Multimodal learning is another key focus, with methods proposed to integrate sensory inputs such as LiDAR and RGB images to enhance scene interpretation in complex, real-world environments. The research further investigates domain adaptation, to improve model robustness when dealing with corrupted or degraded input data, ensuring more reliable performance across varying conditions. Finally, the thesis explores the intersection between semantic scene understanding and 3D reconstruction, proposing few-shot learning techniques to enhance the understanding of 3D models from scarce data, particularly for applications in architectural modelling. Through these contributions, this thesis advances the state of the art in visual representation for semantic scene understanding, offering new insights into the integration of transfer learning techniques in standard pipelines.

Robust Visual Representation across Modalities in Semantic Scene Understanding / Camuffo, Elena. - (2025 Mar 24).

Robust Visual Representation across Modalities in Semantic Scene Understanding

CAMUFFO, ELENA
2025

Abstract

In recent years, advancements in deep learning have greatly improved our ability to understand and interpret three-dimensional scenes, especially in fields like autonomous driving, robotics, and virtual reality. However, achieving a consistent and high-quality representation of visual data remains a significant challenge. This issue is particularly evident with three-dimensional data, which often presents intrinsic problems such as sparsity, uneven data and label distributions. In particular, performing tasks like semantic scene understanding can be difficult due to uneven label distribution in the dataset; additionally, label shifts during training can result in catastrophic forgetting. Uneven data distribution can cause problems such as domain shifts or misalignment in models that handle different types of sensory information. Furthermore, 3D data can be quite complex and challenging to label accurately; consequently, tasks like scene understanding and 3D model reconstruction often suffer from data scarcity or struggle to achieve optimal visual representations. This thesis tackles these challenges by proposing transfer learning based techniques for obtaining robust visual representation in semantic scene understanding across multiple modalities, particularly handling three-dimensional scenes. The dissertation begins by exploring the realm of 3D scene representations and semantic understanding methods, followed by exploring methods to tackle class imbalance in 3D semantic segmentation, introducing coarse-to-fine self-regularizing strategies to improve the representation of infrequent classes in point cloud data. Then, it focuses on continual learning and proposes novel techniques to allow models to evolve over time by incorporating new information without the need for retraining from scratch or forgetting previously learned tasks. Multimodal learning is another key focus, with methods proposed to integrate sensory inputs such as LiDAR and RGB images to enhance scene interpretation in complex, real-world environments. The research further investigates domain adaptation, to improve model robustness when dealing with corrupted or degraded input data, ensuring more reliable performance across varying conditions. Finally, the thesis explores the intersection between semantic scene understanding and 3D reconstruction, proposing few-shot learning techniques to enhance the understanding of 3D models from scarce data, particularly for applications in architectural modelling. Through these contributions, this thesis advances the state of the art in visual representation for semantic scene understanding, offering new insights into the integration of transfer learning techniques in standard pipelines.
Robust Visual Representation across Modalities in Semantic Scene Understanding
24-mar-2025
Robust Visual Representation across Modalities in Semantic Scene Understanding / Camuffo, Elena. - (2025 Mar 24).
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis-final-pdfA.pdf

accesso aperto

Descrizione: Tesi Finale
Tipologia: Tesi di dottorato
Dimensione 21.52 MB
Formato Adobe PDF
21.52 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3550471
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact