All biological systems are inherently dynamic, and dynamic processes occur at every level of life. Changes in human life rely on precise timing and sequencing of events. Disruption of the temporal regulation of events can lead to many diseases, so dynamics are fundamental to understanding life, diagnosing, and treating diseases. For analyzing the time course of events, time-series omics datasets, including RNA-seq (bulk and single-cell) and proteomics offer unprecedented insights into molecular mechanisms underlying biological processes. These high-throughput data provide comprehensive snapshots of gene expression and protein abundance at a precise time point, but their utilization poses challenges. Omics datasets are high-dimensional, requiring advanced computational and visualization methods for analysis and the interpretation is hindered by biological complexity. In this scenario, this thesis presents a new computational framework for analyzing time series of human multi-omics data across multiple biological contexts from a dynamic point of view, to enhance the accuracy in trends visualization and classification of each feature over time. In this work, we first focused on circadian rhythms, 24-hour periodic oscillation, extensively studied for the human body. One of the major challenges in biological time-series data analysis lies in accurately identifying rhythmic patterns, such as those observed in circadian rhythms. We propose a PCA-based method for visualizing and classifying circadian profiles of expression of different biological features from high-throughput bulk datasets. Our methodology was applied to investigate the day and night metabolic entrainment of human liver cells using transcriptomic and proteomics data. Two synchronization protocols were tested, Dexamethasone, the gold standard and a physiological protocol. The PCA-based method proved highly effective in distinguishing circadian and non-circadian features and visualizing gene and protein expression profiles over time. Beyond periodic processes, this thesis also explores expression dynamics in non-periodic, longer-term biological responses, such as those involved in early embryonic development. To achieve this, we extended the PCA-based method to recognize monotonic increasing or decreasing profiles over time. We tested the methodology on a new bulk RNA-seq time-course dataset over 20 days, capturing the dynamics of neurodifferentiation of naïve hiPSCs. By clustering genes based on their temporal trends, we identified eight distinct gene expression patterns and performed enrichment analysis, which revealed the involvement of key developmental pathways such as Notch, Wnt, and TGF-β. This analysis provided the first transcriptomic characterization of neuroepithelial differentiation from naïve hiPSCs, shedding light on the molecular processes that govern early neural development. The thesis also includes the development of a pipeline for single-cell RNA-seq datasets, which are becoming increasingly important for capturing cell-to-cell variability in dynamic biological processes. We proposed a clustering-based pipeline for analyzing single-cell temporal data, which includes a pre-processing step for trajectory inference and reordering of cells along pseudotime, followed by k-means clustering and functional enrichment analysis. This methodology was successfully applied to a dataset of brain organoid differentiation taken from Uzquiano et al., 2022. Our results were consistent with previous studies, and despite its limitations, this can be considered as a starting point for studying single-cell dynamics. In conclusion, this thesis provides significant advancements in the analysis of biological time-series data. Our approaches offer clear improvements in visualizing and understanding expression dynamics, both for circadian and non-circadian processes. The methodologies developed have broad applicability for studying a wide range of biological phenomena.
DEVELOPMENT OF COMPUTATIONAL APPROACHES FOR ANALYZING TIME-SERIES OF HUMAN MULTI-OMICS DATA / Gesualdo, Alessia. - (2025 Jan 15).
DEVELOPMENT OF COMPUTATIONAL APPROACHES FOR ANALYZING TIME-SERIES OF HUMAN MULTI-OMICS DATA
GESUALDO, ALESSIA
2025
Abstract
All biological systems are inherently dynamic, and dynamic processes occur at every level of life. Changes in human life rely on precise timing and sequencing of events. Disruption of the temporal regulation of events can lead to many diseases, so dynamics are fundamental to understanding life, diagnosing, and treating diseases. For analyzing the time course of events, time-series omics datasets, including RNA-seq (bulk and single-cell) and proteomics offer unprecedented insights into molecular mechanisms underlying biological processes. These high-throughput data provide comprehensive snapshots of gene expression and protein abundance at a precise time point, but their utilization poses challenges. Omics datasets are high-dimensional, requiring advanced computational and visualization methods for analysis and the interpretation is hindered by biological complexity. In this scenario, this thesis presents a new computational framework for analyzing time series of human multi-omics data across multiple biological contexts from a dynamic point of view, to enhance the accuracy in trends visualization and classification of each feature over time. In this work, we first focused on circadian rhythms, 24-hour periodic oscillation, extensively studied for the human body. One of the major challenges in biological time-series data analysis lies in accurately identifying rhythmic patterns, such as those observed in circadian rhythms. We propose a PCA-based method for visualizing and classifying circadian profiles of expression of different biological features from high-throughput bulk datasets. Our methodology was applied to investigate the day and night metabolic entrainment of human liver cells using transcriptomic and proteomics data. Two synchronization protocols were tested, Dexamethasone, the gold standard and a physiological protocol. The PCA-based method proved highly effective in distinguishing circadian and non-circadian features and visualizing gene and protein expression profiles over time. Beyond periodic processes, this thesis also explores expression dynamics in non-periodic, longer-term biological responses, such as those involved in early embryonic development. To achieve this, we extended the PCA-based method to recognize monotonic increasing or decreasing profiles over time. We tested the methodology on a new bulk RNA-seq time-course dataset over 20 days, capturing the dynamics of neurodifferentiation of naïve hiPSCs. By clustering genes based on their temporal trends, we identified eight distinct gene expression patterns and performed enrichment analysis, which revealed the involvement of key developmental pathways such as Notch, Wnt, and TGF-β. This analysis provided the first transcriptomic characterization of neuroepithelial differentiation from naïve hiPSCs, shedding light on the molecular processes that govern early neural development. The thesis also includes the development of a pipeline for single-cell RNA-seq datasets, which are becoming increasingly important for capturing cell-to-cell variability in dynamic biological processes. We proposed a clustering-based pipeline for analyzing single-cell temporal data, which includes a pre-processing step for trajectory inference and reordering of cells along pseudotime, followed by k-means clustering and functional enrichment analysis. This methodology was successfully applied to a dataset of brain organoid differentiation taken from Uzquiano et al., 2022. Our results were consistent with previous studies, and despite its limitations, this can be considered as a starting point for studying single-cell dynamics. In conclusion, this thesis provides significant advancements in the analysis of biological time-series data. Our approaches offer clear improvements in visualizing and understanding expression dynamics, both for circadian and non-circadian processes. The methodologies developed have broad applicability for studying a wide range of biological phenomena.File | Dimensione | Formato | |
---|---|---|---|
tesi_definitiva_Alessia_Gesualdo.pdf
embargo fino al 15/01/2028
Descrizione: tesi_definitiva_Alessia_Gesualdo
Tipologia:
Tesi di dottorato
Dimensione
25.52 MB
Formato
Adobe PDF
|
25.52 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.