Sviluppo di metodi efficienti e scalabili per l’analisi di dati omici in biologia del cancro

Billato, Ilaria

High-throughput technologies have propelled biology into the big data era. Single-cell RNA sequencing now produces datasets with millions of cells, while digital pathology generates whole-slide images containing billions of pixels. These advances enable unprecedented discovery but create a computational paradox: data are generated faster than they can be processed, and standard workflows often fail to scale. Efficient algorithms and integrative strategies are therefore essential for analyzing massive, heterogeneous datasets. This PhD thesis addresses these challenges through two complementary aims. First, we benchmark different Singular Value Decomposition (SVD) algorithm for Principal Component Analysis (PCA), a key dimensionality-reduction step in single-cell transcriptomics. Classical PCA becomes prohibitively slow and memory-intensive as data size increases. To overcome these limitations, we evaluate state-of-the-art algorithms and out-of-memory data formats across complete single-cell workflows. The benchmark compares Seurat, OSCA/Bioconductor and scrapper in R and Scanpy, and GPU-enabled frameworks such as rapids\textunderscore singlecell in Python, leveraging GPU acceleration to reduce runtime and memory usage on datasets with millions of cells. These analyses quantify performance trade-offs and provide reproducible guidance for selecting optimal pipelines for large-scale single-cell studies. Second, we focus on digital pathology, where histopathological images reveal tissue architecture, cellular morphology, and tumor spatial organization. We processed 11,765 H\&E-stained images from 32 TCGA cancer types using deep learning (HoVer-Net) to extract nuclei-level features and Prov-GigaPath to extract slide level embeddings. To bridge the gap between image analysis and the R/Bioconductor ecosystem, we released three packages: TCIAAPI, HistoImageR and imageTCGA, a Shiny application for interactive exploration, filtering, and visualization of extracted features alongside the original images. By combining scalable computation with cross-modal integration, this work improves the efficiency of single-cell analysis and supports precision medicine through clinically relevant molecular - morphological associations.

Sviluppo di metodi efficienti e scalabili per l’analisi di dati omici in biologia del cancro / Billato, Ilaria. - (2026 Feb 20).

Sviluppo di metodi efficienti e scalabili per l’analisi di dati omici in biologia del cancro

BILLATO, ILARIA

2026

Abstract

High-throughput technologies have propelled biology into the big data era. Single-cell RNA sequencing now produces datasets with millions of cells, while digital pathology generates whole-slide images containing billions of pixels. These advances enable unprecedented discovery but create a computational paradox: data are generated faster than they can be processed, and standard workflows often fail to scale. Efficient algorithms and integrative strategies are therefore essential for analyzing massive, heterogeneous datasets. This PhD thesis addresses these challenges through two complementary aims. First, we benchmark different Singular Value Decomposition (SVD) algorithm for Principal Component Analysis (PCA), a key dimensionality-reduction step in single-cell transcriptomics. Classical PCA becomes prohibitively slow and memory-intensive as data size increases. To overcome these limitations, we evaluate state-of-the-art algorithms and out-of-memory data formats across complete single-cell workflows. The benchmark compares Seurat, OSCA/Bioconductor and scrapper in R and Scanpy, and GPU-enabled frameworks such as rapids\textunderscore singlecell in Python, leveraging GPU acceleration to reduce runtime and memory usage on datasets with millions of cells. These analyses quantify performance trade-offs and provide reproducible guidance for selecting optimal pipelines for large-scale single-cell studies. Second, we focus on digital pathology, where histopathological images reveal tissue architecture, cellular morphology, and tumor spatial organization. We processed 11,765 H\&E-stained images from 32 TCGA cancer types using deep learning (HoVer-Net) to extract nuclei-level features and Prov-GigaPath to extract slide level embeddings. To bridge the gap between image analysis and the R/Bioconductor ecosystem, we released three packages: TCIAAPI, HistoImageR and imageTCGA, a Shiny application for interactive exploration, filtering, and visualization of extracted features alongside the original images. By combining scalable computation with cross-modal integration, this work improves the efficiency of single-cell analysis and supports precision medicine through clinically relevant molecular - morphological associations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Development of efficient and scalable methods for omic data analyses in cancer biology
			
	Anno di discussione
	
				20-feb-2026
			
	Citazione
	
				Sviluppo di metodi efficienti e scalabili per l’analisi di dati omici in biologia del cancro / Billato, Ilaria. - (2026 Feb 20).
			
	Appare nelle tipologie:
	
				08.01 - Tesi di Dottorato UNIPD (Deposito Legale)

File in questo prodotto:

File	Dimensione	Formato
tesi_definitiva_Ilaria_Billato.pdf accesso aperto Descrizione: documento di tesi definitivo Tipologia: Tesi di dottorato Dimensione 8.9 MB Formato Adobe PDF Visualizza/Apri	8.9 MB	Adobe PDF	Visualizza/Apri