Heart transplantation represents the most effective treatment for patients with end-stage heart failure. Despite advances in surgical procedures and immunosuppressive therapies, long-term graft survival remains threatened by major complications such as acute cellular rejection (ACR) and viral infections, especially Cytomegalovirus (CMV). Endomyocardial biopsy (EMB) is currently the clinical standard for monitoring graft rejection, but complementary molecular and computational approaches are emerging to support and refine diagnostic assessment. Recent advances in biomarker discovery are paving the way for non-invasive, accurate, and early detection of adverse conditions, combined with routine EMB. These developments are increasingly supported by computational approaches that enable the integration of molecular signatures into predictive models for improved graft surveillance. In this thesis, I developed and validated a series of computational models based on machine learning (ML) and deep learning (DL) techniques to improve diagnostic precision and predictive capability in the post-transplant follow-up, with a strong emphasis on interpretability including data-scarce scenarios. The first part of this thesis presents a systematic bootstrap analysis conducted on a large publicly available microarray dataset (E-MTAB-8026), aimed at evaluating a machine learning framework specifically designed for data-scarce scenarios with a focus on interpretability. The study investigates the impact of data augmentation and feature selection techniques on classification performance and model transparency in omics data. The proposed pipeline integrates synthetic data generation, L1-regularized logistic regression (LASSO), and kernel-based classifiers to jointly optimize predictive accuracy and features selection. This framework was used to simulate clinically relevant low-sample conditions. Building upon this foundation, the second part of the thesis introduces AugPred, a machine learning pipeline specifically designed for post-transplant patient follow-up. AugPred was applied to the classification of Acute Cellular Rejection versus Cytomegalovirus infection using miRNA expression profiles obtained from endomyocardial biopsies of transplanted patients. Despite the limited sample size (n=11), the pipeline demonstrates that, through targeted data augmentation and robust feature selection, it is possible to train high-performing classifiers even in highly data-scarce scenarios. Comparative analyses confirmed its superior performance and interpretability over baseline approaches, while pathway enrichment analysis validated the involvement of the selected miRNAs in immune and infection-related biological processes. In the third part of the thesis, a multilayer network approach was employed to model miRNA-mRNA interactions derived from endomyocardial biopsy (EMB) samples of control patients, as well as those with ACR and CMV infection. By applying an ensemble PageRank centrality strategy, we identified miRNAs with phenotype-specific regulatory roles, that act as modulator between different conditions. These miRNAs were further validated through pathway enrichment analysis (PEA) for investigating the biological meaning. This network-based framework highlighted the potential of centrality-driven models to uncover molecular signatures associated with rejection and infection processes. The fourth section is dedicated to time series modeling, initially developed in collaboration with an industrial partner in the hydrological domain. Through the comparison of ARIMAX, LSTM recurrent neural network, and physical based models, I explored techniques for modeling sequential data, forecasting strategies, and the influence from external variables. The methodological insights gained were later transferred to the biomedical domain. Specifically, I applied a Variational Autoencoder (VAE) deep learning architecture coupled with LASSO regression to identify clinical and
Machine Learning models to create an expert system to predict major cardiac adverse events in monitoring heart transplant patients / Perazzolo, Diego. - (2026 Mar 26).
Machine Learning models to create an expert system to predict major cardiac adverse events in monitoring heart transplant patients
PERAZZOLO, DIEGO
2026
Abstract
Heart transplantation represents the most effective treatment for patients with end-stage heart failure. Despite advances in surgical procedures and immunosuppressive therapies, long-term graft survival remains threatened by major complications such as acute cellular rejection (ACR) and viral infections, especially Cytomegalovirus (CMV). Endomyocardial biopsy (EMB) is currently the clinical standard for monitoring graft rejection, but complementary molecular and computational approaches are emerging to support and refine diagnostic assessment. Recent advances in biomarker discovery are paving the way for non-invasive, accurate, and early detection of adverse conditions, combined with routine EMB. These developments are increasingly supported by computational approaches that enable the integration of molecular signatures into predictive models for improved graft surveillance. In this thesis, I developed and validated a series of computational models based on machine learning (ML) and deep learning (DL) techniques to improve diagnostic precision and predictive capability in the post-transplant follow-up, with a strong emphasis on interpretability including data-scarce scenarios. The first part of this thesis presents a systematic bootstrap analysis conducted on a large publicly available microarray dataset (E-MTAB-8026), aimed at evaluating a machine learning framework specifically designed for data-scarce scenarios with a focus on interpretability. The study investigates the impact of data augmentation and feature selection techniques on classification performance and model transparency in omics data. The proposed pipeline integrates synthetic data generation, L1-regularized logistic regression (LASSO), and kernel-based classifiers to jointly optimize predictive accuracy and features selection. This framework was used to simulate clinically relevant low-sample conditions. Building upon this foundation, the second part of the thesis introduces AugPred, a machine learning pipeline specifically designed for post-transplant patient follow-up. AugPred was applied to the classification of Acute Cellular Rejection versus Cytomegalovirus infection using miRNA expression profiles obtained from endomyocardial biopsies of transplanted patients. Despite the limited sample size (n=11), the pipeline demonstrates that, through targeted data augmentation and robust feature selection, it is possible to train high-performing classifiers even in highly data-scarce scenarios. Comparative analyses confirmed its superior performance and interpretability over baseline approaches, while pathway enrichment analysis validated the involvement of the selected miRNAs in immune and infection-related biological processes. In the third part of the thesis, a multilayer network approach was employed to model miRNA-mRNA interactions derived from endomyocardial biopsy (EMB) samples of control patients, as well as those with ACR and CMV infection. By applying an ensemble PageRank centrality strategy, we identified miRNAs with phenotype-specific regulatory roles, that act as modulator between different conditions. These miRNAs were further validated through pathway enrichment analysis (PEA) for investigating the biological meaning. This network-based framework highlighted the potential of centrality-driven models to uncover molecular signatures associated with rejection and infection processes. The fourth section is dedicated to time series modeling, initially developed in collaboration with an industrial partner in the hydrological domain. Through the comparison of ARIMAX, LSTM recurrent neural network, and physical based models, I explored techniques for modeling sequential data, forecasting strategies, and the influence from external variables. The methodological insights gained were later transferred to the biomedical domain. Specifically, I applied a Variational Autoencoder (VAE) deep learning architecture coupled with LASSO regression to identify clinical and| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_Diego_Perazzolo.pdf
accesso aperto
Descrizione: tesi_Diego_Perazzolo
Tipologia:
Tesi di dottorato
Dimensione
16.66 MB
Formato
Adobe PDF
|
16.66 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




