The development of prognostic models for rare diseases is often limited by the latter's extremely low prevalence, which, in turn, affects the breadth of modelling techniques that may be applied. Notably, a severe lack of training data may hinder or prevent the implementation of high-capacity models, such as deep neural networks, despite their known effectiveness in predicting adverse outcomes and efficiency in making use of longitudinal data. To address this issue, in the present work, we propose a novel methodological pipeline where we first train a dynamic Bayesian network (DBN) on the few available data to simulate an adequate number of virtual patients whose variables are linked by the same probabilistic relationship over time as in the original data; then, we train a deep learning model based on recurrent neural networks on the simulated data; and, finally, we apply finetuning, a transfer learning (TL) technique, to adapt the model to the real data. To demonstrate the potential usefulness of our approach, we apply it to the prediction of 3-year mortality in amyotrophic lateral sclerosis (ALS), a rare (<0.01% prevalence), fatal neurodegenerative disease, starting from a population of 985 patients from the Piemonte and Valle d'Aosta ALS (PARALS) register. We show that our pipeline of DBN and TL effectively combines the simulated data it is able to generate and the few available real data, leading to an 8.2% AUROC improvement over a reference deep learning model trained only on the real data.

Dynamic Bayesian Networks and Transfer Learning Enable the Development of Deep Sequence-Based Models on Small-Sample Data

Longato E.;Tavazzi E.;Sparacino G.;Di Camillo B.
2023

Abstract

The development of prognostic models for rare diseases is often limited by the latter's extremely low prevalence, which, in turn, affects the breadth of modelling techniques that may be applied. Notably, a severe lack of training data may hinder or prevent the implementation of high-capacity models, such as deep neural networks, despite their known effectiveness in predicting adverse outcomes and efficiency in making use of longitudinal data. To address this issue, in the present work, we propose a novel methodological pipeline where we first train a dynamic Bayesian network (DBN) on the few available data to simulate an adequate number of virtual patients whose variables are linked by the same probabilistic relationship over time as in the original data; then, we train a deep learning model based on recurrent neural networks on the simulated data; and, finally, we apply finetuning, a transfer learning (TL) technique, to adapt the model to the real data. To demonstrate the potential usefulness of our approach, we apply it to the prediction of 3-year mortality in amyotrophic lateral sclerosis (ALS), a rare (<0.01% prevalence), fatal neurodegenerative disease, starting from a population of 985 patients from the Piemonte and Valle d'Aosta ALS (PARALS) register. We show that our pipeline of DBN and TL effectively combines the simulated data it is able to generate and the few available real data, leading to an 8.2% AUROC improvement over a reference deep learning model trained only on the real data.
2023
Proceedings of the 8th National Congress of Bioengineering, GNB 2023
VIII Congress of the National Group of Bioengineering (GNB 2023)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3515934
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact