Dynamic Bayesian Networks and Transfer Learning Enable the Development of Deep Sequence-Based Models on Small-Sample Data

Longato, E.; Tavazzi, E.; Chio, A.; Sparacino, G.; Di Camillo, B.

The development of prognostic models for rare diseases is often limited by the latter's extremely low prevalence, which, in turn, affects the breadth of modelling techniques that may be applied. Notably, a severe lack of training data may hinder or prevent the implementation of high-capacity models, such as deep neural networks, despite their known effectiveness in predicting adverse outcomes and efficiency in making use of longitudinal data. To address this issue, in the present work, we propose a novel methodological pipeline where we first train a dynamic Bayesian network (DBN) on the few available data to simulate an adequate number of virtual patients whose variables are linked by the same probabilistic relationship over time as in the original data; then, we train a deep learning model based on recurrent neural networks on the simulated data; and, finally, we apply finetuning, a transfer learning (TL) technique, to adapt the model to the real data. To demonstrate the potential usefulness of our approach, we apply it to the prediction of 3-year mortality in amyotrophic lateral sclerosis (ALS), a rare (<0.01% prevalence), fatal neurodegenerative disease, starting from a population of 985 patients from the Piemonte and Valle d'Aosta ALS (PARALS) register. We show that our pipeline of DBN and TL effectively combines the simulated data it is able to generate and the few available real data, leading to an 8.2% AUROC improvement over a reference deep learning model trained only on the real data.