On the impact of mechanistic model quality and data availability in hybrid model development

Geremia, M.; Marella, T.; Ardalani, E.; Beyramysoltan, S.; Chattoraj, S.; Facco, P.; Barolo, M.; Bezzo, F.

doi:10.1016/j.compchemeng.2025.109536

This work presents a methodology for evaluating the effectiveness of hybrid modelling under varying conditions of mechanistic model quality and information available for model training, that is typically expressed as the amount of data available. While hybrid models – which integrate mechanistic and data-driven components – have gained significant attention in process systems engineering, their advantages over purely mechanistic or data-driven alternatives remain inadequately quantified. We address this gap by investigating two critical factors: (i) the impact of mechanistic model fidelity on hybrid model performance, and (ii) the influence of calibration dataset size on prediction accuracy. Our methodology is validated through an in-silico case study of baker's yeast cultivation and a real-world industrial application of ion-exchange chromatography in biopharmaceutical manufacturing. Results demonstrate that hybrid models consistently outperform purely mechanistic and data-driven approaches when the mechanistic component captures fundamental process behaviours, even with structural simplifications. Notably, hybrid models maintain superior predictive capability in extrapolative scenarios; however, when mechanistic knowledge is severely limited and insufficient information is available for compensation, hybridisation benefits diminish substantially. The work provides quantitative guidance for practitioners to determine when hybrid modelling represents a justified investment of modelling resources in process engineering applications.