In this work, we introduce a practical hybrid methodology that fuses first-principles knowledge with measured process data to improve model prediction and extrapolation. Using sequential-orthogonalized partial least squares (SO-PLS), we (i) remove variation already explained by physics so the data-driven component learns only from unexplained residual structures, and (ii) decompose each prediction into mechanistic and empirical contributions. This decomposition yields a simple diagnostic – the mechanistic-to-measured (M2M) ratio – that quantifies the relative influence of physics versus data and flags when the fit is dominated by empirical corrections, signaling potential degradation under extrapolation. We demonstrate the methodology on a simulated two-stage batch reactor with both well-specified and misspecified kinetics. Even with model misspecification, the mechanistic component adds useful information; however, the contribution analysis shows substantial empirical adjustment remains necessary. The M2M ratio enables practitioners to compare candidate models and select those more likely to remain reliable beyond the training domain.
Beyond black-box models: Measuring physics-driven and data-driven contributions in hybrid process models
Facco, Pierantonio;Barolo, Massimiliano
2026
Abstract
In this work, we introduce a practical hybrid methodology that fuses first-principles knowledge with measured process data to improve model prediction and extrapolation. Using sequential-orthogonalized partial least squares (SO-PLS), we (i) remove variation already explained by physics so the data-driven component learns only from unexplained residual structures, and (ii) decompose each prediction into mechanistic and empirical contributions. This decomposition yields a simple diagnostic – the mechanistic-to-measured (M2M) ratio – that quantifies the relative influence of physics versus data and flags when the fit is dominated by empirical corrections, signaling potential degradation under extrapolation. We demonstrate the methodology on a simulated two-stage batch reactor with both well-specified and misspecified kinetics. Even with model misspecification, the mechanistic component adds useful information; however, the contribution analysis shows substantial empirical adjustment remains necessary. The M2M ratio enables practitioners to compare candidate models and select those more likely to remain reliable beyond the training domain.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




