In statistical analysis, Cochran's formula plays a crucial role in disentangling the relationships between marginal and conditional regression coefficients. However, its results and implications are valid only within the linear case. Despite this, due to its simplicity and interpretability, practitioners often continue to use Cochran's formula also outside linear models. With reference to binary outcome models, we derived the approximated expression of the marginal regression coefficient when marginalization is performed over a continuous covariate and show that it mimics Cochran's formula under certain simplifying assumptions. We initially postulate a logistic link function and then show how it can be generalized. We then explore the implications of this formula in the context of sensitivity analysis and causal mediation analysis, thereby enlarging the number of circumstances where explicit parametric formulations can be used to evaluate causal direct and indirect effects, otherwise computed via numerical integration. Simulations show that our proposed estimators perform equally well as others based on numerical methods and that the additional interpretability of the explicit formulas does not compromise their precision.
Omitting continuous covariates in binary regression models: Implications for sensitivity and mediation analysis
Gasparin, Matteo;Scarpa, Bruno;
2025
Abstract
In statistical analysis, Cochran's formula plays a crucial role in disentangling the relationships between marginal and conditional regression coefficients. However, its results and implications are valid only within the linear case. Despite this, due to its simplicity and interpretability, practitioners often continue to use Cochran's formula also outside linear models. With reference to binary outcome models, we derived the approximated expression of the marginal regression coefficient when marginalization is performed over a continuous covariate and show that it mimics Cochran's formula under certain simplifying assumptions. We initially postulate a logistic link function and then show how it can be generalized. We then explore the implications of this formula in the context of sensitivity analysis and causal mediation analysis, thereby enlarging the number of circumstances where explicit parametric formulations can be used to evaluate causal direct and indirect effects, otherwise computed via numerical integration. Simulations show that our proposed estimators perform equally well as others based on numerical methods and that the additional interpretability of the explicit formulas does not compromise their precision.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.