In statistical analysis, Cochran's formula plays a crucial role in disentangling the relationships between marginal and conditional regression coefficients. However, its results and implications are valid only within the linear case. Despite this, due to its simplicity and interpretability, practitioners often continue to use Cochran's formula also outside linear models. With reference to binary outcome models, we derived the approximated expression of the marginal regression coefficient when marginalization is performed over a continuous covariate and show that it mimics Cochran's formula under certain simplifying assumptions. We initially postulate a logistic link function and then show how it can be generalized. We then explore the implications of this formula in the context of sensitivity analysis and causal mediation analysis, thereby enlarging the number of circumstances where explicit parametric formulations can be used to evaluate causal direct and indirect effects, otherwise computed via numerical integration. Simulations show that our proposed estimators perform equally well as others based on numerical methods and that the additional interpretability of the explicit formulas does not compromise their precision.

Omitting continuous covariates in binary regression models: Implications for sensitivity and mediation analysis

Gasparin, Matteo;Scarpa, Bruno;
2025

Abstract

In statistical analysis, Cochran's formula plays a crucial role in disentangling the relationships between marginal and conditional regression coefficients. However, its results and implications are valid only within the linear case. Despite this, due to its simplicity and interpretability, practitioners often continue to use Cochran's formula also outside linear models. With reference to binary outcome models, we derived the approximated expression of the marginal regression coefficient when marginalization is performed over a continuous covariate and show that it mimics Cochran's formula under certain simplifying assumptions. We initially postulate a logistic link function and then show how it can be generalized. We then explore the implications of this formula in the context of sensitivity analysis and causal mediation analysis, thereby enlarging the number of circumstances where explicit parametric formulations can be used to evaluate causal direct and indirect effects, otherwise computed via numerical integration. Simulations show that our proposed estimators perform equally well as others based on numerical methods and that the additional interpretability of the explicit formulas does not compromise their precision.
2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3552743
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact