The growing global aging population underscores the need for reliable biomarkers of brain aging to inform early interventions for age-related diseases. Brain age estimation has emerged as a promising biomarker for assessing brain health, utilizing machine learning models trained on neuroimaging data. This study evaluates the performance of multiple machine learning models both kernel-based (i.e. Support Vector Machines, Relevance Vector Machines, Gaussian Process Regression) and ensemble-based (i.e. Random Forest, Extreme Gradient Boosting) for brain age prediction using anatomical features derived from T1-weighted Magnetic Resonance Imaging (MRI) scans. A total of 25 models (including ensemble and kernel-based models, with linear and non-linear kernels) were trained on the Cam-CAN dataset (n=627) using a robust cross-validation scheme and evaluated on the HCP-Aging dataset (n=607) for generalization. Results indicate that non-linear models, particularly Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, outperformed linear models, achieving a mean absolute error (MAE) of 5.89 years and an explained variance on unseen data (Prediction R2) of 0.84. Validation on the external HCP-Aging dataset revealed that Extreme Gradient Boosting (XGB) performed best on non-harmonized data, achieving a MAE of 7.45 years and a Prediction R2 of 0.64. However, after using the ComBat pipeline to harmonize data across sites, the SVM with an RBF kernel achieved the highest accuracy, with a MAE of 7.05 years and a Prediction R2 of 0.63. These findings highlight the robustness of XGB to inter-dataset variability and the critical role of data harmonization for kernel-based models, like SVM. This study demonstrates the effectiveness of combining non-linear models and data harmonization techniques to improve the accuracy and generalizability of brain age prediction tools, enabling more reliable assessments of neurological health across heterogeneous datasets.

Comparison of Statistical Methods for Brain Age Prediction Using Neuroimaging Data

Pinamonti M.;Moretto M.;Veronese M.
2025

Abstract

The growing global aging population underscores the need for reliable biomarkers of brain aging to inform early interventions for age-related diseases. Brain age estimation has emerged as a promising biomarker for assessing brain health, utilizing machine learning models trained on neuroimaging data. This study evaluates the performance of multiple machine learning models both kernel-based (i.e. Support Vector Machines, Relevance Vector Machines, Gaussian Process Regression) and ensemble-based (i.e. Random Forest, Extreme Gradient Boosting) for brain age prediction using anatomical features derived from T1-weighted Magnetic Resonance Imaging (MRI) scans. A total of 25 models (including ensemble and kernel-based models, with linear and non-linear kernels) were trained on the Cam-CAN dataset (n=627) using a robust cross-validation scheme and evaluated on the HCP-Aging dataset (n=607) for generalization. Results indicate that non-linear models, particularly Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, outperformed linear models, achieving a mean absolute error (MAE) of 5.89 years and an explained variance on unseen data (Prediction R2) of 0.84. Validation on the external HCP-Aging dataset revealed that Extreme Gradient Boosting (XGB) performed best on non-harmonized data, achieving a MAE of 7.45 years and a Prediction R2 of 0.64. However, after using the ComBat pipeline to harmonize data across sites, the SVM with an RBF kernel achieved the highest accuracy, with a MAE of 7.05 years and a Prediction R2 of 0.63. These findings highlight the robustness of XGB to inter-dataset variability and the critical role of data harmonization for kernel-based models, like SVM. This study demonstrates the effectiveness of combining non-linear models and data harmonization techniques to improve the accuracy and generalizability of brain age prediction tools, enabling more reliable assessments of neurological health across heterogeneous datasets.
2025
IEEE
2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3571335
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact