The growing global aging population underscores the need for reliable biomarkers of brain aging to inform early interventions for age-related diseases. Brain age estimation has emerged as a promising biomarker for assessing brain health, utilizing machine learning models trained on neuroimaging data. This study evaluates the performance of multiple machine learning models both kernel-based (i.e. Support Vector Machines, Relevance Vector Machines, Gaussian Process Regression) and ensemble-based (i.e. Random Forest, Extreme Gradient Boosting) for brain age prediction using anatomical features derived from T1-weighted Magnetic Resonance Imaging (MRI) scans. A total of 25 models (including ensemble and kernel-based models, with linear and non-linear kernels) were trained on the Cam-CAN dataset (n=627) using a robust cross-validation scheme and evaluated on the HCP-Aging dataset (n=607) for generalization. Results indicate that non-linear models, particularly Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, outperformed linear models, achieving a mean absolute error (MAE) of 5.89 years and an explained variance on unseen data (Prediction R2) of 0.84. Validation on the external HCP-Aging dataset revealed that Extreme Gradient Boosting (XGB) performed best on non-harmonized data, achieving a MAE of 7.45 years and a Prediction R2 of 0.64. However, after using the ComBat pipeline to harmonize data across sites, the SVM with an RBF kernel achieved the highest accuracy, with a MAE of 7.05 years and a Prediction R2 of 0.63. These findings highlight the robustness of XGB to inter-dataset variability and the critical role of data harmonization for kernel-based models, like SVM. This study demonstrates the effectiveness of combining non-linear models and data harmonization techniques to improve the accuracy and generalizability of brain age prediction tools, enabling more reliable assessments of neurological health across heterogeneous datasets.
Comparison of Statistical Methods for Brain Age Prediction Using Neuroimaging Data
Pinamonti M.;Moretto M.;Veronese M.
2025
Abstract
The growing global aging population underscores the need for reliable biomarkers of brain aging to inform early interventions for age-related diseases. Brain age estimation has emerged as a promising biomarker for assessing brain health, utilizing machine learning models trained on neuroimaging data. This study evaluates the performance of multiple machine learning models both kernel-based (i.e. Support Vector Machines, Relevance Vector Machines, Gaussian Process Regression) and ensemble-based (i.e. Random Forest, Extreme Gradient Boosting) for brain age prediction using anatomical features derived from T1-weighted Magnetic Resonance Imaging (MRI) scans. A total of 25 models (including ensemble and kernel-based models, with linear and non-linear kernels) were trained on the Cam-CAN dataset (n=627) using a robust cross-validation scheme and evaluated on the HCP-Aging dataset (n=607) for generalization. Results indicate that non-linear models, particularly Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, outperformed linear models, achieving a mean absolute error (MAE) of 5.89 years and an explained variance on unseen data (Prediction R2) of 0.84. Validation on the external HCP-Aging dataset revealed that Extreme Gradient Boosting (XGB) performed best on non-harmonized data, achieving a MAE of 7.45 years and a Prediction R2 of 0.64. However, after using the ComBat pipeline to harmonize data across sites, the SVM with an RBF kernel achieved the highest accuracy, with a MAE of 7.05 years and a Prediction R2 of 0.63. These findings highlight the robustness of XGB to inter-dataset variability and the critical role of data harmonization for kernel-based models, like SVM. This study demonstrates the effectiveness of combining non-linear models and data harmonization techniques to improve the accuracy and generalizability of brain age prediction tools, enabling more reliable assessments of neurological health across heterogeneous datasets.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




