Missing values represent a critical challenge in gait analysis datasets, where due to the high variability across subjects, a large number of trials is needed to represent an individual gait pattern; however, this is rarely available in the context of pathological subjects. This study analyzes the possibility to obtain a homogeneous number of trials across subjects through the application of imputation algorithms. Specifically Partial Least Squares regression is adopted and compared to traditional mean imputation strategies to gait analysis data of children with Fragile X Syndrome and healthy controls. For this purpose, missing values were introduced at varying percentages, and six different missingness scenarios analyzed. The effectiveness of each imputation method was assessed through quantifying the Kullback-Leibler divergence between the imputed and original datasets. Results demonstrate that PLS regression consistently outperforms on the available dataset mean imputation strategies across all conditions, maintaining a lower divergence while remaining computationally efficient. The present findings suggest that the more missing data the datasets exhibit, the more important is to choose PLS regression over mean imputation approaches.
Data imputation for gait analysis of children with Fragile X Syndrome
Beghetti, Federica;Varagnolo, Damiano;Sawacha, Zimi
2025
Abstract
Missing values represent a critical challenge in gait analysis datasets, where due to the high variability across subjects, a large number of trials is needed to represent an individual gait pattern; however, this is rarely available in the context of pathological subjects. This study analyzes the possibility to obtain a homogeneous number of trials across subjects through the application of imputation algorithms. Specifically Partial Least Squares regression is adopted and compared to traditional mean imputation strategies to gait analysis data of children with Fragile X Syndrome and healthy controls. For this purpose, missing values were introduced at varying percentages, and six different missingness scenarios analyzed. The effectiveness of each imputation method was assessed through quantifying the Kullback-Leibler divergence between the imputed and original datasets. Results demonstrate that PLS regression consistently outperforms on the available dataset mean imputation strategies across all conditions, maintaining a lower divergence while remaining computationally efficient. The present findings suggest that the more missing data the datasets exhibit, the more important is to choose PLS regression over mean imputation approaches.| File | Dimensione | Formato | |
|---|---|---|---|
|
xfragile.pdf
accesso aperto
Tipologia:
Published (Publisher's Version of Record)
Licenza:
Creative commons
Dimensione
855.16 kB
Formato
Adobe PDF
|
855.16 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




