Missing values represent a critical challenge in gait analysis datasets, where due to the high variability across subjects, a large number of trials is needed to represent an individual gait pattern; however, this is rarely available in the context of pathological subjects. This study analyzes the possibility to obtain a homogeneous number of trials across subjects through the application of imputation algorithms. Specifically Partial Least Squares regression is adopted and compared to traditional mean imputation strategies to gait analysis data of children with Fragile X Syndrome and healthy controls. For this purpose, missing values were introduced at varying percentages, and six different missingness scenarios analyzed. The effectiveness of each imputation method was assessed through quantifying the Kullback-Leibler divergence between the imputed and original datasets. Results demonstrate that PLS regression consistently outperforms on the available dataset mean imputation strategies across all conditions, maintaining a lower divergence while remaining computationally efficient. The present findings suggest that the more missing data the datasets exhibit, the more important is to choose PLS regression over mean imputation approaches.

Data imputation for gait analysis of children with Fragile X Syndrome

Beghetti, Federica;Varagnolo, Damiano;Sawacha, Zimi
2025

Abstract

Missing values represent a critical challenge in gait analysis datasets, where due to the high variability across subjects, a large number of trials is needed to represent an individual gait pattern; however, this is rarely available in the context of pathological subjects. This study analyzes the possibility to obtain a homogeneous number of trials across subjects through the application of imputation algorithms. Specifically Partial Least Squares regression is adopted and compared to traditional mean imputation strategies to gait analysis data of children with Fragile X Syndrome and healthy controls. For this purpose, missing values were introduced at varying percentages, and six different missingness scenarios analyzed. The effectiveness of each imputation method was assessed through quantifying the Kullback-Leibler divergence between the imputed and original datasets. Results demonstrate that PLS regression consistently outperforms on the available dataset mean imputation strategies across all conditions, maintaining a lower divergence while remaining computationally efficient. The present findings suggest that the more missing data the datasets exhibit, the more important is to choose PLS regression over mean imputation approaches.
2025
IFAC-PapersOnLine
7th IFAC Conference on Intelligent Control and Automation Sciences, ICONS 2025
File in questo prodotto:
File Dimensione Formato  
xfragile.pdf

accesso aperto

Tipologia: Published (Publisher's Version of Record)
Licenza: Creative commons
Dimensione 855.16 kB
Formato Adobe PDF
855.16 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3595341
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact