In Collaborative Robotics, 3D Human Motion Prediction (HMP) is of paramount importance to enable proactive robot assistance. It exploits past knowledge to provide insight into future body trajectories to integrate automation and humans. Unfortunately, data collection for robotics is often expensive and time-consuming, and only limited information is available. In this work, we propose a fine-tuning approach to improve the prediction accuracy for HMP in context-specific datasets. A state-of-the-art Deep Learning model, namely Position-Velocity Recurrent Encoder-Decoder (PVRED), is first pre-trained on the Human 3.6M dataset for HMP, and then tuned to suit specific motions. The experiments involved three smaller target datasets, considered in portions of increasing size, and two different levels of the PVRED architecture complexity. Compared to a scratch approach, the results showed that fine-tuning (i) reduced the number of training epochs, (ii) lowered the prediction error, and (iii) required a smaller dataset size. Moreover, the fine-tuned model showed even more advantages than increasing the PVRED complexity for scratch training. The proposed approach successfully transferred knowledge from the source domain to the fine-tuned model to predict human motion from a smaller target dataset. This demonstrates the significant potential of the proposed solution in practical applications with minimal training data for Collaborative Robotics.
Enhancing Robot Collaboration by Improving Human Motion Prediction Through Fine-Tuning
Casarin M.
;Vanuzzo M.;Guidolin M.;Reggiani M.;Michieletto S.
2024
Abstract
In Collaborative Robotics, 3D Human Motion Prediction (HMP) is of paramount importance to enable proactive robot assistance. It exploits past knowledge to provide insight into future body trajectories to integrate automation and humans. Unfortunately, data collection for robotics is often expensive and time-consuming, and only limited information is available. In this work, we propose a fine-tuning approach to improve the prediction accuracy for HMP in context-specific datasets. A state-of-the-art Deep Learning model, namely Position-Velocity Recurrent Encoder-Decoder (PVRED), is first pre-trained on the Human 3.6M dataset for HMP, and then tuned to suit specific motions. The experiments involved three smaller target datasets, considered in portions of increasing size, and two different levels of the PVRED architecture complexity. Compared to a scratch approach, the results showed that fine-tuning (i) reduced the number of training epochs, (ii) lowered the prediction error, and (iii) required a smaller dataset size. Moreover, the fine-tuned model showed even more advantages than increasing the PVRED complexity for scratch training. The proposed approach successfully transferred knowledge from the source domain to the fine-tuned model to predict human motion from a smaller target dataset. This demonstrates the significant potential of the proposed solution in practical applications with minimal training data for Collaborative Robotics.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.