Anticipating human motion is a fundamental component of human intelligence and a key requirement for intelligent systems sharing space with people. In everyday interactions, especially in collaborative activities, humans implicitly predict how others will move by interpreting ongoing motion, inferring underlying goals, and reasoning about how actions will unfold over time. This capability enables coordination, collision avoidance, and adaptation to shared objectives. Human Motion Prediction (HMP), i.e., inferring the evolution of human pose from recent observations, is therefore crucial for human-machine interaction in applications such as collaborative industrial robotics. It enables safer, more efficient, and proactive behavior by allowing intelligent robotic agents to anticipate human intentions, adapt their motion plans, and mitigate hazardous or inefficient interactions in shared workspaces. This thesis investigates long-term full-body HMP for Human-Robot Collaboration (HRC) by jointly addressing three tightly coupled dimensions: evaluation methodology, context-aware predictive modeling, and real-time system integration. Rather than treating HMP as an isolated algorithmic problem, these dimensions are framed as components of a single engineering objective: enabling reliable, context-aware, and deployable motion prediction in collaborative scenarios, with a focus on industrial settings where safety, efficiency, and proactivity are critical. The first contribution is an evaluation framework for full-body HMP models that overcomes the limitations of widely used error metrics. The framework combines three complementary aspects of prediction quality to capture spatial accuracy, angular precision, and motion realism. These aspects are further summarized in a composite index, enabling compact yet multidimensional comparison of models. Applied to state-of-the-art architectures on different datasets and frame rates, the framework reveals how data characteristics and optimization criteria affect predictive performance, providing practical guidelines for model selection under application-specific requirements. Building on these insights, the second contribution develops context-enriched HMP models that exploit semantic and spatial cues available in structured environments. The proposed STAHMP model conditions forecasts on both past motion and natural-language descriptions of the ongoing action, leveraging LLMs to obtain task embeddings that are fused with motion features in an encoder-decoder architecture. Spatial context is incorporated through the PADSO model, which describes human-object interactions via a graph-based representation of human joints and object bounding boxes. The PADEON model extends this approach to more complex environments by additionally encoding the semantic type of environmental elements while preserving real-time performance. Experiments on established datasets and on a newly collected industrial dataset show that semantic and spatial conditioning improve long-term accuracy and robustness to domain shifts. The third contribution is the Real-Time Human Motion Prediction (RT-HMP) framework, a modular ROS 2-based architecture that integrates full-body HMP into robotic pipelines. RT-HMP standardizes heterogeneous sensing inputs into a common representation, exposes model-agnostic prediction interfaces, and aggregates overlapping forecasts into actionable information for downstream applications. Validation in an industrial collaborative assembly task demonstrates that real-time HMP improves event-detection reliability, reduces operator idle time, and maintains end-to-end latency within operational requirements, thus evidencing the practical benefits of anticipatory motion prediction in HRC. Overall, this thesis shows that combining multidimensional evaluation, context-aware predictive models, and a real-time integration framework enables the practical deployment of full-body HMP in industrial HRC scenarios.

Long-term Human Motion Prediction in Human-Robot Collaborative Settings / Vanuzzo, Michael. - (2026 Mar 16).

Long-term Human Motion Prediction in Human-Robot Collaborative Settings

VANUZZO, MICHAEL
2026

Abstract

Anticipating human motion is a fundamental component of human intelligence and a key requirement for intelligent systems sharing space with people. In everyday interactions, especially in collaborative activities, humans implicitly predict how others will move by interpreting ongoing motion, inferring underlying goals, and reasoning about how actions will unfold over time. This capability enables coordination, collision avoidance, and adaptation to shared objectives. Human Motion Prediction (HMP), i.e., inferring the evolution of human pose from recent observations, is therefore crucial for human-machine interaction in applications such as collaborative industrial robotics. It enables safer, more efficient, and proactive behavior by allowing intelligent robotic agents to anticipate human intentions, adapt their motion plans, and mitigate hazardous or inefficient interactions in shared workspaces. This thesis investigates long-term full-body HMP for Human-Robot Collaboration (HRC) by jointly addressing three tightly coupled dimensions: evaluation methodology, context-aware predictive modeling, and real-time system integration. Rather than treating HMP as an isolated algorithmic problem, these dimensions are framed as components of a single engineering objective: enabling reliable, context-aware, and deployable motion prediction in collaborative scenarios, with a focus on industrial settings where safety, efficiency, and proactivity are critical. The first contribution is an evaluation framework for full-body HMP models that overcomes the limitations of widely used error metrics. The framework combines three complementary aspects of prediction quality to capture spatial accuracy, angular precision, and motion realism. These aspects are further summarized in a composite index, enabling compact yet multidimensional comparison of models. Applied to state-of-the-art architectures on different datasets and frame rates, the framework reveals how data characteristics and optimization criteria affect predictive performance, providing practical guidelines for model selection under application-specific requirements. Building on these insights, the second contribution develops context-enriched HMP models that exploit semantic and spatial cues available in structured environments. The proposed STAHMP model conditions forecasts on both past motion and natural-language descriptions of the ongoing action, leveraging LLMs to obtain task embeddings that are fused with motion features in an encoder-decoder architecture. Spatial context is incorporated through the PADSO model, which describes human-object interactions via a graph-based representation of human joints and object bounding boxes. The PADEON model extends this approach to more complex environments by additionally encoding the semantic type of environmental elements while preserving real-time performance. Experiments on established datasets and on a newly collected industrial dataset show that semantic and spatial conditioning improve long-term accuracy and robustness to domain shifts. The third contribution is the Real-Time Human Motion Prediction (RT-HMP) framework, a modular ROS 2-based architecture that integrates full-body HMP into robotic pipelines. RT-HMP standardizes heterogeneous sensing inputs into a common representation, exposes model-agnostic prediction interfaces, and aggregates overlapping forecasts into actionable information for downstream applications. Validation in an industrial collaborative assembly task demonstrates that real-time HMP improves event-detection reliability, reduces operator idle time, and maintains end-to-end latency within operational requirements, thus evidencing the practical benefits of anticipatory motion prediction in HRC. Overall, this thesis shows that combining multidimensional evaluation, context-aware predictive models, and a real-time integration framework enables the practical deployment of full-body HMP in industrial HRC scenarios.
Long-term Human Motion Prediction in Human-Robot Collaborative Settings
16-mar-2026
Long-term Human Motion Prediction in Human-Robot Collaborative Settings / Vanuzzo, Michael. - (2026 Mar 16).
File in questo prodotto:
File Dimensione Formato  
tesi_definitiva_Michael_Vanuzzo.pdf

embargo fino al 15/03/2029

Descrizione: tesi_definitiva_Michael_Vanuzzo
Tipologia: Tesi di dottorato
Dimensione 30.08 MB
Formato Adobe PDF
30.08 MB Adobe PDF Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3594620
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact