Long-term Human Motion Prediction in Human-Robot Collaborative Settings

Vanuzzo, Michael

Anticipating human motion is a fundamental component of human intelligence and a key requirement for intelligent systems sharing space with people. In everyday interactions, especially in collaborative activities, humans implicitly predict how others will move by interpreting ongoing motion, inferring underlying goals, and reasoning about how actions will unfold over time. This capability enables coordination, collision avoidance, and adaptation to shared objectives. Human Motion Prediction (HMP), i.e., inferring the evolution of human pose from recent observations, is therefore crucial for human-machine interaction in applications such as collaborative industrial robotics. It enables safer, more efficient, and proactive behavior by allowing intelligent robotic agents to anticipate human intentions, adapt their motion plans, and mitigate hazardous or inefficient interactions in shared workspaces. This thesis investigates long-term full-body HMP for Human-Robot Collaboration (HRC) by jointly addressing three tightly coupled dimensions: evaluation methodology, context-aware predictive modeling, and real-time system integration. Rather than treating HMP as an isolated algorithmic problem, these dimensions are framed as components of a single engineering objective: enabling reliable, context-aware, and deployable motion prediction in collaborative scenarios, with a focus on industrial settings where safety, efficiency, and proactivity are critical. The first contribution is an evaluation framework for full-body HMP models that overcomes the limitations of widely used error metrics. The framework combines three complementary aspects of prediction quality to capture spatial accuracy, angular precision, and motion realism. These aspects are further summarized in a composite index, enabling compact yet multidimensional comparison of models. Applied to state-of-the-art architectures on different datasets and frame rates, the framework reveals how data characteristics and optimization criteria affect predictive performance, providing practical guidelines for model selection under application-specific requirements. Building on these insights, the second contribution develops context-enriched HMP models that exploit semantic and spatial cues available in structured environments. The proposed STAHMP model conditions forecasts on both past motion and natural-language descriptions of the ongoing action, leveraging LLMs to obtain task embeddings that are fused with motion features in an encoder-decoder architecture. Spatial context is incorporated through the PADSO model, which describes human-object interactions via a graph-based representation of human joints and object bounding boxes. The PADEON model extends this approach to more complex environments by additionally encoding the semantic type of environmental elements while preserving real-time performance. Experiments on established datasets and on a newly collected industrial dataset show that semantic and spatial conditioning improve long-term accuracy and robustness to domain shifts. The third contribution is the Real-Time Human Motion Prediction (RT-HMP) framework, a modular ROS 2-based architecture that integrates full-body HMP into robotic pipelines. RT-HMP standardizes heterogeneous sensing inputs into a common representation, exposes model-agnostic prediction interfaces, and aggregates overlapping forecasts into actionable information for downstream applications. Validation in an industrial collaborative assembly task demonstrates that real-time HMP improves event-detection reliability, reduces operator idle time, and maintains end-to-end latency within operational requirements, thus evidencing the practical benefits of anticipatory motion prediction in HRC. Overall, this thesis shows that combining multidimensional evaluation, context-aware predictive models, and a real-time integration framework enables the practical deployment of full-body HMP in industrial HRC scenarios.

Long-term Human Motion Prediction in Human-Robot Collaborative Settings / Vanuzzo, Michael. - (2026 Mar 16).

Long-term Human Motion Prediction in Human-Robot Collaborative Settings

VANUZZO, MICHAEL

2026

Abstract

Anticipating human motion is a fundamental component of human intelligence and a key requirement for intelligent systems sharing space with people. In everyday interactions, especially in collaborative activities, humans implicitly predict how others will move by interpreting ongoing motion, inferring underlying goals, and reasoning about how actions will unfold over time. This capability enables coordination, collision avoidance, and adaptation to shared objectives. Human Motion Prediction (HMP), i.e., inferring the evolution of human pose from recent observations, is therefore crucial for human-machine interaction in applications such as collaborative industrial robotics. It enables safer, more efficient, and proactive behavior by allowing intelligent robotic agents to anticipate human intentions, adapt their motion plans, and mitigate hazardous or inefficient interactions in shared workspaces. This thesis investigates long-term full-body HMP for Human-Robot Collaboration (HRC) by jointly addressing three tightly coupled dimensions: evaluation methodology, context-aware predictive modeling, and real-time system integration. Rather than treating HMP as an isolated algorithmic problem, these dimensions are framed as components of a single engineering objective: enabling reliable, context-aware, and deployable motion prediction in collaborative scenarios, with a focus on industrial settings where safety, efficiency, and proactivity are critical. The first contribution is an evaluation framework for full-body HMP models that overcomes the limitations of widely used error metrics. The framework combines three complementary aspects of prediction quality to capture spatial accuracy, angular precision, and motion realism. These aspects are further summarized in a composite index, enabling compact yet multidimensional comparison of models. Applied to state-of-the-art architectures on different datasets and frame rates, the framework reveals how data characteristics and optimization criteria affect predictive performance, providing practical guidelines for model selection under application-specific requirements. Building on these insights, the second contribution develops context-enriched HMP models that exploit semantic and spatial cues available in structured environments. The proposed STAHMP model conditions forecasts on both past motion and natural-language descriptions of the ongoing action, leveraging LLMs to obtain task embeddings that are fused with motion features in an encoder-decoder architecture. Spatial context is incorporated through the PADSO model, which describes human-object interactions via a graph-based representation of human joints and object bounding boxes. The PADEON model extends this approach to more complex environments by additionally encoding the semantic type of environmental elements while preserving real-time performance. Experiments on established datasets and on a newly collected industrial dataset show that semantic and spatial conditioning improve long-term accuracy and robustness to domain shifts. The third contribution is the Real-Time Human Motion Prediction (RT-HMP) framework, a modular ROS 2-based architecture that integrates full-body HMP into robotic pipelines. RT-HMP standardizes heterogeneous sensing inputs into a common representation, exposes model-agnostic prediction interfaces, and aggregates overlapping forecasts into actionable information for downstream applications. Validation in an industrial collaborative assembly task demonstrates that real-time HMP improves event-detection reliability, reduces operator idle time, and maintains end-to-end latency within operational requirements, thus evidencing the practical benefits of anticipatory motion prediction in HRC. Overall, this thesis shows that combining multidimensional evaluation, context-aware predictive models, and a real-time integration framework enables the practical deployment of full-body HMP in industrial HRC scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Long-term Human Motion Prediction in Human-Robot Collaborative Settings
			
	Anno di discussione
	
				16-mar-2026
			
	Citazione
	
				Long-term Human Motion Prediction in Human-Robot Collaborative Settings / Vanuzzo, Michael. - (2026 Mar 16).
			
	Appare nelle tipologie:
	
				08.01 - Tesi di Dottorato UNIPD (Deposito Legale)

File in questo prodotto:

File	Dimensione	Formato
tesi_definitiva_Michael_Vanuzzo.pdf embargo fino al 15/03/2029 Descrizione: tesi_definitiva_Michael_Vanuzzo Tipologia: Tesi di dottorato Dimensione 30.08 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	30.08 MB	Adobe PDF	Visualizza/Apri Richiedi una copia