Human-to-robot (H2R) object handovers are essential for effective human-robot collaboration in dynamic scenarios, but existing approaches often rely on large, object-specific models that limit generalization and compromise real-time performance. In this work, we present an object-agnostic framework for human-to-robot (H2R) handover, composed of three main modules: hand/object segmentation, grasp generation, and grasp selection. For the segmentation module, we propose Fast-EgoHOS, a lighter and faster version of the state-of-the-art EgoHOS model, achieved by replacing parts of the network with targeted post-processing and filtering techniques. The grasp generation module is based on a pre-trained 6DOF grasp model that predicts a set of feasible grasps, from which the grasp selection module chooses the most suitable one to be performed based on a set of criteria aimed at ensuring operator safety (e.g. hand-grasp distance). The framework has been validated in a real robotic setup across a variety of object types and grip configurations, confirming its generalization capability. Results demonstrate that our approach achieves performance comparable to larger models while reducing the inference speed to one third, enabling faster and safer operation in close-contact H2R scenarios.
Fast-EgoHOS: An Efficient Framework based on Hand-Object Segmentation for Object-Agnostic Human-to-Robot Handovers
Terreran M.;Ghidoni S.
2025
Abstract
Human-to-robot (H2R) object handovers are essential for effective human-robot collaboration in dynamic scenarios, but existing approaches often rely on large, object-specific models that limit generalization and compromise real-time performance. In this work, we present an object-agnostic framework for human-to-robot (H2R) handover, composed of three main modules: hand/object segmentation, grasp generation, and grasp selection. For the segmentation module, we propose Fast-EgoHOS, a lighter and faster version of the state-of-the-art EgoHOS model, achieved by replacing parts of the network with targeted post-processing and filtering techniques. The grasp generation module is based on a pre-trained 6DOF grasp model that predicts a set of feasible grasps, from which the grasp selection module chooses the most suitable one to be performed based on a set of criteria aimed at ensuring operator safety (e.g. hand-grasp distance). The framework has been validated in a real robotic setup across a variety of object types and grip configurations, confirming its generalization capability. Results demonstrate that our approach achieves performance comparable to larger models while reducing the inference speed to one third, enabling faster and safer operation in close-contact H2R scenarios.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




