In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.

Deep Reinforcement Learning for Motion Planning in Human Robot cooperative Scenarios

Giorgio Nicola;Stefano Ghidoni
2021

Abstract

In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.
2021
Proceedings of 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)
978-1-7281-2989-1
File in questo prodotto:
File Dimensione Formato  
Deep_Reinforcement_Learning_for_Motion_Planning_in_Human_Robot_cooperative_Scenarios.pdf

non disponibili

Tipologia: Published (publisher's version)
Licenza: Accesso privato - non pubblico
Dimensione 658.64 kB
Formato Adobe PDF
658.64 kB Adobe PDF Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3419265
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact