In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.
Deep Reinforcement Learning for Motion Planning in Human Robot cooperative Scenarios
Giorgio Nicola;Stefano Ghidoni
2021
Abstract
In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.File | Dimensione | Formato | |
---|---|---|---|
Deep_Reinforcement_Learning_for_Motion_Planning_in_Human_Robot_cooperative_Scenarios.pdf
non disponibili
Tipologia:
Published (publisher's version)
Licenza:
Accesso privato - non pubblico
Dimensione
658.64 kB
Formato
Adobe PDF
|
658.64 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.