Deep Reinforcement Learning for Motion Planning in Human Robot cooperative Scenarios

In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.

Deep Reinforcement Learning for Motion Planning in Human Robot cooperative Scenarios

Giorgio Nicola;Stefano Ghidoni

2021

Abstract

In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo del Libro
	
				Proceedings of 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)
			
	Titolo convegno
	
				2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ETFA45728.2021.9613505
			
	Codice WOS
	
				WOS:000766992600157
			
	Codice Scopus
	
				2-s2.0-85122944080
			
	Codice ISBN
	
				978-1-7281-2989-1
			
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Deep_Reinforcement_Learning_for_Motion_Planning_in_Human_Robot_cooperative_Scenarios.pdf Accesso riservato Tipologia: Published (publisher's version) Licenza: Accesso privato - non pubblico Dimensione 658.64 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	658.64 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3419265

Citazioni

ND

5

2

ND

social impact