A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Ribeiro, Eduardo Godinho
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/18/18153/tde-25092020-134758/
Resumo: The development of the robotics and artificial intelligence fields has not yet allowed robots to execute, with dexterity, simple actions performed by humans. One of them is the grasping of objects by robotic manipulators. Aiming to explore the use of deep learning algorithms, specifically Convolutional Neural Networks, to approach the robotic grasping problem, this work addresses the visual perception phase involved in the task. That is, the processing of visual data to obtain the location of the object to be grasped, its pose and the points at which the robot\'s grippers must make contact to ensure a stable grasp. For this, the dataset Cornell Grasping is used to train a convolutional neural network capable of considering these three stages simultaneously. In other words, having an image of the robot\'s workspace, containing a certain object, the network predicts a grasp rectangle that symbolizes the position, orientation and opening of the robot\'s parallel grippers in the instant before its closing. In addition to this network, capable of processing images in real-time, another network is designed so that it is possible to deal with situations in which the object moves in the environment. In this way, the second convolutional network is trained to perform a visual servo control which ensures that the object remains in the robot\'s field of view. This network predicts the proportional values of the linear and angular velocities that the camera must have so that the object is always in the image processed by the grasp network. The dataset used for training was generated, with reduced human supervision, by a Kinova Gen3 robotic manipulator with seven degrees of freedom. The robot is also used to evaluate the applicability in real-time and obtain practical results from the designed algorithms. In addition, the offline results obtained through validation sets are also analyzed and discussed taking into account their efficiency and processing speed. The results for grasping exceed 90% accuracy with state-of-the-art prediction speed. Regarding visual servoing, one of the designed models achieves millimeter positioning accuracy for a first-seen object. In a small evaluation, the complete system performed successful tracking and grasping of first-seen dynamic objects in 85% of attempts. So, this work presents a new system for autonomous robotic manipulation, able to generalize to different objects and with high processing speed, which allows its application in real-time and real-world robotic systems.
id USP_42bdfc4c1dfe1029f69fc09d23a48f80
oai_identifier_str oai:teses.usp.br:tde-25092020-134758
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulationUma abordagem baseada em aprendizagem profunda para controle servo-visual e detecção de pontos de preensão para manipulação robótica autônomaAprendizagem profundaControle servo-visualDeep learningPreensão robóticaRobotic graspingVisual servoingThe development of the robotics and artificial intelligence fields has not yet allowed robots to execute, with dexterity, simple actions performed by humans. One of them is the grasping of objects by robotic manipulators. Aiming to explore the use of deep learning algorithms, specifically Convolutional Neural Networks, to approach the robotic grasping problem, this work addresses the visual perception phase involved in the task. That is, the processing of visual data to obtain the location of the object to be grasped, its pose and the points at which the robot\'s grippers must make contact to ensure a stable grasp. For this, the dataset Cornell Grasping is used to train a convolutional neural network capable of considering these three stages simultaneously. In other words, having an image of the robot\'s workspace, containing a certain object, the network predicts a grasp rectangle that symbolizes the position, orientation and opening of the robot\'s parallel grippers in the instant before its closing. In addition to this network, capable of processing images in real-time, another network is designed so that it is possible to deal with situations in which the object moves in the environment. In this way, the second convolutional network is trained to perform a visual servo control which ensures that the object remains in the robot\'s field of view. This network predicts the proportional values of the linear and angular velocities that the camera must have so that the object is always in the image processed by the grasp network. The dataset used for training was generated, with reduced human supervision, by a Kinova Gen3 robotic manipulator with seven degrees of freedom. The robot is also used to evaluate the applicability in real-time and obtain practical results from the designed algorithms. In addition, the offline results obtained through validation sets are also analyzed and discussed taking into account their efficiency and processing speed. The results for grasping exceed 90% accuracy with state-of-the-art prediction speed. Regarding visual servoing, one of the designed models achieves millimeter positioning accuracy for a first-seen object. In a small evaluation, the complete system performed successful tracking and grasping of first-seen dynamic objects in 85% of attempts. So, this work presents a new system for autonomous robotic manipulation, able to generalize to different objects and with high processing speed, which allows its application in real-time and real-world robotic systems.A evolução dos campos da Robótica e da Inteligência Artificial ainda não possibilitou que tarefas simples executadas pelo ser humano, sejam executadas com destreza por um robô. Uma delas é a manipulação de objetos por manipuladores robóticos. Visando explorar o uso de algoritmos de aprendizagem profunda, especificamente Redes Neurais Convolucionais, para abordar o problema de preensão robótica, este trabalho explora a fase de percepção visual envolvida na tarefa. Isto é, o processamento de dados visuais para que se possa obter a localização do objeto a ser pego, sua postura e os pontos nos quais as garras do robô devem fazer contato para garantir uma preensão estável. Para tal, o conjunto de dados Cornell Grasping foi utilizado para treinar uma rede neural convolucional capaz de considerar estas três etapas de forma simultânea. Ou seja, de posse de uma imagem do ambiente de trabalho do robô, contendo determinado objeto, a rede prediz um retângulo de preensão que simboliza a posição, orientação e abertura da garra paralela do robô no instante anterior ao seu fechamento. Em adição a esta rede, capaz de processar as imagens em tempo real, outra rede foi projetada para que seja possível lidar com situações em que o objeto se movimenta no ambiente. Desta forma, a segunda rede convolucional é treinada para realizar um controle servo-visual que assegura a permanência do objeto no campo de visão do robô. Esta rede prediz os valores proporcionais das velocidades linear e angular que a câmera deve possuir para que o objeto sempre esteja na imagem processada pela rede de preensão. O conjunto de dados utilizado para treinamento foi gerado, com reduzida supervisão humana, por um robô manipulador Kinova Gen3 com sete graus de liberdade. O robô também foi utilizado para avaliar a aplicabilidade em tempo real e obtenção de resultados práticos dos algoritmos projetados. Os resultados de preensão alcançam 90% de precisão com alta velocidade de processamento. Um dos modelos projetados para controle servo-visual alcança precisão milimétrica de posicionamento para um objeto visto pela primeira vez. Em uma pequena avaliação, o sistema completo executou o rastreamento e a preensão de objetos dinâmicos vistos pela primeira vez em 85% das tentativas. Assim, este trabalho apresenta um novo sistema de manipulação robótica autônoma, capaz de generalizar para diferentes objetos e com alta velocidade de processamento, o que permite sua aplicação em sistemas robóticos de tempo real.Biblioteca Digitais de Teses e Dissertações da USPGrassi Junior, ValdirRibeiro, Eduardo Godinho2020-04-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/18/18153/tde-25092020-134758/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-06-25T21:30:02Zoai:teses.usp.br:tde-25092020-134758Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-06-25T21:30:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
Uma abordagem baseada em aprendizagem profunda para controle servo-visual e detecção de pontos de preensão para manipulação robótica autônoma
title A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
spellingShingle A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
Ribeiro, Eduardo Godinho
Aprendizagem profunda
Controle servo-visual
Deep learning
Preensão robótica
Robotic grasping
Visual servoing
title_short A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
title_full A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
title_fullStr A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
title_full_unstemmed A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
title_sort A deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation
author Ribeiro, Eduardo Godinho
author_facet Ribeiro, Eduardo Godinho
author_role author
dc.contributor.none.fl_str_mv Grassi Junior, Valdir
dc.contributor.author.fl_str_mv Ribeiro, Eduardo Godinho
dc.subject.por.fl_str_mv Aprendizagem profunda
Controle servo-visual
Deep learning
Preensão robótica
Robotic grasping
Visual servoing
topic Aprendizagem profunda
Controle servo-visual
Deep learning
Preensão robótica
Robotic grasping
Visual servoing
description The development of the robotics and artificial intelligence fields has not yet allowed robots to execute, with dexterity, simple actions performed by humans. One of them is the grasping of objects by robotic manipulators. Aiming to explore the use of deep learning algorithms, specifically Convolutional Neural Networks, to approach the robotic grasping problem, this work addresses the visual perception phase involved in the task. That is, the processing of visual data to obtain the location of the object to be grasped, its pose and the points at which the robot\'s grippers must make contact to ensure a stable grasp. For this, the dataset Cornell Grasping is used to train a convolutional neural network capable of considering these three stages simultaneously. In other words, having an image of the robot\'s workspace, containing a certain object, the network predicts a grasp rectangle that symbolizes the position, orientation and opening of the robot\'s parallel grippers in the instant before its closing. In addition to this network, capable of processing images in real-time, another network is designed so that it is possible to deal with situations in which the object moves in the environment. In this way, the second convolutional network is trained to perform a visual servo control which ensures that the object remains in the robot\'s field of view. This network predicts the proportional values of the linear and angular velocities that the camera must have so that the object is always in the image processed by the grasp network. The dataset used for training was generated, with reduced human supervision, by a Kinova Gen3 robotic manipulator with seven degrees of freedom. The robot is also used to evaluate the applicability in real-time and obtain practical results from the designed algorithms. In addition, the offline results obtained through validation sets are also analyzed and discussed taking into account their efficiency and processing speed. The results for grasping exceed 90% accuracy with state-of-the-art prediction speed. Regarding visual servoing, one of the designed models achieves millimeter positioning accuracy for a first-seen object. In a small evaluation, the complete system performed successful tracking and grasping of first-seen dynamic objects in 85% of attempts. So, this work presents a new system for autonomous robotic manipulation, able to generalize to different objects and with high processing speed, which allows its application in real-time and real-world robotic systems.
publishDate 2020
dc.date.none.fl_str_mv 2020-04-16
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/18/18153/tde-25092020-134758/
url https://www.teses.usp.br/teses/disponiveis/18/18153/tde-25092020-134758/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815258449212080128