Graph to sequence syntactic pattern recognition for image classification problems

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Gilberto Astolfi
Orientador(a): Hemerson Pistori
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufms.br/handle/123456789/3913
Resumo: A growing interest in applying Natural Language Processing (NLP) models in computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this work, a new method for applying NLP to image classification problems is proposed. The aim is to represent the visual patterns of objects using a sequence of alphabet symbols and then train some form of Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. The visual pattern representation of objects in a syntactic way allows PLN models to be applied to image classification problems in a natural way, i.e., in the same way as applied to natural language problems. Two visual pattern representation approaches of objects in a syntactic way were investigated: representation using keypoints and representation using component parts of objects. In the approach that uses keypoints, the keypoints are identified in the images, associated with alphabet symbols, and then related using a graph to derive strings from images. Strings are the inputs for training an LSTM encoder. Experiments showed evidence that the syntactic pattern representation can represent visual variations in superpixel images captured by Unmanned Aerial Vehicles, even when there is a small set of images for training. In the approach that uses component parts of objects, the component parts are provided by means of bounding boxes in the images. The component parts are associated with alphabet symbols and related with each other to derive a sequence of symbols from the object for representing its visual pattern. Then, some form of GRU, LSTM, or Transformer are trained to learn the spatial relation between component parts of the objects contained in the sequences. An extensive experimental evaluation using a limited number of samples for training has been conducted to compare our method with ResNet-50 deep learning architecture. The results achieved by the proposed method overcome ResNet-50 in all test scenarios. In one test, the method presents an average accuracy of 95.3% against 89.9% of the ResNet-50. Both experiments showed evidence that from a finite set of primitive structures is possible to obtain many variations in the visual pattern of the object same when there are few samples for training. Besides, the experiments evidenced that the NPL models can be applied in a natural way to image classification problems in computer vision.
id UFMS_a3c62b4411cc8c42bcf72ddca4fc5c4c
oai_identifier_str oai:repositorio.ufms.br:123456789/3913
network_acronym_str UFMS
network_name_str Repositório Institucional da UFMS
repository_id_str
spelling 2021-08-18T14:51:46Z2021-09-30T19:57:46Z2021https://repositorio.ufms.br/handle/123456789/3913A growing interest in applying Natural Language Processing (NLP) models in computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this work, a new method for applying NLP to image classification problems is proposed. The aim is to represent the visual patterns of objects using a sequence of alphabet symbols and then train some form of Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. The visual pattern representation of objects in a syntactic way allows PLN models to be applied to image classification problems in a natural way, i.e., in the same way as applied to natural language problems. Two visual pattern representation approaches of objects in a syntactic way were investigated: representation using keypoints and representation using component parts of objects. In the approach that uses keypoints, the keypoints are identified in the images, associated with alphabet symbols, and then related using a graph to derive strings from images. Strings are the inputs for training an LSTM encoder. Experiments showed evidence that the syntactic pattern representation can represent visual variations in superpixel images captured by Unmanned Aerial Vehicles, even when there is a small set of images for training. In the approach that uses component parts of objects, the component parts are provided by means of bounding boxes in the images. The component parts are associated with alphabet symbols and related with each other to derive a sequence of symbols from the object for representing its visual pattern. Then, some form of GRU, LSTM, or Transformer are trained to learn the spatial relation between component parts of the objects contained in the sequences. An extensive experimental evaluation using a limited number of samples for training has been conducted to compare our method with ResNet-50 deep learning architecture. The results achieved by the proposed method overcome ResNet-50 in all test scenarios. In one test, the method presents an average accuracy of 95.3% against 89.9% of the ResNet-50. Both experiments showed evidence that from a finite set of primitive structures is possible to obtain many variations in the visual pattern of the object same when there are few samples for training. Besides, the experiments evidenced that the NPL models can be applied in a natural way to image classification problems in computer vision.Um interesse crescente na aplicação de modelos de Processamento de Linguagem Natural (PLN) em problemas de visão computacional tem emergido recentemente. Esse interesse é motivado pelo sucesso dos modelos de PLN em tarefas como tradução e sumarização de textos. Neste trabalho, um novo método para aplicação de PLN em problemas de classificação de imagens é proposto. O objetivo é representar os padrões visuais de objetos usando uma sequência de símbolos do alfabeto e, em seguida, treinar alguma forma de Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM) ou Transformer usando essas sequências para classificar objetos. A representação de padrões visuais de objetos de maneira sintática permite que os modelos PLN sejam aplicados a problemas de classificação de imagens de uma forma natural, ou seja, da mesma forma que são aplicados a problemas de linguagem natural. Duas abordagens de representação de padrões visuais de objetos de maneira sintática foram investigadas: representação usando pontos-chave e representação usando partes componentes de objetos. Na abordagem que usa pontos-chave, os pontos-chave são identificados nas imagens, associados a símbolos do alfabeto e, em seguida, relacionados usando um grafo para derivar sequências de símbolos das imagens. As sequências de símbolos são as entradas para treinar um codificador LSTM. Experimentos mostraram evidências de que a representação sintática de padrão pode representar variações visuais em imagens de superpixel capturadas por Veículos Aéreos não Tripulados, mesmo quando há um pequeno conjunto de imagens para treinamento. Na abordagem que usa partes componentes de objetos, as partes componentes são fornecidas por meio de caixas delimitadoras nas imagens. As partes componentes são associadas aos símbolos do alfabeto e relacionadas entre si para derivar uma sequência de símbolos do objeto para representar seu padrão visual. Então, alguma forma de GRU, LSTM ou Transformer são treinados para aprender a relação espacial entre as partes componentes dos objetos contidos nas sequências. Uma extensa avaliação experimental, usando um número limitado de amostras para treinamento, foi conduzida para comparar nosso método com a arquitetura de aprendizagem profunda ResNet-50. Os resultados alcançados pelo método proposto superam a ResNet-50 em todos os cenários de teste. Em um teste, o método apresenta acurácia média de 95,3% contra 89,9% da ResNet-50. Ambos os experimentos mostraram evidências de que a partir de um conjunto finito de estruturas primitivas é possível obter muitas variações no padrão visual do objeto mesmo quando há poucas amostras para treinamento. Além disso, os experimentos evidenciaram que os modelos PLN podem ser aplicados de forma natural a problemas de classificação de imagens em visão computacional.Fundação Universidade Federal de Mato Grosso do SulUFMSBrasilSyntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer VisionGraph to sequence syntactic pattern recognition for image classification problemsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisHemerson PistoriGilberto Astolfiinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMSinstname:Universidade Federal de Mato Grosso do Sul (UFMS)instacron:UFMSTHUMBNAILgilberto_astolfi_FINAL.pdf.jpggilberto_astolfi_FINAL.pdf.jpgGenerated Thumbnailimage/jpeg1132https://repositorio.ufms.br/bitstream/123456789/3913/3/gilberto_astolfi_FINAL.pdf.jpg48a8a1f28ffbaafaeb050bf779c3e603MD53TEXTgilberto_astolfi_FINAL.pdf.txtgilberto_astolfi_FINAL.pdf.txtExtracted texttext/plain267514https://repositorio.ufms.br/bitstream/123456789/3913/2/gilberto_astolfi_FINAL.pdf.txt081c1931f9198f048cae7cd2a7be408cMD52ORIGINALgilberto_astolfi_FINAL.pdfgilberto_astolfi_FINAL.pdfapplication/pdf17688900https://repositorio.ufms.br/bitstream/123456789/3913/1/gilberto_astolfi_FINAL.pdf5fc6a3c097de341f360436e96ad71860MD51123456789/39132021-09-30 15:57:46.347oai:repositorio.ufms.br:123456789/3913Repositório InstitucionalPUBhttps://repositorio.ufms.br/oai/requestri.prograd@ufms.bropendoar:21242021-09-30T19:57:46Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)false
dc.title.pt_BR.fl_str_mv Graph to sequence syntactic pattern recognition for image classification problems
title Graph to sequence syntactic pattern recognition for image classification problems
spellingShingle Graph to sequence syntactic pattern recognition for image classification problems
Gilberto Astolfi
Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
title_short Graph to sequence syntactic pattern recognition for image classification problems
title_full Graph to sequence syntactic pattern recognition for image classification problems
title_fullStr Graph to sequence syntactic pattern recognition for image classification problems
title_full_unstemmed Graph to sequence syntactic pattern recognition for image classification problems
title_sort Graph to sequence syntactic pattern recognition for image classification problems
author Gilberto Astolfi
author_facet Gilberto Astolfi
author_role author
dc.contributor.advisor1.fl_str_mv Hemerson Pistori
dc.contributor.author.fl_str_mv Gilberto Astolfi
contributor_str_mv Hemerson Pistori
dc.subject.por.fl_str_mv Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
topic Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
description A growing interest in applying Natural Language Processing (NLP) models in computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this work, a new method for applying NLP to image classification problems is proposed. The aim is to represent the visual patterns of objects using a sequence of alphabet symbols and then train some form of Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. The visual pattern representation of objects in a syntactic way allows PLN models to be applied to image classification problems in a natural way, i.e., in the same way as applied to natural language problems. Two visual pattern representation approaches of objects in a syntactic way were investigated: representation using keypoints and representation using component parts of objects. In the approach that uses keypoints, the keypoints are identified in the images, associated with alphabet symbols, and then related using a graph to derive strings from images. Strings are the inputs for training an LSTM encoder. Experiments showed evidence that the syntactic pattern representation can represent visual variations in superpixel images captured by Unmanned Aerial Vehicles, even when there is a small set of images for training. In the approach that uses component parts of objects, the component parts are provided by means of bounding boxes in the images. The component parts are associated with alphabet symbols and related with each other to derive a sequence of symbols from the object for representing its visual pattern. Then, some form of GRU, LSTM, or Transformer are trained to learn the spatial relation between component parts of the objects contained in the sequences. An extensive experimental evaluation using a limited number of samples for training has been conducted to compare our method with ResNet-50 deep learning architecture. The results achieved by the proposed method overcome ResNet-50 in all test scenarios. In one test, the method presents an average accuracy of 95.3% against 89.9% of the ResNet-50. Both experiments showed evidence that from a finite set of primitive structures is possible to obtain many variations in the visual pattern of the object same when there are few samples for training. Besides, the experiments evidenced that the NPL models can be applied in a natural way to image classification problems in computer vision.
publishDate 2021
dc.date.accessioned.fl_str_mv 2021-08-18T14:51:46Z
dc.date.available.fl_str_mv 2021-09-30T19:57:46Z
dc.date.issued.fl_str_mv 2021
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufms.br/handle/123456789/3913
url https://repositorio.ufms.br/handle/123456789/3913
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Fundação Universidade Federal de Mato Grosso do Sul
dc.publisher.initials.fl_str_mv UFMS
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Fundação Universidade Federal de Mato Grosso do Sul
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMS
instname:Universidade Federal de Mato Grosso do Sul (UFMS)
instacron:UFMS
instname_str Universidade Federal de Mato Grosso do Sul (UFMS)
instacron_str UFMS
institution UFMS
reponame_str Repositório Institucional da UFMS
collection Repositório Institucional da UFMS
bitstream.url.fl_str_mv https://repositorio.ufms.br/bitstream/123456789/3913/3/gilberto_astolfi_FINAL.pdf.jpg
https://repositorio.ufms.br/bitstream/123456789/3913/2/gilberto_astolfi_FINAL.pdf.txt
https://repositorio.ufms.br/bitstream/123456789/3913/1/gilberto_astolfi_FINAL.pdf
bitstream.checksum.fl_str_mv 48a8a1f28ffbaafaeb050bf779c3e603
081c1931f9198f048cae7cd2a7be408c
5fc6a3c097de341f360436e96ad71860
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)
repository.mail.fl_str_mv ri.prograd@ufms.br
_version_ 1797953137479254016