Graph to sequence syntactic pattern recognition for image classification problems

Gilberto Astolfi

Graph to sequence syntactic pattern recognition for image classification problems

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	Gilberto Astolfi
Orientador(a):	Hemerson Pistori
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Brasil
Palavras-chave em Português:	Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
Link de acesso:	https://repositorio.ufms.br/handle/123456789/3913
Resumo:	A growing interest in applying Natural Language Processing (NLP) models in computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this work, a new method for applying NLP to image classification problems is proposed. The aim is to represent the visual patterns of objects using a sequence of alphabet symbols and then train some form of Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. The visual pattern representation of objects in a syntactic way allows PLN models to be applied to image classification problems in a natural way, i.e., in the same way as applied to natural language problems. Two visual pattern representation approaches of objects in a syntactic way were investigated: representation using keypoints and representation using component parts of objects. In the approach that uses keypoints, the keypoints are identified in the images, associated with alphabet symbols, and then related using a graph to derive strings from images. Strings are the inputs for training an LSTM encoder. Experiments showed evidence that the syntactic pattern representation can represent visual variations in superpixel images captured by Unmanned Aerial Vehicles, even when there is a small set of images for training. In the approach that uses component parts of objects, the component parts are provided by means of bounding boxes in the images. The component parts are associated with alphabet symbols and related with each other to derive a sequence of symbols from the object for representing its visual pattern. Then, some form of GRU, LSTM, or Transformer are trained to learn the spatial relation between component parts of the objects contained in the sequences. An extensive experimental evaluation using a limited number of samples for training has been conducted to compare our method with ResNet-50 deep learning architecture. The results achieved by the proposed method overcome ResNet-50 in all test scenarios. In one test, the method presents an average accuracy of 95.3% against 89.9% of the ResNet-50. Both experiments showed evidence that from a finite set of primitive structures is possible to obtain many variations in the visual pattern of the object same when there are few samples for training. Besides, the experiments evidenced that the NPL models can be applied in a natural way to image classification problems in computer vision.

Metadados do item

id	UFMS_a3c62b4411cc8c42bcf72ddca4fc5c4c
oai_identifier_str	oai:repositorio.ufms.br:123456789/3913
network_acronym_str	UFMS
network_name_str	Repositório Institucional da UFMS
repository_id_str
spelling	2021-08-18T14:51:46Z2021-09-30T19:57:46Z2021https://repositorio.ufms.br/handle/123456789/3913A growing interest in applying Natural Language Processing (NLP) models in computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this work, a new method for applying NLP to image classification problems is proposed. The aim is to represent the visual patterns of objects using a sequence of alphabet symbols and then train some form of Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. The visual pattern representation of objects in a syntactic way allows PLN models to be applied to image classification problems in a natural way, i.e., in the same way as applied to natural language problems. Two visual pattern representation approaches of objects in a syntactic way were investigated: representation using keypoints and representation using component parts of objects. In the approach that uses keypoints, the keypoints are identified in the images, associated with alphabet symbols, and then related using a graph to derive strings from images. Strings are the inputs for training an LSTM encoder. Experiments showed evidence that the syntactic pattern representation can represent visual variations in superpixel images captured by Unmanned Aerial Vehicles, even when there is a small set of images for training. In the approach that uses component parts of objects, the component parts are provided by means of bounding boxes in the images. The component parts are associated with alphabet symbols and related with each other to derive a sequence of symbols from the object for representing its visual pattern. Then, some form of GRU, LSTM, or Transformer are trained to learn the spatial relation between component parts of the objects contained in the sequences. An extensive experimental evaluation using a limited number of samples for training has been conducted to compare our method with ResNet-50 deep learning architecture. The results achieved by the proposed method overcome ResNet-50 in all test scenarios. In one test, the method presents an average accuracy of 95.3% against 89.9% of the ResNet-50. Both experiments showed evidence that from a finite set of primitive structures is possible to obtain many variations in the visual pattern of the object same when there are few samples for training. Besides, the experiments evidenced that the NPL models can be applied in a natural way to image classification problems in computer vision.Um interesse crescente na aplicação de modelos de Processamento de Linguagem Natural (PLN) em problemas de visão computacional tem emergido recentemente. Esse interesse é motivado pelo sucesso dos modelos de PLN em tarefas como tradução e sumarização de textos. Neste trabalho, um novo método para aplicação de PLN em problemas de classificação de imagens é proposto. O objetivo é representar os padrões visuais de objetos usando uma sequência de símbolos do alfabeto e, em seguida, treinar alguma forma de Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM) ou Transformer usando essas sequências para classificar objetos. A representação de padrões visuais de objetos de maneira sintática permite que os modelos PLN sejam aplicados a problemas de classificação de imagens de uma forma natural, ou seja, da mesma forma que são aplicados a problemas de linguagem natural. Duas abordagens de representação de padrões visuais de objetos de maneira sintática foram investigadas: representação usando pontos-chave e representação usando partes componentes de objetos. Na abordagem que usa pontos-chave, os pontos-chave são identificados nas imagens, associados a símbolos do alfabeto e, em seguida, relacionados usando um grafo para derivar sequências de símbolos das imagens. As sequências de símbolos são as entradas para treinar um codificador LSTM. Experimentos mostraram evidências de que a representação sintática de padrão pode representar variações visuais em imagens de superpixel capturadas por Veículos Aéreos não Tripulados, mesmo quando há um pequeno conjunto de imagens para treinamento. Na abordagem que usa partes componentes de objetos, as partes componentes são fornecidas por meio de caixas delimitadoras nas imagens. As partes componentes são associadas aos símbolos do alfabeto e relacionadas entre si para derivar uma sequência de símbolos do objeto para representar seu padrão visual. Então, alguma forma de GRU, LSTM ou Transformer são treinados para aprender a relação espacial entre as partes componentes dos objetos contidos nas sequências. Uma extensa avaliação experimental, usando um número limitado de amostras para treinamento, foi conduzida para comparar nosso método com a arquitetura de aprendizagem profunda ResNet-50. Os resultados alcançados pelo método proposto superam a ResNet-50 em todos os cenários de teste. Em um teste, o método apresenta acurácia média de 95,3% contra 89,9% da ResNet-50. Ambos os experimentos mostraram evidências de que a partir de um conjunto finito de estruturas primitivas é possível obter muitas variações no padrão visual do objeto mesmo quando há poucas amostras para treinamento. Além disso, os experimentos evidenciaram que os modelos PLN podem ser aplicados de forma natural a problemas de classificação de imagens em visão computacional.Fundação Universidade Federal de Mato Grosso do SulUFMSBrasilSyntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer VisionGraph to sequence syntactic pattern recognition for image classification problemsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisHemerson PistoriGilberto Astolfiinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMSinstname:Universidade Federal de Mato Grosso do Sul (UFMS)instacron:UFMSTHUMBNAILgilberto_astolfi_FINAL.pdf.jpggilberto_astolfi_FINAL.pdf.jpgGenerated Thumbnailimage/jpeg1132https://repositorio.ufms.br/bitstream/123456789/3913/3/gilberto_astolfi_FINAL.pdf.jpg48a8a1f28ffbaafaeb050bf779c3e603MD53TEXTgilberto_astolfi_FINAL.pdf.txtgilberto_astolfi_FINAL.pdf.txtExtracted texttext/plain267514https://repositorio.ufms.br/bitstream/123456789/3913/2/gilberto_astolfi_FINAL.pdf.txt081c1931f9198f048cae7cd2a7be408cMD52ORIGINALgilberto_astolfi_FINAL.pdfgilberto_astolfi_FINAL.pdfapplication/pdf17688900https://repositorio.ufms.br/bitstream/123456789/3913/1/gilberto_astolfi_FINAL.pdf5fc6a3c097de341f360436e96ad71860MD51123456789/39132021-09-30 15:57:46.347oai:repositorio.ufms.br:123456789/3913Repositório InstitucionalPUBhttps://repositorio.ufms.br/oai/requestri.prograd@ufms.bropendoar:21242021-09-30T19:57:46Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)false
dc.title.pt_BR.fl_str_mv	Graph to sequence syntactic pattern recognition for image classification problems
title	Graph to sequence syntactic pattern recognition for image classification problems
spellingShingle	Graph to sequence syntactic pattern recognition for image classification problems Gilberto Astolfi Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
title_short	Graph to sequence syntactic pattern recognition for image classification problems
title_full	Graph to sequence syntactic pattern recognition for image classification problems
title_fullStr	Graph to sequence syntactic pattern recognition for image classification problems
title_full_unstemmed	Graph to sequence syntactic pattern recognition for image classification problems
title_sort	Graph to sequence syntactic pattern recognition for image classification problems
author	Gilberto Astolfi
author_facet	Gilberto Astolfi
author_role	author
dc.contributor.advisor1.fl_str_mv	Hemerson Pistori
dc.contributor.author.fl_str_mv	Gilberto Astolfi
contributor_str_mv	Hemerson Pistori
dc.subject.por.fl_str_mv	Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
topic	Syntactic Pattern Recognition, Recurrent Neural Network, Visual Word, Computer Vision
description	A growing interest in applying Natural Language Processing (NLP) models in computer vision problems has recently emerged. This interest is motivated by the success of NLP models in tasks such as translation and text summarization. In this work, a new method for applying NLP to image classification problems is proposed. The aim is to represent the visual patterns of objects using a sequence of alphabet symbols and then train some form of Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), or Transformer using these sequences to classify objects. The visual pattern representation of objects in a syntactic way allows PLN models to be applied to image classification problems in a natural way, i.e., in the same way as applied to natural language problems. Two visual pattern representation approaches of objects in a syntactic way were investigated: representation using keypoints and representation using component parts of objects. In the approach that uses keypoints, the keypoints are identified in the images, associated with alphabet symbols, and then related using a graph to derive strings from images. Strings are the inputs for training an LSTM encoder. Experiments showed evidence that the syntactic pattern representation can represent visual variations in superpixel images captured by Unmanned Aerial Vehicles, even when there is a small set of images for training. In the approach that uses component parts of objects, the component parts are provided by means of bounding boxes in the images. The component parts are associated with alphabet symbols and related with each other to derive a sequence of symbols from the object for representing its visual pattern. Then, some form of GRU, LSTM, or Transformer are trained to learn the spatial relation between component parts of the objects contained in the sequences. An extensive experimental evaluation using a limited number of samples for training has been conducted to compare our method with ResNet-50 deep learning architecture. The results achieved by the proposed method overcome ResNet-50 in all test scenarios. In one test, the method presents an average accuracy of 95.3% against 89.9% of the ResNet-50. Both experiments showed evidence that from a finite set of primitive structures is possible to obtain many variations in the visual pattern of the object same when there are few samples for training. Besides, the experiments evidenced that the NPL models can be applied in a natural way to image classification problems in computer vision.
publishDate	2021
dc.date.accessioned.fl_str_mv	2021-08-18T14:51:46Z
dc.date.available.fl_str_mv	2021-09-30T19:57:46Z
dc.date.issued.fl_str_mv	2021
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://repositorio.ufms.br/handle/123456789/3913
url	https://repositorio.ufms.br/handle/123456789/3913
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Fundação Universidade Federal de Mato Grosso do Sul
dc.publisher.initials.fl_str_mv	UFMS
dc.publisher.country.fl_str_mv	Brasil
publisher.none.fl_str_mv	Fundação Universidade Federal de Mato Grosso do Sul
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMS instname:Universidade Federal de Mato Grosso do Sul (UFMS) instacron:UFMS
instname_str	Universidade Federal de Mato Grosso do Sul (UFMS)
instacron_str	UFMS
institution	UFMS
reponame_str	Repositório Institucional da UFMS
collection	Repositório Institucional da UFMS
bitstream.url.fl_str_mv	https://repositorio.ufms.br/bitstream/123456789/3913/3/gilberto_astolfi_FINAL.pdf.jpg https://repositorio.ufms.br/bitstream/123456789/3913/2/gilberto_astolfi_FINAL.pdf.txt https://repositorio.ufms.br/bitstream/123456789/3913/1/gilberto_astolfi_FINAL.pdf
bitstream.checksum.fl_str_mv	48a8a1f28ffbaafaeb050bf779c3e603 081c1931f9198f048cae7cd2a7be408c 5fc6a3c097de341f360436e96ad71860
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)
repository.mail.fl_str_mv	ri.prograd@ufms.br
_version_	1797953137479254016

Graph to sequence syntactic pattern recognition for image classification problems

Registros relacionados