Multi-task Learning Applied to Computer Vision Problems

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: DIOGO NUNES GONCALVES
Orientador(a): Hemerson Pistori
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Fundação Universidade Federal de Mato Grosso do Sul
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufms.br/handle/123456789/4841
Resumo: Deep learning has been widely studied, mainly to solve problems considered complex. In general, these problems can be described and divided into a set of tasks. These tasks are intrinsic to the general problem, that is, they are naturally defined because they are part of the essence of the problem. In addition, they can be learned in isolation but are related to the solution of the general problem. Another important factor is that for a larger computer vision problem, performing the distinct tasks individually becomes expensive in memory and inference time. To solve these problems, several approaches as Multi-task Learning (MTL) was proposed. The idea is to simulate human learning, in which people can learn new tasks through experiences gained in learning similar tasks. This approach allows the learning of the tasks simultaneously, building a relationship between them. From these directions, this work in the form of a collection of articles presents MTL approaches for solving computer vision problems. Initially, two problems were addressed: detection of plantation lines in the first article and detection of fingerlings in the second. In the detection of plantation lines the idea is to divide the problem into identifying the plants individually and detecting the plantation lines. In fingerling detection, the tasks are divided into detecting the fingerling and identifying the fingerling direction in subsequent frames. For both problems, a method was proposed with a backbone that extracts the initial features for all tasks. Taking the initial features as input, independent branches learn the solution of each task. The exchange of information between tasks occurs through the concatenation of features extracted at specific points in each branch. The results showed that sharing between tasks is important for the solution, achieving results superior to the state-of-the-art. In addition to the two proposals, a new semantic segmentation method using MTL and attention mechanism was proposed. The main advance was the use of weights learned by Transformers to indicate the importance of a task to others. Thus, only image regions considered relevant influence other tasks. Results on two problems, plantation line and gaps, and leaf segmentation and defoliation, showed the effectiveness of the approach compared to the state-of-the-art.
id UFMS_9b3c9e7edde24ac4058b6836d7a5409c
oai_identifier_str oai:repositorio.ufms.br:123456789/4841
network_acronym_str UFMS
network_name_str Repositório Institucional da UFMS
repository_id_str
spelling 2022-06-10T13:16:52Z2022-06-10T13:16:52Z2022https://repositorio.ufms.br/handle/123456789/4841Deep learning has been widely studied, mainly to solve problems considered complex. In general, these problems can be described and divided into a set of tasks. These tasks are intrinsic to the general problem, that is, they are naturally defined because they are part of the essence of the problem. In addition, they can be learned in isolation but are related to the solution of the general problem. Another important factor is that for a larger computer vision problem, performing the distinct tasks individually becomes expensive in memory and inference time. To solve these problems, several approaches as Multi-task Learning (MTL) was proposed. The idea is to simulate human learning, in which people can learn new tasks through experiences gained in learning similar tasks. This approach allows the learning of the tasks simultaneously, building a relationship between them. From these directions, this work in the form of a collection of articles presents MTL approaches for solving computer vision problems. Initially, two problems were addressed: detection of plantation lines in the first article and detection of fingerlings in the second. In the detection of plantation lines the idea is to divide the problem into identifying the plants individually and detecting the plantation lines. In fingerling detection, the tasks are divided into detecting the fingerling and identifying the fingerling direction in subsequent frames. For both problems, a method was proposed with a backbone that extracts the initial features for all tasks. Taking the initial features as input, independent branches learn the solution of each task. The exchange of information between tasks occurs through the concatenation of features extracted at specific points in each branch. The results showed that sharing between tasks is important for the solution, achieving results superior to the state-of-the-art. In addition to the two proposals, a new semantic segmentation method using MTL and attention mechanism was proposed. The main advance was the use of weights learned by Transformers to indicate the importance of a task to others. Thus, only image regions considered relevant influence other tasks. Results on two problems, plantation line and gaps, and leaf segmentation and defoliation, showed the effectiveness of the approach compared to the state-of-the-art.O aprendizado profundo tem sido amplamente estudado, principalmente para resolver problemas considerados complexos. De forma geral, esses problemas podem ser descritos e divididos em um conjunto de tarefas. Essas tarefas são intrínsecas ao problema geral, ou seja, são definidas de forma natural por fazer parte da essência do problema. Além disso, elas podem ser aprendidas de forma isolada porém estão relacionadas para solução do problema geral. Outro fator importante é que para um problema maior de visão computacional, realizar as tarefas distintas individualmente se torna muito custoso em memória e tempo de inferência. Para solucionar esses problemas uma abordagem chamada Aprendizado Multitarefa (MTL) foi proposta. A ideia é simular o aprendizado humano, em que pessoas podem aprender novas tarefas através de experiências obtidas no aprendizado de tarefas similares. Essa abordagem permite o aprendizado das tarefas do problema de maneira simultânea, construindo uma relação entre elas. A partir desses direcionamentos, este trabalho na forma de coleção de artigos apresenta abordagens MTL para a resolução de problemas de visão computacional. Inicialmente, dois problemas foram abordados: detecção de linhas de plantação no primeiro artigo e detecção de alevinos no segundo. Na detecção de linhas de plantação a ideia é dividir o problema em identificar as plantas individualmente e detectar as linhas de plantação. Na detecção de alevinos as tarefas são divididas em detectar o alevino e identificar a direção desse alevino nos quadros subsequentes. Para ambos os problemas, um método foi proposto com um backbone que extrai as características iniciais para todas as tarefas. Tendo como entrada as características iniciais, ramos independentes aprendem a solução de cada tarefa. A troca de informações entre as tarefas ocorre por meio da concatenação das características extraídas em pontos específicos de cada ramo. Os resultados mostraram que o compartilhamento entre as tarefas é importante para a solução, alcançando resultados superiores ao estado-da-arte. Além das duas propostas, um novo método de segmentação semântica usando MTL e mecanismo de atenção foi proposto. O principal avanço foi o uso de pesos aprendidos por Transformers para indicar a importância de uma tarefa nas demais. Assim, apenas regiões da imagem consideradas relevantes influenciam em outras tarefas. Os resultados em dois problemas, segmentação de linhas e falhas de plantação, e segmentação da folha e desfolha, mostraram a eficácia da abordagem frente ao estado-da-arte.Fundação Universidade Federal de Mato Grosso do SulUFMSBrasilDeep learningmulti-task learningidentifying plantation linesfingerling counting.Multi-task Learning Applied to Computer Vision Problemsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisHemerson PistoriDIOGO NUNES GONCALVESinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMSinstname:Universidade Federal de Mato Grosso do Sul (UFMS)instacron:UFMSORIGINALmodelotex_submetido.pdfmodelotex_submetido.pdfapplication/pdf30210738https://repositorio.ufms.br/bitstream/123456789/4841/-1/modelotex_submetido.pdfa3b420d0b69b42ccfdec5304320576f1MD5-1123456789/48412022-06-10 09:16:57.022oai:repositorio.ufms.br:123456789/4841Repositório InstitucionalPUBhttps://repositorio.ufms.br/oai/requestri.prograd@ufms.bropendoar:21242022-06-10T13:16:57Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)false
dc.title.pt_BR.fl_str_mv Multi-task Learning Applied to Computer Vision Problems
title Multi-task Learning Applied to Computer Vision Problems
spellingShingle Multi-task Learning Applied to Computer Vision Problems
DIOGO NUNES GONCALVES
Deep learning
multi-task learning
identifying plantation lines
fingerling counting.
title_short Multi-task Learning Applied to Computer Vision Problems
title_full Multi-task Learning Applied to Computer Vision Problems
title_fullStr Multi-task Learning Applied to Computer Vision Problems
title_full_unstemmed Multi-task Learning Applied to Computer Vision Problems
title_sort Multi-task Learning Applied to Computer Vision Problems
author DIOGO NUNES GONCALVES
author_facet DIOGO NUNES GONCALVES
author_role author
dc.contributor.advisor1.fl_str_mv Hemerson Pistori
dc.contributor.author.fl_str_mv DIOGO NUNES GONCALVES
contributor_str_mv Hemerson Pistori
dc.subject.por.fl_str_mv Deep learning
multi-task learning
identifying plantation lines
fingerling counting.
topic Deep learning
multi-task learning
identifying plantation lines
fingerling counting.
description Deep learning has been widely studied, mainly to solve problems considered complex. In general, these problems can be described and divided into a set of tasks. These tasks are intrinsic to the general problem, that is, they are naturally defined because they are part of the essence of the problem. In addition, they can be learned in isolation but are related to the solution of the general problem. Another important factor is that for a larger computer vision problem, performing the distinct tasks individually becomes expensive in memory and inference time. To solve these problems, several approaches as Multi-task Learning (MTL) was proposed. The idea is to simulate human learning, in which people can learn new tasks through experiences gained in learning similar tasks. This approach allows the learning of the tasks simultaneously, building a relationship between them. From these directions, this work in the form of a collection of articles presents MTL approaches for solving computer vision problems. Initially, two problems were addressed: detection of plantation lines in the first article and detection of fingerlings in the second. In the detection of plantation lines the idea is to divide the problem into identifying the plants individually and detecting the plantation lines. In fingerling detection, the tasks are divided into detecting the fingerling and identifying the fingerling direction in subsequent frames. For both problems, a method was proposed with a backbone that extracts the initial features for all tasks. Taking the initial features as input, independent branches learn the solution of each task. The exchange of information between tasks occurs through the concatenation of features extracted at specific points in each branch. The results showed that sharing between tasks is important for the solution, achieving results superior to the state-of-the-art. In addition to the two proposals, a new semantic segmentation method using MTL and attention mechanism was proposed. The main advance was the use of weights learned by Transformers to indicate the importance of a task to others. Thus, only image regions considered relevant influence other tasks. Results on two problems, plantation line and gaps, and leaf segmentation and defoliation, showed the effectiveness of the approach compared to the state-of-the-art.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-06-10T13:16:52Z
dc.date.available.fl_str_mv 2022-06-10T13:16:52Z
dc.date.issued.fl_str_mv 2022
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufms.br/handle/123456789/4841
url https://repositorio.ufms.br/handle/123456789/4841
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Fundação Universidade Federal de Mato Grosso do Sul
dc.publisher.initials.fl_str_mv UFMS
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Fundação Universidade Federal de Mato Grosso do Sul
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMS
instname:Universidade Federal de Mato Grosso do Sul (UFMS)
instacron:UFMS
instname_str Universidade Federal de Mato Grosso do Sul (UFMS)
instacron_str UFMS
institution UFMS
reponame_str Repositório Institucional da UFMS
collection Repositório Institucional da UFMS
bitstream.url.fl_str_mv https://repositorio.ufms.br/bitstream/123456789/4841/-1/modelotex_submetido.pdf
bitstream.checksum.fl_str_mv a3b420d0b69b42ccfdec5304320576f1
bitstream.checksumAlgorithm.fl_str_mv MD5
repository.name.fl_str_mv Repositório Institucional da UFMS - Universidade Federal de Mato Grosso do Sul (UFMS)
repository.mail.fl_str_mv ri.prograd@ufms.br
_version_ 1845881968024092672