Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Lyra, Lucas de Oliveira
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/
Resumo: In this thesis, we investigate the use of Fisher Vector (FV) for encoding multilevel features extracted from deep neural networks in image classification. Specifically, we employ Convolutional Neural Networks (CNNs) and hybrid CNN + Visual Transformer (ViT) models for feature extraction. While CNNs are effective at extracting generalist features, they exhibit a locality bias, which we address using hybrid architectures. The FV encoding method tackles issues related to order-sensitive encoders, such as Fully-Connected layers, in visual texture recognition and related fields (e.g., medical image classification). Our results demonstrate that the proposed approach significantly improves CNN accuracy in visual texture recognition. Despite its usefulness in the context of limited data availability, scalability to larger datasets remains a challenge. To mitigate this, we propose a method for reducing the computational costs of FV encoding. We rigorously evaluate the robustness of this method and apply it to larger datasets within the context of medical image classification. Additionally, we explore the impact of fine-tuning on the models performance. Finally, our approach proves suitable for both small and large datasets, exhibiting competitiveness compared to existing literature.
id USP_186e988a64ccc3f94a649a7adf7d96da
oai_identifier_str oai:teses.usp.br:tde-18062025-191437
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banksClassificação de imagens com Vetores de Fisher calculados com descritores de vários níveis de bancos de filtros profundosClassificação de imagens médicasConvolutional Neural NetworksFisher VectorsMedical image classificationReconhecimento de texturaRedes Neurais ConvolucionaisTexture recognitionTransformadores visuaisVetores de FisherVisual transformersIn this thesis, we investigate the use of Fisher Vector (FV) for encoding multilevel features extracted from deep neural networks in image classification. Specifically, we employ Convolutional Neural Networks (CNNs) and hybrid CNN + Visual Transformer (ViT) models for feature extraction. While CNNs are effective at extracting generalist features, they exhibit a locality bias, which we address using hybrid architectures. The FV encoding method tackles issues related to order-sensitive encoders, such as Fully-Connected layers, in visual texture recognition and related fields (e.g., medical image classification). Our results demonstrate that the proposed approach significantly improves CNN accuracy in visual texture recognition. Despite its usefulness in the context of limited data availability, scalability to larger datasets remains a challenge. To mitigate this, we propose a method for reducing the computational costs of FV encoding. We rigorously evaluate the robustness of this method and apply it to larger datasets within the context of medical image classification. Additionally, we explore the impact of fine-tuning on the models performance. Finally, our approach proves suitable for both small and large datasets, exhibiting competitiveness compared to existing literature.Nesta tese, investigamos o uso de Vetores de Fisher (FV) para codificar descritores de vários níveis extraídos de redes neurais profundas na classificação de imagens. Mais especificamente, empregamos Redes Neurais Convolucionais (CNNs) e modelos híbridos CNN + Transformador Visual (ViT) para extração de descritores. Embora as CNNs sejam eficazes na extração de descritores generalistas, elas apresentam um viés de localidade, o qual endereçamos usando arquiteturas híbridas. A codificação por FV busca resolver problemas relacionados a codificadores sensíveis à ordem, como camadas totalmente conectadas, no reconhecimento de texturas visuais e campos relacionados (por exemplo, classificação de imagens médicas). Nossos resultados demonstram que a abordagem proposta melhora significativamente a precisão das CNNs no reconhecimento de texturas visuais. Apesar de sua utilidade no contexto de disponibilidade limitada de dados, a escalabilidade para conjuntos de dados maiores ainda é um desafio. Para mitigar isso, propomos um método para reduzir os custos computacionais da codificação por FV. Avaliamos rigorosamente a robustez desse método e o aplicamos a conjuntos de dados maiores no contexto da classificação de imagens médicas. Além disso, exploramos o impacto do ajuste fino no desempenho do modelo. Finalmente, nossa abordagem se mostra adequada para conjuntos de dados pequenos e grandes, demonstrando competitividade quando comparada à literatura existente.Biblioteca Digitais de Teses e Dissertações da USPFabris, Antonio EliasFlorindo, João BatistaLyra, Lucas de Oliveira2024-08-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-06-19T09:02:02Zoai:teses.usp.br:tde-18062025-191437Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-06-19T09:02:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
Classificação de imagens com Vetores de Fisher calculados com descritores de vários níveis de bancos de filtros profundos
title Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
spellingShingle Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
Lyra, Lucas de Oliveira
Classificação de imagens médicas
Convolutional Neural Networks
Fisher Vectors
Medical image classification
Reconhecimento de textura
Redes Neurais Convolucionais
Texture recognition
Transformadores visuais
Vetores de Fisher
Visual transformers
title_short Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
title_full Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
title_fullStr Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
title_full_unstemmed Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
title_sort Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
author Lyra, Lucas de Oliveira
author_facet Lyra, Lucas de Oliveira
author_role author
dc.contributor.none.fl_str_mv Fabris, Antonio Elias
Florindo, João Batista
dc.contributor.author.fl_str_mv Lyra, Lucas de Oliveira
dc.subject.por.fl_str_mv Classificação de imagens médicas
Convolutional Neural Networks
Fisher Vectors
Medical image classification
Reconhecimento de textura
Redes Neurais Convolucionais
Texture recognition
Transformadores visuais
Vetores de Fisher
Visual transformers
topic Classificação de imagens médicas
Convolutional Neural Networks
Fisher Vectors
Medical image classification
Reconhecimento de textura
Redes Neurais Convolucionais
Texture recognition
Transformadores visuais
Vetores de Fisher
Visual transformers
description In this thesis, we investigate the use of Fisher Vector (FV) for encoding multilevel features extracted from deep neural networks in image classification. Specifically, we employ Convolutional Neural Networks (CNNs) and hybrid CNN + Visual Transformer (ViT) models for feature extraction. While CNNs are effective at extracting generalist features, they exhibit a locality bias, which we address using hybrid architectures. The FV encoding method tackles issues related to order-sensitive encoders, such as Fully-Connected layers, in visual texture recognition and related fields (e.g., medical image classification). Our results demonstrate that the proposed approach significantly improves CNN accuracy in visual texture recognition. Despite its usefulness in the context of limited data availability, scalability to larger datasets remains a challenge. To mitigate this, we propose a method for reducing the computational costs of FV encoding. We rigorously evaluate the robustness of this method and apply it to larger datasets within the context of medical image classification. Additionally, we explore the impact of fine-tuning on the models performance. Finally, our approach proves suitable for both small and large datasets, exhibiting competitiveness compared to existing literature.
publishDate 2024
dc.date.none.fl_str_mv 2024-08-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/
url https://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1865492290706866176