Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/ |
Resumo: | In this thesis, we investigate the use of Fisher Vector (FV) for encoding multilevel features extracted from deep neural networks in image classification. Specifically, we employ Convolutional Neural Networks (CNNs) and hybrid CNN + Visual Transformer (ViT) models for feature extraction. While CNNs are effective at extracting generalist features, they exhibit a locality bias, which we address using hybrid architectures. The FV encoding method tackles issues related to order-sensitive encoders, such as Fully-Connected layers, in visual texture recognition and related fields (e.g., medical image classification). Our results demonstrate that the proposed approach significantly improves CNN accuracy in visual texture recognition. Despite its usefulness in the context of limited data availability, scalability to larger datasets remains a challenge. To mitigate this, we propose a method for reducing the computational costs of FV encoding. We rigorously evaluate the robustness of this method and apply it to larger datasets within the context of medical image classification. Additionally, we explore the impact of fine-tuning on the models performance. Finally, our approach proves suitable for both small and large datasets, exhibiting competitiveness compared to existing literature. |
| id |
USP_186e988a64ccc3f94a649a7adf7d96da |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-18062025-191437 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banksClassificação de imagens com Vetores de Fisher calculados com descritores de vários níveis de bancos de filtros profundosClassificação de imagens médicasConvolutional Neural NetworksFisher VectorsMedical image classificationReconhecimento de texturaRedes Neurais ConvolucionaisTexture recognitionTransformadores visuaisVetores de FisherVisual transformersIn this thesis, we investigate the use of Fisher Vector (FV) for encoding multilevel features extracted from deep neural networks in image classification. Specifically, we employ Convolutional Neural Networks (CNNs) and hybrid CNN + Visual Transformer (ViT) models for feature extraction. While CNNs are effective at extracting generalist features, they exhibit a locality bias, which we address using hybrid architectures. The FV encoding method tackles issues related to order-sensitive encoders, such as Fully-Connected layers, in visual texture recognition and related fields (e.g., medical image classification). Our results demonstrate that the proposed approach significantly improves CNN accuracy in visual texture recognition. Despite its usefulness in the context of limited data availability, scalability to larger datasets remains a challenge. To mitigate this, we propose a method for reducing the computational costs of FV encoding. We rigorously evaluate the robustness of this method and apply it to larger datasets within the context of medical image classification. Additionally, we explore the impact of fine-tuning on the models performance. Finally, our approach proves suitable for both small and large datasets, exhibiting competitiveness compared to existing literature.Nesta tese, investigamos o uso de Vetores de Fisher (FV) para codificar descritores de vários níveis extraídos de redes neurais profundas na classificação de imagens. Mais especificamente, empregamos Redes Neurais Convolucionais (CNNs) e modelos híbridos CNN + Transformador Visual (ViT) para extração de descritores. Embora as CNNs sejam eficazes na extração de descritores generalistas, elas apresentam um viés de localidade, o qual endereçamos usando arquiteturas híbridas. A codificação por FV busca resolver problemas relacionados a codificadores sensíveis à ordem, como camadas totalmente conectadas, no reconhecimento de texturas visuais e campos relacionados (por exemplo, classificação de imagens médicas). Nossos resultados demonstram que a abordagem proposta melhora significativamente a precisão das CNNs no reconhecimento de texturas visuais. Apesar de sua utilidade no contexto de disponibilidade limitada de dados, a escalabilidade para conjuntos de dados maiores ainda é um desafio. Para mitigar isso, propomos um método para reduzir os custos computacionais da codificação por FV. Avaliamos rigorosamente a robustez desse método e o aplicamos a conjuntos de dados maiores no contexto da classificação de imagens médicas. Além disso, exploramos o impacto do ajuste fino no desempenho do modelo. Finalmente, nossa abordagem se mostra adequada para conjuntos de dados pequenos e grandes, demonstrando competitividade quando comparada à literatura existente.Biblioteca Digitais de Teses e Dissertações da USPFabris, Antonio EliasFlorindo, João BatistaLyra, Lucas de Oliveira2024-08-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-06-19T09:02:02Zoai:teses.usp.br:tde-18062025-191437Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-06-19T09:02:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks Classificação de imagens com Vetores de Fisher calculados com descritores de vários níveis de bancos de filtros profundos |
| title |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks |
| spellingShingle |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks Lyra, Lucas de Oliveira Classificação de imagens médicas Convolutional Neural Networks Fisher Vectors Medical image classification Reconhecimento de textura Redes Neurais Convolucionais Texture recognition Transformadores visuais Vetores de Fisher Visual transformers |
| title_short |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks |
| title_full |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks |
| title_fullStr |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks |
| title_full_unstemmed |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks |
| title_sort |
Image classification with Fisher Vectors computed with multilevel features extracted from deep filter banks |
| author |
Lyra, Lucas de Oliveira |
| author_facet |
Lyra, Lucas de Oliveira |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Fabris, Antonio Elias Florindo, João Batista |
| dc.contributor.author.fl_str_mv |
Lyra, Lucas de Oliveira |
| dc.subject.por.fl_str_mv |
Classificação de imagens médicas Convolutional Neural Networks Fisher Vectors Medical image classification Reconhecimento de textura Redes Neurais Convolucionais Texture recognition Transformadores visuais Vetores de Fisher Visual transformers |
| topic |
Classificação de imagens médicas Convolutional Neural Networks Fisher Vectors Medical image classification Reconhecimento de textura Redes Neurais Convolucionais Texture recognition Transformadores visuais Vetores de Fisher Visual transformers |
| description |
In this thesis, we investigate the use of Fisher Vector (FV) for encoding multilevel features extracted from deep neural networks in image classification. Specifically, we employ Convolutional Neural Networks (CNNs) and hybrid CNN + Visual Transformer (ViT) models for feature extraction. While CNNs are effective at extracting generalist features, they exhibit a locality bias, which we address using hybrid architectures. The FV encoding method tackles issues related to order-sensitive encoders, such as Fully-Connected layers, in visual texture recognition and related fields (e.g., medical image classification). Our results demonstrate that the proposed approach significantly improves CNN accuracy in visual texture recognition. Despite its usefulness in the context of limited data availability, scalability to larger datasets remains a challenge. To mitigate this, we propose a method for reducing the computational costs of FV encoding. We rigorously evaluate the robustness of this method and apply it to larger datasets within the context of medical image classification. Additionally, we explore the impact of fine-tuning on the models performance. Finally, our approach proves suitable for both small and large datasets, exhibiting competitiveness compared to existing literature. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-08-23 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/ |
| url |
https://www.teses.usp.br/teses/disponiveis/45/45132/tde-18062025-191437/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1865492290706866176 |