Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Silva, Antônio Carlos Durães da
Orientador(a): Gazolli, Kelly Assis de Souza
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Serra
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ifes.edu.br/handle/123456789/5718
Resumo: Monocular depth estimation is a fundamental task in computer vision, with applications in areas such as augmented reality, autonomous navigation, and medical procedures. This work investigates the reuse of architectures originally developed for semantic segmentation, such as U-Net and its variants, in the task of depth estimation. Combinations of U-Net and UNet++ architectures with different encoders were implemented and evaluated, covering convolutional networks, Transformer networks, and hybrid architectures, including VGG-19 (BN), Xception, Inception-ResNet-v2, Mixed Transformer (B2), CoaT, CoAtNet, and TransUnet. The experiments were conducted on the NYU Depth V2 dataset, using multiple input sizes to investigate the impact of resolution on the results. The evaluated metrics include RMSE, rel, log10, and accuracy thresholds δ_1, δ_2 e δ_3.. The results reveal that the combination of U-Net with the CoaT-Lite (M) encoder outperforms all other evaluated approaches and network combinations. The implementation is available at: https://github.com/duraes-antonio/seg_depth.
id IFES-2_42db7de1aeee2c24741469f9aac9fedd
oai_identifier_str oai:repositorio.ifes.edu.br:123456789/5718
network_acronym_str IFES-2
network_name_str Repositório Institucional do IFES
repository_id_str
spelling Silva, Antônio Carlos Durães daInstituto Federal do Espírito SantoCampana, Vitor FaiçalKomati, Karin SatieGazolli, Kelly Assis de Souza2025-02-14T22:08:45Z2025-02-14T22:08:45Z2024-11-19Silva, Antonio Carlos Duraes da. Uma Comparação De Arquiteturas Baseadas Em U-Net Na Estimativa De Profundidade Monocular. 2024. 69 f. Dissertação (Mestrado em Computação Aplicada) - Instituto Federal do Espírito Santo, Campus Serra, Serra, 2024.https://repositorio.ifes.edu.br/handle/123456789/571830004012075P4Monocular depth estimation is a fundamental task in computer vision, with applications in areas such as augmented reality, autonomous navigation, and medical procedures. This work investigates the reuse of architectures originally developed for semantic segmentation, such as U-Net and its variants, in the task of depth estimation. Combinations of U-Net and UNet++ architectures with different encoders were implemented and evaluated, covering convolutional networks, Transformer networks, and hybrid architectures, including VGG-19 (BN), Xception, Inception-ResNet-v2, Mixed Transformer (B2), CoaT, CoAtNet, and TransUnet. The experiments were conducted on the NYU Depth V2 dataset, using multiple input sizes to investigate the impact of resolution on the results. The evaluated metrics include RMSE, rel, log10, and accuracy thresholds δ_1, δ_2 e δ_3.. The results reveal that the combination of U-Net with the CoaT-Lite (M) encoder outperforms all other evaluated approaches and network combinations. The implementation is available at: https://github.com/duraes-antonio/seg_depth.ABSTRACT Monocular depth estimation is a fundamental task in computer vision, with applications in areas such as augmented reality, autonomous navigation, and medical procedures. This work investigates the reuse of architectures originally developed for semantic segmentation, such as U-Net and its variants, in the task of depth estimation. Combinations of U-Net and UNet++ architectures with different encoders were implemented and evaluated, covering convolutional networks, Transformer networks, and hybrid architectures, including VGG-19 (BN), Xception, Inception-ResNet-v2, Mixed Transformer (B2), CoaT, CoAtNet, and TransUnet. The experiments were conducted on the NYU Depth V2 dataset, using multiple input sizes to investigate the impact of resolution on the results. The evaluated metrics include RMSE, rel, log10, and accuracy thresholds δ1, δ2, and δ3. The results reveal that the combination of U-Net with the CoaT-Lite (M) encoder outperforms all other evaluated approaches and network combinations. The implementation is available at: https://github.com/duraes-antonio/seg_depth. Keywords: Monocular depth estimation. U-Net. UNet++. Transformers.65 f.U-NetUNet++Profundidade monocularVisão computacionalRedes transformersRede neural convolucionaUma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocularinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisSerrainfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional do IFESinstname:Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo (IFES)instacron:IFESCampus SerraComputação AplicadaInteligência Artificialhttp://lattes.cnpq.br/0343732414150447https://orcid.org/0000-0001-5551-3258Computação Aplicadahttp://lattes.cnpq.br/9860697624155451https://orcid.org/0000-0001-5677-4724TEXTdissertação.pdf.txtdissertação.pdf.txtExtracted texttext/plain123374https://repositorio.ifes.edu.br/bitstreams/2330426b-0027-4971-84fb-7d0f4ab23dba/download9b20ddc8a5cee2b76c2c7507270fd2c5MD53falseAnonymousREADTHUMBNAILdissertação.pdf.jpgdissertação.pdf.jpgGenerated Thumbnailimage/jpeg2167https://repositorio.ifes.edu.br/bitstreams/ec7a26bf-0b84-47cd-bd1c-77eef8afcc90/download93f2c6720a5b32d73850a875f8088184MD54falseAnonymousREADORIGINALdissertação.pdfdissertação.pdfDissertação - Mestrado em Computação Aplicada - IFES Serraapplication/pdf23277576https://repositorio.ifes.edu.br/bitstreams/22a6b194-a340-4206-b517-3b78679b2f90/download5f08d2bbf9bc5cb6b3ad6258bd3e4a8fMD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-8934https://repositorio.ifes.edu.br/bitstreams/4c225af8-8c3d-461e-9855-e03fe6b8cb1c/downloadac7cb971050ed632be934da23d966924MD52falseAnonymousREAD123456789/57182025-08-27T18:26:44.221Zopen.accessoai:repositorio.ifes.edu.br:123456789/5718https://repositorio.ifes.edu.brRepositório InstitucionalPUBhttps://repositorio.ifes.edu.br/server/oai/requestrepositorio@ifes.edu.bropendoar:2025-08-27T18:26:44Repositório Institucional do IFES - Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo (IFES)falseQXV0b3JlcyBxdWUgc3VibWV0ZW0gYSBlc3RhIGNvbmZlcsOqbmNpYSBjb25jb3JkYW0gY29tIG9zIHNlZ3VpbnRlcyB0ZXJtb3M6CmEpIEF1dG9yZXMgbWFudMOpbSBvcyBkaXJlaXRvcyBhdXRvcmFpcyBzb2JyZSBvIHRyYWJhbGhvLCBwZXJtaXRpbmRvIMOgIGNvbmZlcsOqbmNpYSBjb2xvY8OhLWxvIHNvYiB1bWEgbGljZW7Dp2EgTGljZW7Dp2EgQ3JlYXRpdmUgQ29tbW9ucyBBdHRyaWJ1dGlvbiwgcXVlIHBlcm1pdGUgbGl2cmVtZW50ZSBhIG91dHJvcyBhY2Vzc2FyLCB1c2FyIGUgY29tcGFydGlsaGFyIG8gdHJhYmFsaG8gY29tIG8gY3LDqWRpdG8gZGUgYXV0b3JpYSBlIGFwcmVzZW50YcOnw6NvIGluaWNpYWwgbmVzdGEgY29uZmVyw6puY2lhLgpiKSBBdXRvcmVzIHBvZGVtIGFicmlyIG3Do28gZG9zIHRlcm1vcyBkYSBsaWNlbsOnYSBDQyBlIGRlZmluaXIgY29udHJhdG9zIGFkaWNpb25haXMgcGFyYSBhIGRpc3RyaWJ1acOnw6NvIG7Do28tZXhjbHVzaXZhIGUgc3Vic2Vxw7xlbnRlIHB1YmxpY2HDp8OjbyBkZXN0ZSB0cmFiYWxobyAoZXguOiBwdWJsaWNhciB1bWEgdmVyc8OjbyBhdHVhbGl6YWRhIGVtIHVtIHBlcmnDs2RpY28sIGRpc3BvbmliaWxpemFyIGVtIHJlcG9zaXTDs3JpbyBpbnN0aXR1Y2lvbmFsLCBvdSBwdWJsaWPDoS1sbyBlbSBsaXZybyksIGNvbSBvIGNyw6lkaXRvIGRlIGF1dG9yaWEgZSBhcHJlc2VudGHDp8OjbyBpbmljaWFsIG5lc3RhIGNvbmZlcsOqbmNpYS4KYykgQWzDqW0gZGlzc28sIGF1dG9yZXMgc8OjbyBpbmNlbnRpdmFkb3MgYSBwdWJsaWNhciBlIGNvbXBhcnRpbGhhciBzZXVzIHRyYWJhbGhvcyBvbmxpbmUgKGV4LjogZW0gcmVwb3NpdMOzcmlvIGluc3RpdHVjaW9uYWwgb3UgZW0gc3VhIHDDoWdpbmEgcGVzc29hbCkgYSBxdWFscXVlciBtb21lbnRvIGFudGVzIGUgZGVwb2lzIGRhIGNvbmZlcsOqCg==
dc.title.pt_BR.fl_str_mv Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
title Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
spellingShingle Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
Silva, Antônio Carlos Durães da
U-Net
UNet++
Profundidade monocular
Visão computacional
Redes transformers
Rede neural convoluciona
title_short Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
title_full Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
title_fullStr Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
title_full_unstemmed Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
title_sort Uma Comparação De Arquiteturas Baseadas Em U-net Na Estimativa De Profundidade Monocular
author Silva, Antônio Carlos Durães da
author_facet Silva, Antônio Carlos Durães da
author_role author
dc.contributor.institution.pt_BR.fl_str_mv Instituto Federal do Espírito Santo
dc.contributor.member.none.fl_str_mv Campana, Vitor Faiçal
Komati, Karin Satie
dc.contributor.author.fl_str_mv Silva, Antônio Carlos Durães da
dc.contributor.advisor1.fl_str_mv Gazolli, Kelly Assis de Souza
contributor_str_mv Gazolli, Kelly Assis de Souza
dc.subject.por.fl_str_mv U-Net
UNet++
Profundidade monocular
Visão computacional
Redes transformers
Rede neural convoluciona
topic U-Net
UNet++
Profundidade monocular
Visão computacional
Redes transformers
Rede neural convoluciona
description Monocular depth estimation is a fundamental task in computer vision, with applications in areas such as augmented reality, autonomous navigation, and medical procedures. This work investigates the reuse of architectures originally developed for semantic segmentation, such as U-Net and its variants, in the task of depth estimation. Combinations of U-Net and UNet++ architectures with different encoders were implemented and evaluated, covering convolutional networks, Transformer networks, and hybrid architectures, including VGG-19 (BN), Xception, Inception-ResNet-v2, Mixed Transformer (B2), CoaT, CoAtNet, and TransUnet. The experiments were conducted on the NYU Depth V2 dataset, using multiple input sizes to investigate the impact of resolution on the results. The evaluated metrics include RMSE, rel, log10, and accuracy thresholds δ_1, δ_2 e δ_3.. The results reveal that the combination of U-Net with the CoaT-Lite (M) encoder outperforms all other evaluated approaches and network combinations. The implementation is available at: https://github.com/duraes-antonio/seg_depth.
publishDate 2024
dc.date.issued.fl_str_mv 2024-11-19
dc.date.accessioned.fl_str_mv 2025-02-14T22:08:45Z
dc.date.available.fl_str_mv 2025-02-14T22:08:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv Silva, Antonio Carlos Duraes da. Uma Comparação De Arquiteturas Baseadas Em U-Net Na Estimativa De Profundidade Monocular. 2024. 69 f. Dissertação (Mestrado em Computação Aplicada) - Instituto Federal do Espírito Santo, Campus Serra, Serra, 2024.
dc.identifier.uri.fl_str_mv https://repositorio.ifes.edu.br/handle/123456789/5718
dc.identifier.capes.pt_BR.fl_str_mv 30004012075P4
identifier_str_mv Silva, Antonio Carlos Duraes da. Uma Comparação De Arquiteturas Baseadas Em U-Net Na Estimativa De Profundidade Monocular. 2024. 69 f. Dissertação (Mestrado em Computação Aplicada) - Instituto Federal do Espírito Santo, Campus Serra, Serra, 2024.
30004012075P4
url https://repositorio.ifes.edu.br/handle/123456789/5718
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 65 f.
dc.publisher.none.fl_str_mv Serra
publisher.none.fl_str_mv Serra
dc.source.none.fl_str_mv reponame:Repositório Institucional do IFES
instname:Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo (IFES)
instacron:IFES
instname_str Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo (IFES)
instacron_str IFES
institution IFES
reponame_str Repositório Institucional do IFES
collection Repositório Institucional do IFES
bitstream.url.fl_str_mv https://repositorio.ifes.edu.br/bitstreams/2330426b-0027-4971-84fb-7d0f4ab23dba/download
https://repositorio.ifes.edu.br/bitstreams/ec7a26bf-0b84-47cd-bd1c-77eef8afcc90/download
https://repositorio.ifes.edu.br/bitstreams/22a6b194-a340-4206-b517-3b78679b2f90/download
https://repositorio.ifes.edu.br/bitstreams/4c225af8-8c3d-461e-9855-e03fe6b8cb1c/download
bitstream.checksum.fl_str_mv 9b20ddc8a5cee2b76c2c7507270fd2c5
93f2c6720a5b32d73850a875f8088184
5f08d2bbf9bc5cb6b3ad6258bd3e4a8f
ac7cb971050ed632be934da23d966924
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional do IFES - Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo (IFES)
repository.mail.fl_str_mv repositorio@ifes.edu.br
_version_ 1864451011147464704