Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles
| Ano de defesa: | 2025 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-22092025-151759/ |
Resumo: | Autonomous vehicles are increasingly becoming a part of urban life. However, significant challenges still need to be overcome to ensure these vehicles are both safe and efficient. One of the main challenges lies in perception, particularly in 3D object segmentation and detection, which relies on multiple sensors to enhance accuracy and operate under adverse conditions. The efficient fusion of these sensors is a crucial factor, as it directly impacts detection quality and computational cost. Modern multi-sensor fusion methods leverage Deep Learning techniques, and an emerging approach in this field is the use of attention mechanisms. These mechanisms enable more informative representations of feature maps extracted from sensors, highlighting the most relevant information while suppressing less significant data. In this context, this study explores the use of attention mechanisms to optimise the BEVFusion model, which has achieved state-of-the-art performance by employing a unified Camera-LiDAR fusion in a Birds Eye View (BEV) representation. The key advantage of BEVFusion is its highly efficient transformation of camera-perspective views into the BEV representation. However, its sensor fusion approach is limited to a simple concatenation of extracted features, which may not be the most efficient solution. Another critical aspect of BEVFusion is its high computational cost, as it relies on deep learning models that demand powerful hardware, posing a challenge for deployment in autonomous vehicles. This is particularly relevant given the need for embedded hardware in such vehicles to be both cost-effective and energy-efficient. To address these issues, this study investigates and implements attention mechanisms to enhance the sensor fusion process in BEVFusion for 3D object detection and segmentation, while also improving computational efficiency. Modifications were made to reduce VRAM consumption and processing time, ensuring performance comparable to the original model but with lower computational demands. The results are promising, showing a 14.12% increase in IoU for the segmentation task and a 0.732% improvement in mAP for 3D object detection. Additionally, training time was reduced by a factor of 3.3, and VRAM consumption was nearly halved. |
| id |
USP_d7e13dd5d7e6db7812a5e412c5b20364 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-22092025-151759 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehiclesFusão multi-sensorial com mecanismos de atenção para percepção visual em veículos autônomos3D object detectionAttention mechanismAutonomous vehiclesDetecção de objetos 3DFusão multi-sensorialMecanismo de atençãoMulti-sensor fusionSegmentaçãoSegmentationVeículos autônomosAutonomous vehicles are increasingly becoming a part of urban life. However, significant challenges still need to be overcome to ensure these vehicles are both safe and efficient. One of the main challenges lies in perception, particularly in 3D object segmentation and detection, which relies on multiple sensors to enhance accuracy and operate under adverse conditions. The efficient fusion of these sensors is a crucial factor, as it directly impacts detection quality and computational cost. Modern multi-sensor fusion methods leverage Deep Learning techniques, and an emerging approach in this field is the use of attention mechanisms. These mechanisms enable more informative representations of feature maps extracted from sensors, highlighting the most relevant information while suppressing less significant data. In this context, this study explores the use of attention mechanisms to optimise the BEVFusion model, which has achieved state-of-the-art performance by employing a unified Camera-LiDAR fusion in a Birds Eye View (BEV) representation. The key advantage of BEVFusion is its highly efficient transformation of camera-perspective views into the BEV representation. However, its sensor fusion approach is limited to a simple concatenation of extracted features, which may not be the most efficient solution. Another critical aspect of BEVFusion is its high computational cost, as it relies on deep learning models that demand powerful hardware, posing a challenge for deployment in autonomous vehicles. This is particularly relevant given the need for embedded hardware in such vehicles to be both cost-effective and energy-efficient. To address these issues, this study investigates and implements attention mechanisms to enhance the sensor fusion process in BEVFusion for 3D object detection and segmentation, while also improving computational efficiency. Modifications were made to reduce VRAM consumption and processing time, ensuring performance comparable to the original model but with lower computational demands. The results are promising, showing a 14.12% increase in IoU for the segmentation task and a 0.732% improvement in mAP for 3D object detection. Additionally, training time was reduced by a factor of 3.3, and VRAM consumption was nearly halved.Veículos autônomos estão cada vez mais próximos de se tornarem parte do cotidiano urbano. No entanto, desafios significativos ainda precisam ser superados para garantir que esses veículos sejam seguros e eficientes. Um dos principais desafios está na percepção, especialmente na segmentação e detecção de objetos 3D, que utiliza múltiplos sensores para melhorar a precisão e operar em condições adversas. A fusão eficiente desses sensores é uma questão central, pois determina a qualidade da detecção e o custo computacional do sistema. Métodos modernos de fusão multissensorial fazem uso de técnicas de Aprendizado Profundo, e uma abordagem emergente nessa área é a utilização de mecanismos de atenção. Esses mecanismos permitem obter representações mais informativas dos mapas de características extraídos pelos sensores, destacando as informações mais relevantes e suprimindo aquelas menos significativas. Neste contexto, este trabalho investiga o uso de mecanismos de atenção para otimizar o modelo BEVFusion, que alcançou o estado da arte ao empregar uma fusão unificada Câmera-LiDAR na representação Birds Eye View (BEV). O principal diferencial do BEVFusion é sua eficiente transformação da visão de perspectiva das câmeras para a representação BEV. No entanto, sua abordagem de fusão se limita à simples concatenação das características extraídas dos sensores, o que pode não ser a solução mais eficiente. Outro aspecto crítico do modelo BEVFusion é seu alto custo computacional, pois depende de redes neurais profundas que exigem hardware robusto, tornando sua aplicação em veículos autônomos mais desafiadora. Isso se deve ao fato de que o hardware embarcado desses veículos precisa ter baixo custo e alta eficiência energética. Diante desse cenário, este trabalho propõe o estudo e a implementação de mecanismos de atenção para aprimorar a fusão de sensores do BEVFusion nas tarefas de detecção de objetos 3D e segmentação, ao mesmo tempo em que busca tornar o modelo mais eficiente computacionalmente. Foram realizadas modificações para reduzir o consumo de VRAM e o tempo de processamento, garantindo um desempenho semelhante ao do modelo original, mas com menor demanda por recursos computacionais. Os resultados obtidos são promissores, demonstrando um aumento de 14.12% no IoU para a tarefa de segmentação e de 0.732% no mAP para a detecção de objetos 3D. Além disso, houve uma redução de 3,3 vezes no tempo de treinamento e uma diminuição de quase 50% no consumo de memória VRAM.Biblioteca Digitais de Teses e Dissertações da USPWolf, Denis FernandoHonorato, Eduardo Sperle2025-02-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-22092025-151759/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-09-22T19:45:01Zoai:teses.usp.br:tde-22092025-151759Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-09-22T19:45:01Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles Fusão multi-sensorial com mecanismos de atenção para percepção visual em veículos autônomos |
| title |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles |
| spellingShingle |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles Honorato, Eduardo Sperle 3D object detection Attention mechanism Autonomous vehicles Detecção de objetos 3D Fusão multi-sensorial Mecanismo de atenção Multi-sensor fusion Segmentação Segmentation Veículos autônomos |
| title_short |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles |
| title_full |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles |
| title_fullStr |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles |
| title_full_unstemmed |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles |
| title_sort |
Multi-sensor fusion with attention mechanisms for visual perception in autonomous vehicles |
| author |
Honorato, Eduardo Sperle |
| author_facet |
Honorato, Eduardo Sperle |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Wolf, Denis Fernando |
| dc.contributor.author.fl_str_mv |
Honorato, Eduardo Sperle |
| dc.subject.por.fl_str_mv |
3D object detection Attention mechanism Autonomous vehicles Detecção de objetos 3D Fusão multi-sensorial Mecanismo de atenção Multi-sensor fusion Segmentação Segmentation Veículos autônomos |
| topic |
3D object detection Attention mechanism Autonomous vehicles Detecção de objetos 3D Fusão multi-sensorial Mecanismo de atenção Multi-sensor fusion Segmentação Segmentation Veículos autônomos |
| description |
Autonomous vehicles are increasingly becoming a part of urban life. However, significant challenges still need to be overcome to ensure these vehicles are both safe and efficient. One of the main challenges lies in perception, particularly in 3D object segmentation and detection, which relies on multiple sensors to enhance accuracy and operate under adverse conditions. The efficient fusion of these sensors is a crucial factor, as it directly impacts detection quality and computational cost. Modern multi-sensor fusion methods leverage Deep Learning techniques, and an emerging approach in this field is the use of attention mechanisms. These mechanisms enable more informative representations of feature maps extracted from sensors, highlighting the most relevant information while suppressing less significant data. In this context, this study explores the use of attention mechanisms to optimise the BEVFusion model, which has achieved state-of-the-art performance by employing a unified Camera-LiDAR fusion in a Birds Eye View (BEV) representation. The key advantage of BEVFusion is its highly efficient transformation of camera-perspective views into the BEV representation. However, its sensor fusion approach is limited to a simple concatenation of extracted features, which may not be the most efficient solution. Another critical aspect of BEVFusion is its high computational cost, as it relies on deep learning models that demand powerful hardware, posing a challenge for deployment in autonomous vehicles. This is particularly relevant given the need for embedded hardware in such vehicles to be both cost-effective and energy-efficient. To address these issues, this study investigates and implements attention mechanisms to enhance the sensor fusion process in BEVFusion for 3D object detection and segmentation, while also improving computational efficiency. Modifications were made to reduce VRAM consumption and processing time, ensuring performance comparable to the original model but with lower computational demands. The results are promising, showing a 14.12% increase in IoU for the segmentation task and a 0.732% improvement in mAP for 3D object detection. Additionally, training time was reduced by a factor of 3.3, and VRAM consumption was nearly halved. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-02-18 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-22092025-151759/ |
| url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-22092025-151759/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1848370488859951104 |