Object detection from captive balloon imagery using deep learning

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Victória Maria Gomes Velame
Orientador(a): José Claudio Mura, Leonardo Sant'Anna Bins
Banca de defesa: Ney Rafael Sêcco
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Instituto Nacional de Pesquisas Espaciais (INPE)
Programa de Pós-Graduação: Programa de Pós-Graduação do INPE em Sensoriamento Remoto
Departamento: Não Informado pela instituição
País: BR
Link de acesso: http://urlib.net/sid.inpe.br/mtc-m21c/2020/03.30.16.14
Resumo: The combination of remote sensing and computer vision technologies have been used to monitor large areas. In order to ensure their local security. This monitoring requires high temporal and spatial resolution sensors. Captive balloons with infrared and visible sensors, like Altave system, can perform a long-term day-night surveillance with viable cost in comparison with other aerial vehicles. Altave captive balloon system provides security of large areas by continuously monitoring people and vehicles, which is exhaustive for humans due to the large amount of data. To provide a more efficient and less arduous monitoring, this work developed a technology based on DL (Deep Learning), more specifically Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capable of detecting people and vehicles in images from captive balloons infrared and visible sensors. The advantage of CNN object detectors is their ability to generalize, which make them more efficient to deal with some captive balloon image features, such as objects on different points of view, positions and scales. This work used videos provided by Altave Company (from their captive balloon system) to manually build two databases containing about 700 images each, one for the infrared and the other for the visible data. Since training a large CNN from scratch requires a large database and high computational power, two networks were fine-tuned from a Faster R-CNN, pre-trained on RGB (red, green, blue) images. The accuracy, mAP and AR metrics reached on the test datasets indicates the network high performance. The accuracy was 87.1% for the infrared network and 86.1% for the visible. These high accuracies demonstrated that a Faster R-CNN pre-trained only in ordinary RGB images can be fine-tuned to work satisfactorily on 3-band RGB visible remote sensing images and even on 1-band infrared images, as long as they are properly converted for 3-band images by repeating the infrared band on the three channels. The networks satisfactorily detected people and vehicle on images from Altave captive balloon system. They could detect multiple objects in an image with a variety of angles, positions, types (for vehicles), scales, and even with some noise and overlap. They also presented some mistaken detections caused by splitting parts of one object into two objects or merging two objects from the same class in one large object. These types of mistakes are not a relevant problem for surveillance because it is much more important to detect the objects than to locate or count them.
id INPE_81129c2a16c26db6c16361103f8daffd
oai_identifier_str oai:urlib.net:sid.inpe.br/mtc-m21c/2020/03.30.16.14.10-0
network_acronym_str INPE
network_name_str Biblioteca Digital de Teses e Dissertações do INPE
spelling info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisObject detection from captive balloon imagery using deep learningDetecção de objetos em imagens de balão cativo utilizando deep learning2020-04-02José Claudio MuraLeonardo Sant'Anna BinsNey Rafael SêccoVictória Maria Gomes VelameInstituto Nacional de Pesquisas Espaciais (INPE)Programa de Pós-Graduação do INPE em Sensoriamento RemotoINPEBRobject detectiondeep learningconvolutional neural networkremote sensingcaptive balloonsdetecção de objetosaprendizado profundorede neural convolucionalsensoriamento semotobalões cativosThe combination of remote sensing and computer vision technologies have been used to monitor large areas. In order to ensure their local security. This monitoring requires high temporal and spatial resolution sensors. Captive balloons with infrared and visible sensors, like Altave system, can perform a long-term day-night surveillance with viable cost in comparison with other aerial vehicles. Altave captive balloon system provides security of large areas by continuously monitoring people and vehicles, which is exhaustive for humans due to the large amount of data. To provide a more efficient and less arduous monitoring, this work developed a technology based on DL (Deep Learning), more specifically Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capable of detecting people and vehicles in images from captive balloons infrared and visible sensors. The advantage of CNN object detectors is their ability to generalize, which make them more efficient to deal with some captive balloon image features, such as objects on different points of view, positions and scales. This work used videos provided by Altave Company (from their captive balloon system) to manually build two databases containing about 700 images each, one for the infrared and the other for the visible data. Since training a large CNN from scratch requires a large database and high computational power, two networks were fine-tuned from a Faster R-CNN, pre-trained on RGB (red, green, blue) images. The accuracy, mAP and AR metrics reached on the test datasets indicates the network high performance. The accuracy was 87.1% for the infrared network and 86.1% for the visible. These high accuracies demonstrated that a Faster R-CNN pre-trained only in ordinary RGB images can be fine-tuned to work satisfactorily on 3-band RGB visible remote sensing images and even on 1-band infrared images, as long as they are properly converted for 3-band images by repeating the infrared band on the three channels. The networks satisfactorily detected people and vehicle on images from Altave captive balloon system. They could detect multiple objects in an image with a variety of angles, positions, types (for vehicles), scales, and even with some noise and overlap. They also presented some mistaken detections caused by splitting parts of one object into two objects or merging two objects from the same class in one large object. These types of mistakes are not a relevant problem for surveillance because it is much more important to detect the objects than to locate or count them.A combinação de tecnologias de sensoriamento remoto com visão computacional tem sido utilizada para monitorar grandes áreas, de modo a garantir a segurança local. Esse monitoramento requer sensores de alta resolução temporal e espacial. Os balões cativos com sensores visível e infravermelhos, como os da Altave, são capazes de realizar vigilância diurna e noturna a longo prazo, com custo viável comparado com outros veículos aéreos. O sistema de balões cativos da Altave fornece segurança para grandes áreas por meio do monitorando contínuo de pessoas e veículos, função que é exaustiva para seres humanos devido à grande quantidade de dados. Com o objetivo de proporcionar um monitoramento mais eficiente e menos árduo, neste trabalho foi desenvolvido uma tecnologia baseada em Aprendizado Profundo, mais especificamente Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capaz de detectar pessoas e veículos em imagens de sensores infravermelho e visível de balões cativos. A vantagem dos detectores de objetos baseados em CNN é sua capacidade de generalização, tornando-os mais eficientes para algumas características de imagem de balões cativos, como objetos em diferentes visadas, posições e escalas. Este trabalho utilizou os vídeos fornecidos pela empresa Altave (do sistema de balão cativo) para criar, manualmente, dois bancos de dados com cerca de 700 imagens, um para o infravermelho e a outro para o visível. Como o treinamento de uma CNN de grande complexidade desde o início requer um banco de dados grande e alto poder computacional, duas redes foram ajustadas a partir de uma rede Faster R-CNN pré-treinada em imagens RGB (vermelha, verde, azul). A acurácia, métricas mAP e AR alcançadas nos conjuntos de dados de teste comprovam o alto desempenho das redes treinadas. A acurácia do sistema foi de 87,1% para a rede infravermelha e de 86,1% para a óptica. Essas altas acurácias demonstraram que uma Faster R-CNN pré-treinada apenas em imagens RGB comuns, pode ser ajustada para funcionar satisfatoriamente em imagens de sensoriamento remoto visível RGB de 3-bandas e até mesmo em imagens infravermelhas de 1-banda, desde que sejam adequadamente convertidas para imagens 3-bandas através da repetição desta banda nos três canais. As redes construídas foram capazes de detectar satisfatoriamente pessoas e veículos em imagens do sistema de balões cativos da Altave, sendo capaz de detectar múltiplos objetos em vários ângulos, posições, tipos (no caso de veículos), escalas e até mesmo com algum ruído e sobreposição. Eles também apresentaram algumas detecções erradas causadas pela divisão de partes de um objeto em dois objetos ou pela fusão de dois objetos da mesma classe em um objeto maior. Esse tipo de erro não é relevante para o monitoramento com vigilância devido ao fato de ser mais importante detectar objetos do que localizá-los ou contá-los.http://urlib.net/sid.inpe.br/mtc-m21c/2020/03.30.16.14info:eu-repo/semantics/openAccessengreponame:Biblioteca Digital de Teses e Dissertações do INPEinstname:Instituto Nacional de Pesquisas Espaciais (INPE)instacron:INPE2021-07-31T06:56:14Zoai:urlib.net:sid.inpe.br/mtc-m21c/2020/03.30.16.14.10-0Biblioteca Digital de Teses e Dissertaçõeshttp://bibdigital.sid.inpe.br/PUBhttp://bibdigital.sid.inpe.br/col/iconet.com.br/banon/2003/11.21.21.08/doc/oai.cgiopendoar:32772021-07-31 06:56:15.563Biblioteca Digital de Teses e Dissertações do INPE - Instituto Nacional de Pesquisas Espaciais (INPE)false
dc.title.en.fl_str_mv Object detection from captive balloon imagery using deep learning
dc.title.alternative.pt.fl_str_mv Detecção de objetos em imagens de balão cativo utilizando deep learning
title Object detection from captive balloon imagery using deep learning
spellingShingle Object detection from captive balloon imagery using deep learning
Victória Maria Gomes Velame
title_short Object detection from captive balloon imagery using deep learning
title_full Object detection from captive balloon imagery using deep learning
title_fullStr Object detection from captive balloon imagery using deep learning
title_full_unstemmed Object detection from captive balloon imagery using deep learning
title_sort Object detection from captive balloon imagery using deep learning
author Victória Maria Gomes Velame
author_facet Victória Maria Gomes Velame
author_role author
dc.contributor.advisor1.fl_str_mv José Claudio Mura
dc.contributor.advisor2.fl_str_mv Leonardo Sant'Anna Bins
dc.contributor.referee1.fl_str_mv Ney Rafael Sêcco
dc.contributor.author.fl_str_mv Victória Maria Gomes Velame
contributor_str_mv José Claudio Mura
Leonardo Sant'Anna Bins
Ney Rafael Sêcco
dc.description.abstract.por.fl_txt_mv The combination of remote sensing and computer vision technologies have been used to monitor large areas. In order to ensure their local security. This monitoring requires high temporal and spatial resolution sensors. Captive balloons with infrared and visible sensors, like Altave system, can perform a long-term day-night surveillance with viable cost in comparison with other aerial vehicles. Altave captive balloon system provides security of large areas by continuously monitoring people and vehicles, which is exhaustive for humans due to the large amount of data. To provide a more efficient and less arduous monitoring, this work developed a technology based on DL (Deep Learning), more specifically Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capable of detecting people and vehicles in images from captive balloons infrared and visible sensors. The advantage of CNN object detectors is their ability to generalize, which make them more efficient to deal with some captive balloon image features, such as objects on different points of view, positions and scales. This work used videos provided by Altave Company (from their captive balloon system) to manually build two databases containing about 700 images each, one for the infrared and the other for the visible data. Since training a large CNN from scratch requires a large database and high computational power, two networks were fine-tuned from a Faster R-CNN, pre-trained on RGB (red, green, blue) images. The accuracy, mAP and AR metrics reached on the test datasets indicates the network high performance. The accuracy was 87.1% for the infrared network and 86.1% for the visible. These high accuracies demonstrated that a Faster R-CNN pre-trained only in ordinary RGB images can be fine-tuned to work satisfactorily on 3-band RGB visible remote sensing images and even on 1-band infrared images, as long as they are properly converted for 3-band images by repeating the infrared band on the three channels. The networks satisfactorily detected people and vehicle on images from Altave captive balloon system. They could detect multiple objects in an image with a variety of angles, positions, types (for vehicles), scales, and even with some noise and overlap. They also presented some mistaken detections caused by splitting parts of one object into two objects or merging two objects from the same class in one large object. These types of mistakes are not a relevant problem for surveillance because it is much more important to detect the objects than to locate or count them.
A combinação de tecnologias de sensoriamento remoto com visão computacional tem sido utilizada para monitorar grandes áreas, de modo a garantir a segurança local. Esse monitoramento requer sensores de alta resolução temporal e espacial. Os balões cativos com sensores visível e infravermelhos, como os da Altave, são capazes de realizar vigilância diurna e noturna a longo prazo, com custo viável comparado com outros veículos aéreos. O sistema de balões cativos da Altave fornece segurança para grandes áreas por meio do monitorando contínuo de pessoas e veículos, função que é exaustiva para seres humanos devido à grande quantidade de dados. Com o objetivo de proporcionar um monitoramento mais eficiente e menos árduo, neste trabalho foi desenvolvido uma tecnologia baseada em Aprendizado Profundo, mais especificamente Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capaz de detectar pessoas e veículos em imagens de sensores infravermelho e visível de balões cativos. A vantagem dos detectores de objetos baseados em CNN é sua capacidade de generalização, tornando-os mais eficientes para algumas características de imagem de balões cativos, como objetos em diferentes visadas, posições e escalas. Este trabalho utilizou os vídeos fornecidos pela empresa Altave (do sistema de balão cativo) para criar, manualmente, dois bancos de dados com cerca de 700 imagens, um para o infravermelho e a outro para o visível. Como o treinamento de uma CNN de grande complexidade desde o início requer um banco de dados grande e alto poder computacional, duas redes foram ajustadas a partir de uma rede Faster R-CNN pré-treinada em imagens RGB (vermelha, verde, azul). A acurácia, métricas mAP e AR alcançadas nos conjuntos de dados de teste comprovam o alto desempenho das redes treinadas. A acurácia do sistema foi de 87,1% para a rede infravermelha e de 86,1% para a óptica. Essas altas acurácias demonstraram que uma Faster R-CNN pré-treinada apenas em imagens RGB comuns, pode ser ajustada para funcionar satisfatoriamente em imagens de sensoriamento remoto visível RGB de 3-bandas e até mesmo em imagens infravermelhas de 1-banda, desde que sejam adequadamente convertidas para imagens 3-bandas através da repetição desta banda nos três canais. As redes construídas foram capazes de detectar satisfatoriamente pessoas e veículos em imagens do sistema de balões cativos da Altave, sendo capaz de detectar múltiplos objetos em vários ângulos, posições, tipos (no caso de veículos), escalas e até mesmo com algum ruído e sobreposição. Eles também apresentaram algumas detecções erradas causadas pela divisão de partes de um objeto em dois objetos ou pela fusão de dois objetos da mesma classe em um objeto maior. Esse tipo de erro não é relevante para o monitoramento com vigilância devido ao fato de ser mais importante detectar objetos do que localizá-los ou contá-los.
description The combination of remote sensing and computer vision technologies have been used to monitor large areas. In order to ensure their local security. This monitoring requires high temporal and spatial resolution sensors. Captive balloons with infrared and visible sensors, like Altave system, can perform a long-term day-night surveillance with viable cost in comparison with other aerial vehicles. Altave captive balloon system provides security of large areas by continuously monitoring people and vehicles, which is exhaustive for humans due to the large amount of data. To provide a more efficient and less arduous monitoring, this work developed a technology based on DL (Deep Learning), more specifically Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capable of detecting people and vehicles in images from captive balloons infrared and visible sensors. The advantage of CNN object detectors is their ability to generalize, which make them more efficient to deal with some captive balloon image features, such as objects on different points of view, positions and scales. This work used videos provided by Altave Company (from their captive balloon system) to manually build two databases containing about 700 images each, one for the infrared and the other for the visible data. Since training a large CNN from scratch requires a large database and high computational power, two networks were fine-tuned from a Faster R-CNN, pre-trained on RGB (red, green, blue) images. The accuracy, mAP and AR metrics reached on the test datasets indicates the network high performance. The accuracy was 87.1% for the infrared network and 86.1% for the visible. These high accuracies demonstrated that a Faster R-CNN pre-trained only in ordinary RGB images can be fine-tuned to work satisfactorily on 3-band RGB visible remote sensing images and even on 1-band infrared images, as long as they are properly converted for 3-band images by repeating the infrared band on the three channels. The networks satisfactorily detected people and vehicle on images from Altave captive balloon system. They could detect multiple objects in an image with a variety of angles, positions, types (for vehicles), scales, and even with some noise and overlap. They also presented some mistaken detections caused by splitting parts of one object into two objects or merging two objects from the same class in one large object. These types of mistakes are not a relevant problem for surveillance because it is much more important to detect the objects than to locate or count them.
publishDate 2020
dc.date.issued.fl_str_mv 2020-04-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
status_str publishedVersion
format masterThesis
dc.identifier.uri.fl_str_mv http://urlib.net/sid.inpe.br/mtc-m21c/2020/03.30.16.14
url http://urlib.net/sid.inpe.br/mtc-m21c/2020/03.30.16.14
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Instituto Nacional de Pesquisas Espaciais (INPE)
dc.publisher.program.fl_str_mv Programa de Pós-Graduação do INPE em Sensoriamento Remoto
dc.publisher.initials.fl_str_mv INPE
dc.publisher.country.fl_str_mv BR
publisher.none.fl_str_mv Instituto Nacional de Pesquisas Espaciais (INPE)
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações do INPE
instname:Instituto Nacional de Pesquisas Espaciais (INPE)
instacron:INPE
reponame_str Biblioteca Digital de Teses e Dissertações do INPE
collection Biblioteca Digital de Teses e Dissertações do INPE
instname_str Instituto Nacional de Pesquisas Espaciais (INPE)
instacron_str INPE
institution INPE
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações do INPE - Instituto Nacional de Pesquisas Espaciais (INPE)
repository.mail.fl_str_mv
publisher_program_txtF_mv Programa de Pós-Graduação do INPE em Sensoriamento Remoto
contributor_advisor1_txtF_mv José Claudio Mura
_version_ 1706805044341374976