Object detection from captive balloon imagery using deep learning

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Victória Maria Gomes Velame
Orientador(a): José Claudio Mura, Leonardo Sant'Anna Bins
Banca de defesa: Ney Rafael Sêcco
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Instituto Nacional de Pesquisas Espaciais (INPE)
Programa de Pós-Graduação: Programa de Pós-Graduação do INPE em Sensoriamento Remoto
Departamento: Não Informado pela instituição
País: BR
Link de acesso: http://urlib.net/sid.inpe.br/mtc-m21c/2020/03.30.16.14
Resumo: The combination of remote sensing and computer vision technologies have been used to monitor large areas. In order to ensure their local security. This monitoring requires high temporal and spatial resolution sensors. Captive balloons with infrared and visible sensors, like Altave system, can perform a long-term day-night surveillance with viable cost in comparison with other aerial vehicles. Altave captive balloon system provides security of large areas by continuously monitoring people and vehicles, which is exhaustive for humans due to the large amount of data. To provide a more efficient and less arduous monitoring, this work developed a technology based on DL (Deep Learning), more specifically Faster R-CNN (Region-based Convolutional Neural Network - R-CNN), capable of detecting people and vehicles in images from captive balloons infrared and visible sensors. The advantage of CNN object detectors is their ability to generalize, which make them more efficient to deal with some captive balloon image features, such as objects on different points of view, positions and scales. This work used videos provided by Altave Company (from their captive balloon system) to manually build two databases containing about 700 images each, one for the infrared and the other for the visible data. Since training a large CNN from scratch requires a large database and high computational power, two networks were fine-tuned from a Faster R-CNN, pre-trained on RGB (red, green, blue) images. The accuracy, mAP and AR metrics reached on the test datasets indicates the network high performance. The accuracy was 87.1% for the infrared network and 86.1% for the visible. These high accuracies demonstrated that a Faster R-CNN pre-trained only in ordinary RGB images can be fine-tuned to work satisfactorily on 3-band RGB visible remote sensing images and even on 1-band infrared images, as long as they are properly converted for 3-band images by repeating the infrared band on the three channels. The networks satisfactorily detected people and vehicle on images from Altave captive balloon system. They could detect multiple objects in an image with a variety of angles, positions, types (for vehicles), scales, and even with some noise and overlap. They also presented some mistaken detections caused by splitting parts of one object into two objects or merging two objects from the same class in one large object. These types of mistakes are not a relevant problem for surveillance because it is much more important to detect the objects than to locate or count them.