Detecção e destaque em vídeo de objetos utilizando YOLO

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Araújo, Aline Moura
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/26374
Resumo: In mobile devices and cameras, the image magnification (zoom) functionality is increasingly present, however, it is still an innovation to use object detection to automate this task. Detection is a classic problem related to computer vision that deals with the location of instances of semantic objects in a specific class. In this sense, the objective of this work was to implement a system capable of performing the detection of objects automatically in video, evaluating its operation with acoustic guitar, electric guitar and microphone. After this detection, a new video was generated emphasizing the detected instrument, in order to facilitate the observation of its execution. For that, a system was developed that, using a YOLOV4 model, is able to identify objects and perform a procedure similar to a zoom in on the video. A pipeline was implemented, where the frames are first extracted, and then the detection of a parameterized object in an interval of 12 frames. After detection, the clipping is made following an interpolation methodology to deal with the fluidity of the video, and, finally, a new video is generated from these clippings. Tests were made with different parameters for extracting the frames, using videos retrieved from Youtube, evaluating 4 scenarios for extracting the images. In these tests, the performance of the detection model, the time taken for extraction and the percentage of information excluded in the output video in each scenario were evaluated. For the validation of this pipeline, a methodology was adopted, assuming that detection would work efficiently, to validate the heuristic implemented in the interpolation, the confidence of the model for the 4 extraction scenarios and the behavior of the system when dealing with occlusion problems in the video.. In the validation, a pre-trained model of the YOLOV4 open source neural network with 80 classes was used, performing the detection of arbitrary objects, with cat and dog being the chosen classes. In addition, a customized YOLOV4 model was also trained to be able to perform specific detection of musical instruments with the Imagenet database. Regarding the network results, although not the focus of this work, the average accuracy achieved by the model in the guitar, acoustic guitar and microphone classes was 61.90%, 87.94% and 62.27%, respectively. In the zoom in system, it was possible to notice that the better the extraction parameter, the greater the number of objects detected by the model, as well as the greater precision and quality of detection. There was a small loss of quality from the resolution of the original video, and there was no significant loss of video content due to frame gap in detection. Concluding the analysis of the results obtained, it is possible to affirm that the proposal of the work was successful, therefore, all the presented objectives were reached. For future work, we aim to test new detection models, implement new output video evaluation criteria and parallelize the pipeline steps.
id UFPB-2_0cbc5871ba6523fc09e4896370291422
oai_identifier_str oai:repositorio.ufpb.br:123456789/26374
network_acronym_str UFPB-2
network_name_str Repositório Institucional da UFPB
repository_id_str
spelling Detecção e destaque em vídeo de objetos utilizando YOLOSistemas de computaçãoYOLOInstrumentos musicaisDetecção de objetosDetecção em vídeoComputing systemsMusical instrumentsObject detectionVideo detectionCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOIn mobile devices and cameras, the image magnification (zoom) functionality is increasingly present, however, it is still an innovation to use object detection to automate this task. Detection is a classic problem related to computer vision that deals with the location of instances of semantic objects in a specific class. In this sense, the objective of this work was to implement a system capable of performing the detection of objects automatically in video, evaluating its operation with acoustic guitar, electric guitar and microphone. After this detection, a new video was generated emphasizing the detected instrument, in order to facilitate the observation of its execution. For that, a system was developed that, using a YOLOV4 model, is able to identify objects and perform a procedure similar to a zoom in on the video. A pipeline was implemented, where the frames are first extracted, and then the detection of a parameterized object in an interval of 12 frames. After detection, the clipping is made following an interpolation methodology to deal with the fluidity of the video, and, finally, a new video is generated from these clippings. Tests were made with different parameters for extracting the frames, using videos retrieved from Youtube, evaluating 4 scenarios for extracting the images. In these tests, the performance of the detection model, the time taken for extraction and the percentage of information excluded in the output video in each scenario were evaluated. For the validation of this pipeline, a methodology was adopted, assuming that detection would work efficiently, to validate the heuristic implemented in the interpolation, the confidence of the model for the 4 extraction scenarios and the behavior of the system when dealing with occlusion problems in the video.. In the validation, a pre-trained model of the YOLOV4 open source neural network with 80 classes was used, performing the detection of arbitrary objects, with cat and dog being the chosen classes. In addition, a customized YOLOV4 model was also trained to be able to perform specific detection of musical instruments with the Imagenet database. Regarding the network results, although not the focus of this work, the average accuracy achieved by the model in the guitar, acoustic guitar and microphone classes was 61.90%, 87.94% and 62.27%, respectively. In the zoom in system, it was possible to notice that the better the extraction parameter, the greater the number of objects detected by the model, as well as the greater precision and quality of detection. There was a small loss of quality from the resolution of the original video, and there was no significant loss of video content due to frame gap in detection. Concluding the analysis of the results obtained, it is possible to affirm that the proposal of the work was successful, therefore, all the presented objectives were reached. For future work, we aim to test new detection models, implement new output video evaluation criteria and parallelize the pipeline steps.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESEm dispositivos celulares e câmeras, está crescentemente presente a funcionalidade de ampliação de imagem (do inglês, zoom), no entanto, ainda é uma inovação utilizar detecção de objetos para automatizar esta tarefa. A detecção é um problema clássico relacionado à visão computacional que trata da localização de instâncias de objetos semânticos em uma classe específica. Nesse sentido, o objetivo deste trabalho foi implementar um sistema capaz de realizar a detecção de objetos de forma automática em vídeo, avaliando seu funcionamento com guitarra, violão e microfone. Após essa detecção, um novo vídeo foi gerado dando ênfase ao instrumento detectado, a fim de facilitar a observação de sua execução. Para tanto, foi desenvolvido um sistema que, utilizando um modelo YOLOv4, é capaz de identificar objetos e fazer um procedimento semelhante a um zoom in no vídeo. Foi implementado um pipeline, onde é feita primeiramente uma extração dos quadros dos vídeos, e, em seguida, a detecção de um objeto parametrizado em um intervalo de 12 quadros. Após a detecção, é feito o recorte seguindo uma metodologia de estabilização para tratar a fluidez do vídeo, e, por fim, um novo vídeo é gerado a partir desses recortes. Foram feitos testes com diferentes parâmetros de extração dos quadros, utilizando vídeos recuperados do Youtube, avaliando 4 cenários para a extração das imagens. Nesses testes, foram avaliados a performance do modelo de detecção, o tempo levado para a extração e o percentual de informação excluída no vídeo de saída em cada cenário. Para a validação desse pipeline, foi adotada uma metodologia, assumindo que detecção funcionaria de forma eficiente, para validar a heurística implementada na estabilização, a confiança do modelo para os 4 cenários de extração e o comportamento do sistema ao lidar com problemas de oclusão no vídeo. Na validação, foi utilizado um modelo pré-treinado da rede neural de código aberto YOLOv4 com 80 classes, realizando a detecção de objetos arbitrários, sendo gato e cachorro as classes escolhidas. Além disso, também foi treinado um modelo personalizado YOLOv4 para que seja capaz de fazer a detecção específica de instrumentos musicais, utilizando a base de dados da Imagenet. Em relação aos resultados da rede, apesar de não ser o foco deste trabalho, a precisão média alcançada pelo modelo nas classes guitarra, violão e microfone foi 61,90%, 87,94% e 62,27%, respectivamente. No sistema de zoom in, foi possível perceber que, quanto melhor o parâmetro de extração, maior é a quantidade de objetos detectados pelo modelo, como, também, a detecção tem maior precisão e qualidade. Houve uma pequena perda de qualidade em relação à resolução do vídeo original, e não houve perda significativa de conteúdo do vídeo devido ao intervalo dos quadros na detecção. Concluindo as análises dos resultados obtidos, é possível afirmar que a proposta do trabalho obteve êxito, pois, todos os objetivos apresentados foram alcançados. Para trabalhos futuros, almeja-se testar novos modelos de detecção, implementação de novos critérios de avaliação do vídeo de saída e paralelização das etapas do pipeline.Universidade Federal da ParaíbaBrasilInformáticaPrograma de Pós-Graduação em InformáticaUFPBRêgo, Thaís Gaudencio dohttp://lattes.cnpq.br/3166390632199101Silva, Lincoln David Nery ehttp://lattes.cnpq.br/0721450925602821Araújo, Aline Moura2023-03-06T14:15:45Z2022-12-202023-03-06T14:15:45Z2022-10-31info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttps://repositorio.ufpb.br/jspui/handle/123456789/26374porhttp://creativecommons.org/licenses/by-nd/3.0/br/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPBinstname:Universidade Federal da Paraíba (UFPB)instacron:UFPB2023-05-22T12:36:18Zoai:repositorio.ufpb.br:123456789/26374Repositório InstitucionalPUBhttps://repositorio.ufpb.br/oai/requestdiretoria@ufpb.br||bdtd@biblioteca.ufpb.bropendoar:25462023-05-22T12:36:18Repositório Institucional da UFPB - Universidade Federal da Paraíba (UFPB)false
dc.title.none.fl_str_mv Detecção e destaque em vídeo de objetos utilizando YOLO
title Detecção e destaque em vídeo de objetos utilizando YOLO
spellingShingle Detecção e destaque em vídeo de objetos utilizando YOLO
Araújo, Aline Moura
Sistemas de computação
YOLO
Instrumentos musicais
Detecção de objetos
Detecção em vídeo
Computing systems
Musical instruments
Object detection
Video detection
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Detecção e destaque em vídeo de objetos utilizando YOLO
title_full Detecção e destaque em vídeo de objetos utilizando YOLO
title_fullStr Detecção e destaque em vídeo de objetos utilizando YOLO
title_full_unstemmed Detecção e destaque em vídeo de objetos utilizando YOLO
title_sort Detecção e destaque em vídeo de objetos utilizando YOLO
author Araújo, Aline Moura
author_facet Araújo, Aline Moura
author_role author
dc.contributor.none.fl_str_mv Rêgo, Thaís Gaudencio do
http://lattes.cnpq.br/3166390632199101
Silva, Lincoln David Nery e
http://lattes.cnpq.br/0721450925602821
dc.contributor.author.fl_str_mv Araújo, Aline Moura
dc.subject.por.fl_str_mv Sistemas de computação
YOLO
Instrumentos musicais
Detecção de objetos
Detecção em vídeo
Computing systems
Musical instruments
Object detection
Video detection
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic Sistemas de computação
YOLO
Instrumentos musicais
Detecção de objetos
Detecção em vídeo
Computing systems
Musical instruments
Object detection
Video detection
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description In mobile devices and cameras, the image magnification (zoom) functionality is increasingly present, however, it is still an innovation to use object detection to automate this task. Detection is a classic problem related to computer vision that deals with the location of instances of semantic objects in a specific class. In this sense, the objective of this work was to implement a system capable of performing the detection of objects automatically in video, evaluating its operation with acoustic guitar, electric guitar and microphone. After this detection, a new video was generated emphasizing the detected instrument, in order to facilitate the observation of its execution. For that, a system was developed that, using a YOLOV4 model, is able to identify objects and perform a procedure similar to a zoom in on the video. A pipeline was implemented, where the frames are first extracted, and then the detection of a parameterized object in an interval of 12 frames. After detection, the clipping is made following an interpolation methodology to deal with the fluidity of the video, and, finally, a new video is generated from these clippings. Tests were made with different parameters for extracting the frames, using videos retrieved from Youtube, evaluating 4 scenarios for extracting the images. In these tests, the performance of the detection model, the time taken for extraction and the percentage of information excluded in the output video in each scenario were evaluated. For the validation of this pipeline, a methodology was adopted, assuming that detection would work efficiently, to validate the heuristic implemented in the interpolation, the confidence of the model for the 4 extraction scenarios and the behavior of the system when dealing with occlusion problems in the video.. In the validation, a pre-trained model of the YOLOV4 open source neural network with 80 classes was used, performing the detection of arbitrary objects, with cat and dog being the chosen classes. In addition, a customized YOLOV4 model was also trained to be able to perform specific detection of musical instruments with the Imagenet database. Regarding the network results, although not the focus of this work, the average accuracy achieved by the model in the guitar, acoustic guitar and microphone classes was 61.90%, 87.94% and 62.27%, respectively. In the zoom in system, it was possible to notice that the better the extraction parameter, the greater the number of objects detected by the model, as well as the greater precision and quality of detection. There was a small loss of quality from the resolution of the original video, and there was no significant loss of video content due to frame gap in detection. Concluding the analysis of the results obtained, it is possible to affirm that the proposal of the work was successful, therefore, all the presented objectives were reached. For future work, we aim to test new detection models, implement new output video evaluation criteria and parallelize the pipeline steps.
publishDate 2022
dc.date.none.fl_str_mv 2022-12-20
2022-10-31
2023-03-06T14:15:45Z
2023-03-06T14:15:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufpb.br/jspui/handle/123456789/26374
url https://repositorio.ufpb.br/jspui/handle/123456789/26374
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
publisher.none.fl_str_mv Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPB
instname:Universidade Federal da Paraíba (UFPB)
instacron:UFPB
instname_str Universidade Federal da Paraíba (UFPB)
instacron_str UFPB
institution UFPB
reponame_str Repositório Institucional da UFPB
collection Repositório Institucional da UFPB
repository.name.fl_str_mv Repositório Institucional da UFPB - Universidade Federal da Paraíba (UFPB)
repository.mail.fl_str_mv diretoria@ufpb.br||bdtd@biblioteca.ufpb.br
_version_ 1863379059021447168