Detecção e destaque em vídeo de objetos utilizando YOLO

Araújo, Aline Moura

Detecção e destaque em vídeo de objetos utilizando YOLO

Detalhes bibliográficos
Ano de defesa:	2022
Autor(a) principal:	Araújo, Aline Moura
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal da Paraíba Brasil Informática Programa de Pós-Graduação em Informática UFPB
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Sistemas de computação YOLO Instrumentos musicais Detecção de objetos Detecção em vídeo Computing systems Musical instruments Object detection Video detection CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufpb.br/jspui/handle/123456789/26374
Resumo:	In mobile devices and cameras, the image magnification (zoom) functionality is increasingly present, however, it is still an innovation to use object detection to automate this task. Detection is a classic problem related to computer vision that deals with the location of instances of semantic objects in a specific class. In this sense, the objective of this work was to implement a system capable of performing the detection of objects automatically in video, evaluating its operation with acoustic guitar, electric guitar and microphone. After this detection, a new video was generated emphasizing the detected instrument, in order to facilitate the observation of its execution. For that, a system was developed that, using a YOLOV4 model, is able to identify objects and perform a procedure similar to a zoom in on the video. A pipeline was implemented, where the frames are first extracted, and then the detection of a parameterized object in an interval of 12 frames. After detection, the clipping is made following an interpolation methodology to deal with the fluidity of the video, and, finally, a new video is generated from these clippings. Tests were made with different parameters for extracting the frames, using videos retrieved from Youtube, evaluating 4 scenarios for extracting the images. In these tests, the performance of the detection model, the time taken for extraction and the percentage of information excluded in the output video in each scenario were evaluated. For the validation of this pipeline, a methodology was adopted, assuming that detection would work efficiently, to validate the heuristic implemented in the interpolation, the confidence of the model for the 4 extraction scenarios and the behavior of the system when dealing with occlusion problems in the video.. In the validation, a pre-trained model of the YOLOV4 open source neural network with 80 classes was used, performing the detection of arbitrary objects, with cat and dog being the chosen classes. In addition, a customized YOLOV4 model was also trained to be able to perform specific detection of musical instruments with the Imagenet database. Regarding the network results, although not the focus of this work, the average accuracy achieved by the model in the guitar, acoustic guitar and microphone classes was 61.90%, 87.94% and 62.27%, respectively. In the zoom in system, it was possible to notice that the better the extraction parameter, the greater the number of objects detected by the model, as well as the greater precision and quality of detection. There was a small loss of quality from the resolution of the original video, and there was no significant loss of video content due to frame gap in detection. Concluding the analysis of the results obtained, it is possible to affirm that the proposal of the work was successful, therefore, all the presented objectives were reached. For future work, we aim to test new detection models, implement new output video evaluation criteria and parallelize the pipeline steps.

Metadados do item

id	UFPB-2_0cbc5871ba6523fc09e4896370291422
oai_identifier_str	oai:repositorio.ufpb.br:123456789/26374
network_acronym_str	UFPB-2
network_name_str	Repositório Institucional da UFPB
repository_id_str
spelling	Detecção e destaque em vídeo de objetos utilizando YOLOSistemas de computaçãoYOLOInstrumentos musicaisDetecção de objetosDetecção em vídeoComputing systemsMusical instrumentsObject detectionVideo detectionCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOIn mobile devices and cameras, the image magnification (zoom) functionality is increasingly present, however, it is still an innovation to use object detection to automate this task. Detection is a classic problem related to computer vision that deals with the location of instances of semantic objects in a specific class. In this sense, the objective of this work was to implement a system capable of performing the detection of objects automatically in video, evaluating its operation with acoustic guitar, electric guitar and microphone. After this detection, a new video was generated emphasizing the detected instrument, in order to facilitate the observation of its execution. For that, a system was developed that, using a YOLOV4 model, is able to identify objects and perform a procedure similar to a zoom in on the video. A pipeline was implemented, where the frames are first extracted, and then the detection of a parameterized object in an interval of 12 frames. After detection, the clipping is made following an interpolation methodology to deal with the fluidity of the video, and, finally, a new video is generated from these clippings. Tests were made with different parameters for extracting the frames, using videos retrieved from Youtube, evaluating 4 scenarios for extracting the images. In these tests, the performance of the detection model, the time taken for extraction and the percentage of information excluded in the output video in each scenario were evaluated. For the validation of this pipeline, a methodology was adopted, assuming that detection would work efficiently, to validate the heuristic implemented in the interpolation, the confidence of the model for the 4 extraction scenarios and the behavior of the system when dealing with occlusion problems in the video.. In the validation, a pre-trained model of the YOLOV4 open source neural network with 80 classes was used, performing the detection of arbitrary objects, with cat and dog being the chosen classes. In addition, a customized YOLOV4 model was also trained to be able to perform specific detection of musical instruments with the Imagenet database. Regarding the network results, although not the focus of this work, the average accuracy achieved by the model in the guitar, acoustic guitar and microphone classes was 61.90%, 87.94% and 62.27%, respectively. In the zoom in system, it was possible to notice that the better the extraction parameter, the greater the number of objects detected by the model, as well as the greater precision and quality of detection. There was a small loss of quality from the resolution of the original video, and there was no significant loss of video content due to frame gap in detection. Concluding the analysis of the results obtained, it is possible to affirm that the proposal of the work was successful, therefore, all the presented objectives were reached. For future work, we aim to test new detection models, implement new output video evaluation criteria and parallelize the pipeline steps.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESEm dispositivos celulares e câmeras, está crescentemente presente a funcionalidade de ampliação de imagem (do inglês, zoom), no entanto, ainda é uma inovação utilizar detecção de objetos para automatizar esta tarefa. A detecção é um problema clássico relacionado à visão computacional que trata da localização de instâncias de objetos semânticos em uma classe específica. Nesse sentido, o objetivo deste trabalho foi implementar um sistema capaz de realizar a detecção de objetos de forma automática em vídeo, avaliando seu funcionamento com guitarra, violão e microfone. Após essa detecção, um novo vídeo foi gerado dando ênfase ao instrumento detectado, a fim de facilitar a observação de sua execução. Para tanto, foi desenvolvido um sistema que, utilizando um modelo YOLOv4, é capaz de identificar objetos e fazer um procedimento semelhante a um zoom in no vídeo. Foi implementado um pipeline, onde é feita primeiramente uma extração dos quadros dos vídeos, e, em seguida, a detecção de um objeto parametrizado em um intervalo de 12 quadros. Após a detecção, é feito o recorte seguindo uma metodologia de estabilização para tratar a fluidez do vídeo, e, por fim, um novo vídeo é gerado a partir desses recortes. Foram feitos testes com diferentes parâmetros de extração dos quadros, utilizando vídeos recuperados do Youtube, avaliando 4 cenários para a extração das imagens. Nesses testes, foram avaliados a performance do modelo de detecção, o tempo levado para a extração e o percentual de informação excluída no vídeo de saída em cada cenário. Para a validação desse pipeline, foi adotada uma metodologia, assumindo que detecção funcionaria de forma eficiente, para validar a heurística implementada na estabilização, a confiança do modelo para os 4 cenários de extração e o comportamento do sistema ao lidar com problemas de oclusão no vídeo. Na validação, foi utilizado um modelo pré-treinado da rede neural de código aberto YOLOv4 com 80 classes, realizando a detecção de objetos arbitrários, sendo gato e cachorro as classes escolhidas. Além disso, também foi treinado um modelo personalizado YOLOv4 para que seja capaz de fazer a detecção específica de instrumentos musicais, utilizando a base de dados da Imagenet. Em relação aos resultados da rede, apesar de não ser o foco deste trabalho, a precisão média alcançada pelo modelo nas classes guitarra, violão e microfone foi 61,90%, 87,94% e 62,27%, respectivamente. No sistema de zoom in, foi possível perceber que, quanto melhor o parâmetro de extração, maior é a quantidade de objetos detectados pelo modelo, como, também, a detecção tem maior precisão e qualidade. Houve uma pequena perda de qualidade em relação à resolução do vídeo original, e não houve perda significativa de conteúdo do vídeo devido ao intervalo dos quadros na detecção. Concluindo as análises dos resultados obtidos, é possível afirmar que a proposta do trabalho obteve êxito, pois, todos os objetivos apresentados foram alcançados. Para trabalhos futuros, almeja-se testar novos modelos de detecção, implementação de novos critérios de avaliação do vídeo de saída e paralelização das etapas do pipeline.Universidade Federal da ParaíbaBrasilInformáticaPrograma de Pós-Graduação em InformáticaUFPBRêgo, Thaís Gaudencio dohttp://lattes.cnpq.br/3166390632199101Silva, Lincoln David Nery ehttp://lattes.cnpq.br/0721450925602821Araújo, Aline Moura2023-03-06T14:15:45Z2022-12-202023-03-06T14:15:45Z2022-10-31info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttps://repositorio.ufpb.br/jspui/handle/123456789/26374porhttp://creativecommons.org/licenses/by-nd/3.0/br/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPBinstname:Universidade Federal da Paraíba (UFPB)instacron:UFPB2023-05-22T12:36:18Zoai:repositorio.ufpb.br:123456789/26374Repositório InstitucionalPUBhttps://repositorio.ufpb.br/oai/requestdiretoria@ufpb.br\|\|bdtd@biblioteca.ufpb.bropendoar:25462023-05-22T12:36:18Repositório Institucional da UFPB - Universidade Federal da Paraíba (UFPB)false
dc.title.none.fl_str_mv	Detecção e destaque em vídeo de objetos utilizando YOLO
title	Detecção e destaque em vídeo de objetos utilizando YOLO
spellingShingle	Detecção e destaque em vídeo de objetos utilizando YOLO Araújo, Aline Moura Sistemas de computação YOLO Instrumentos musicais Detecção de objetos Detecção em vídeo Computing systems Musical instruments Object detection Video detection CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Detecção e destaque em vídeo de objetos utilizando YOLO
title_full	Detecção e destaque em vídeo de objetos utilizando YOLO
title_fullStr	Detecção e destaque em vídeo de objetos utilizando YOLO
title_full_unstemmed	Detecção e destaque em vídeo de objetos utilizando YOLO
title_sort	Detecção e destaque em vídeo de objetos utilizando YOLO
author	Araújo, Aline Moura
author_facet	Araújo, Aline Moura
author_role	author
dc.contributor.none.fl_str_mv	Rêgo, Thaís Gaudencio do http://lattes.cnpq.br/3166390632199101 Silva, Lincoln David Nery e http://lattes.cnpq.br/0721450925602821
dc.contributor.author.fl_str_mv	Araújo, Aline Moura
dc.subject.por.fl_str_mv	Sistemas de computação YOLO Instrumentos musicais Detecção de objetos Detecção em vídeo Computing systems Musical instruments Object detection Video detection CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic	Sistemas de computação YOLO Instrumentos musicais Detecção de objetos Detecção em vídeo Computing systems Musical instruments Object detection Video detection CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	In mobile devices and cameras, the image magnification (zoom) functionality is increasingly present, however, it is still an innovation to use object detection to automate this task. Detection is a classic problem related to computer vision that deals with the location of instances of semantic objects in a specific class. In this sense, the objective of this work was to implement a system capable of performing the detection of objects automatically in video, evaluating its operation with acoustic guitar, electric guitar and microphone. After this detection, a new video was generated emphasizing the detected instrument, in order to facilitate the observation of its execution. For that, a system was developed that, using a YOLOV4 model, is able to identify objects and perform a procedure similar to a zoom in on the video. A pipeline was implemented, where the frames are first extracted, and then the detection of a parameterized object in an interval of 12 frames. After detection, the clipping is made following an interpolation methodology to deal with the fluidity of the video, and, finally, a new video is generated from these clippings. Tests were made with different parameters for extracting the frames, using videos retrieved from Youtube, evaluating 4 scenarios for extracting the images. In these tests, the performance of the detection model, the time taken for extraction and the percentage of information excluded in the output video in each scenario were evaluated. For the validation of this pipeline, a methodology was adopted, assuming that detection would work efficiently, to validate the heuristic implemented in the interpolation, the confidence of the model for the 4 extraction scenarios and the behavior of the system when dealing with occlusion problems in the video.. In the validation, a pre-trained model of the YOLOV4 open source neural network with 80 classes was used, performing the detection of arbitrary objects, with cat and dog being the chosen classes. In addition, a customized YOLOV4 model was also trained to be able to perform specific detection of musical instruments with the Imagenet database. Regarding the network results, although not the focus of this work, the average accuracy achieved by the model in the guitar, acoustic guitar and microphone classes was 61.90%, 87.94% and 62.27%, respectively. In the zoom in system, it was possible to notice that the better the extraction parameter, the greater the number of objects detected by the model, as well as the greater precision and quality of detection. There was a small loss of quality from the resolution of the original video, and there was no significant loss of video content due to frame gap in detection. Concluding the analysis of the results obtained, it is possible to affirm that the proposal of the work was successful, therefore, all the presented objectives were reached. For future work, we aim to test new detection models, implement new output video evaluation criteria and parallelize the pipeline steps.
publishDate	2022
dc.date.none.fl_str_mv	2022-12-20 2022-10-31 2023-03-06T14:15:45Z 2023-03-06T14:15:45Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://repositorio.ufpb.br/jspui/handle/123456789/26374
url	https://repositorio.ufpb.br/jspui/handle/123456789/26374
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by-nd/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nd/3.0/br/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal da Paraíba Brasil Informática Programa de Pós-Graduação em Informática UFPB
publisher.none.fl_str_mv	Universidade Federal da Paraíba Brasil Informática Programa de Pós-Graduação em Informática UFPB
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFPB instname:Universidade Federal da Paraíba (UFPB) instacron:UFPB
instname_str	Universidade Federal da Paraíba (UFPB)
instacron_str	UFPB
institution	UFPB
reponame_str	Repositório Institucional da UFPB
collection	Repositório Institucional da UFPB
repository.name.fl_str_mv	Repositório Institucional da UFPB - Universidade Federal da Paraíba (UFPB)
repository.mail.fl_str_mv	diretoria@ufpb.br\|\|bdtd@biblioteca.ufpb.br
_version_	1863379059021447168

Detecção e destaque em vídeo de objetos utilizando YOLO

Registros relacionados