Face image inpainting based on Generative Adversarial Networks

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Ivamoto, Victor Soares
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/100/100131/tde-06022024-184049/
Resumo: Face image inpainting is a challenging problem in computer vision with several practical uses, and is employed in many image preprocessing applications. In the past few years, deep learning has made great breakthroughs in the field of image inpainting. The impressive results achieved by Generative Adversarial Networks in image processing increased the attention of the scientific community in recent years around facial inpainting. Recent network architecture developments are the two-stage networks using coarse to fine approach, landmarks, semantic segmentation map, and edge maps that guide the inpainting process. Moreover, improved convolutions enlarge the receptive field and filter the values passed to the next layer, and attention layers create relationships between local and distant information. The objective of this project is to evaluate and compare the efficacy of the baseline models identified in the literature on face datasets with three occlusion types. To this end, a comparative study was performed among the baseline models to identify the advantages and disadvantages of each model on three types of facial occlusions. A literature review gathered the baseline models, face datasets, occlusions, and evaluation metrics. The baseline models were DF1, DF2, EdgeConnect, GLCI, GMCNN, PConv, and SIIGM. The datasets were CelebA and CelebA-HQ. The occlusions were a square mask centered in the image, an irregular shape mask and a facial mask (MTF). The evaluation metrics were PSNR, SSIM and LPIPS. The comparative study consisted in two experiments, one training the baseline models from scratch, and the other using pretrained models. Both experiments followed the same testing procedure. GMCNN achieved the best overall results with the square mask, DF2 was the best model with the irregular mask, and both models were the best with the MTF mask. PConv results were very bad in all experiments, except for irregular mask with the pretrained model.
id USP_6f2f4676af9f728d26c4b1f0e66e623b
oai_identifier_str oai:teses.usp.br:tde-06022024-184049
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Face image inpainting based on Generative Adversarial NetworksRestauração de imagem facial baseada em Redes Adversárias GenerativasBiometriaBiometryComputer visionFacial inpaintingFacial occlusionGenerative Adversarial NetworkImage processingOclusão facialProcessamento de imagemRede Adversária GenerativaRestauração facialVisão computacionalFace image inpainting is a challenging problem in computer vision with several practical uses, and is employed in many image preprocessing applications. In the past few years, deep learning has made great breakthroughs in the field of image inpainting. The impressive results achieved by Generative Adversarial Networks in image processing increased the attention of the scientific community in recent years around facial inpainting. Recent network architecture developments are the two-stage networks using coarse to fine approach, landmarks, semantic segmentation map, and edge maps that guide the inpainting process. Moreover, improved convolutions enlarge the receptive field and filter the values passed to the next layer, and attention layers create relationships between local and distant information. The objective of this project is to evaluate and compare the efficacy of the baseline models identified in the literature on face datasets with three occlusion types. To this end, a comparative study was performed among the baseline models to identify the advantages and disadvantages of each model on three types of facial occlusions. A literature review gathered the baseline models, face datasets, occlusions, and evaluation metrics. The baseline models were DF1, DF2, EdgeConnect, GLCI, GMCNN, PConv, and SIIGM. The datasets were CelebA and CelebA-HQ. The occlusions were a square mask centered in the image, an irregular shape mask and a facial mask (MTF). The evaluation metrics were PSNR, SSIM and LPIPS. The comparative study consisted in two experiments, one training the baseline models from scratch, and the other using pretrained models. Both experiments followed the same testing procedure. GMCNN achieved the best overall results with the square mask, DF2 was the best model with the irregular mask, and both models were the best with the MTF mask. PConv results were very bad in all experiments, except for irregular mask with the pretrained model.A restauração de imagens faciais é um problema desafiador em visão computacional com vários usos práticos, e é empregada em muitas aplicações de pré-processamento de imagens. Nos últimos anos, o aprendizado profundo fez grandes avanços no campo de restauração de imagens. Os resultados impressionantes alcançados pelas Redes Adversariais Generativas (GANs) no processamento de imagens aumentaram a atenção da comunidade científica nos últimos anos em torno da pintura facial. Os desenvolvimentos recentes da arquitetura de rede são as redes de dois estágios que usam abordagem grosseira a fina, pontos de referência, mapa de segmentação semântica e mapas de borda que orientam o processo de pintura interna. Além disso, convoluções aprimoradas ampliam o campo receptivo e filtram os valores passados para a próxima camada, e as camadas de atenção criam relacionamentos entre informações locais e distantes. O objetivo deste projeto é avaliar e comparar a eficácia dos modelos de referência identificados na literatura em conjuntos de dados faciais com três tipos de oclusão. Para tanto, foi realizado um estudo comparativo entre os modelos basais para identificar as vantagens e desvantagens de cada modelo nos três tipos de oclusões faciais. Uma revisão da literatura reuniu os modelos de referência, conjuntos de dados faciais, oclusões e métricas de avaliação. Os modelos de referência foram DF1, DF2, EdgeConnect, GLCI, GMCNN, PConv e SIIGM. Os conjuntos de dados foram CelebA e CelebA-HQ. As oclusões foram uma máscara quadrada centrada na imagem, uma máscara de formato irregular e uma máscara facial (MTF). As métricas de avaliação foram PSNR, SSIM e LPIPS. O estudo comparativo consistiu em dois experimentos, um treinando do zero os modelos de referência e outro utilizando modelos pré-treinados. Ambos os experimentos seguiram o mesmo procedimento de teste. GMCNN obteve os melhores resultados gerais com a máscara quadrada, DF2 foi o melhor modelo com máscara irregular, e ambos os modelos foram os melhores com máscara MTF. Os resultados de PConv foram muito ruins em todos os experimentos, exceto para máscara irregular com o modelo pré-treinado.Biblioteca Digitais de Teses e Dissertações da USPLima, Clodoaldo Aparecido de MoraesIvamoto, Victor Soares2023-11-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/100/100131/tde-06022024-184049/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-12-09T16:36:02Zoai:teses.usp.br:tde-06022024-184049Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-12-09T16:36:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Face image inpainting based on Generative Adversarial Networks
Restauração de imagem facial baseada em Redes Adversárias Generativas
title Face image inpainting based on Generative Adversarial Networks
spellingShingle Face image inpainting based on Generative Adversarial Networks
Ivamoto, Victor Soares
Biometria
Biometry
Computer vision
Facial inpainting
Facial occlusion
Generative Adversarial Network
Image processing
Oclusão facial
Processamento de imagem
Rede Adversária Generativa
Restauração facial
Visão computacional
title_short Face image inpainting based on Generative Adversarial Networks
title_full Face image inpainting based on Generative Adversarial Networks
title_fullStr Face image inpainting based on Generative Adversarial Networks
title_full_unstemmed Face image inpainting based on Generative Adversarial Networks
title_sort Face image inpainting based on Generative Adversarial Networks
author Ivamoto, Victor Soares
author_facet Ivamoto, Victor Soares
author_role author
dc.contributor.none.fl_str_mv Lima, Clodoaldo Aparecido de Moraes
dc.contributor.author.fl_str_mv Ivamoto, Victor Soares
dc.subject.por.fl_str_mv Biometria
Biometry
Computer vision
Facial inpainting
Facial occlusion
Generative Adversarial Network
Image processing
Oclusão facial
Processamento de imagem
Rede Adversária Generativa
Restauração facial
Visão computacional
topic Biometria
Biometry
Computer vision
Facial inpainting
Facial occlusion
Generative Adversarial Network
Image processing
Oclusão facial
Processamento de imagem
Rede Adversária Generativa
Restauração facial
Visão computacional
description Face image inpainting is a challenging problem in computer vision with several practical uses, and is employed in many image preprocessing applications. In the past few years, deep learning has made great breakthroughs in the field of image inpainting. The impressive results achieved by Generative Adversarial Networks in image processing increased the attention of the scientific community in recent years around facial inpainting. Recent network architecture developments are the two-stage networks using coarse to fine approach, landmarks, semantic segmentation map, and edge maps that guide the inpainting process. Moreover, improved convolutions enlarge the receptive field and filter the values passed to the next layer, and attention layers create relationships between local and distant information. The objective of this project is to evaluate and compare the efficacy of the baseline models identified in the literature on face datasets with three occlusion types. To this end, a comparative study was performed among the baseline models to identify the advantages and disadvantages of each model on three types of facial occlusions. A literature review gathered the baseline models, face datasets, occlusions, and evaluation metrics. The baseline models were DF1, DF2, EdgeConnect, GLCI, GMCNN, PConv, and SIIGM. The datasets were CelebA and CelebA-HQ. The occlusions were a square mask centered in the image, an irregular shape mask and a facial mask (MTF). The evaluation metrics were PSNR, SSIM and LPIPS. The comparative study consisted in two experiments, one training the baseline models from scratch, and the other using pretrained models. Both experiments followed the same testing procedure. GMCNN achieved the best overall results with the square mask, DF2 was the best model with the irregular mask, and both models were the best with the MTF mask. PConv results were very bad in all experiments, except for irregular mask with the pretrained model.
publishDate 2023
dc.date.none.fl_str_mv 2023-11-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/100/100131/tde-06022024-184049/
url https://www.teses.usp.br/teses/disponiveis/100/100131/tde-06022024-184049/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1818598502180061184