Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python

Souza, Pedro Carneiro de

Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Souza, Pedro Carneiro de
Orientador(a):	Bastos, Larissa Rocha Soares
Banca de defesa:	Bastos, Larissa Rocha, Machado, Ivan do Carmo, Uchôa, Anderson Gonçalves
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Estadual de Feira de Santana
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação
Departamento:	DEPARTAMENTO DE CIÊNCIAS EXATAS
País:	Brasil
Palavras-chave em Português:	Refatoração de código Manutenibilidade Modelos de Linguagem de Larga Escala (LLMs) Python
Palavras-chave em Inglês:	Code Refactoring Maintainability Large-Scale Language Models (LLMs) Python
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://tede2.uefs.br:8080/handle/tede/1969
Resumo:	Code refactoring is an essential practice to ensure the quality and continuous evolution of software systems, especially in languages like Python, which demand high maintainability. Although static analysis tools such as SonarQube provide support in identifying issues, the refactoring process still presents challenges, such as preserving functionality and improving code readability. In this context, Large Language Models (LLMs), such as GPT-4, DeepSeek, and Claude AI, emerge as promising tools by combining advanced contextual analysis with automated code generation. This study aims to evaluate the effectiveness of LLMs in refactoring Python code, focusing on correcting maintainability issues, identifying limitations, and proposing improvements. To this end, we conducted an empirical study with four widely used models: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3, and Gemini 2.5 Pro. Furthermore, this study also investigates whether these models perform better when used with more refined prompting techniques, such as few-shot learning. For this purpose, each LLM was submitted to two distinct prompting styles: zero-shot and few-shot, allowing a comparative analysis of the impact of these approaches on the quality of the generated refactorings. We evaluated 150 methods with maintainability problems by LLM and by prompt technique, and the results indicate that, although the models achieved considerable effectiveness rates in the few-shot scenario—Gemini (64.67%), DeepSeek (64.00%), Copilot (63.33%), and LLaMA 3.3 70B (55.33%), all faced significant limitations. Among the main challenges observed were the introduction of new maintainability problems, runtime errors, failures in automated tests, and, in some cases, the failure to correct the originally identified issue. Additionally, we conducted a human participant evaluation to analyze the readability of the code refactored by the models. The results indicate that 81.25% of the solutions were perceived as improvements, especially in structural aspects. However, there were also cases where readability was impaired, either due to the introduction of unnecessary complexity or lack of standardization in code style. These findings reinforce the need for caution when automatically adopting suggestions generated by LLMs, and highlight the importance of validation by developers in the final code review. This work contributes a comparative analysis of the capabilities of LLMs, pointing out their limitations and proposing practical methodologies for integrating AI into the code refactoring process. The results of this study aim to pave the way for new research, especially in the development of more efficient prompting techniques and in the evaluation of models yet to come. We hope these contributions help developers and researchers find more practical, reliable, and long-lasting solutions to improve software maintainability in everyday practice.

Metadados do item

id	UEFS_a5160f16e1176d1acd3d84fcf21400e2
oai_identifier_str	oai:tede2.uefs.br:8080:tede/1969
network_acronym_str	UEFS
network_name_str	Biblioteca Digital de Teses e Dissertações da UEFS
repository_id_str
spelling	Bastos, Larissa Rocha Soareshttps://orcid.org/0000-0002-8069-5249http://lattes.cnpq.br/5750570352089990Figueiredo, EduardoBastos, Larissa RochaMachado, Ivan do CarmoUchôa, Anderson Gonçalves7915054932849802http://lattes.cnpq.br/7915054932849802Souza, Pedro Carneiro de2025-12-04T18:35:04Z2025-08-19SOUZA, Pedro Carneiro de. Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python, 2025, 129f., Dissertação (mestrado) - Programa de Pós-Graduação em Ciência da Computação, Universidade Estadual de Feira de Santana, Feira de Santana.http://tede2.uefs.br:8080/handle/tede/1969Code refactoring is an essential practice to ensure the quality and continuous evolution of software systems, especially in languages like Python, which demand high maintainability. Although static analysis tools such as SonarQube provide support in identifying issues, the refactoring process still presents challenges, such as preserving functionality and improving code readability. In this context, Large Language Models (LLMs), such as GPT-4, DeepSeek, and Claude AI, emerge as promising tools by combining advanced contextual analysis with automated code generation. This study aims to evaluate the effectiveness of LLMs in refactoring Python code, focusing on correcting maintainability issues, identifying limitations, and proposing improvements. To this end, we conducted an empirical study with four widely used models: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3, and Gemini 2.5 Pro. Furthermore, this study also investigates whether these models perform better when used with more refined prompting techniques, such as few-shot learning. For this purpose, each LLM was submitted to two distinct prompting styles: zero-shot and few-shot, allowing a comparative analysis of the impact of these approaches on the quality of the generated refactorings. We evaluated 150 methods with maintainability problems by LLM and by prompt technique, and the results indicate that, although the models achieved considerable effectiveness rates in the few-shot scenario—Gemini (64.67%), DeepSeek (64.00%), Copilot (63.33%), and LLaMA 3.3 70B (55.33%), all faced significant limitations. Among the main challenges observed were the introduction of new maintainability problems, runtime errors, failures in automated tests, and, in some cases, the failure to correct the originally identified issue. Additionally, we conducted a human participant evaluation to analyze the readability of the code refactored by the models. The results indicate that 81.25% of the solutions were perceived as improvements, especially in structural aspects. However, there were also cases where readability was impaired, either due to the introduction of unnecessary complexity or lack of standardization in code style. These findings reinforce the need for caution when automatically adopting suggestions generated by LLMs, and highlight the importance of validation by developers in the final code review. This work contributes a comparative analysis of the capabilities of LLMs, pointing out their limitations and proposing practical methodologies for integrating AI into the code refactoring process. The results of this study aim to pave the way for new research, especially in the development of more efficient prompting techniques and in the evaluation of models yet to come. We hope these contributions help developers and researchers find more practical, reliable, and long-lasting solutions to improve software maintainability in everyday practice.A refatoração de código é uma prática essencial para garantir a qualidade e a evolução contínua dos sistemas de software, especialmente em linguagens como Python, que exigem alta manutenibilidade. Embora ferramentas de análise estática, como o SonarQube, ofereçam suporte na identificação de problemas, o processo de refatoração ainda apresenta desafios, como a preservação da funcionalidade e a melhoria da legibilidade do código. Nesse cenário, os Modelos de Linguagem de Grande Escala (LLMs), como o GPT-4, DeepSeek e Claude AI, surgem como ferramentas promissoras por combinarem análise contextual avançada com geração automatizada de código. Este estudo tem como objetivo avaliar a eficácia de LLMs na refatoração de código Python, com foco na correção de problemas de manutenibilidade, identificação de limitações e proposição de melhorias. Para isso, conduzimos um estudo empírico com quatro modelos amplamente utilizados: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3 e Gemine 2.5 Pro. Além disso, este estudo também investiga se esses modelos apresentam melhor desempenho quando utilizados com técnicas de prompting mais refinadas, como o few-shot por exemplo. Para isso, cada LLM foi submetido a dois estilos distintos de prompting: zero-shot e few-shot, permitindo uma análise comparativa do impacto dessas abordagens na qualidade das refatorações geradas. Avaliamos 150 métodos com problemas de manutenibilidade por LLM e por técnica de prompt, e os resultados indicam que, embora os modelos tenham alcançado taxas consideráveis de eficácia no cenário few-shot, Gemini (64,67%), DeepSeek (64,00%), Copilot (63,33%) e LLaMA 3.3 70B (55,33%), todos enfrentaram limitações importantes. Entre os principais desafios observados estão: a introdução de novos problemas de manutenibilidade, erros de execução, falhas em testes automatizados e, em alguns casos, a não correção do problema original identificado. Além disso, conduzimos uma avaliação com participantes humanos para analisar a legibilidade do código refatorado pelos modelos. Os resultados indicam que 81,25% das soluções foram percebidas como melhorias, especialmente em aspectos estruturais. No entanto, também foram observados casos em que a legibilidade foi prejudicada, seja pela introdução de complexidade desnecessária ou pela falta de padronização no estilo do código. Esses achados reforçam a necessidade de cautela ao adotar automaticamente as sugestões geradas por LLMs, além de destacar a importância da validação por desenvolvedores na revisão final do código. Este trabalho contribui com uma análise comparativa das capacidades dos LLMs, apontando suas limitações e propondo metodologias práticas para a integração de IA no processo de refatoração de código. Os resultados deste estudo buscam contribuir para abrir caminho para novas pesquisas, principalmente no desenvolvimento de técnicas de prompting mais eficientes e na avaliação de modelos que ainda estão por vir. Esperamos que essas contribuições ajudem desenvolvedores e pesquisadores a encontrar soluções mais práticas, confiáveis e duradouras para melhorar a manutenibilidade do software no dia a dia.Submitted by Daniela Costa (dmscosta@uefs.br) on 2025-12-04T18:35:04Z No. of bitstreams: 1 Pedro_Carneiro_de_Souza - Dissertação.pdf: 2148468 bytes, checksum: 53a3720de92fe7600a5b6bfc1c1cbabf (MD5)Made available in DSpace on 2025-12-04T18:35:04Z (GMT). No. of bitstreams: 1 Pedro_Carneiro_de_Souza - Dissertação.pdf: 2148468 bytes, checksum: 53a3720de92fe7600a5b6bfc1c1cbabf (MD5) Previous issue date: 2025-08-19application/pdfhttp://tede2.uefs.br:8080/retrieve/8227/Pedro_Carneiro_de_Souza%20-%20Disserta%c3%a7%c3%a3o.pdf.jpgporUniversidade Estadual de Feira de SantanaPrograma de Pós-Graduação em Ciência da ComputaçãoUEFSBrasilDEPARTAMENTO DE CIÊNCIAS EXATASRefatoração de códigoManutenibilidadeModelos de Linguagem de Larga Escala (LLMs)PythonCode RefactoringMaintainabilityLarge-Scale Language Models (LLMs)PythonCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAvaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Pythoninfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis197499653308127447060060060079947400822895908073671711205811204509info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UEFSinstname:Universidade Estadual de Feira de Santana (UEFS)instacron:UEFSTHUMBNAILPedro_Carneiro_de_Souza - Dissertação.pdf.jpgPedro_Carneiro_de_Souza - Dissertação.pdf.jpgimage/jpeg3396http://tede2.uefs.br:8080/bitstream/tede/1969/4/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.jpg553456a27b3c9660cb80337d9462580eMD54TEXTPedro_Carneiro_de_Souza - Dissertação.pdf.txtPedro_Carneiro_de_Souza - Dissertação.pdf.txttext/plain259460http://tede2.uefs.br:8080/bitstream/tede/1969/3/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.txtee78d8dc97e8526bc18d818b5163e583MD53ORIGINALPedro_Carneiro_de_Souza - Dissertação.pdfPedro_Carneiro_de_Souza - Dissertação.pdfapplication/pdf2148468http://tede2.uefs.br:8080/bitstream/tede/1969/2/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf53a3720de92fe7600a5b6bfc1c1cbabfMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82165http://tede2.uefs.br:8080/bitstream/tede/1969/1/license.txtbd3efa91386c1718a7f26a329fdcb468MD51tede/19692025-12-05 01:00:11.191oai:tede2.uefs.br:8080:tede/1969Tk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgClhYWCAoU2lnbGEgZGEgVW5pdmVyc2lkYWRlKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIApwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgU2lnbGEgZGUgVW5pdmVyc2lkYWRlIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPDs3BpYSBhIHN1YSB0ZXNlIG91IApkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyAKbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgCmNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiAKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSAKb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIAppZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250ZcO6ZG8gZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFRFU0UgT1UgRElTU0VSVEHDh8ODTyBPUkEgREVQT1NJVEFEQSBURU5IQSBTSURPIFJFU1VMVEFETyBERSBVTSBQQVRST0PDjU5JTyBPVSAKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSAKVU5JVkVSU0lEQURFLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyAKVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIEVYSUdJREFTIFBPUiBDT05UUkFUTyBPVSBBQ09SRE8uCgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.uefs.br:8080/PUBhttp://tede2.uefs.br:8080/oai/requestbcuefs@uefs.br\|\| bcref@uefs.br\|\|bcuefs@uefs.bropendoar:2025-12-05T04:00:11Biblioteca Digital de Teses e Dissertações da UEFS - Universidade Estadual de Feira de Santana (UEFS)false
dc.title.por.fl_str_mv	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
title	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
spellingShingle	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python Souza, Pedro Carneiro de Refatoração de código Manutenibilidade Modelos de Linguagem de Larga Escala (LLMs) Python Code Refactoring Maintainability Large-Scale Language Models (LLMs) Python CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
title_full	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
title_fullStr	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
title_full_unstemmed	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
title_sort	Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python
author	Souza, Pedro Carneiro de
author_facet	Souza, Pedro Carneiro de
author_role	author
dc.contributor.advisor1.fl_str_mv	Bastos, Larissa Rocha Soares
dc.contributor.advisor1ID.fl_str_mv	https://orcid.org/0000-0002-8069-5249
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/5750570352089990
dc.contributor.advisor-co1.fl_str_mv	Figueiredo, Eduardo
dc.contributor.referee1.fl_str_mv	Bastos, Larissa Rocha
dc.contributor.referee2.fl_str_mv	Machado, Ivan do Carmo
dc.contributor.referee3.fl_str_mv	Uchôa, Anderson Gonçalves
dc.contributor.authorID.fl_str_mv	7915054932849802
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/7915054932849802
dc.contributor.author.fl_str_mv	Souza, Pedro Carneiro de
contributor_str_mv	Bastos, Larissa Rocha Soares Figueiredo, Eduardo Bastos, Larissa Rocha Machado, Ivan do Carmo Uchôa, Anderson Gonçalves
dc.subject.por.fl_str_mv	Refatoração de código Manutenibilidade Modelos de Linguagem de Larga Escala (LLMs) Python
topic	Refatoração de código Manutenibilidade Modelos de Linguagem de Larga Escala (LLMs) Python Code Refactoring Maintainability Large-Scale Language Models (LLMs) Python CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Code Refactoring Maintainability Large-Scale Language Models (LLMs) Python
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	Code refactoring is an essential practice to ensure the quality and continuous evolution of software systems, especially in languages like Python, which demand high maintainability. Although static analysis tools such as SonarQube provide support in identifying issues, the refactoring process still presents challenges, such as preserving functionality and improving code readability. In this context, Large Language Models (LLMs), such as GPT-4, DeepSeek, and Claude AI, emerge as promising tools by combining advanced contextual analysis with automated code generation. This study aims to evaluate the effectiveness of LLMs in refactoring Python code, focusing on correcting maintainability issues, identifying limitations, and proposing improvements. To this end, we conducted an empirical study with four widely used models: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3, and Gemini 2.5 Pro. Furthermore, this study also investigates whether these models perform better when used with more refined prompting techniques, such as few-shot learning. For this purpose, each LLM was submitted to two distinct prompting styles: zero-shot and few-shot, allowing a comparative analysis of the impact of these approaches on the quality of the generated refactorings. We evaluated 150 methods with maintainability problems by LLM and by prompt technique, and the results indicate that, although the models achieved considerable effectiveness rates in the few-shot scenario—Gemini (64.67%), DeepSeek (64.00%), Copilot (63.33%), and LLaMA 3.3 70B (55.33%), all faced significant limitations. Among the main challenges observed were the introduction of new maintainability problems, runtime errors, failures in automated tests, and, in some cases, the failure to correct the originally identified issue. Additionally, we conducted a human participant evaluation to analyze the readability of the code refactored by the models. The results indicate that 81.25% of the solutions were perceived as improvements, especially in structural aspects. However, there were also cases where readability was impaired, either due to the introduction of unnecessary complexity or lack of standardization in code style. These findings reinforce the need for caution when automatically adopting suggestions generated by LLMs, and highlight the importance of validation by developers in the final code review. This work contributes a comparative analysis of the capabilities of LLMs, pointing out their limitations and proposing practical methodologies for integrating AI into the code refactoring process. The results of this study aim to pave the way for new research, especially in the development of more efficient prompting techniques and in the evaluation of models yet to come. We hope these contributions help developers and researchers find more practical, reliable, and long-lasting solutions to improve software maintainability in everyday practice.
publishDate	2025
dc.date.accessioned.fl_str_mv	2025-12-04T18:35:04Z
dc.date.issued.fl_str_mv	2025-08-19
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	SOUZA, Pedro Carneiro de. Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python, 2025, 129f., Dissertação (mestrado) - Programa de Pós-Graduação em Ciência da Computação, Universidade Estadual de Feira de Santana, Feira de Santana.
dc.identifier.uri.fl_str_mv	http://tede2.uefs.br:8080/handle/tede/1969
identifier_str_mv	SOUZA, Pedro Carneiro de. Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python, 2025, 129f., Dissertação (mestrado) - Programa de Pós-Graduação em Ciência da Computação, Universidade Estadual de Feira de Santana, Feira de Santana.
url	http://tede2.uefs.br:8080/handle/tede/1969
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	1974996533081274470
dc.relation.confidence.fl_str_mv	600 600 600
dc.relation.department.fl_str_mv	7994740082289590807
dc.relation.cnpq.fl_str_mv	3671711205811204509
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Estadual de Feira de Santana
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	UEFS
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	DEPARTAMENTO DE CIÊNCIAS EXATAS
publisher.none.fl_str_mv	Universidade Estadual de Feira de Santana
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UEFS instname:Universidade Estadual de Feira de Santana (UEFS) instacron:UEFS
instname_str	Universidade Estadual de Feira de Santana (UEFS)
instacron_str	UEFS
institution	UEFS
reponame_str	Biblioteca Digital de Teses e Dissertações da UEFS
collection	Biblioteca Digital de Teses e Dissertações da UEFS
bitstream.url.fl_str_mv	http://tede2.uefs.br:8080/bitstream/tede/1969/4/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.jpg http://tede2.uefs.br:8080/bitstream/tede/1969/3/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.txt http://tede2.uefs.br:8080/bitstream/tede/1969/2/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf http://tede2.uefs.br:8080/bitstream/tede/1969/1/license.txt
bitstream.checksum.fl_str_mv	553456a27b3c9660cb80337d9462580e ee78d8dc97e8526bc18d818b5163e583 53a3720de92fe7600a5b6bfc1c1cbabf bd3efa91386c1718a7f26a329fdcb468
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UEFS - Universidade Estadual de Feira de Santana (UEFS)
repository.mail.fl_str_mv	bcuefs@uefs.br\|\| bcref@uefs.br\|\|bcuefs@uefs.br
_version_	1856388868666818560

Avaliação empírica da eficácia de Modelos de Linguagem Grande (LLMs) na refatoração de Projetos Python

Registros relacionados