Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Souza, Pedro Carneiro de lattes
Orientador(a): Bastos, Larissa Rocha Soares lattes
Banca de defesa: Bastos, Larissa Rocha, Machado, Ivan do Carmo, Uch?a, Anderson Gon?alves
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Estadual de Feira de Santana
Programa de Pós-Graduação: Programa de P?s-Gradua??o em Ci?ncia da Computa??o
Departamento: DEPARTAMENTO DE CI?NCIAS EXATAS
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://tede2.uefs.br:8080/handle/tede/1969
Resumo: Code refactoring is an essential practice to ensure the quality and continuous evolution of software systems, especially in languages like Python, which demand high maintainability. Although static analysis tools such as SonarQube provide support in identifying issues, the refactoring process still presents challenges, such as preserving functionality and improving code readability. In this context, Large Language Models (LLMs), such as GPT-4, DeepSeek, and Claude AI, emerge as promising tools by combining advanced contextual analysis with automated code generation. This study aims to evaluate the effectiveness of LLMs in refactoring Python code, focusing on correcting maintainability issues, identifying limitations, and proposing improvements. To this end, we conducted an empirical study with four widely used models: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3, and Gemini 2.5 Pro. Furthermore, this study also investigates whether these models perform better when used with more refined prompting techniques, such as few-shot learning. For this purpose, each LLM was submitted to two distinct prompting styles: zero-shot and few-shot, allowing a comparative analysis of the impact of these approaches on the quality of the generated refactorings. We evaluated 150 methods with maintainability problems by LLM and by prompt technique, and the results indicate that, although the models achieved considerable effectiveness rates in the few-shot scenario?Gemini (64.67%), DeepSeek (64.00%), Copilot (63.33%), and LLaMA 3.3 70B (55.33%), all faced significant limitations. Among the main challenges observed were the introduction of new maintainability problems, runtime errors, failures in automated tests, and, in some cases, the failure to correct the originally identified issue. Additionally, we conducted a human participant evaluation to analyze the readability of the code refactored by the models. The results indicate that 81.25% of the solutions were perceived as improvements, especially in structural aspects. However, there were also cases where readability was impaired, either due to the introduction of unnecessary complexity or lack of standardization in code style. These findings reinforce the need for caution when automatically adopting suggestions generated by LLMs, and highlight the importance of validation by developers in the final code review. This work contributes a comparative analysis of the capabilities of LLMs, pointing out their limitations and proposing practical methodologies for integrating AI into the code refactoring process. The results of this study aim to pave the way for new research, especially in the development of more efficient prompting techniques and in the evaluation of models yet to come. We hope these contributions help developers and researchers find more practical, reliable, and long-lasting solutions to improve software maintainability in everyday practice.
id UEFS_a5160f16e1176d1acd3d84fcf21400e2
oai_identifier_str oai:tede2.uefs.br:8080:tede/1969
network_acronym_str UEFS
network_name_str Biblioteca Digital de Teses e Dissertações da UEFS
repository_id_str
spelling Bastos, Larissa Rocha Soareshttps://orcid.org/0000-0002-8069-5249http://lattes.cnpq.br/5750570352089990Figueiredo, EduardoBastos, Larissa RochaMachado, Ivan do CarmoUch?a, Anderson Gon?alves7915054932849802http://lattes.cnpq.br/7915054932849802Souza, Pedro Carneiro de2025-12-04T18:35:04Z2025-08-19SOUZA, Pedro Carneiro de. Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python, 2025, 129f., Disserta??o (mestrado) - Programa de P?s-Gradua??o em Ci?ncia da Computa??o, Universidade Estadual de Feira de Santana, Feira de Santana.http://tede2.uefs.br:8080/handle/tede/1969Code refactoring is an essential practice to ensure the quality and continuous evolution of software systems, especially in languages like Python, which demand high maintainability. Although static analysis tools such as SonarQube provide support in identifying issues, the refactoring process still presents challenges, such as preserving functionality and improving code readability. In this context, Large Language Models (LLMs), such as GPT-4, DeepSeek, and Claude AI, emerge as promising tools by combining advanced contextual analysis with automated code generation. This study aims to evaluate the effectiveness of LLMs in refactoring Python code, focusing on correcting maintainability issues, identifying limitations, and proposing improvements. To this end, we conducted an empirical study with four widely used models: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3, and Gemini 2.5 Pro. Furthermore, this study also investigates whether these models perform better when used with more refined prompting techniques, such as few-shot learning. For this purpose, each LLM was submitted to two distinct prompting styles: zero-shot and few-shot, allowing a comparative analysis of the impact of these approaches on the quality of the generated refactorings. We evaluated 150 methods with maintainability problems by LLM and by prompt technique, and the results indicate that, although the models achieved considerable effectiveness rates in the few-shot scenario?Gemini (64.67%), DeepSeek (64.00%), Copilot (63.33%), and LLaMA 3.3 70B (55.33%), all faced significant limitations. Among the main challenges observed were the introduction of new maintainability problems, runtime errors, failures in automated tests, and, in some cases, the failure to correct the originally identified issue. Additionally, we conducted a human participant evaluation to analyze the readability of the code refactored by the models. The results indicate that 81.25% of the solutions were perceived as improvements, especially in structural aspects. However, there were also cases where readability was impaired, either due to the introduction of unnecessary complexity or lack of standardization in code style. These findings reinforce the need for caution when automatically adopting suggestions generated by LLMs, and highlight the importance of validation by developers in the final code review. This work contributes a comparative analysis of the capabilities of LLMs, pointing out their limitations and proposing practical methodologies for integrating AI into the code refactoring process. The results of this study aim to pave the way for new research, especially in the development of more efficient prompting techniques and in the evaluation of models yet to come. We hope these contributions help developers and researchers find more practical, reliable, and long-lasting solutions to improve software maintainability in everyday practice.A refatora??o de c?digo ? uma pr?tica essencial para garantir a qualidade e a evolu??o cont?nua dos sistemas de software, especialmente em linguagens como Python, que exigem alta manutenibilidade. Embora ferramentas de an?lise est?tica, como o SonarQube, ofere?am suporte na identifica??o de problemas, o processo de refatora??o ainda apresenta desafios, como a preserva??o da funcionalidade e a melhoria da legibilidade do c?digo. Nesse cen?rio, os Modelos de Linguagem de Grande Escala (LLMs), como o GPT-4, DeepSeek e Claude AI, surgem como ferramentas promissoras por combinarem an?lise contextual avan?ada com gera??o automatizada de c?digo. Este estudo tem como objetivo avaliar a efic?cia de LLMs na refatora??o de c?digo Python, com foco na corre??o de problemas de manutenibilidade, identifica??o de limita??es e proposi??o de melhorias. Para isso, conduzimos um estudo emp?rico com quatro modelos amplamente utilizados: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3 e Gemine 2.5 Pro. Al?m disso, este estudo tamb?m investiga se esses modelos apresentam melhor desempenho quando utilizados com t?cnicas de prompting mais refinadas, como o few-shot por exemplo. Para isso, cada LLM foi submetido a dois estilos distintos de prompting: zero-shot e few-shot, permitindo uma an?lise comparativa do impacto dessas abordagens na qualidade das refatora??es geradas. Avaliamos 150 m?todos com problemas de manutenibilidade por LLM e por t?cnica de prompt, e os resultados indicam que, embora os modelos tenham alcan?ado taxas consider?veis de efic?cia no cen?rio few-shot, Gemini (64,67%), DeepSeek (64,00%), Copilot (63,33%) e LLaMA 3.3 70B (55,33%), todos enfrentaram limita??es importantes. Entre os principais desafios observados est?o: a introdu??o de novos problemas de manutenibilidade, erros de execu??o, falhas em testes automatizados e, em alguns casos, a n?o corre??o do problema original identificado. Al?m disso, conduzimos uma avalia??o com participantes humanos para analisar a legibilidade do c?digo refatorado pelos modelos. Os resultados indicam que 81,25% das solu??es foram percebidas como melhorias, especialmente em aspectos estruturais. No entanto, tamb?m foram observados casos em que a legibilidade foi prejudicada, seja pela introdu??o de complexidade desnecess?ria ou pela falta de padroniza??o no estilo do c?digo. Esses achados refor?am a necessidade de cautela ao adotar automaticamente as sugest?es geradas por LLMs, al?m de destacar a import?ncia da valida??o por desenvolvedores na revis?o final do c?digo. Este trabalho contribui com uma an?lise comparativa das capacidades dos LLMs, apontando suas limita??es e propondo metodologias pr?ticas para a integra??o de IA no processo de refatora??o de c?digo. Os resultados deste estudo buscam contribuir para abrir caminho para novas pesquisas, principalmente no desenvolvimento de t?cnicas de prompting mais eficientes e na avalia??o de modelos que ainda est?o por vir. Esperamos que essas contribui??es ajudem desenvolvedores e pesquisadores a encontrar solu??es mais pr?ticas, confi?veis e duradouras para melhorar a manutenibilidade do software no dia a dia.Submitted by Daniela Costa (dmscosta@uefs.br) on 2025-12-04T18:35:04Z No. of bitstreams: 1 Pedro_Carneiro_de_Souza - Disserta??o.pdf: 2148468 bytes, checksum: 53a3720de92fe7600a5b6bfc1c1cbabf (MD5)Made available in DSpace on 2025-12-04T18:35:04Z (GMT). No. of bitstreams: 1 Pedro_Carneiro_de_Souza - Disserta??o.pdf: 2148468 bytes, checksum: 53a3720de92fe7600a5b6bfc1c1cbabf (MD5) Previous issue date: 2025-08-19application/pdfhttp://tede2.uefs.br:8080/retrieve/8227/Pedro_Carneiro_de_Souza%20-%20Disserta%c3%a7%c3%a3o.pdf.jpgporUniversidade Estadual de Feira de SantanaPrograma de P?s-Gradua??o em Ci?ncia da Computa??oUEFSBrasilDEPARTAMENTO DE CI?NCIAS EXATASRefatora??o de c?digoManutenibilidadeModelos de Linguagem de Larga Escala (LLMs)PythonCode RefactoringMaintainabilityLarge-Scale Language Models (LLMs)PythonCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOAvalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Pythoninfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis197499653308127447060060060079947400822895908073671711205811204509info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UEFSinstname:Universidade Estadual de Feira de Santana (UEFS)instacron:UEFSTHUMBNAILPedro_Carneiro_de_Souza - Disserta??o.pdf.jpgPedro_Carneiro_de_Souza - Disserta??o.pdf.jpgimage/jpeg3396http://tede2.uefs.br:8080/bitstream/tede/1969/4/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.jpg553456a27b3c9660cb80337d9462580eMD54TEXTPedro_Carneiro_de_Souza - Disserta??o.pdf.txtPedro_Carneiro_de_Souza - Disserta??o.pdf.txttext/plain259460http://tede2.uefs.br:8080/bitstream/tede/1969/3/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.txtee78d8dc97e8526bc18d818b5163e583MD53ORIGINALPedro_Carneiro_de_Souza - Disserta??o.pdfPedro_Carneiro_de_Souza - Disserta??o.pdfapplication/pdf2148468http://tede2.uefs.br:8080/bitstream/tede/1969/2/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf53a3720de92fe7600a5b6bfc1c1cbabfMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82165http://tede2.uefs.br:8080/bitstream/tede/1969/1/license.txtbd3efa91386c1718a7f26a329fdcb468MD51tede/19692025-12-05 01:00:11.191oai:tede2.uefs.br:8080:tede/1969Tk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgClhYWCAoU2lnbGEgZGEgVW5pdmVyc2lkYWRlKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIApwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgU2lnbGEgZGUgVW5pdmVyc2lkYWRlIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPDs3BpYSBhIHN1YSB0ZXNlIG91IApkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyAKbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgCmNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiAKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSAKb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIAppZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250ZcO6ZG8gZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFRFU0UgT1UgRElTU0VSVEHDh8ODTyBPUkEgREVQT1NJVEFEQSBURU5IQSBTSURPIFJFU1VMVEFETyBERSBVTSBQQVRST0PDjU5JTyBPVSAKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSAKVU5JVkVSU0lEQURFLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyAKVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIEVYSUdJREFTIFBPUiBDT05UUkFUTyBPVSBBQ09SRE8uCgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.uefs.br:8080/PUBhttp://tede2.uefs.br:8080/oai/requestbcuefs@uefs.br|| bcref@uefs.br||bcuefs@uefs.bropendoar:2025-12-05T04:00:11Biblioteca Digital de Teses e Dissertações da UEFS - Universidade Estadual de Feira de Santana (UEFS)false
dc.title.por.fl_str_mv Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
title Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
spellingShingle Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
Souza, Pedro Carneiro de
Refatora??o de c?digo
Manutenibilidade
Modelos de Linguagem de Larga Escala (LLMs)
Python
Code Refactoring
Maintainability
Large-Scale Language Models (LLMs)
Python
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
title_full Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
title_fullStr Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
title_full_unstemmed Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
title_sort Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python
author Souza, Pedro Carneiro de
author_facet Souza, Pedro Carneiro de
author_role author
dc.contributor.advisor1.fl_str_mv Bastos, Larissa Rocha Soares
dc.contributor.advisor1ID.fl_str_mv https://orcid.org/0000-0002-8069-5249
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/5750570352089990
dc.contributor.advisor-co1.fl_str_mv Figueiredo, Eduardo
dc.contributor.referee1.fl_str_mv Bastos, Larissa Rocha
dc.contributor.referee2.fl_str_mv Machado, Ivan do Carmo
dc.contributor.referee3.fl_str_mv Uch?a, Anderson Gon?alves
dc.contributor.authorID.fl_str_mv 7915054932849802
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/7915054932849802
dc.contributor.author.fl_str_mv Souza, Pedro Carneiro de
contributor_str_mv Bastos, Larissa Rocha Soares
Figueiredo, Eduardo
Bastos, Larissa Rocha
Machado, Ivan do Carmo
Uch?a, Anderson Gon?alves
dc.subject.por.fl_str_mv Refatora??o de c?digo
Manutenibilidade
Modelos de Linguagem de Larga Escala (LLMs)
Python
topic Refatora??o de c?digo
Manutenibilidade
Modelos de Linguagem de Larga Escala (LLMs)
Python
Code Refactoring
Maintainability
Large-Scale Language Models (LLMs)
Python
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Code Refactoring
Maintainability
Large-Scale Language Models (LLMs)
Python
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Code refactoring is an essential practice to ensure the quality and continuous evolution of software systems, especially in languages like Python, which demand high maintainability. Although static analysis tools such as SonarQube provide support in identifying issues, the refactoring process still presents challenges, such as preserving functionality and improving code readability. In this context, Large Language Models (LLMs), such as GPT-4, DeepSeek, and Claude AI, emerge as promising tools by combining advanced contextual analysis with automated code generation. This study aims to evaluate the effectiveness of LLMs in refactoring Python code, focusing on correcting maintainability issues, identifying limitations, and proposing improvements. To this end, we conducted an empirical study with four widely used models: Copilot Chat 4o, LLaMA 3.3 70B Instruct, DeepSeek V3, and Gemini 2.5 Pro. Furthermore, this study also investigates whether these models perform better when used with more refined prompting techniques, such as few-shot learning. For this purpose, each LLM was submitted to two distinct prompting styles: zero-shot and few-shot, allowing a comparative analysis of the impact of these approaches on the quality of the generated refactorings. We evaluated 150 methods with maintainability problems by LLM and by prompt technique, and the results indicate that, although the models achieved considerable effectiveness rates in the few-shot scenario?Gemini (64.67%), DeepSeek (64.00%), Copilot (63.33%), and LLaMA 3.3 70B (55.33%), all faced significant limitations. Among the main challenges observed were the introduction of new maintainability problems, runtime errors, failures in automated tests, and, in some cases, the failure to correct the originally identified issue. Additionally, we conducted a human participant evaluation to analyze the readability of the code refactored by the models. The results indicate that 81.25% of the solutions were perceived as improvements, especially in structural aspects. However, there were also cases where readability was impaired, either due to the introduction of unnecessary complexity or lack of standardization in code style. These findings reinforce the need for caution when automatically adopting suggestions generated by LLMs, and highlight the importance of validation by developers in the final code review. This work contributes a comparative analysis of the capabilities of LLMs, pointing out their limitations and proposing practical methodologies for integrating AI into the code refactoring process. The results of this study aim to pave the way for new research, especially in the development of more efficient prompting techniques and in the evaluation of models yet to come. We hope these contributions help developers and researchers find more practical, reliable, and long-lasting solutions to improve software maintainability in everyday practice.
publishDate 2025
dc.date.accessioned.fl_str_mv 2025-12-04T18:35:04Z
dc.date.issued.fl_str_mv 2025-08-19
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv SOUZA, Pedro Carneiro de. Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python, 2025, 129f., Disserta??o (mestrado) - Programa de P?s-Gradua??o em Ci?ncia da Computa??o, Universidade Estadual de Feira de Santana, Feira de Santana.
dc.identifier.uri.fl_str_mv http://tede2.uefs.br:8080/handle/tede/1969
identifier_str_mv SOUZA, Pedro Carneiro de. Avalia??o emp?rica da efic?cia de Modelos de Linguagem Grande (LLMs) na refatora??o de Projetos Python, 2025, 129f., Disserta??o (mestrado) - Programa de P?s-Gradua??o em Ci?ncia da Computa??o, Universidade Estadual de Feira de Santana, Feira de Santana.
url http://tede2.uefs.br:8080/handle/tede/1969
dc.language.iso.fl_str_mv por
language por
dc.relation.program.fl_str_mv 1974996533081274470
dc.relation.confidence.fl_str_mv 600
600
600
dc.relation.department.fl_str_mv 7994740082289590807
dc.relation.cnpq.fl_str_mv 3671711205811204509
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Estadual de Feira de Santana
dc.publisher.program.fl_str_mv Programa de P?s-Gradua??o em Ci?ncia da Computa??o
dc.publisher.initials.fl_str_mv UEFS
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv DEPARTAMENTO DE CI?NCIAS EXATAS
publisher.none.fl_str_mv Universidade Estadual de Feira de Santana
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações da UEFS
instname:Universidade Estadual de Feira de Santana (UEFS)
instacron:UEFS
instname_str Universidade Estadual de Feira de Santana (UEFS)
instacron_str UEFS
institution UEFS
reponame_str Biblioteca Digital de Teses e Dissertações da UEFS
collection Biblioteca Digital de Teses e Dissertações da UEFS
bitstream.url.fl_str_mv http://tede2.uefs.br:8080/bitstream/tede/1969/4/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.jpg
http://tede2.uefs.br:8080/bitstream/tede/1969/3/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf.txt
http://tede2.uefs.br:8080/bitstream/tede/1969/2/Pedro_Carneiro_de_Souza+-+Disserta%C3%A7%C3%A3o.pdf
http://tede2.uefs.br:8080/bitstream/tede/1969/1/license.txt
bitstream.checksum.fl_str_mv 553456a27b3c9660cb80337d9462580e
ee78d8dc97e8526bc18d818b5163e583
53a3720de92fe7600a5b6bfc1c1cbabf
bd3efa91386c1718a7f26a329fdcb468
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da UEFS - Universidade Estadual de Feira de Santana (UEFS)
repository.mail.fl_str_mv bcuefs@uefs.br|| bcref@uefs.br||bcuefs@uefs.br
_version_ 1865375263554011136