Exportação concluída — 

Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Pinheiro, Darwin de Oliveira
Orientador(a): Bezerra, Carla Ilane Moreira
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Área do conhecimento CNPq:
Link de acesso: http://repositorio.ufc.br/handle/riufc/78984
Resumo: Refactoring changes the internal structure of the code without changing its external behavior, improving quality, maintainability, and readability, in addition to reducing technical debt. Studies indicate the need to improve the detection and correction of refactorings, recommending the use of machine learning to investigate motivations, difficulties, and improvements in software. This Master’s dissertation aims to identify the relationship between trivial and non-trivial refactorings, in addition to proposing a metric that evaluates the triviality of implementing refactorings. Initially, we use supervised learning classifier models to examine the impact of trivial refactorings on the prediction of non-trivial ones. We analyzed three datasets, with 1,291 open source projects and approximately 1.9M refactoring operations, using 45 code metrics. The 5 classification models were used, in different dataset configurations. Second, we also propose an ML-based metric to evaluate the triviality of refactoring, considering complexity, speed, and risk. The study examined how the prioritization of 58 features, identified by 15 developers, affected the effectiveness of seven regression models. The effectiveness of 7 regression and ensemble models was analyzed. In addition, the alignment between the perceptions of 16 experienced developers and the results of the models was verified. Our results are promising: (i) Algorithms such as Random Forest, Decision Tree and Neural Network performed better when using code metrics to identify opportunities for refactorings; (ii) Separating trivial and non-trivial refactorings improves the efficiency of the models, even on different datasets; (iii) Using all available features outperforms the prioritization made by developers in predictive models; (iv) Ensemble models, such as Random Forest and Gradient Boosting, outperform linear models, regardless of feature prioritization; and (v) There is strong alignment between the perceptions of experts and the results of the models. In summary, this Master’s dissertation contributed to the refactoring process, an important support for developers, as it can influence the decision of whether or not to apply a refactoring. In addition, it highlights insights, challenges and opportunities for future work.
id UFC-7_d37140d1afa0976fedb0d6260743956d
oai_identifier_str oai:repositorio.ufc.br:riufc/78984
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Pinheiro, Darwin de OliveiraUchôa, Anderson GonçalvesBezerra, Carla Ilane Moreira2024-11-26T14:13:53Z2024-11-26T14:13:53Z2024PINHEIRO, Darwin de Oliveira. Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal. 2024. 204 f. Dissertação (mestrado) – Universidade Federal do Ceará, Campus de Quixadá, Programa de Pós-Graduação em Computação, Quixadá, 2024.http://repositorio.ufc.br/handle/riufc/78984Refactoring changes the internal structure of the code without changing its external behavior, improving quality, maintainability, and readability, in addition to reducing technical debt. Studies indicate the need to improve the detection and correction of refactorings, recommending the use of machine learning to investigate motivations, difficulties, and improvements in software. This Master’s dissertation aims to identify the relationship between trivial and non-trivial refactorings, in addition to proposing a metric that evaluates the triviality of implementing refactorings. Initially, we use supervised learning classifier models to examine the impact of trivial refactorings on the prediction of non-trivial ones. We analyzed three datasets, with 1,291 open source projects and approximately 1.9M refactoring operations, using 45 code metrics. The 5 classification models were used, in different dataset configurations. Second, we also propose an ML-based metric to evaluate the triviality of refactoring, considering complexity, speed, and risk. The study examined how the prioritization of 58 features, identified by 15 developers, affected the effectiveness of seven regression models. The effectiveness of 7 regression and ensemble models was analyzed. In addition, the alignment between the perceptions of 16 experienced developers and the results of the models was verified. Our results are promising: (i) Algorithms such as Random Forest, Decision Tree and Neural Network performed better when using code metrics to identify opportunities for refactorings; (ii) Separating trivial and non-trivial refactorings improves the efficiency of the models, even on different datasets; (iii) Using all available features outperforms the prioritization made by developers in predictive models; (iv) Ensemble models, such as Random Forest and Gradient Boosting, outperform linear models, regardless of feature prioritization; and (v) There is strong alignment between the perceptions of experts and the results of the models. In summary, this Master’s dissertation contributed to the refactoring process, an important support for developers, as it can influence the decision of whether or not to apply a refactoring. In addition, it highlights insights, challenges and opportunities for future work.A refatoração altera a estrutura interna do código sem modificar seu comportamento externo, melhorando a qualidade, manutenibilidade e legibilidade, além de reduzir a dívida técnica. Estudos indicam a necessidade de aprimorar a detecção e correção de refatorações, recomendando o uso de aprendizado de máquina para investigar motivações, dificuldades e melhorias no software. Esta dissertação tem como objetivo identificar a relação entre refatorações triviais e não triviais, além de propor uma métrica que avalia a trivialidade da implementação de refatorações. Inicialmente, utilizamos modelos classificadores de aprendizado supervisionado para examinar o impacto das refatorações triviais na predição das não triviais. Analisamos três conjuntos de dados, com 1.291 projetos de código aberto e aproximadamente 1,9M de operações de refatoração, utilizando 45 métricas de código. Foram utilizados os 5 modelos de classificação, em diferentes configurações do dataset. Em segundo lugar, propomos também uma métrica baseada em ML para avaliar a trivialidade da refatoração, considerando complexidade, velocidade e risco. O estudo examinou como a priorização de 58 featuers, apontadas por 15 desenvolvedores, afetou a eficácia de sete modelos de regressão. Analisou a eficácia dos de 7 modelos de regressão e ensemble. Além disso, verificou-se o alinhamento entre as percepções de 16 desenvolvedores experientes e os resultados dos modelos. Nossos resultados são promissores: (i) Algoritmos como Random Forest, Decision Tree e Neural Network tiveram melhor desempenho ao usar métricas de código para identificar oportunidades de refatorações; (ii) Separar refatorações triviais e não triviais melhora a eficiência dos modelos, mesmo em diferentes conjuntos de dados; (iii) Usar todas as features disponíveis supera a priorização feita pelos desenvolvedores nos modelos preditivos; (iv) Modelos ensemble, como Random Forest e Gradient Boosting, superam os modelos lineares, independentemente da priorização de features; e (v) Há forte alinhamento entre as percepções dos especialistas e os resultados dos modelos. Em resumo, esta dissertação contribuiu com o processo de refatoração, um apoio importante para os desenvolvedores, pois pode influenciar a decisão de aplicar ou não uma refatoração. Além de destacar insights, desafios e oportunidades para trabalhos futuros.Measuring trivial and non-trivial refactorings: a predictive analysis and index proposalinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisrefactoringsoftware maintenancesoftware qualitysupervised learningmachine learningCNPQ: CIÊNCIAS EXATAS E DA TERRA: CIÊNCIA DA COMPUTAÇÃOinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFChttp://lattes.cnpq.br/4277471687235814https://orcid.org/0000-0002-6847-5569http://lattes.cnpq.br/3740664626762609ORIGINAL2024_dis_dopinheiro.pdf2024_dis_dopinheiro.pdfapplication/pdf3111769http://repositorio.ufc.br/bitstream/riufc/78984/1/2024_dis_dopinheiro.pdf10c45c29df53bde0835f7a94dc60ccadMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/78984/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52riufc/789842024-11-26 11:13:56.433oai:repositorio.ufc.br:riufc/78984Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2024-11-26T14:13:56Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
title Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
spellingShingle Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
Pinheiro, Darwin de Oliveira
CNPQ: CIÊNCIAS EXATAS E DA TERRA: CIÊNCIA DA COMPUTAÇÃO
refactoring
software maintenance
software quality
supervised learning
machine learning
title_short Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
title_full Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
title_fullStr Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
title_full_unstemmed Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
title_sort Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal
author Pinheiro, Darwin de Oliveira
author_facet Pinheiro, Darwin de Oliveira
author_role author
dc.contributor.co-advisor.none.fl_str_mv Uchôa, Anderson Gonçalves
dc.contributor.author.fl_str_mv Pinheiro, Darwin de Oliveira
dc.contributor.advisor1.fl_str_mv Bezerra, Carla Ilane Moreira
contributor_str_mv Bezerra, Carla Ilane Moreira
dc.subject.cnpq.fl_str_mv CNPQ: CIÊNCIAS EXATAS E DA TERRA: CIÊNCIA DA COMPUTAÇÃO
topic CNPQ: CIÊNCIAS EXATAS E DA TERRA: CIÊNCIA DA COMPUTAÇÃO
refactoring
software maintenance
software quality
supervised learning
machine learning
dc.subject.en.pt_BR.fl_str_mv refactoring
software maintenance
software quality
supervised learning
machine learning
description Refactoring changes the internal structure of the code without changing its external behavior, improving quality, maintainability, and readability, in addition to reducing technical debt. Studies indicate the need to improve the detection and correction of refactorings, recommending the use of machine learning to investigate motivations, difficulties, and improvements in software. This Master’s dissertation aims to identify the relationship between trivial and non-trivial refactorings, in addition to proposing a metric that evaluates the triviality of implementing refactorings. Initially, we use supervised learning classifier models to examine the impact of trivial refactorings on the prediction of non-trivial ones. We analyzed three datasets, with 1,291 open source projects and approximately 1.9M refactoring operations, using 45 code metrics. The 5 classification models were used, in different dataset configurations. Second, we also propose an ML-based metric to evaluate the triviality of refactoring, considering complexity, speed, and risk. The study examined how the prioritization of 58 features, identified by 15 developers, affected the effectiveness of seven regression models. The effectiveness of 7 regression and ensemble models was analyzed. In addition, the alignment between the perceptions of 16 experienced developers and the results of the models was verified. Our results are promising: (i) Algorithms such as Random Forest, Decision Tree and Neural Network performed better when using code metrics to identify opportunities for refactorings; (ii) Separating trivial and non-trivial refactorings improves the efficiency of the models, even on different datasets; (iii) Using all available features outperforms the prioritization made by developers in predictive models; (iv) Ensemble models, such as Random Forest and Gradient Boosting, outperform linear models, regardless of feature prioritization; and (v) There is strong alignment between the perceptions of experts and the results of the models. In summary, this Master’s dissertation contributed to the refactoring process, an important support for developers, as it can influence the decision of whether or not to apply a refactoring. In addition, it highlights insights, challenges and opportunities for future work.
publishDate 2024
dc.date.accessioned.fl_str_mv 2024-11-26T14:13:53Z
dc.date.available.fl_str_mv 2024-11-26T14:13:53Z
dc.date.issued.fl_str_mv 2024
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv PINHEIRO, Darwin de Oliveira. Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal. 2024. 204 f. Dissertação (mestrado) – Universidade Federal do Ceará, Campus de Quixadá, Programa de Pós-Graduação em Computação, Quixadá, 2024.
dc.identifier.uri.fl_str_mv http://repositorio.ufc.br/handle/riufc/78984
identifier_str_mv PINHEIRO, Darwin de Oliveira. Measuring trivial and non-trivial refactorings: a predictive analysis and index proposal. 2024. 204 f. Dissertação (mestrado) – Universidade Federal do Ceará, Campus de Quixadá, Programa de Pós-Graduação em Computação, Quixadá, 2024.
url http://repositorio.ufc.br/handle/riufc/78984
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/78984/1/2024_dis_dopinheiro.pdf
http://repositorio.ufc.br/bitstream/riufc/78984/2/license.txt
bitstream.checksum.fl_str_mv 10c45c29df53bde0835f7a94dc60ccad
8a4605be74aa9ea9d79846c1fba20a33
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847793195797184512