Enhancing the SZZ Algorithm to Deal with Refactoring Changes

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: Campos Neto, Edmilson Barbalho
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Brasil
UFRN
PROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃO
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufrn.br/jspui/handle/123456789/26204
Resumo: SZZ was proposed by Śliwerski, Zimmermann, and Zeller (hence the SZZ abbreviation) to identify fix-inducing changes, i.e., changes that are likely to induce bugs. Despite the wide adoption of this algorithm, SZZ still faces limitations, which have been recently reported. No existing research work widely surveys how SZZ has been used, extended, and evaluated by the software engineering community. Furthermore, only a few studies have investigated the SZZ improvements. Therefore, this thesis aims to explore the existing limitations that have been documented in the SZZ literature and to enhance the state-of-the-art of SZZ by proposing solutions to some of its limitations. First, we perform a systematic mapping study to determine the current state-of-the-art of the SZZ algorithm. Our results exhibit that majority of the analyzed studies use SZZ as a foundation to conduct their empirical studies (79%), while only a few studies have proposed direct improvements to SZZ (6%) or evaluated it (4%). We further observe that SZZ exhibits several unaddressed limitations, such as the bias related to the refactoring changes. Second, we conducted an empirical study to investigate the relationship between the refactoring changes and the SZZ results. We use RefDiff — a tool that detects code refactoring with high precision — to analyze an extensive dataset, including 31,518 issues of ten systems, with 64,855 bug-fixes and 20,298 fix-inducing changes. We run RefDiff for both bug-fix and fix-inducing changes that were generated by a recent SZZ implementation, meta-change aware SZZ (MASZZ). The results indicate a refactoring ratio of 6.5% for fix-inducing changes and 19.9% for bug-fix changes. We incorporated RefDiff into MA-SZZ and proposed a refactoring aware SZZ implementation (RA-SZZ). RA-SZZ reduces the number of lines flagged as fix-inducing changes by MA-SZZ by 20.8%. These results indicate that refactoring can impact the SZZ results. Using an evaluation framework, we observe that RA-SZZ reduces the disagreement ratio compared to previous implementations; however, our results suggest the SZZ accuracy must still be improved. Finally, we evaluated the RA-SZZ accuracy using a well-accepted dataset, which we validated for evaluating SZZ implementations. Furthermore, we revisited the known RA-SZZ limitations to improve the accuracy of the algorithm by integrating a novel refactoring-detection tool, RMiner. We observed that after refining RA-SZZ, in the median, 44% of the lines that were flagged as fix-inducing are accurate, while only 29% were flagged accurately in case of the MA-SZZ-generated results. We also manually analyzed the RA-SZZ results and observed that there are still refactoring (31.17%) and equivalent changes (13.64%) to be recognized by SZZ. This result reiterates that detecting refactoring indeed increases the SZZ accuracy. Our thesis results contribute to SZZ maturation and indicate that the impact of refactoring upon SZZ may be even higher if further improvements can be made in future studies.
id UFRN_1780fa89bb0c1ea4482342d66d2ab2df
oai_identifier_str oai:repositorio.ufrn.br:123456789/26204
network_acronym_str UFRN
network_name_str Repositório Institucional da UFRN
repository_id_str
spelling Enhancing the SZZ Algorithm to Deal with Refactoring ChangesSZZ algorithmFix-inducing changesBug-fix changesRefactoring changesCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAOSZZ was proposed by Śliwerski, Zimmermann, and Zeller (hence the SZZ abbreviation) to identify fix-inducing changes, i.e., changes that are likely to induce bugs. Despite the wide adoption of this algorithm, SZZ still faces limitations, which have been recently reported. No existing research work widely surveys how SZZ has been used, extended, and evaluated by the software engineering community. Furthermore, only a few studies have investigated the SZZ improvements. Therefore, this thesis aims to explore the existing limitations that have been documented in the SZZ literature and to enhance the state-of-the-art of SZZ by proposing solutions to some of its limitations. First, we perform a systematic mapping study to determine the current state-of-the-art of the SZZ algorithm. Our results exhibit that majority of the analyzed studies use SZZ as a foundation to conduct their empirical studies (79%), while only a few studies have proposed direct improvements to SZZ (6%) or evaluated it (4%). We further observe that SZZ exhibits several unaddressed limitations, such as the bias related to the refactoring changes. Second, we conducted an empirical study to investigate the relationship between the refactoring changes and the SZZ results. We use RefDiff — a tool that detects code refactoring with high precision — to analyze an extensive dataset, including 31,518 issues of ten systems, with 64,855 bug-fixes and 20,298 fix-inducing changes. We run RefDiff for both bug-fix and fix-inducing changes that were generated by a recent SZZ implementation, meta-change aware SZZ (MASZZ). The results indicate a refactoring ratio of 6.5% for fix-inducing changes and 19.9% for bug-fix changes. We incorporated RefDiff into MA-SZZ and proposed a refactoring aware SZZ implementation (RA-SZZ). RA-SZZ reduces the number of lines flagged as fix-inducing changes by MA-SZZ by 20.8%. These results indicate that refactoring can impact the SZZ results. Using an evaluation framework, we observe that RA-SZZ reduces the disagreement ratio compared to previous implementations; however, our results suggest the SZZ accuracy must still be improved. Finally, we evaluated the RA-SZZ accuracy using a well-accepted dataset, which we validated for evaluating SZZ implementations. Furthermore, we revisited the known RA-SZZ limitations to improve the accuracy of the algorithm by integrating a novel refactoring-detection tool, RMiner. We observed that after refining RA-SZZ, in the median, 44% of the lines that were flagged as fix-inducing are accurate, while only 29% were flagged accurately in case of the MA-SZZ-generated results. We also manually analyzed the RA-SZZ results and observed that there are still refactoring (31.17%) and equivalent changes (13.64%) to be recognized by SZZ. This result reiterates that detecting refactoring indeed increases the SZZ accuracy. Our thesis results contribute to SZZ maturation and indicate that the impact of refactoring upon SZZ may be even higher if further improvements can be made in future studies.BrasilUFRNPROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃOKulesza, UiraCosta, Daniel Alencar daAranha, Eduardo Henrique da SilvaNunes, Ingrid Oliveira deMaia, Marcelo de AlmeidaCoelho, Roberta de SouzaCampos Neto, Edmilson Barbalho2018-11-27T20:42:18Z2018-11-27T20:42:18Z2018-07-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfCAMPOS NETO, Edmilson Barbalho. Enhancing the SZZ Algorithm to Deal with Refactoring Changes. 2018. 133f. Tese (Doutorado em Ciência da Computação) - Centro de Ciências Exatas e da Terra, Universidade Federal do Rio Grande do Norte, Natal, 2018.https://repositorio.ufrn.br/jspui/handle/123456789/26204porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRN2019-01-29T19:54:39Zoai:repositorio.ufrn.br:123456789/26204Repositório InstitucionalPUBhttp://repositorio.ufrn.br/oai/repositorio@bczm.ufrn.bropendoar:2019-01-29T19:54:39Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.none.fl_str_mv Enhancing the SZZ Algorithm to Deal with Refactoring Changes
title Enhancing the SZZ Algorithm to Deal with Refactoring Changes
spellingShingle Enhancing the SZZ Algorithm to Deal with Refactoring Changes
Campos Neto, Edmilson Barbalho
SZZ algorithm
Fix-inducing changes
Bug-fix changes
Refactoring changes
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
title_short Enhancing the SZZ Algorithm to Deal with Refactoring Changes
title_full Enhancing the SZZ Algorithm to Deal with Refactoring Changes
title_fullStr Enhancing the SZZ Algorithm to Deal with Refactoring Changes
title_full_unstemmed Enhancing the SZZ Algorithm to Deal with Refactoring Changes
title_sort Enhancing the SZZ Algorithm to Deal with Refactoring Changes
author Campos Neto, Edmilson Barbalho
author_facet Campos Neto, Edmilson Barbalho
author_role author
dc.contributor.none.fl_str_mv Kulesza, Uira


Costa, Daniel Alencar da

Aranha, Eduardo Henrique da Silva

Nunes, Ingrid Oliveira de

Maia, Marcelo de Almeida

Coelho, Roberta de Souza

dc.contributor.author.fl_str_mv Campos Neto, Edmilson Barbalho
dc.subject.por.fl_str_mv SZZ algorithm
Fix-inducing changes
Bug-fix changes
Refactoring changes
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
topic SZZ algorithm
Fix-inducing changes
Bug-fix changes
Refactoring changes
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::SISTEMAS DE COMPUTACAO
description SZZ was proposed by Śliwerski, Zimmermann, and Zeller (hence the SZZ abbreviation) to identify fix-inducing changes, i.e., changes that are likely to induce bugs. Despite the wide adoption of this algorithm, SZZ still faces limitations, which have been recently reported. No existing research work widely surveys how SZZ has been used, extended, and evaluated by the software engineering community. Furthermore, only a few studies have investigated the SZZ improvements. Therefore, this thesis aims to explore the existing limitations that have been documented in the SZZ literature and to enhance the state-of-the-art of SZZ by proposing solutions to some of its limitations. First, we perform a systematic mapping study to determine the current state-of-the-art of the SZZ algorithm. Our results exhibit that majority of the analyzed studies use SZZ as a foundation to conduct their empirical studies (79%), while only a few studies have proposed direct improvements to SZZ (6%) or evaluated it (4%). We further observe that SZZ exhibits several unaddressed limitations, such as the bias related to the refactoring changes. Second, we conducted an empirical study to investigate the relationship between the refactoring changes and the SZZ results. We use RefDiff — a tool that detects code refactoring with high precision — to analyze an extensive dataset, including 31,518 issues of ten systems, with 64,855 bug-fixes and 20,298 fix-inducing changes. We run RefDiff for both bug-fix and fix-inducing changes that were generated by a recent SZZ implementation, meta-change aware SZZ (MASZZ). The results indicate a refactoring ratio of 6.5% for fix-inducing changes and 19.9% for bug-fix changes. We incorporated RefDiff into MA-SZZ and proposed a refactoring aware SZZ implementation (RA-SZZ). RA-SZZ reduces the number of lines flagged as fix-inducing changes by MA-SZZ by 20.8%. These results indicate that refactoring can impact the SZZ results. Using an evaluation framework, we observe that RA-SZZ reduces the disagreement ratio compared to previous implementations; however, our results suggest the SZZ accuracy must still be improved. Finally, we evaluated the RA-SZZ accuracy using a well-accepted dataset, which we validated for evaluating SZZ implementations. Furthermore, we revisited the known RA-SZZ limitations to improve the accuracy of the algorithm by integrating a novel refactoring-detection tool, RMiner. We observed that after refining RA-SZZ, in the median, 44% of the lines that were flagged as fix-inducing are accurate, while only 29% were flagged accurately in case of the MA-SZZ-generated results. We also manually analyzed the RA-SZZ results and observed that there are still refactoring (31.17%) and equivalent changes (13.64%) to be recognized by SZZ. This result reiterates that detecting refactoring indeed increases the SZZ accuracy. Our thesis results contribute to SZZ maturation and indicate that the impact of refactoring upon SZZ may be even higher if further improvements can be made in future studies.
publishDate 2018
dc.date.none.fl_str_mv 2018-11-27T20:42:18Z
2018-11-27T20:42:18Z
2018-07-20
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv CAMPOS NETO, Edmilson Barbalho. Enhancing the SZZ Algorithm to Deal with Refactoring Changes. 2018. 133f. Tese (Doutorado em Ciência da Computação) - Centro de Ciências Exatas e da Terra, Universidade Federal do Rio Grande do Norte, Natal, 2018.
https://repositorio.ufrn.br/jspui/handle/123456789/26204
identifier_str_mv CAMPOS NETO, Edmilson Barbalho. Enhancing the SZZ Algorithm to Deal with Refactoring Changes. 2018. 133f. Tese (Doutorado em Ciência da Computação) - Centro de Ciências Exatas e da Terra, Universidade Federal do Rio Grande do Norte, Natal, 2018.
url https://repositorio.ufrn.br/jspui/handle/123456789/26204
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Brasil
UFRN
PROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃO
publisher.none.fl_str_mv Brasil
UFRN
PROGRAMA DE PÓS-GRADUAÇÃO EM SISTEMAS E COMPUTAÇÃO
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRN
instname:Universidade Federal do Rio Grande do Norte (UFRN)
instacron:UFRN
instname_str Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str UFRN
institution UFRN
reponame_str Repositório Institucional da UFRN
collection Repositório Institucional da UFRN
repository.name.fl_str_mv Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv repositorio@bczm.ufrn.br
_version_ 1855758908115648512