Vocabulário de testes instáveis entre linguagens de programação

Soratto, Rafael Rampim

Vocabulário de testes instáveis entre linguagens de programação

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Soratto, Rafael Rampim
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Tecnológica Federal do Paraná Campo Mourao Brasil Programa de Pós-Graduação em Ciência da Computação UTFPR
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Inovações tecnológicas Engenharia de software JavaScript (Linguagem de programação de computador) Python (Linguagem de programação de computador) Java (Linguagem de programação de computador) Technological innovations Software engineering JavaScript (Computer program language) Python (Computer program language) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE Ciência da Computação
Link de acesso:	http://repositorio.utfpr.edu.br/jspui/handle/1/38992
Resumo:	Context: Regression testing is a verification and validation activity of systems present in modern software engineering. In this activity, tests may fail without any implementation changes, characterizing them as unstable tests (flaky test). This type of instability can delay software release and reduce confidence in the tests. One way to identify such unstable tests is through test re-execution, but this comes with a high computational cost. An alternative to re-execution is static code analysis of the test cases, identifying patterns related to instability. In this context, so far, only vocabulary-based works focusing on single language applications (Java, Python, JavaScript) have been observed. To understand the limits and possibilities of this approach, it is interesting to check the intersection of unstable vocabularies across different languages. Objective: The objective of this work was to evaluate the vocabulary technique for predicting unstable tests in JavaScript, Java, and Python applications. Method: To achieve this objective, a dataset was constructed with unstable test cases present in open-source projects on GitHub that utilize JavaScript and the dataset from previous works for Python and Java was used. Classification models were then created, considering the instability vocabulary between different projects and languages. After evaluating the models’ accuracy, we examined the words with the highest information gain for predicting instability across languages. Results: We created the ShakerJS tool, which managed to identify 102 flaky tests in 36% of 36 relevant projects in JavaScript using CPU and memory stress. The terms with the highest information gain for predicting flaky tests in this set were: await, async, if, and path. In JavaScript, similarly to Java, the RF model had the best accuracy (0.96), while the DT model slightly surpassed RF in terms of recall (0.77 vs 0.63). The results are better when the model is trained and tested within the same project scope and language. Some projects achieve results of 100% accuracy and recall. We noted that the quality of the models concerning completely unknown intermittent failures is low. We observed the low quality of the models regarding intermittent failures across different programming languages: JavaScript, Java, and Python. The best result was for the model trained in JavaScript and tested in Java, which presented an F1 of 0.63. However, regarding the unstable vocabulary across languages, the results are positive. Between the programming languages Python and JavaScript, we can find terms related to the following root causes of flaky tests: asynchronous communication (await, 400, data), concurrency, graphical user interface events (plot, page, button, click, and width), time dependency (time), and resource leaks (source, data, args). Between the Java and JavaScript languages, we can verify the intersection of terms related to concurrency (manager), asynchronous waiting (await), and resource leaks (new, class). Between the Python and Java languages, we can find terms related to asynchronous communication (waitFor), concurrency (get, select), time dependency (date), and resource leaks (evaluate, catch). Conclusions: This work presents relevant results for a more efficient identification of unstable tests in projects that utilize JavaScript, Java, and Python. There is an intersection between contexts and root causes of instabilities from different projects and languages. However, using only the pure content of the source code may reduce the performance of prediction models in a scenario with different projects and languages.

Metadados do item

id	UTFPR-12_db721fb5d5e7321b931197280726aa63
oai_identifier_str	oai:repositorio.utfpr.edu.br:1/38992
network_acronym_str	UTFPR-12
network_name_str	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository_id_str
spelling	Vocabulário de testes instáveis entre linguagens de programaçãoVocabulary of flaky tests across programming languagesInovações tecnológicasEngenharia de softwareJavaScript (Linguagem de programação de computador)Python (Linguagem de programação de computador)Java (Linguagem de programação de computador)Technological innovationsSoftware engineeringJavaScript (Computer program language)Python (Computer program language)CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARECiência da ComputaçãoContext: Regression testing is a verification and validation activity of systems present in modern software engineering. In this activity, tests may fail without any implementation changes, characterizing them as unstable tests (flaky test). This type of instability can delay software release and reduce confidence in the tests. One way to identify such unstable tests is through test re-execution, but this comes with a high computational cost. An alternative to re-execution is static code analysis of the test cases, identifying patterns related to instability. In this context, so far, only vocabulary-based works focusing on single language applications (Java, Python, JavaScript) have been observed. To understand the limits and possibilities of this approach, it is interesting to check the intersection of unstable vocabularies across different languages. Objective: The objective of this work was to evaluate the vocabulary technique for predicting unstable tests in JavaScript, Java, and Python applications. Method: To achieve this objective, a dataset was constructed with unstable test cases present in open-source projects on GitHub that utilize JavaScript and the dataset from previous works for Python and Java was used. Classification models were then created, considering the instability vocabulary between different projects and languages. After evaluating the models’ accuracy, we examined the words with the highest information gain for predicting instability across languages. Results: We created the ShakerJS tool, which managed to identify 102 flaky tests in 36% of 36 relevant projects in JavaScript using CPU and memory stress. The terms with the highest information gain for predicting flaky tests in this set were: await, async, if, and path. In JavaScript, similarly to Java, the RF model had the best accuracy (0.96), while the DT model slightly surpassed RF in terms of recall (0.77 vs 0.63). The results are better when the model is trained and tested within the same project scope and language. Some projects achieve results of 100% accuracy and recall. We noted that the quality of the models concerning completely unknown intermittent failures is low. We observed the low quality of the models regarding intermittent failures across different programming languages: JavaScript, Java, and Python. The best result was for the model trained in JavaScript and tested in Java, which presented an F1 of 0.63. However, regarding the unstable vocabulary across languages, the results are positive. Between the programming languages Python and JavaScript, we can find terms related to the following root causes of flaky tests: asynchronous communication (await, 400, data), concurrency, graphical user interface events (plot, page, button, click, and width), time dependency (time), and resource leaks (source, data, args). Between the Java and JavaScript languages, we can verify the intersection of terms related to concurrency (manager), asynchronous waiting (await), and resource leaks (new, class). Between the Python and Java languages, we can find terms related to asynchronous communication (waitFor), concurrency (get, select), time dependency (date), and resource leaks (evaluate, catch). Conclusions: This work presents relevant results for a more efficient identification of unstable tests in projects that utilize JavaScript, Java, and Python. There is an intersection between contexts and root causes of instabilities from different projects and languages. However, using only the pure content of the source code may reduce the performance of prediction models in a scenario with different projects and languages.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Contexto: O teste de regressão é uma atividade de verificação e validação de sistemas presente na engenharia de software moderna. Nesta atividade, testes podem falhar sem nenhuma alteração de implementação, caracterizando-se como testes instáveis (flaky test). Este tipo de instabilidade pode atrasar o lançamento do software e reduzir a confiança dos testes. Uma forma de identificar tais testes instáveis é pela reexecução dos testes, mas isso possui um custo computacional elevado. Uma alternativa à reexecução é a análise estática do código dos casos de teste, identificando padrões relacionados à instabilidade. Nesse contexto, por enquanto observam-se apenas trabalhos de vocabulário que abordam aplicações de linguagens únicas (Java, Python, Javascript). Para conhecer os limites e possibilidades dessa abordagem, é interessante verificar a intersecção de vocabulários instáveis entre diferentes linguagens. Objetivo: O objetivo deste trabalho foi avaliar a técnica de vocabulário para predição de testes instáveis de aplicações Javascript, Java e Python. Método: Para atingir tal objetivo foi construído um conjunto de dados com casos de teste instáveis presentes em projetos de código aberto no Github que utilizam Javascript, e também foram utilizados conjuntos de dados de trabalhos anteriores para Python e Java. A seguir foram criados modelos de classificação, considerando o vocabulário de instabilidade entre diferentes projetos e linguagens. Após avaliar a precisão dos modelos, verificamos as palavras com maior ganho de informação para predição de instabilidade entre linguagens. Resultados: Criamos a ferramenta ShakerJS que conseguiu identificar 102 flaky tests em 36% de 36 projetos relevantes em JavaScript utilizando o estresse de CPU e memória. Os termos com maior ganho de informação para predição de flaky tests neste conjunto foram: await, async, if e path. Em JavaScript, semelhantemente ao Java, o modelo Random Forest (RF) teve a melhor precisão (0,96), enquanto o modelo Decision Tree (DT) superou ligeiramente o RF em termos de recall (0,77 vs 0,63). Os resultados são melhores quando o modelo é treinado e testado no mesmo escopo de projeto e linguagem. Alguns projetos chegam a alcançar resultados de 100% de precisão e recall. Verificamos que a qualidade dos modelos em relação às falhas intermitentes totalmente desconhecidas é baixa. Verificamos a baixa qualidade dos modelos em relação às falhas intermitentes de diferentes linguagens de programação: JavaScript, Java e Python. O melhor resultado foi para o modelo treinado na linguagem JavaScript e testado na linguagem Java, que apresentou o F1-Score (F1) de 0.63. Porém, no que diz respeito ao vocabulário instável entre linguagens os resultados são positivos. Entre as linguagens de programação Python e JavaScript podemos verificar termos relacionados com as seguintes causas raízes de flaky tests: comunicação assíncrona (await, 400, data), concorrência, eventos de interface gráfica do Usuário (plot, page, button, click e width, dependência de Tempo (time), vazamentos de Recursos (source, data, args). Entre as linguagens Java e JavaScript, podemos verificar a intersecção de termos relacionados a concorrência (manager), espera assíncrona (await) e vazamentos de Recursos (new, class). Entre as linguagens Python e Java, podemos verificar termos relacionados com comunicação assíncrona (waitFor), concorrência (get, select), dependência de Tempo (date), vazamentos de recursos (evaluate, catch). Conclusões: Este trabalho apresenta resultados relevantes para uma identificação mais eficiente de testes instáveis em projetos que utilizam Javascript, Java e Python. Existe uma intersecção entre contextos e causas raízes de instabilidades de diferentes projetos e linguagens. Porém, utilizar somente o conteúdo puro do código-fonte pode reduzir o desempenho dos modelos de predição em um cenário com diferentes projetos e linguagens.Universidade Tecnológica Federal do ParanáCampo MouraoBrasilPrograma de Pós-Graduação em Ciência da ComputaçãoUTFPRSilva, Marco Aurélio Graciottohttps://orcid.org/0000-0002-1737-8240https://lattes.cnpq.br/9383290036853173Schwerz, André Luíshttps://orcid.org/0000-0002-8328-7144https://lattes.cnpq.br/4954414332524750Endo, André Takeshihttps://orcid.org/0000-0002-8737-1749https://lattes.cnpq.br/4221336619791961Wiese, Igor Scaliantehttps://orcid.org/0000-0001-9943-5570https://lattes.cnpq.br/0447444423694007Silva, Marco Aurélio Graciottohttps://orcid.org/0000-0002-1737-8240https://lattes.cnpq.br/9383290036853173Soratto, Rafael Rampim2025-12-01T11:44:06Z2025-12-01T11:44:06Z2025-04-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfSORATTO, Rafael Rampim. Vocabulário de testes instáveis entre linguagens de programação. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Tecnológica Federal do Paraná, Campo Mourão, 2025.http://repositorio.utfpr.edu.br/jspui/handle/1/38992porhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))instname:Universidade Tecnológica Federal do Paraná (UTFPR)instacron:UTFPR2025-12-01T11:44:16Zoai:repositorio.utfpr.edu.br:1/38992Repositório InstitucionalPUBhttp://repositorio.utfpr.edu.br:8080/oai/requestriut@utfpr.edu.br \|\| sibi@utfpr.edu.bropendoar:2025-12-01T11:44:16Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)false
dc.title.none.fl_str_mv	Vocabulário de testes instáveis entre linguagens de programação Vocabulary of flaky tests across programming languages
title	Vocabulário de testes instáveis entre linguagens de programação
spellingShingle	Vocabulário de testes instáveis entre linguagens de programação Soratto, Rafael Rampim Inovações tecnológicas Engenharia de software JavaScript (Linguagem de programação de computador) Python (Linguagem de programação de computador) Java (Linguagem de programação de computador) Technological innovations Software engineering JavaScript (Computer program language) Python (Computer program language) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE Ciência da Computação
title_short	Vocabulário de testes instáveis entre linguagens de programação
title_full	Vocabulário de testes instáveis entre linguagens de programação
title_fullStr	Vocabulário de testes instáveis entre linguagens de programação
title_full_unstemmed	Vocabulário de testes instáveis entre linguagens de programação
title_sort	Vocabulário de testes instáveis entre linguagens de programação
author	Soratto, Rafael Rampim
author_facet	Soratto, Rafael Rampim
author_role	author
dc.contributor.none.fl_str_mv	Silva, Marco Aurélio Graciotto https://orcid.org/0000-0002-1737-8240 https://lattes.cnpq.br/9383290036853173 Schwerz, André Luís https://orcid.org/0000-0002-8328-7144 https://lattes.cnpq.br/4954414332524750 Endo, André Takeshi https://orcid.org/0000-0002-8737-1749 https://lattes.cnpq.br/4221336619791961 Wiese, Igor Scaliante https://orcid.org/0000-0001-9943-5570 https://lattes.cnpq.br/0447444423694007 Silva, Marco Aurélio Graciotto https://orcid.org/0000-0002-1737-8240 https://lattes.cnpq.br/9383290036853173
dc.contributor.author.fl_str_mv	Soratto, Rafael Rampim
dc.subject.por.fl_str_mv	Inovações tecnológicas Engenharia de software JavaScript (Linguagem de programação de computador) Python (Linguagem de programação de computador) Java (Linguagem de programação de computador) Technological innovations Software engineering JavaScript (Computer program language) Python (Computer program language) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE Ciência da Computação
topic	Inovações tecnológicas Engenharia de software JavaScript (Linguagem de programação de computador) Python (Linguagem de programação de computador) Java (Linguagem de programação de computador) Technological innovations Software engineering JavaScript (Computer program language) Python (Computer program language) CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO::ENGENHARIA DE SOFTWARE Ciência da Computação
description	Context: Regression testing is a verification and validation activity of systems present in modern software engineering. In this activity, tests may fail without any implementation changes, characterizing them as unstable tests (flaky test). This type of instability can delay software release and reduce confidence in the tests. One way to identify such unstable tests is through test re-execution, but this comes with a high computational cost. An alternative to re-execution is static code analysis of the test cases, identifying patterns related to instability. In this context, so far, only vocabulary-based works focusing on single language applications (Java, Python, JavaScript) have been observed. To understand the limits and possibilities of this approach, it is interesting to check the intersection of unstable vocabularies across different languages. Objective: The objective of this work was to evaluate the vocabulary technique for predicting unstable tests in JavaScript, Java, and Python applications. Method: To achieve this objective, a dataset was constructed with unstable test cases present in open-source projects on GitHub that utilize JavaScript and the dataset from previous works for Python and Java was used. Classification models were then created, considering the instability vocabulary between different projects and languages. After evaluating the models’ accuracy, we examined the words with the highest information gain for predicting instability across languages. Results: We created the ShakerJS tool, which managed to identify 102 flaky tests in 36% of 36 relevant projects in JavaScript using CPU and memory stress. The terms with the highest information gain for predicting flaky tests in this set were: await, async, if, and path. In JavaScript, similarly to Java, the RF model had the best accuracy (0.96), while the DT model slightly surpassed RF in terms of recall (0.77 vs 0.63). The results are better when the model is trained and tested within the same project scope and language. Some projects achieve results of 100% accuracy and recall. We noted that the quality of the models concerning completely unknown intermittent failures is low. We observed the low quality of the models regarding intermittent failures across different programming languages: JavaScript, Java, and Python. The best result was for the model trained in JavaScript and tested in Java, which presented an F1 of 0.63. However, regarding the unstable vocabulary across languages, the results are positive. Between the programming languages Python and JavaScript, we can find terms related to the following root causes of flaky tests: asynchronous communication (await, 400, data), concurrency, graphical user interface events (plot, page, button, click, and width), time dependency (time), and resource leaks (source, data, args). Between the Java and JavaScript languages, we can verify the intersection of terms related to concurrency (manager), asynchronous waiting (await), and resource leaks (new, class). Between the Python and Java languages, we can find terms related to asynchronous communication (waitFor), concurrency (get, select), time dependency (date), and resource leaks (evaluate, catch). Conclusions: This work presents relevant results for a more efficient identification of unstable tests in projects that utilize JavaScript, Java, and Python. There is an intersection between contexts and root causes of instabilities from different projects and languages. However, using only the pure content of the source code may reduce the performance of prediction models in a scenario with different projects and languages.
publishDate	2025
dc.date.none.fl_str_mv	2025-12-01T11:44:06Z 2025-12-01T11:44:06Z 2025-04-04
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	SORATTO, Rafael Rampim. Vocabulário de testes instáveis entre linguagens de programação. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Tecnológica Federal do Paraná, Campo Mourão, 2025. http://repositorio.utfpr.edu.br/jspui/handle/1/38992
identifier_str_mv	SORATTO, Rafael Rampim. Vocabulário de testes instáveis entre linguagens de programação. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Tecnológica Federal do Paraná, Campo Mourão, 2025.
url	http://repositorio.utfpr.edu.br/jspui/handle/1/38992
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	http://creativecommons.org/licenses/by/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Tecnológica Federal do Paraná Campo Mourao Brasil Programa de Pós-Graduação em Ciência da Computação UTFPR
publisher.none.fl_str_mv	Universidade Tecnológica Federal do Paraná Campo Mourao Brasil Programa de Pós-Graduação em Ciência da Computação UTFPR
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) instname:Universidade Tecnológica Federal do Paraná (UTFPR) instacron:UTFPR
instname_str	Universidade Tecnológica Federal do Paraná (UTFPR)
instacron_str	UTFPR
institution	UTFPR
reponame_str	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
collection	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository.name.fl_str_mv	Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)
repository.mail.fl_str_mv	riut@utfpr.edu.br \|\| sibi@utfpr.edu.br
_version_	1850498321307664384

Vocabulário de testes instáveis entre linguagens de programação

Registros relacionados