Study on unsupervised machine learning applied to software testing based on resource usage

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Ruiz, Kevin Gerardo Polo
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/
Resumo: Systems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.
id USP_a9ff268051fb0b2a7bea67db389a16af
oai_identifier_str oai:teses.usp.br:tde-11022025-170631
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Study on unsupervised machine learning applied to software testing based on resource usageEstudo sobre aprendizado de máquina não supervisionado aplicado ao teste de software baseado em uso de recursosAgglomerative clusteringAgrupamento aglomerativoAnomaly detectionAprendizado de máquinaDetecção de anomaliasDetecção de defeitosFault detectionMachine learningSoftware testingTeste de softwareSystems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.Os sistemas tornam-se cada vez mais complexos na indústria de software, o que os torna também mais propensos a falhas. Garantir a qualidade do software por meio de testes rigorosos é crucial, mas essa tarefa ainda é custosa e difícil, consumindo partes significativas do orçamento dos projetos. A automação é um fator chave para melhorar a eficiência dos testes, com novas metodologias e ferramentas sendo desenvolvidas continuamente. O Aprendizado de Máquina tem despertado considerável interesse em várias áreas nos últimos dez anos, impulsionado por avanços no poder computacional e na capacidade de gerenciar grandes volumes de dados. A Tricorder é uma metodologia de Testes projetada para detectar possíveis defeitos de software ao analisar mudanças no comportamento do consumo de recursos do sistema computacional em teste. O comportamento do consumo de recursos é caracterizado pelo aprendizado de máquina não supervisionado fornecido pela DAMICORE. A metodologia usada pela DAMICORE é baseada em um pipeline de três etapas principais. O pipeline utiliza a Distância de Compressão Normalizada (NCD) para gerar uma matriz de distâncias, o algoritmo Neighbor-Joining para construir uma árvore filogenética, e o método Fast Newman para detecção de comunidades, essencial para o agrupamento de dados. A DAMICORE monitora e identifica anomalias nos padrões de uso de recursos como CPU, memória e, Entrada e Saída, de modo a indicar a possivel presença de defeitos de software. Este projeto estuda o impacto gerado pela DAMICORE na detecção de defeitos de software fornecido pela Tricorder, no contexto de agrupamento aglomerativo e árvores filogenéticas, testando várias técnicas adaptadas à DAMICORE. A revisão da literatura sobre Aprendizado de Máquina aplicado ao Teste de Software destacou que as técnicas mais amplamente estudadas para agrupamento aglomerativo são baseadas em Single, Complete e Average (UPGMA) Linkages. Neighbor-Joining e UPGMA são proeminentes na construção de árvores filogenéticas. Para alcançar os objetivos do projeto, consolidamos as bases teóricas, seguidas por uma revisão aprofundada do estado da arte e das metodologias específicas da Tricorder e da DAMICORE. Em seguida, planejamos e executamos sistematicamente uma série de experimentos, analisando os resultados gerados. O projeto demonstrou o potencial do uso da distância de Levenshtein, aproveitando a topologia de rede para a detecção de comunidades, e incorporando todas as métricas do sistema em uma análise única. Essas abordagens apresentaram resultados melhores em nosso contexto do que outras técnicas encontradas na literatura. NCD e Neighbor-Joining apresentam limitações significativas, especialmente devido às suas altas demandas computacionais, o que dificulta sua aplicação prática em projetos maiores e mais complexos. As melhorias introduzidas neste projeto devem melhorar a precisão da DAMICORE na detecção de defeitos de software. Os resultados deste projeto contribuam para o estado da arte na aplicação do Aprendizado de Máquina ao Teste de Software e reforçam a posição da Tricorder na comunidade de Teste de Software.Biblioteca Digitais de Teses e Dissertações da USPDelbem, Alexandre Cláudio BotazzoSouza, Paulo Sergio Lopes deRuiz, Kevin Gerardo Polo2024-10-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-02-11T19:17:02Zoai:teses.usp.br:tde-11022025-170631Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-02-11T19:17:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Study on unsupervised machine learning applied to software testing based on resource usage
Estudo sobre aprendizado de máquina não supervisionado aplicado ao teste de software baseado em uso de recursos
title Study on unsupervised machine learning applied to software testing based on resource usage
spellingShingle Study on unsupervised machine learning applied to software testing based on resource usage
Ruiz, Kevin Gerardo Polo
Agglomerative clustering
Agrupamento aglomerativo
Anomaly detection
Aprendizado de máquina
Detecção de anomalias
Detecção de defeitos
Fault detection
Machine learning
Software testing
Teste de software
title_short Study on unsupervised machine learning applied to software testing based on resource usage
title_full Study on unsupervised machine learning applied to software testing based on resource usage
title_fullStr Study on unsupervised machine learning applied to software testing based on resource usage
title_full_unstemmed Study on unsupervised machine learning applied to software testing based on resource usage
title_sort Study on unsupervised machine learning applied to software testing based on resource usage
author Ruiz, Kevin Gerardo Polo
author_facet Ruiz, Kevin Gerardo Polo
author_role author
dc.contributor.none.fl_str_mv Delbem, Alexandre Cláudio Botazzo
Souza, Paulo Sergio Lopes de
dc.contributor.author.fl_str_mv Ruiz, Kevin Gerardo Polo
dc.subject.por.fl_str_mv Agglomerative clustering
Agrupamento aglomerativo
Anomaly detection
Aprendizado de máquina
Detecção de anomalias
Detecção de defeitos
Fault detection
Machine learning
Software testing
Teste de software
topic Agglomerative clustering
Agrupamento aglomerativo
Anomaly detection
Aprendizado de máquina
Detecção de anomalias
Detecção de defeitos
Fault detection
Machine learning
Software testing
Teste de software
description Systems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.
publishDate 2024
dc.date.none.fl_str_mv 2024-10-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/
url https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1839839143484456960