Study on unsupervised machine learning applied to software testing based on resource usage

Ruiz, Kevin Gerardo Polo

Study on unsupervised machine learning applied to software testing based on resource usage

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Ruiz, Kevin Gerardo Polo
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/
Resumo:	Systems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.

Metadados do item

id	USP_a9ff268051fb0b2a7bea67db389a16af
oai_identifier_str	oai:teses.usp.br:tde-11022025-170631
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	Study on unsupervised machine learning applied to software testing based on resource usageEstudo sobre aprendizado de máquina não supervisionado aplicado ao teste de software baseado em uso de recursosAgglomerative clusteringAgrupamento aglomerativoAnomaly detectionAprendizado de máquinaDetecção de anomaliasDetecção de defeitosFault detectionMachine learningSoftware testingTeste de softwareSystems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.Os sistemas tornam-se cada vez mais complexos na indústria de software, o que os torna também mais propensos a falhas. Garantir a qualidade do software por meio de testes rigorosos é crucial, mas essa tarefa ainda é custosa e difícil, consumindo partes significativas do orçamento dos projetos. A automação é um fator chave para melhorar a eficiência dos testes, com novas metodologias e ferramentas sendo desenvolvidas continuamente. O Aprendizado de Máquina tem despertado considerável interesse em várias áreas nos últimos dez anos, impulsionado por avanços no poder computacional e na capacidade de gerenciar grandes volumes de dados. A Tricorder é uma metodologia de Testes projetada para detectar possíveis defeitos de software ao analisar mudanças no comportamento do consumo de recursos do sistema computacional em teste. O comportamento do consumo de recursos é caracterizado pelo aprendizado de máquina não supervisionado fornecido pela DAMICORE. A metodologia usada pela DAMICORE é baseada em um pipeline de três etapas principais. O pipeline utiliza a Distância de Compressão Normalizada (NCD) para gerar uma matriz de distâncias, o algoritmo Neighbor-Joining para construir uma árvore filogenética, e o método Fast Newman para detecção de comunidades, essencial para o agrupamento de dados. A DAMICORE monitora e identifica anomalias nos padrões de uso de recursos como CPU, memória e, Entrada e Saída, de modo a indicar a possivel presença de defeitos de software. Este projeto estuda o impacto gerado pela DAMICORE na detecção de defeitos de software fornecido pela Tricorder, no contexto de agrupamento aglomerativo e árvores filogenéticas, testando várias técnicas adaptadas à DAMICORE. A revisão da literatura sobre Aprendizado de Máquina aplicado ao Teste de Software destacou que as técnicas mais amplamente estudadas para agrupamento aglomerativo são baseadas em Single, Complete e Average (UPGMA) Linkages. Neighbor-Joining e UPGMA são proeminentes na construção de árvores filogenéticas. Para alcançar os objetivos do projeto, consolidamos as bases teóricas, seguidas por uma revisão aprofundada do estado da arte e das metodologias específicas da Tricorder e da DAMICORE. Em seguida, planejamos e executamos sistematicamente uma série de experimentos, analisando os resultados gerados. O projeto demonstrou o potencial do uso da distância de Levenshtein, aproveitando a topologia de rede para a detecção de comunidades, e incorporando todas as métricas do sistema em uma análise única. Essas abordagens apresentaram resultados melhores em nosso contexto do que outras técnicas encontradas na literatura. NCD e Neighbor-Joining apresentam limitações significativas, especialmente devido às suas altas demandas computacionais, o que dificulta sua aplicação prática em projetos maiores e mais complexos. As melhorias introduzidas neste projeto devem melhorar a precisão da DAMICORE na detecção de defeitos de software. Os resultados deste projeto contribuam para o estado da arte na aplicação do Aprendizado de Máquina ao Teste de Software e reforçam a posição da Tricorder na comunidade de Teste de Software.Biblioteca Digitais de Teses e Dissertações da USPDelbem, Alexandre Cláudio BotazzoSouza, Paulo Sergio Lopes deRuiz, Kevin Gerardo Polo2024-10-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-02-11T19:17:02Zoai:teses.usp.br:tde-11022025-170631Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212025-02-11T19:17:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Study on unsupervised machine learning applied to software testing based on resource usage Estudo sobre aprendizado de máquina não supervisionado aplicado ao teste de software baseado em uso de recursos
title	Study on unsupervised machine learning applied to software testing based on resource usage
spellingShingle	Study on unsupervised machine learning applied to software testing based on resource usage Ruiz, Kevin Gerardo Polo Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software
title_short	Study on unsupervised machine learning applied to software testing based on resource usage
title_full	Study on unsupervised machine learning applied to software testing based on resource usage
title_fullStr	Study on unsupervised machine learning applied to software testing based on resource usage
title_full_unstemmed	Study on unsupervised machine learning applied to software testing based on resource usage
title_sort	Study on unsupervised machine learning applied to software testing based on resource usage
author	Ruiz, Kevin Gerardo Polo
author_facet	Ruiz, Kevin Gerardo Polo
author_role	author
dc.contributor.none.fl_str_mv	Delbem, Alexandre Cláudio Botazzo Souza, Paulo Sergio Lopes de
dc.contributor.author.fl_str_mv	Ruiz, Kevin Gerardo Polo
dc.subject.por.fl_str_mv	Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software
topic	Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software
description	Systems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.
publishDate	2024
dc.date.none.fl_str_mv	2024-10-23
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/
url	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1839839143484456960

Study on unsupervised machine learning applied to software testing based on resource usage

Registros relacionados