Study on unsupervised machine learning applied to software testing based on resource usage
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/ |
Resumo: | Systems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community. |
| id |
USP_a9ff268051fb0b2a7bea67db389a16af |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-11022025-170631 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Study on unsupervised machine learning applied to software testing based on resource usageEstudo sobre aprendizado de máquina não supervisionado aplicado ao teste de software baseado em uso de recursosAgglomerative clusteringAgrupamento aglomerativoAnomaly detectionAprendizado de máquinaDetecção de anomaliasDetecção de defeitosFault detectionMachine learningSoftware testingTeste de softwareSystems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community.Os sistemas tornam-se cada vez mais complexos na indústria de software, o que os torna também mais propensos a falhas. Garantir a qualidade do software por meio de testes rigorosos é crucial, mas essa tarefa ainda é custosa e difícil, consumindo partes significativas do orçamento dos projetos. A automação é um fator chave para melhorar a eficiência dos testes, com novas metodologias e ferramentas sendo desenvolvidas continuamente. O Aprendizado de Máquina tem despertado considerável interesse em várias áreas nos últimos dez anos, impulsionado por avanços no poder computacional e na capacidade de gerenciar grandes volumes de dados. A Tricorder é uma metodologia de Testes projetada para detectar possíveis defeitos de software ao analisar mudanças no comportamento do consumo de recursos do sistema computacional em teste. O comportamento do consumo de recursos é caracterizado pelo aprendizado de máquina não supervisionado fornecido pela DAMICORE. A metodologia usada pela DAMICORE é baseada em um pipeline de três etapas principais. O pipeline utiliza a Distância de Compressão Normalizada (NCD) para gerar uma matriz de distâncias, o algoritmo Neighbor-Joining para construir uma árvore filogenética, e o método Fast Newman para detecção de comunidades, essencial para o agrupamento de dados. A DAMICORE monitora e identifica anomalias nos padrões de uso de recursos como CPU, memória e, Entrada e Saída, de modo a indicar a possivel presença de defeitos de software. Este projeto estuda o impacto gerado pela DAMICORE na detecção de defeitos de software fornecido pela Tricorder, no contexto de agrupamento aglomerativo e árvores filogenéticas, testando várias técnicas adaptadas à DAMICORE. A revisão da literatura sobre Aprendizado de Máquina aplicado ao Teste de Software destacou que as técnicas mais amplamente estudadas para agrupamento aglomerativo são baseadas em Single, Complete e Average (UPGMA) Linkages. Neighbor-Joining e UPGMA são proeminentes na construção de árvores filogenéticas. Para alcançar os objetivos do projeto, consolidamos as bases teóricas, seguidas por uma revisão aprofundada do estado da arte e das metodologias específicas da Tricorder e da DAMICORE. Em seguida, planejamos e executamos sistematicamente uma série de experimentos, analisando os resultados gerados. O projeto demonstrou o potencial do uso da distância de Levenshtein, aproveitando a topologia de rede para a detecção de comunidades, e incorporando todas as métricas do sistema em uma análise única. Essas abordagens apresentaram resultados melhores em nosso contexto do que outras técnicas encontradas na literatura. NCD e Neighbor-Joining apresentam limitações significativas, especialmente devido às suas altas demandas computacionais, o que dificulta sua aplicação prática em projetos maiores e mais complexos. As melhorias introduzidas neste projeto devem melhorar a precisão da DAMICORE na detecção de defeitos de software. Os resultados deste projeto contribuam para o estado da arte na aplicação do Aprendizado de Máquina ao Teste de Software e reforçam a posição da Tricorder na comunidade de Teste de Software.Biblioteca Digitais de Teses e Dissertações da USPDelbem, Alexandre Cláudio BotazzoSouza, Paulo Sergio Lopes deRuiz, Kevin Gerardo Polo2024-10-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-02-11T19:17:02Zoai:teses.usp.br:tde-11022025-170631Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-02-11T19:17:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Study on unsupervised machine learning applied to software testing based on resource usage Estudo sobre aprendizado de máquina não supervisionado aplicado ao teste de software baseado em uso de recursos |
| title |
Study on unsupervised machine learning applied to software testing based on resource usage |
| spellingShingle |
Study on unsupervised machine learning applied to software testing based on resource usage Ruiz, Kevin Gerardo Polo Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software |
| title_short |
Study on unsupervised machine learning applied to software testing based on resource usage |
| title_full |
Study on unsupervised machine learning applied to software testing based on resource usage |
| title_fullStr |
Study on unsupervised machine learning applied to software testing based on resource usage |
| title_full_unstemmed |
Study on unsupervised machine learning applied to software testing based on resource usage |
| title_sort |
Study on unsupervised machine learning applied to software testing based on resource usage |
| author |
Ruiz, Kevin Gerardo Polo |
| author_facet |
Ruiz, Kevin Gerardo Polo |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Delbem, Alexandre Cláudio Botazzo Souza, Paulo Sergio Lopes de |
| dc.contributor.author.fl_str_mv |
Ruiz, Kevin Gerardo Polo |
| dc.subject.por.fl_str_mv |
Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software |
| topic |
Agglomerative clustering Agrupamento aglomerativo Anomaly detection Aprendizado de máquina Detecção de anomalias Detecção de defeitos Fault detection Machine learning Software testing Teste de software |
| description |
Systems are becoming increasingly complex in the software industry, which also makes them more prone to failure. Ensuring software quality through rigorous testing is crucial, but this task still needs to be improved and more manageable because it still consumes significant parts of the project budget. Automation is a critical factor in improving testing efficiency, with new methodologies and tools being developed continuously. Machine Learning has attracted considerable interest in several areas over the last ten years, driven by advances in computing power and the ability to manage large volumes of data. Tricorder is a testing methodology designed to detect potential software faults by analyzing changes in the resource consumption behavior of the computing system under test. Unsupervised machine learning provided by DAMICORE characterizes resource consumption behavior. The methodology used by DAMICORE is based on a pipeline of three main steps. The pipeline uses Normalized Compression Distance (NCD) to generate a distance matrix, the Neighbor-Joining algorithm to build a phylogenetic tree, and the Fast Newman method for community detection, which is essential for data clustering. DAMICORE monitors and identifies anomalies in resource usage patterns, such as CPU, memory, and I/O, to indicate the presence of software faults. This project studies the impact generated by DAMICORE in the detection of software faults provided by Tricorder in the context of agglomerative clustering and phylogenetic trees, testing several techniques adapted to DAMICORE. The literature review on Machine Learning applied to Software Testing highlighted that the most widely studied techniques for agglomerative clustering are based on Single, Complete, and Average (UPGMA) Linkages. Neighbor-joining and UPGMA are prominent in the construction of phylogenetic trees. To achieve the project objectives, we consolidated the theoretical foundations, followed by an in-depth review of the state of the art and the specific methodologies of Tricorder and DAMICORE. We then systematically designed and executed a series of experiments, analyzing the generated results. The project demonstrated the potential of using Levenshtein distance, leveraging the network topology for community detection, and incorporating all system metrics into a single analysis. These approaches yielded better results in our context than other techniques found in the literature. NCD and Neighbor-Joining have significant limitations, especially due to their high computational demands, which hinder their practical application in more extensive and complex projects. The improvements introduced in this project should improve DAMICOREs accuracy in detecting software faults. The results of this project contribute to the state-of-the-art application of Machine Learning to Software Testing and reinforce Tricorders position in the Software Testing community. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-10-23 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/ |
| url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-11022025-170631/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1839839143484456960 |