Exportação concluída — 

Evaluating LLMs for multimodal GUI test generation in Android applications

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: FAGUNDES, Nayse da Silva
Orientador(a): TEIXEIRA, Leopoldo Motta
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
Programa de Pós-Graduação: Programa de Pos Graduacao em Ciencia da Computacao
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
GUI
Link de acesso: https://repositorio.ufpe.br/handle/123456789/68466
Resumo: Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.
id UFPE_fef6096641fb1c69f9f15413d88fec29
oai_identifier_str oai:repositorio.ufpe.br:123456789/68466
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str
spelling FAGUNDES, Nayse da Silvahttp://lattes.cnpq.br/1720903997040537http://lattes.cnpq.br/2117651910340729https://orcid.org/0000-0002-3915-3245https://orcid.org/0000-0002-6154-1666TEIXEIRA, Leopoldo Motta2026-02-19T18:32:45Z2026-02-19T18:32:45Z2025-12-10FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.https://repositorio.ufpe.br/handle/123456789/68466Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.O teste de Interface Gráfica do Usuário (GUI) constitui uma etapa fundamental no desenvolvimento de aplicativos móveis, garantindo que a interface de qualquer aplicação se comporte corretamente e atenda às expectativas dos usuários. No entanto, quando realizado manualmente, o teste de GUI permanece uma tarefa demorada. Com a ascensão dos Grandes modelos de linguagem (LLMs), cresce o interesse em analisar seu potencial para automatizar atividades de desenvolvimento de software, incluindo a geração de testes GUI. Este estudo investiga a capacidade dos LLMs de produzir casos de teste de GUI e scripts para aplicativos Android a partir de entradas multimodais, como screenshots e dados estruturados da interface do usuário, que fornecem informações visuais e semânticas sobre a interface. Propõe-se uma abordagem que integra essas informações provenientes de aplicações Android open-source, avaliando o desempenho de quatro LLMs, incluindo três modelos proprietários e um modelo open-source. Os resultados evidenciam diferenças significativas entre os modelos, onde o modelo Claude 3 Sonnet produziu os resultados mais completos, GPT-4o gerou testes menores focando nos fluxos essenciais, enquanto Gemini 2.5 Pro e o modelo open-source Gemma 3 apresentaram resultados semelhantes. De modo geral, os resultados demonstram que modelos de LLMs demonstram potencial para reduzir o esforço manual e aumentar a produtividade na criação de testes de GUI, oferecendo benefícios distintos conforme o modelo utilizado.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessGUITestesLLMsEvaluating LLMs for multimodal GUI test generation in Android applicationsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPELICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/68466/2/license.txt5e89a1613ddc8510c6576f4b23a78973MD52TEXTDISSERTAÇÃO Nayse da Silva Fagundes.pdf.txtDISSERTAÇÃO Nayse da Silva Fagundes.pdf.txtExtracted texttext/plain128403https://repositorio.ufpe.br/bitstream/123456789/68466/3/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.txtbcc291424303c51b06d5b088a86b0920MD53THUMBNAILDISSERTAÇÃO Nayse da Silva Fagundes.pdf.jpgDISSERTAÇÃO Nayse da Silva Fagundes.pdf.jpgGenerated Thumbnailimage/jpeg1227https://repositorio.ufpe.br/bitstream/123456789/68466/4/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.jpgecd5408704b34bbad0f4e6b97195324eMD54ORIGINALDISSERTAÇÃO Nayse da Silva Fagundes.pdfDISSERTAÇÃO Nayse da Silva Fagundes.pdfapplication/pdf1684965https://repositorio.ufpe.br/bitstream/123456789/68466/1/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf4c3a2b43db69bcf6be07a437c850575aMD51123456789/684662026-02-22 17:10:36.368oai:repositorio.ufpe.br:123456789/68466VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212026-02-22T20:10:36Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv Evaluating LLMs for multimodal GUI test generation in Android applications
title Evaluating LLMs for multimodal GUI test generation in Android applications
spellingShingle Evaluating LLMs for multimodal GUI test generation in Android applications
FAGUNDES, Nayse da Silva
GUI
Testes
LLMs
title_short Evaluating LLMs for multimodal GUI test generation in Android applications
title_full Evaluating LLMs for multimodal GUI test generation in Android applications
title_fullStr Evaluating LLMs for multimodal GUI test generation in Android applications
title_full_unstemmed Evaluating LLMs for multimodal GUI test generation in Android applications
title_sort Evaluating LLMs for multimodal GUI test generation in Android applications
author FAGUNDES, Nayse da Silva
author_facet FAGUNDES, Nayse da Silva
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/1720903997040537
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/2117651910340729
dc.contributor.authorORCID.pt_BR.fl_str_mv https://orcid.org/0000-0002-3915-3245
dc.contributor.advisorORCID.pt_BR.fl_str_mv https://orcid.org/0000-0002-6154-1666
dc.contributor.author.fl_str_mv FAGUNDES, Nayse da Silva
dc.contributor.advisor1.fl_str_mv TEIXEIRA, Leopoldo Motta
contributor_str_mv TEIXEIRA, Leopoldo Motta
dc.subject.por.fl_str_mv GUI
Testes
LLMs
topic GUI
Testes
LLMs
description Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.
publishDate 2025
dc.date.issued.fl_str_mv 2025-12-10
dc.date.accessioned.fl_str_mv 2026-02-19T18:32:45Z
dc.date.available.fl_str_mv 2026-02-19T18:32:45Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/68466
identifier_str_mv FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
url https://repositorio.ufpe.br/handle/123456789/68466
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/68466/2/license.txt
https://repositorio.ufpe.br/bitstream/123456789/68466/3/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.txt
https://repositorio.ufpe.br/bitstream/123456789/68466/4/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.jpg
https://repositorio.ufpe.br/bitstream/123456789/68466/1/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf
bitstream.checksum.fl_str_mv 5e89a1613ddc8510c6576f4b23a78973
bcc291424303c51b06d5b088a86b0920
ecd5408704b34bbad0f4e6b97195324e
4c3a2b43db69bcf6be07a437c850575a
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1862741571607199744