Evaluating LLMs for multimodal GUI test generation in Android applications
| Ano de defesa: | 2025 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de Pernambuco
|
| Programa de Pós-Graduação: |
Programa de Pos Graduacao em Ciencia da Computacao
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Brasil
|
| Palavras-chave em Português: | |
| Link de acesso: | https://repositorio.ufpe.br/handle/123456789/68466 |
Resumo: | Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used. |
| id |
UFPE_fef6096641fb1c69f9f15413d88fec29 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufpe.br:123456789/68466 |
| network_acronym_str |
UFPE |
| network_name_str |
Repositório Institucional da UFPE |
| repository_id_str |
|
| spelling |
FAGUNDES, Nayse da Silvahttp://lattes.cnpq.br/1720903997040537http://lattes.cnpq.br/2117651910340729https://orcid.org/0000-0002-3915-3245https://orcid.org/0000-0002-6154-1666TEIXEIRA, Leopoldo Motta2026-02-19T18:32:45Z2026-02-19T18:32:45Z2025-12-10FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.https://repositorio.ufpe.br/handle/123456789/68466Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.O teste de Interface Gráfica do Usuário (GUI) constitui uma etapa fundamental no desenvolvimento de aplicativos móveis, garantindo que a interface de qualquer aplicação se comporte corretamente e atenda às expectativas dos usuários. No entanto, quando realizado manualmente, o teste de GUI permanece uma tarefa demorada. Com a ascensão dos Grandes modelos de linguagem (LLMs), cresce o interesse em analisar seu potencial para automatizar atividades de desenvolvimento de software, incluindo a geração de testes GUI. Este estudo investiga a capacidade dos LLMs de produzir casos de teste de GUI e scripts para aplicativos Android a partir de entradas multimodais, como screenshots e dados estruturados da interface do usuário, que fornecem informações visuais e semânticas sobre a interface. Propõe-se uma abordagem que integra essas informações provenientes de aplicações Android open-source, avaliando o desempenho de quatro LLMs, incluindo três modelos proprietários e um modelo open-source. Os resultados evidenciam diferenças significativas entre os modelos, onde o modelo Claude 3 Sonnet produziu os resultados mais completos, GPT-4o gerou testes menores focando nos fluxos essenciais, enquanto Gemini 2.5 Pro e o modelo open-source Gemma 3 apresentaram resultados semelhantes. De modo geral, os resultados demonstram que modelos de LLMs demonstram potencial para reduzir o esforço manual e aumentar a produtividade na criação de testes de GUI, oferecendo benefícios distintos conforme o modelo utilizado.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessGUITestesLLMsEvaluating LLMs for multimodal GUI test generation in Android applicationsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPELICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/68466/2/license.txt5e89a1613ddc8510c6576f4b23a78973MD52TEXTDISSERTAÇÃO Nayse da Silva Fagundes.pdf.txtDISSERTAÇÃO Nayse da Silva Fagundes.pdf.txtExtracted texttext/plain128403https://repositorio.ufpe.br/bitstream/123456789/68466/3/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.txtbcc291424303c51b06d5b088a86b0920MD53THUMBNAILDISSERTAÇÃO Nayse da Silva Fagundes.pdf.jpgDISSERTAÇÃO Nayse da Silva Fagundes.pdf.jpgGenerated Thumbnailimage/jpeg1227https://repositorio.ufpe.br/bitstream/123456789/68466/4/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.jpgecd5408704b34bbad0f4e6b97195324eMD54ORIGINALDISSERTAÇÃO Nayse da Silva Fagundes.pdfDISSERTAÇÃO Nayse da Silva Fagundes.pdfapplication/pdf1684965https://repositorio.ufpe.br/bitstream/123456789/68466/1/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf4c3a2b43db69bcf6be07a437c850575aMD51123456789/684662026-02-22 17:10:36.368oai:repositorio.ufpe.br:123456789/68466VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212026-02-22T20:10:36Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
| dc.title.pt_BR.fl_str_mv |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| title |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| spellingShingle |
Evaluating LLMs for multimodal GUI test generation in Android applications FAGUNDES, Nayse da Silva GUI Testes LLMs |
| title_short |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| title_full |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| title_fullStr |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| title_full_unstemmed |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| title_sort |
Evaluating LLMs for multimodal GUI test generation in Android applications |
| author |
FAGUNDES, Nayse da Silva |
| author_facet |
FAGUNDES, Nayse da Silva |
| author_role |
author |
| dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/1720903997040537 |
| dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/2117651910340729 |
| dc.contributor.authorORCID.pt_BR.fl_str_mv |
https://orcid.org/0000-0002-3915-3245 |
| dc.contributor.advisorORCID.pt_BR.fl_str_mv |
https://orcid.org/0000-0002-6154-1666 |
| dc.contributor.author.fl_str_mv |
FAGUNDES, Nayse da Silva |
| dc.contributor.advisor1.fl_str_mv |
TEIXEIRA, Leopoldo Motta |
| contributor_str_mv |
TEIXEIRA, Leopoldo Motta |
| dc.subject.por.fl_str_mv |
GUI Testes LLMs |
| topic |
GUI Testes LLMs |
| description |
Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used. |
| publishDate |
2025 |
| dc.date.issued.fl_str_mv |
2025-12-10 |
| dc.date.accessioned.fl_str_mv |
2026-02-19T18:32:45Z |
| dc.date.available.fl_str_mv |
2026-02-19T18:32:45Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025. |
| dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/68466 |
| identifier_str_mv |
FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025. |
| url |
https://repositorio.ufpe.br/handle/123456789/68466 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
https://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
| dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
| dc.publisher.initials.fl_str_mv |
UFPE |
| dc.publisher.country.fl_str_mv |
Brasil |
| publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
| instname_str |
Universidade Federal de Pernambuco (UFPE) |
| instacron_str |
UFPE |
| institution |
UFPE |
| reponame_str |
Repositório Institucional da UFPE |
| collection |
Repositório Institucional da UFPE |
| bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/68466/2/license.txt https://repositorio.ufpe.br/bitstream/123456789/68466/3/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/68466/4/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.jpg https://repositorio.ufpe.br/bitstream/123456789/68466/1/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf |
| bitstream.checksum.fl_str_mv |
5e89a1613ddc8510c6576f4b23a78973 bcc291424303c51b06d5b088a86b0920 ecd5408704b34bbad0f4e6b97195324e 4c3a2b43db69bcf6be07a437c850575a |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
| repository.mail.fl_str_mv |
attena@ufpe.br |
| _version_ |
1862741571607199744 |