Evaluating LLMs for multimodal GUI test generation in Android applications

FAGUNDES, Nayse da Silva

Evaluating LLMs for multimodal GUI test generation in Android applications

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	FAGUNDES, Nayse da Silva
Orientador(a):	TEIXEIRA, Leopoldo Motta
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Universidade Federal de Pernambuco
Programa de Pós-Graduação:	Programa de Pos Graduacao em Ciencia da Computacao
Departamento:	Não Informado pela instituição
País:	Brasil
Palavras-chave em Português:	GUI Testes LLMs
Link de acesso:	https://repositorio.ufpe.br/handle/123456789/68466
Resumo:	Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.

Metadados do item

id	UFPE_fef6096641fb1c69f9f15413d88fec29
oai_identifier_str	oai:repositorio.ufpe.br:123456789/68466
network_acronym_str	UFPE
network_name_str	Repositório Institucional da UFPE
repository_id_str
spelling	FAGUNDES, Nayse da Silvahttp://lattes.cnpq.br/1720903997040537http://lattes.cnpq.br/2117651910340729https://orcid.org/0000-0002-3915-3245https://orcid.org/0000-0002-6154-1666TEIXEIRA, Leopoldo Motta2026-02-19T18:32:45Z2026-02-19T18:32:45Z2025-12-10FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.https://repositorio.ufpe.br/handle/123456789/68466Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.O teste de Interface Gráfica do Usuário (GUI) constitui uma etapa fundamental no desenvolvimento de aplicativos móveis, garantindo que a interface de qualquer aplicação se comporte corretamente e atenda às expectativas dos usuários. No entanto, quando realizado manualmente, o teste de GUI permanece uma tarefa demorada. Com a ascensão dos Grandes modelos de linguagem (LLMs), cresce o interesse em analisar seu potencial para automatizar atividades de desenvolvimento de software, incluindo a geração de testes GUI. Este estudo investiga a capacidade dos LLMs de produzir casos de teste de GUI e scripts para aplicativos Android a partir de entradas multimodais, como screenshots e dados estruturados da interface do usuário, que fornecem informações visuais e semânticas sobre a interface. Propõe-se uma abordagem que integra essas informações provenientes de aplicações Android open-source, avaliando o desempenho de quatro LLMs, incluindo três modelos proprietários e um modelo open-source. Os resultados evidenciam diferenças significativas entre os modelos, onde o modelo Claude 3 Sonnet produziu os resultados mais completos, GPT-4o gerou testes menores focando nos fluxos essenciais, enquanto Gemini 2.5 Pro e o modelo open-source Gemma 3 apresentaram resultados semelhantes. De modo geral, os resultados demonstram que modelos de LLMs demonstram potencial para reduzir o esforço manual e aumentar a produtividade na criação de testes de GUI, oferecendo benefícios distintos conforme o modelo utilizado.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessGUITestesLLMsEvaluating LLMs for multimodal GUI test generation in Android applicationsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPELICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/68466/2/license.txt5e89a1613ddc8510c6576f4b23a78973MD52TEXTDISSERTAÇÃO Nayse da Silva Fagundes.pdf.txtDISSERTAÇÃO Nayse da Silva Fagundes.pdf.txtExtracted texttext/plain128403https://repositorio.ufpe.br/bitstream/123456789/68466/3/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.txtbcc291424303c51b06d5b088a86b0920MD53THUMBNAILDISSERTAÇÃO Nayse da Silva Fagundes.pdf.jpgDISSERTAÇÃO Nayse da Silva Fagundes.pdf.jpgGenerated Thumbnailimage/jpeg1227https://repositorio.ufpe.br/bitstream/123456789/68466/4/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.jpgecd5408704b34bbad0f4e6b97195324eMD54ORIGINALDISSERTAÇÃO Nayse da Silva Fagundes.pdfDISSERTAÇÃO Nayse da Silva Fagundes.pdfapplication/pdf1684965https://repositorio.ufpe.br/bitstream/123456789/68466/1/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf4c3a2b43db69bcf6be07a437c850575aMD51123456789/684662026-02-22 17:10:36.368oai:repositorio.ufpe.br:123456789/68466VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212026-02-22T20:10:36Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv	Evaluating LLMs for multimodal GUI test generation in Android applications
title	Evaluating LLMs for multimodal GUI test generation in Android applications
spellingShingle	Evaluating LLMs for multimodal GUI test generation in Android applications FAGUNDES, Nayse da Silva GUI Testes LLMs
title_short	Evaluating LLMs for multimodal GUI test generation in Android applications
title_full	Evaluating LLMs for multimodal GUI test generation in Android applications
title_fullStr	Evaluating LLMs for multimodal GUI test generation in Android applications
title_full_unstemmed	Evaluating LLMs for multimodal GUI test generation in Android applications
title_sort	Evaluating LLMs for multimodal GUI test generation in Android applications
author	FAGUNDES, Nayse da Silva
author_facet	FAGUNDES, Nayse da Silva
author_role	author
dc.contributor.authorLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/1720903997040537
dc.contributor.advisorLattes.pt_BR.fl_str_mv	http://lattes.cnpq.br/2117651910340729
dc.contributor.authorORCID.pt_BR.fl_str_mv	https://orcid.org/0000-0002-3915-3245
dc.contributor.advisorORCID.pt_BR.fl_str_mv	https://orcid.org/0000-0002-6154-1666
dc.contributor.author.fl_str_mv	FAGUNDES, Nayse da Silva
dc.contributor.advisor1.fl_str_mv	TEIXEIRA, Leopoldo Motta
contributor_str_mv	TEIXEIRA, Leopoldo Motta
dc.subject.por.fl_str_mv	GUI Testes LLMs
topic	GUI Testes LLMs
description	Graphical User Interface (GUI) testing is a fundamental task in mobile application development, as it ensures that the user interface of any mobile application behaves cor rectly and meets user expectations. However, when performed manually, GUI testing remains time-consuming. With the rise of Large Language Models (LLMs), there is in creasing interest in exploring their potential to automate software development tasks, including the generation of GUI tests. This study investigates how LLMs can generate GUI test intentions and scripts for Android applications using multimodal inputs, such as screenshots and structured UI data, which provide both visual and semantic informa tion about the interface. This work present an approach that combines these inputs from open-source Android apps and evaluate the performance of four LLMs, including three proprietary models and one open-source model. The results show significant differences among the models, where the Claude 3 Sonnet model produced the most complete results, GPT-4o generated smaller tests focusing on essential flows, while Gemini 2.5 Pro and the open-source Gemma 3 model presented similar results, limiting themselves to basic in teractions. Overall, the results demonstrate that LLMs models show potential to reduce manual effort and increase productivity in GUI test creation, offering distinct benefits according to the model used.
publishDate	2025
dc.date.issued.fl_str_mv	2025-12-10
dc.date.accessioned.fl_str_mv	2026-02-19T18:32:45Z
dc.date.available.fl_str_mv	2026-02-19T18:32:45Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
dc.identifier.uri.fl_str_mv	https://repositorio.ufpe.br/handle/123456789/68466
identifier_str_mv	FAGUNDES, Nayse da Silva. Evaluating LLMs for multimodal GUI test generation in Android applications. 2025. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
url	https://repositorio.ufpe.br/handle/123456789/68466
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	https://creativecommons.org/licenses/by-nc-nd/4.0/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv	Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv	UFPE
dc.publisher.country.fl_str_mv	Brasil
publisher.none.fl_str_mv	Universidade Federal de Pernambuco
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE
instname_str	Universidade Federal de Pernambuco (UFPE)
instacron_str	UFPE
institution	UFPE
reponame_str	Repositório Institucional da UFPE
collection	Repositório Institucional da UFPE
bitstream.url.fl_str_mv	https://repositorio.ufpe.br/bitstream/123456789/68466/2/license.txt https://repositorio.ufpe.br/bitstream/123456789/68466/3/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/68466/4/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf.jpg https://repositorio.ufpe.br/bitstream/123456789/68466/1/DISSERTA%c3%87%c3%83O%20Nayse%20da%20Silva%20Fagundes.pdf
bitstream.checksum.fl_str_mv	5e89a1613ddc8510c6576f4b23a78973 bcc291424303c51b06d5b088a86b0920 ecd5408704b34bbad0f4e6b97195324e 4c3a2b43db69bcf6be07a437c850575a
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv	attena@ufpe.br
_version_	1862741571607199744

Evaluating LLMs for multimodal GUI test generation in Android applications

Registros relacionados