Um framework independente de domínio para knowledge graph question answering baseado em large language models
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Não Informado pela instituição
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Área do conhecimento CNPq: | |
| Link de acesso: | http://repositorio.ufc.br/handle/riufc/78251 |
Resumo: | Knowledge graph question answering (KGQA) systems are computational systems capable of answering questions in natural language using a knowledge graph (KG) as a source of knowledge to be consulted. These systems stand out for their curated and deep answers. Throughout history, several architectures and approaches have been proposed for KGQA systems, with systems based on pre-trained end-to-end deep learning models becoming popular in recent years. Currently, large language models (LLMs) are the state of the art for pre-trained language models. Thus, the opportunity arises to develop KGQA systems based on LLMs. With this in mind, as its main contribution, this thesis presents Auto-KGQA, a domain-independent autonomous framework based on LLMs for KGQA. The framework automatically selects fragments of the KG that are relevant to the question, which the LLM uses as context to translate the natural language question into a SPARQL query over the KG. The framework is accessible through its HTTP API or through a Chat Messenger Web interface. In addition, the framework is integrated with the RDF browser, LiRB, allowing iterative navigation of resources returned in queries. Preliminary experiments with Auto-KGQA with ChatGPT indicate that the framework substantially reduced the number of tokens passed to LLM without sacrificing performance. Finally, evaluation of Auto-KGQA on a benchmark with enterprise queries in the insurance companies domain showed that the framework is competitive, achieving a 13.2% improvement in accuracy over the state-of-the-art and a 51.12% reduction in the number of tokens passed to LLM. Experiments have revealed that the use of few-shot learning strategies together with the subgraph selected by Auto-KGQA generates robust and generalizable KGQA systems, outperforming their competitors in 0-shot learning scenarios and matching them in few-shot scenarios. |
| id |
UFC-7_4dbb82071b2c519d5f2986fd2571d733 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufc.br:riufc/78251 |
| network_acronym_str |
UFC-7 |
| network_name_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| repository_id_str |
|
| spelling |
Ávila, Caio Viktor da SilvaCasanova, Marco AntonioVidal, Vânia Maria Ponte2024-09-20T16:58:58Z2024-09-20T16:58:58Z2024ÁVILA, Caio Viktor da Silva. Um framework independente de domínio para knowledge graph question answering baseado em large language models. 2024. 134 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2024.http://repositorio.ufc.br/handle/riufc/78251Knowledge graph question answering (KGQA) systems are computational systems capable of answering questions in natural language using a knowledge graph (KG) as a source of knowledge to be consulted. These systems stand out for their curated and deep answers. Throughout history, several architectures and approaches have been proposed for KGQA systems, with systems based on pre-trained end-to-end deep learning models becoming popular in recent years. Currently, large language models (LLMs) are the state of the art for pre-trained language models. Thus, the opportunity arises to develop KGQA systems based on LLMs. With this in mind, as its main contribution, this thesis presents Auto-KGQA, a domain-independent autonomous framework based on LLMs for KGQA. The framework automatically selects fragments of the KG that are relevant to the question, which the LLM uses as context to translate the natural language question into a SPARQL query over the KG. The framework is accessible through its HTTP API or through a Chat Messenger Web interface. In addition, the framework is integrated with the RDF browser, LiRB, allowing iterative navigation of resources returned in queries. Preliminary experiments with Auto-KGQA with ChatGPT indicate that the framework substantially reduced the number of tokens passed to LLM without sacrificing performance. Finally, evaluation of Auto-KGQA on a benchmark with enterprise queries in the insurance companies domain showed that the framework is competitive, achieving a 13.2% improvement in accuracy over the state-of-the-art and a 51.12% reduction in the number of tokens passed to LLM. Experiments have revealed that the use of few-shot learning strategies together with the subgraph selected by Auto-KGQA generates robust and generalizable KGQA systems, outperforming their competitors in 0-shot learning scenarios and matching them in few-shot scenarios.Os knowledge graph question answering (KGQA) são sistemas computacionais capazes de responder perguntas em linguagem natural utilizando um knowledge graph (KG) como fonte de conhecimento a ser consultada. Estes sistemas destacam-se por suas respostas curadas e profundas. Ao longo da história, diversas arquiteturas e abordagens foram propostas para sistemas de KGQA, com sistemas baseados em modelos fim-a-fim pré-treinados de aprendizado profundo vindo a se popularizar nos últimos anos. Atualmente, os large language models (LLMs) apresentam-se como o estado da arte para modelos de linguagem pré-treinados. Deste modo, surge a oportunidade do desenvolvimento de sistemas de KGQA baseados em LLMs. Com isto em vista, como principal contribuição, esta tese apresenta Auto-KGQA, um framework autônomo independente de domínio baseado em LLMs para KGQA. O framework seleciona automaticamente sub-grafos do KG que são relevantes para a questão, que o LLM utiliza como contexto para traduzir a pergunta em linguagem natural para uma consulta SPARQL sobre o KG. O framework é acessível através de sua API HTTP ou por meio de uma interface Web de Chat Messenger. Além disso, o framework é integrado ao RDF browser, LiRB, permitindo a navegação iterativa de recursos retornados em consultas. Experimentos preliminares com Auto-KGQA com o ChatGPT que indicam o framework reduziu substancialmente o número de tokens passados para o LLM sem sacrificar o desempenho. Por fim, a avaliação de Auto-KGQA em um benchmark com consultas empresariais no domínio de companhias de seguros mostrou que o framework é competitivo, alcançando uma melhoria de 13,2% na acurácia em relação ao estado da arte e uma de redução de 51,12% na quantidade de tokens repassados ao LLM. Experimentos revelaram que o uso de estratégias de few-shot learning em conjunto com o sub-grafo selecionado por Auto-KGQA geram sistemas de KGQA robustos e generalizáveis, superando seus competidores em cenários 0-shot learning e equiparando-se em cenários few-shot.Um framework independente de domínio para knowledge graph question answering baseado em large language modelsA domain-independent framework for knowledge graph question answering based on large language modelsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisQuestion answeringKnowledge graphLarge language modelRDF browserQuestion answeringKnowledge graphLarge language modelRDF browserCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFChttp://lattes.cnpq.br/0449925605343817http://lattes.cnpq.br/9431229866203038http://lattes.cnpq.br/04002322988491152024ORIGINAL2024_tese_cvsavila.pdf2024_tese_cvsavila.pdfapplication/pdf2049459http://repositorio.ufc.br/bitstream/riufc/78251/3/2024_tese_cvsavila.pdf14907fac1baad1d44c5bf2da445a3ca9MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/78251/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54riufc/782512024-09-20 13:58:59.255oai:repositorio.ufc.br:riufc/78251Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2024-09-20T16:58:59Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false |
| dc.title.pt_BR.fl_str_mv |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| dc.title.en.pt_BR.fl_str_mv |
A domain-independent framework for knowledge graph question answering based on large language models |
| title |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| spellingShingle |
Um framework independente de domínio para knowledge graph question answering baseado em large language models Ávila, Caio Viktor da Silva CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Question answering Knowledge graph Large language model RDF browser Question answering Knowledge graph Large language model RDF browser |
| title_short |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| title_full |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| title_fullStr |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| title_full_unstemmed |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| title_sort |
Um framework independente de domínio para knowledge graph question answering baseado em large language models |
| author |
Ávila, Caio Viktor da Silva |
| author_facet |
Ávila, Caio Viktor da Silva |
| author_role |
author |
| dc.contributor.co-advisor.none.fl_str_mv |
Casanova, Marco Antonio |
| dc.contributor.author.fl_str_mv |
Ávila, Caio Viktor da Silva |
| dc.contributor.advisor1.fl_str_mv |
Vidal, Vânia Maria Ponte |
| contributor_str_mv |
Vidal, Vânia Maria Ponte |
| dc.subject.cnpq.fl_str_mv |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
| topic |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Question answering Knowledge graph Large language model RDF browser Question answering Knowledge graph Large language model RDF browser |
| dc.subject.ptbr.pt_BR.fl_str_mv |
Question answering Knowledge graph Large language model RDF browser |
| dc.subject.en.pt_BR.fl_str_mv |
Question answering Knowledge graph Large language model RDF browser |
| description |
Knowledge graph question answering (KGQA) systems are computational systems capable of answering questions in natural language using a knowledge graph (KG) as a source of knowledge to be consulted. These systems stand out for their curated and deep answers. Throughout history, several architectures and approaches have been proposed for KGQA systems, with systems based on pre-trained end-to-end deep learning models becoming popular in recent years. Currently, large language models (LLMs) are the state of the art for pre-trained language models. Thus, the opportunity arises to develop KGQA systems based on LLMs. With this in mind, as its main contribution, this thesis presents Auto-KGQA, a domain-independent autonomous framework based on LLMs for KGQA. The framework automatically selects fragments of the KG that are relevant to the question, which the LLM uses as context to translate the natural language question into a SPARQL query over the KG. The framework is accessible through its HTTP API or through a Chat Messenger Web interface. In addition, the framework is integrated with the RDF browser, LiRB, allowing iterative navigation of resources returned in queries. Preliminary experiments with Auto-KGQA with ChatGPT indicate that the framework substantially reduced the number of tokens passed to LLM without sacrificing performance. Finally, evaluation of Auto-KGQA on a benchmark with enterprise queries in the insurance companies domain showed that the framework is competitive, achieving a 13.2% improvement in accuracy over the state-of-the-art and a 51.12% reduction in the number of tokens passed to LLM. Experiments have revealed that the use of few-shot learning strategies together with the subgraph selected by Auto-KGQA generates robust and generalizable KGQA systems, outperforming their competitors in 0-shot learning scenarios and matching them in few-shot scenarios. |
| publishDate |
2024 |
| dc.date.accessioned.fl_str_mv |
2024-09-20T16:58:58Z |
| dc.date.available.fl_str_mv |
2024-09-20T16:58:58Z |
| dc.date.issued.fl_str_mv |
2024 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
ÁVILA, Caio Viktor da Silva. Um framework independente de domínio para knowledge graph question answering baseado em large language models. 2024. 134 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2024. |
| dc.identifier.uri.fl_str_mv |
http://repositorio.ufc.br/handle/riufc/78251 |
| identifier_str_mv |
ÁVILA, Caio Viktor da Silva. Um framework independente de domínio para knowledge graph question answering baseado em large language models. 2024. 134 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2024. |
| url |
http://repositorio.ufc.br/handle/riufc/78251 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC |
| instname_str |
Universidade Federal do Ceará (UFC) |
| instacron_str |
UFC |
| institution |
UFC |
| reponame_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| collection |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| bitstream.url.fl_str_mv |
http://repositorio.ufc.br/bitstream/riufc/78251/3/2024_tese_cvsavila.pdf http://repositorio.ufc.br/bitstream/riufc/78251/4/license.txt |
| bitstream.checksum.fl_str_mv |
14907fac1baad1d44c5bf2da445a3ca9 8a4605be74aa9ea9d79846c1fba20a33 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC) |
| repository.mail.fl_str_mv |
bu@ufc.br || repositorio@ufc.br |
| _version_ |
1847793344877428736 |