Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Gaspar, Lucas Peres
Orientador(a): Macêdo, José Antonio Fernandes de
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/53750
Resumo: The Web has evolved from a network of linked documents to one where both documents and data are linked, resulting in what is commonly known as the Web of Data, which includes a large variety of data usually published in RDF from multiple domains. Intuitive ways of accessing RDF data become increasingly important since the standard approach would be to run SPARQL queries. However, this can be extremely difficult for non-experts users. In this work, we address the problem of question answering over RDF. Given a natural language question or a keyword search string, our goal is to translate it into a formal query as SPARQL that captures the information needed. We propose two schema-based approach to query over RDF data without any previous knowledge about the ontology entities and schema: Von-QBE and Von-QBNER. This is different from the-state-of-art since the approaches are instance-based. However, it can be unfeasible using such approaches in big data scenarios where the ontology base is huge and demands a large number of computational resources to keep the knowledge base in memory. Moreover, most of these solutions need the knowledge base triplified, which can be a hard task for legacy bases. For this reason, Von-QBE uses only the RDF schema to answer the user’s question. However, the user query may contain information about the data instances which does not syntactically match with any concept or property on the ontology schema. For instance, the query Movies with Angelina Jolie. Consider that the ontology schema only presents the concepts Movie and Actress, and a property starring which relates both concepts. If we use only the ontology schema, just the concept Movie matches with the user query. Von-QBNER addresses such limitation by identifying the instances involved in the query and their correspondent concept or property in the ontology schema by using Named Entity Recognition (NER) models. The results are promising for the some real datasets evaluated, considering that only the ontology schema is used to generate SPARQL queries.
id UFC-7_8f19385c564beb92bb50dc77beccab19
oai_identifier_str oai:repositorio.ufc.br:riufc/53750
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Gaspar, Lucas PeresMacêdo, José Antonio Fernandes de2020-09-01T12:18:44Z2020-09-01T12:18:44Z2019GASPAR, Lucas Peres. Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas. 2019. 66 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.http://www.repositorio.ufc.br/handle/riufc/53750The Web has evolved from a network of linked documents to one where both documents and data are linked, resulting in what is commonly known as the Web of Data, which includes a large variety of data usually published in RDF from multiple domains. Intuitive ways of accessing RDF data become increasingly important since the standard approach would be to run SPARQL queries. However, this can be extremely difficult for non-experts users. In this work, we address the problem of question answering over RDF. Given a natural language question or a keyword search string, our goal is to translate it into a formal query as SPARQL that captures the information needed. We propose two schema-based approach to query over RDF data without any previous knowledge about the ontology entities and schema: Von-QBE and Von-QBNER. This is different from the-state-of-art since the approaches are instance-based. However, it can be unfeasible using such approaches in big data scenarios where the ontology base is huge and demands a large number of computational resources to keep the knowledge base in memory. Moreover, most of these solutions need the knowledge base triplified, which can be a hard task for legacy bases. For this reason, Von-QBE uses only the RDF schema to answer the user’s question. However, the user query may contain information about the data instances which does not syntactically match with any concept or property on the ontology schema. For instance, the query Movies with Angelina Jolie. Consider that the ontology schema only presents the concepts Movie and Actress, and a property starring which relates both concepts. If we use only the ontology schema, just the concept Movie matches with the user query. Von-QBNER addresses such limitation by identifying the instances involved in the query and their correspondent concept or property in the ontology schema by using Named Entity Recognition (NER) models. The results are promising for the some real datasets evaluated, considering that only the ontology schema is used to generate SPARQL queries.A Web evoluiu de uma rede de documentos interligados a uma onde tanto documentos e dados estão ligados, resultando no que é comumente conhecido como a Web de Dados, que inclui uma grande variedade de dados, normalmente publicados no formato RDF, sobre múltiplos domínios. Métodos intuitivos de acessar os dados RDF possuem grande importância, uma vez que a abordagem padrão seria executar uma consulta em SPARQL. Entretanto, isso pode ser muito difícil para usuários não-técnicos. Neste trabalho, abordamos o problema de question answering sobre bases RDF. Dada uma busca em linguagem natural ou em palavras-chaves, nosso objetivo é traduzi-la em uma consulta formal em SPARQL que capture a informação necessitada. Nós propomos duas abordagens baseadas em esquema para buscar sobre dados RDF sem nenhum conhecimento prévio da ontologia: Von-QBE e Von-QBNER. Isso é diferente do estado da arte uma vez que suas abordagens são baseadas nas instâncias de dados. Entretanto, isso pode ser infactível em cenários de Big Data, onde os dados são demasiados grandes e requerem muitos recursos computacionais para manter a base em memória. Também, muitas dessas soluções requerem que a base esteja triplificada, o que pode ser uma tarefa difícil em bases de dados legado. Por esta razão, Von-QBE utiliza apenas o esquema da base RDF para responder a busca do usuário. Entretanto, a busca do usuário pode conter informações sobre as instâncias de dados, que não vai corresponder, sintaticamente, a nenhum conceito ou propriedade no esquema da ontologia. Por exemplo, a busca filmes com Angelina Jolie. Considere que o esquema apresente apenas os conceitos Filme e Atriz, e a propriedade estrelando, que relaciona os dois conceitos. Se utilizarmos apenas o esquema da ontologia, apenas Filme será identificado na busca. VonQBNER resolve essa limitação identificando as instâncias envolvidas na busca e o conceito a que correspondem no esquema utilizando modelos de Named Entity Recognition (NER). Os resultados são promissores para alguns conjuntos de dados reais avaliados, considerando que apenas o esquema da ontologia foi utilizado para gerar as consultas em SPARQL.Geração de SPARQLEsquema RDFNamed Entity RecognitionQuery by ExampleConsultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadasNamed Entity Recognition based queries over linked data sourcesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessORIGINAL2019_dis_lpgaspar.pdf2019_dis_lpgaspar.pdfapplication/pdf1547908http://repositorio.ufc.br/bitstream/riufc/53750/3/2019_dis_lpgaspar.pdfb65233c138cb088d31dcb800e6209788MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/53750/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54riufc/537502020-09-01 09:18:44.777oai:repositorio.ufc.br:riufc/53750Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2020-09-01T12:18:44Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
dc.title.en.pt_BR.fl_str_mv Named Entity Recognition based queries over linked data sources
title Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
spellingShingle Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
Gaspar, Lucas Peres
Geração de SPARQL
Esquema RDF
Named Entity Recognition
Query by Example
title_short Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
title_full Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
title_fullStr Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
title_full_unstemmed Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
title_sort Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas
author Gaspar, Lucas Peres
author_facet Gaspar, Lucas Peres
author_role author
dc.contributor.author.fl_str_mv Gaspar, Lucas Peres
dc.contributor.advisor1.fl_str_mv Macêdo, José Antonio Fernandes de
contributor_str_mv Macêdo, José Antonio Fernandes de
dc.subject.por.fl_str_mv Geração de SPARQL
Esquema RDF
Named Entity Recognition
Query by Example
topic Geração de SPARQL
Esquema RDF
Named Entity Recognition
Query by Example
description The Web has evolved from a network of linked documents to one where both documents and data are linked, resulting in what is commonly known as the Web of Data, which includes a large variety of data usually published in RDF from multiple domains. Intuitive ways of accessing RDF data become increasingly important since the standard approach would be to run SPARQL queries. However, this can be extremely difficult for non-experts users. In this work, we address the problem of question answering over RDF. Given a natural language question or a keyword search string, our goal is to translate it into a formal query as SPARQL that captures the information needed. We propose two schema-based approach to query over RDF data without any previous knowledge about the ontology entities and schema: Von-QBE and Von-QBNER. This is different from the-state-of-art since the approaches are instance-based. However, it can be unfeasible using such approaches in big data scenarios where the ontology base is huge and demands a large number of computational resources to keep the knowledge base in memory. Moreover, most of these solutions need the knowledge base triplified, which can be a hard task for legacy bases. For this reason, Von-QBE uses only the RDF schema to answer the user’s question. However, the user query may contain information about the data instances which does not syntactically match with any concept or property on the ontology schema. For instance, the query Movies with Angelina Jolie. Consider that the ontology schema only presents the concepts Movie and Actress, and a property starring which relates both concepts. If we use only the ontology schema, just the concept Movie matches with the user query. Von-QBNER addresses such limitation by identifying the instances involved in the query and their correspondent concept or property in the ontology schema by using Named Entity Recognition (NER) models. The results are promising for the some real datasets evaluated, considering that only the ontology schema is used to generate SPARQL queries.
publishDate 2019
dc.date.issued.fl_str_mv 2019
dc.date.accessioned.fl_str_mv 2020-09-01T12:18:44Z
dc.date.available.fl_str_mv 2020-09-01T12:18:44Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv GASPAR, Lucas Peres. Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas. 2019. 66 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufc.br/handle/riufc/53750
identifier_str_mv GASPAR, Lucas Peres. Consultas sobre fontes de dados ligados baseadas em reconhecimento de entidades nomeadas. 2019. 66 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.
url http://www.repositorio.ufc.br/handle/riufc/53750
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/53750/3/2019_dis_lpgaspar.pdf
http://repositorio.ufc.br/bitstream/riufc/53750/4/license.txt
bitstream.checksum.fl_str_mv b65233c138cb088d31dcb800e6209788
8a4605be74aa9ea9d79846c1fba20a33
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847793043765198848