Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro

Antoni, Marco

Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	Antoni, Marco
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
dARK ID:	ark:/26339/0013000011t2g
Idioma:	por
Instituição de defesa:	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Sistemas de perguntas e respostas Processamento de linguagem natural Dados abertos Censo escolar Question answering systems Natural language processing Open data Brazilian educational census CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://repositorio.ufsm.br/handle/1/23051
Resumo:	The accelerated growth of the data obtained and stored has been observed for many years, motivating a growing investigation for new forms of querying, enabling other ways to query information that is useful in several knowledge domains. In this sense, Question Answering (QA) is a specialized area of Information Retrieval, whose objective is to obtain precise and direct answers that satisfy the user’s need for information, given a question expressed in Natural Language (NL). For this task, a set of Natural Language Processing (NLP) techniques are applied for understanding human language. Although NLP has maturity in some languages (such as English), this research area presents numerous challenges, due to the difficulty of NL understanding caused by use of words that have similar meanings, slang/regional terms, incorrect spelling, or ambiguity. Moreover, in the Portuguese language, there is still a research gap, possibly motivated by the complexity that Portuguese language present in comparison to other languages. Thus, this research presents an exploratory study on the NLP applied to QA systems, and for that, a QA system was designed and developed for querying information from open data of Brazilian Educational Census, which is the largest and most important statistical research performed by Anísio Teixeira National Institute of Educational Studies and Research. The presented system applies a hybrid approach to understand the meaning of the question, i.e., it combines the linguistic and rule-based approaches, which are manually constructed based on the data dictionary and current educational legislation. The results of the evaluation carried out with Education professionals suggest the ease of use of the QA system, in addition to the importance of the tool for querying information in this data set. However, there are still many difficulties related to the NLP itself, and particularities related to the educational data set used.

Metadados do item

id	UFSM_68ff2c64a0c37c921038b4a3b169ba87
oai_identifier_str	oai:repositorio.ufsm.br:1/23051
network_acronym_str	UFSM
network_name_str	Manancial - Repositório Digital da UFSM
repository_id_str
spelling	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiroUse of natural language for querying microdata from brazilian Educational CensusSistemas de perguntas e respostasProcessamento de linguagem naturalDados abertosCenso escolarQuestion answering systemsNatural language processingOpen dataBrazilian educational censusCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOThe accelerated growth of the data obtained and stored has been observed for many years, motivating a growing investigation for new forms of querying, enabling other ways to query information that is useful in several knowledge domains. In this sense, Question Answering (QA) is a specialized area of Information Retrieval, whose objective is to obtain precise and direct answers that satisfy the user’s need for information, given a question expressed in Natural Language (NL). For this task, a set of Natural Language Processing (NLP) techniques are applied for understanding human language. Although NLP has maturity in some languages (such as English), this research area presents numerous challenges, due to the difficulty of NL understanding caused by use of words that have similar meanings, slang/regional terms, incorrect spelling, or ambiguity. Moreover, in the Portuguese language, there is still a research gap, possibly motivated by the complexity that Portuguese language present in comparison to other languages. Thus, this research presents an exploratory study on the NLP applied to QA systems, and for that, a QA system was designed and developed for querying information from open data of Brazilian Educational Census, which is the largest and most important statistical research performed by Anísio Teixeira National Institute of Educational Studies and Research. The presented system applies a hybrid approach to understand the meaning of the question, i.e., it combines the linguistic and rule-based approaches, which are manually constructed based on the data dictionary and current educational legislation. The results of the evaluation carried out with Education professionals suggest the ease of use of the QA system, in addition to the importance of the tool for querying information in this data set. However, there are still many difficulties related to the NLP itself, and particularities related to the educational data set used.O crescimento acelerado dos dados produzidos e armazenados tem sido observado há vários anos, motivando uma busca crescente por novas formas de consulta, possibilitando outras formas de buscar informações que sejam úteis em várias áreas do conhecimento. Nesse sentido, os sistemas de Perguntas e Respostas (SPR) se constituem como uma área especializada da Recuperação da Informação, cujo objetivo consiste na obtenção de respostas precisas e diretas, e que satisfaçam a necessidade de informação do usuário, dada uma pergunta expressa em Linguagem Natural (LN). Para isso, são aplicados um conjunto de técnicas de Processamento de Linguagem Natural (PLN) que buscam a compreensão da linguagem humana. Apesar do PLN apresentar maturidade em alguns idiomas (como o inglês), essa área de pesquisa apresenta inúmeros desafios, devido a dificuldade de compreensão da LN em razão do uso de palavras que tenham significados semelhantes, gírias/termos regionais, escrita incorreta ou ambiguidade. Além disso, na língua portuguesa, ainda existe uma lacuna de pesquisa, possivelmente motivada pela complexidade que a língua portuguesa apresenta em relação a outros idiomas. Desse modo, esta pesquisa apresenta um estudo exploratório sobre o PLN aplicado aos SPRs, e para isso, foi planejado e desenvolvido um SPR para consulta de informações provenientes dos dados abertos do Censo Escolar brasileiro, que trata-se da maior e mais importante pesquisa estatística realizada pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. O sistema apresentado utiliza uma abordagem híbrida para compreender o sentido da pergunta, isto é, combina as abordagens linguística e baseada baseada em regras, estas construídas manualmente baseando-se no dicionário dos dados e legislação educacional vigente. Os resultados da avaliação realizada junto a profissionais da área da educação sugerem a facilidade do uso de um SPR, além da importância da ferramenta para consulta dessas informações nesse conjunto de dados, no entanto, ainda são encontradas uma série de dificuldades relacionadas ao PLN propriamente dito, além de particularidades relativas ao conjunto de dados educacionais usado.Universidade Federal de Santa MariaBrasilCiência da ComputaçãoUFSMPrograma de Pós-Graduação em Ciência da ComputaçãoCentro de TecnologiaCharao, Andrea Schwertnerhttp://lattes.cnpq.br/8251676116103188Lima, Joao Carlos DamascenoDel Fabro, Marcos DidonetAntoni, Marco2021-11-29T17:20:35Z2021-11-29T17:20:35Z2021-03-31info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://repositorio.ufsm.br/handle/1/23051ark:/26339/0013000011t2gporAttribution-NonCommercial-NoDerivatives 4.0 Internationalinfo:eu-repo/semantics/openAccessreponame:Manancial - Repositório Digital da UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSM2022-05-18T15:03:47Zoai:repositorio.ufsm.br:1/23051Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/PUBhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br\|\|tedebc@gmail.com\|\|manancial@ufsm.bropendoar:2022-05-18T15:03:47Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)false
dc.title.none.fl_str_mv	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro Use of natural language for querying microdata from brazilian Educational Census
title	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro
spellingShingle	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro Antoni, Marco Sistemas de perguntas e respostas Processamento de linguagem natural Dados abertos Censo escolar Question answering systems Natural language processing Open data Brazilian educational census CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro
title_full	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro
title_fullStr	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro
title_full_unstemmed	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro
title_sort	Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro
author	Antoni, Marco
author_facet	Antoni, Marco
author_role	author
dc.contributor.none.fl_str_mv	Charao, Andrea Schwertner http://lattes.cnpq.br/8251676116103188 Lima, Joao Carlos Damasceno Del Fabro, Marcos Didonet
dc.contributor.author.fl_str_mv	Antoni, Marco
dc.subject.por.fl_str_mv	Sistemas de perguntas e respostas Processamento de linguagem natural Dados abertos Censo escolar Question answering systems Natural language processing Open data Brazilian educational census CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic	Sistemas de perguntas e respostas Processamento de linguagem natural Dados abertos Censo escolar Question answering systems Natural language processing Open data Brazilian educational census CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	The accelerated growth of the data obtained and stored has been observed for many years, motivating a growing investigation for new forms of querying, enabling other ways to query information that is useful in several knowledge domains. In this sense, Question Answering (QA) is a specialized area of Information Retrieval, whose objective is to obtain precise and direct answers that satisfy the user’s need for information, given a question expressed in Natural Language (NL). For this task, a set of Natural Language Processing (NLP) techniques are applied for understanding human language. Although NLP has maturity in some languages (such as English), this research area presents numerous challenges, due to the difficulty of NL understanding caused by use of words that have similar meanings, slang/regional terms, incorrect spelling, or ambiguity. Moreover, in the Portuguese language, there is still a research gap, possibly motivated by the complexity that Portuguese language present in comparison to other languages. Thus, this research presents an exploratory study on the NLP applied to QA systems, and for that, a QA system was designed and developed for querying information from open data of Brazilian Educational Census, which is the largest and most important statistical research performed by Anísio Teixeira National Institute of Educational Studies and Research. The presented system applies a hybrid approach to understand the meaning of the question, i.e., it combines the linguistic and rule-based approaches, which are manually constructed based on the data dictionary and current educational legislation. The results of the evaluation carried out with Education professionals suggest the ease of use of the QA system, in addition to the importance of the tool for querying information in this data set. However, there are still many difficulties related to the NLP itself, and particularities related to the educational data set used.
publishDate	2021
dc.date.none.fl_str_mv	2021-11-29T17:20:35Z 2021-11-29T17:20:35Z 2021-03-31
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.ufsm.br/handle/1/23051
dc.identifier.dark.fl_str_mv	ark:/26339/0013000011t2g
url	http://repositorio.ufsm.br/handle/1/23051
identifier_str_mv	ark:/26339/0013000011t2g
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia
publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia
dc.source.none.fl_str_mv	reponame:Manancial - Repositório Digital da UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM
instname_str	Universidade Federal de Santa Maria (UFSM)
instacron_str	UFSM
institution	UFSM
reponame_str	Manancial - Repositório Digital da UFSM
collection	Manancial - Repositório Digital da UFSM
repository.name.fl_str_mv	Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)
repository.mail.fl_str_mv	atendimento.sib@ufsm.br\|\|tedebc@gmail.com\|\|manancial@ufsm.br
_version_	1847153479388233728

Uso de linguagem natural para consulta de informações dos microdados do Censo Escolar brasileiro

Registros relacionados