Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: Fonseca, Evandro Brasil lattes
Orientador(a): Vieira, Renata
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Pontif?cia Universidade Cat?lica do Rio Grande do Sul
Programa de Pós-Graduação: Programa de P?s-Gradua??o em Ci?ncia da Computa??o
Departamento: Escola Polit?cnica
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://tede2.pucrs.br/tede2/handle/tede/8169
Resumo: Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence ? France is refusing. The country is one of the first in the ranking... ? we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.
id P_RS_343e7a6d1062ba023a79e689e27a50b4
oai_identifier_str oai:tede2.pucrs.br:tede/8169
network_acronym_str P_RS
network_name_str Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling Vieira, RenataVanin, Aline Averhttp://lattes.cnpq.br/7639784707152839http://lattes.cnpq.br/3229974637891253Fonseca, Evandro Brasil2018-06-26T14:48:46Z2018-03-19http://tede2.pucrs.br/tede2/handle/tede/8169Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence ? France is refusing. The country is one of the first in the ranking... ? we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.A tarefa de Resolu??o de Correfer?ncia ? um grande desafio para a ?rea de Processamento da Linguagem Natural, tendo em vista o conhecimento lingu?stico exigido e a sofistica??o das t?cnicas de processamento da l?ngua empregados. Mesmo sendo uma tarefa desafiadora, um fator motivador do estudo deste fen?meno se d? pela sua utilidade. Basicamente, v?rias tarefas de Processamento da Linguagem Natural podem se beneficiar de seus resultados, como, por exemplo, o reconhecimento de entidades nomeadas, extra??o de rela??o entre entidades nomeadas, sumariza??o, an?lise de sentimentos, entre outras. A Resolu??o de Correfer?ncia ? um processo que consiste em identificar determinados termos e express?es que remetem a uma mesma entidade. Por exemplo, na senten?a ?A Fran?a est? resistindo. O pa?s ? um dos primeiros no ranking...? podemos dizer que [o pa?s] ? uma correfer?ncia de [A Fran?a]. Realizando o agrupamento desses termos referenciais, formamos grupos de men??es correferentes, mais conhecidos como cadeias de correfer?ncia. Esta tese prop?e um processo para a resolu??o de correfer?ncia entre sintagmas nominais para a l?ngua portuguesa, tendo como foco a utiliza??o do conhecimento sem?ntico. Nossa abordagem proposta ? baseada em regras lingu?sticas sint?tico-sem?nticas. Ou seja, combinamos diferentes n?veis de processamento lingu?stico utilizando rela??es sem?nticas como apoio, de forma a inferir rela??es referenciais entre men??es. Modelos baseados em regras lingu?sticas t?m sido aplicados eficientemente em outros idiomas como o ingl?s, o espanhol e o galego. Esses modelos mostram-se mais eficientes que os baseados em aprendizado de m?quina quando lidamos com idiomas menos providos de recursos, dado que a aus?ncia de corpora ricos em amostras pode prejudicar o treino desses modelos. O modelo proposto nesta tese ? o primeiro voltado para a resolu??o de correfer?ncia em portugu?s que faz uso de conhecimento sem?ntico. Dessa forma, tomamos este fator como a principal contribui??o deste trabalho.Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-06-19T11:37:24Z No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-26T14:40:39Z (GMT) No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5)Made available in DSpace on 2018-06-26T14:48:46Z (GMT). No. of bitstreams: 1 EVANDRO BRASIL FONSECA_TES.pdf: 1972824 bytes, checksum: 9fca0c499753cd9d2822c59040e826bf (MD5) Previous issue date: 2018-03-19application/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/172616/EVANDRO%20BRASIL%20FONSECA_TES.pdf.jpgporPontif?cia Universidade Cat?lica do Rio Grande do SulPrograma de P?s-Gradua??o em Ci?ncia da Computa??oPUCRSBrasilEscola Polit?cnicaResolu??o de Correfer?nciaExtra??o de Informa??oCoreference ResolutionInformation ExtractionCIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAOResolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisTrabalho n?o apresenta restri??o para publica??o1974996533081274470500500-862078257083325301info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAILEVANDRO BRASIL FONSECA_TES.pdf.jpgEVANDRO BRASIL FONSECA_TES.pdf.jpgimage/jpeg4899http://tede2.pucrs.br/tede2/bitstream/tede/8169/4/EVANDRO+BRASIL+FONSECA_TES.pdf.jpgd7fa51000ab126c04f3d0dea38dd68f4MD54TEXTEVANDRO BRASIL FONSECA_TES.pdf.txtEVANDRO BRASIL FONSECA_TES.pdf.txttext/plain208449http://tede2.pucrs.br/tede2/bitstream/tede/8169/3/EVANDRO+BRASIL+FONSECA_TES.pdf.txt0da35164ce29c1637605f29c70d29c6bMD53ORIGINALEVANDRO BRASIL FONSECA_TES.pdfEVANDRO BRASIL FONSECA_TES.pdfapplication/pdf1972824http://tede2.pucrs.br/tede2/bitstream/tede/8169/2/EVANDRO+BRASIL+FONSECA_TES.pdf9fca0c499753cd9d2822c59040e826bfMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-8610http://tede2.pucrs.br/tede2/bitstream/tede/8169/1/license.txt5a9d6006225b368ef605ba16b4f6d1beMD51tede/81692018-06-26 12:00:58.995oai:tede2.pucrs.br:tede/8169QXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBFbGV0csO0bmljYTogQ29tIGJhc2Ugbm8gZGlzcG9zdG8gbmEgTGVpIEZlZGVyYWwgbsK6OS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBvIGF1dG9yIEFVVE9SSVpBIGEgcHVibGljYcOnw6NvIGVsZXRyw7RuaWNhIGRhIHByZXNlbnRlIG9icmEgbm8gYWNlcnZvIGRhIEJpYmxpb3RlY2EgRGlnaXRhbCBkYSBQb250aWbDrWNpYSBVbml2ZXJzaWRhZGUgQ2F0w7NsaWNhIGRvIFJpbyBHcmFuZGUgZG8gU3VsLCBzZWRpYWRhIGEgQXYuIElwaXJhbmdhIDY2ODEsIFBvcnRvIEFsZWdyZSwgUmlvIEdyYW5kZSBkbyBTdWwsIGNvbSByZWdpc3RybyBkZSBDTlBKIDg4NjMwNDEzMDAwMi04MSBiZW0gY29tbyBlbSBvdXRyYXMgYmlibGlvdGVjYXMgZGlnaXRhaXMsIG5hY2lvbmFpcyBlIGludGVybmFjaW9uYWlzLCBjb25zw7NyY2lvcyBlIHJlZGVzIMOgcyBxdWFpcyBhIGJpYmxpb3RlY2EgZGEgUFVDUlMgcG9zc2EgYSB2aXIgcGFydGljaXBhciwgc2VtIMO0bnVzIGFsdXNpdm8gYW9zIGRpcmVpdG9zIGF1dG9yYWlzLCBhIHTDrXR1bG8gZGUgZGl2dWxnYcOnw6NvIGRhIHByb2R1w6fDo28gY2llbnTDrWZpY2EuCg==Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br||opendoar:2018-06-26T15:00:58Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
title Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
spellingShingle Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
Fonseca, Evandro Brasil
Resolu??o de Correfer?ncia
Extra??o de Informa??o
Coreference Resolution
Information Extraction
CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
title_short Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
title_full Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
title_fullStr Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
title_full_unstemmed Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
title_sort Resolu??o de correfer?ncia nominal usando sem?ntica em l?ngua portuguesa
author Fonseca, Evandro Brasil
author_facet Fonseca, Evandro Brasil
author_role author
dc.contributor.advisor1.fl_str_mv Vieira, Renata
dc.contributor.advisor-co1.fl_str_mv Vanin, Aline Aver
dc.contributor.advisor-co1Lattes.fl_str_mv http://lattes.cnpq.br/7639784707152839
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/3229974637891253
dc.contributor.author.fl_str_mv Fonseca, Evandro Brasil
contributor_str_mv Vieira, Renata
Vanin, Aline Aver
dc.subject.por.fl_str_mv Resolu??o de Correfer?ncia
Extra??o de Informa??o
topic Resolu??o de Correfer?ncia
Extra??o de Informa??o
Coreference Resolution
Information Extraction
CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Coreference Resolution
Information Extraction
dc.subject.cnpq.fl_str_mv CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
description Coreference Resolution task is challenging for Natural Language Processing, considering the required linguistic knowledge and the sophistication of language processing techniques involved. Even though it is a demanding task, a motivating factor in the study of this phenomenon is its usefulness. Basically, several Natural Language Processing tasks may benefit from their results, such as named entities recognition, relation extraction between named entities, summarization, sentiment analysis, among others. Coreference Resolution is a process that consists on identifying certain terms and expressions that refer to the same entity. For example, in the sentence ? France is refusing. The country is one of the first in the ranking... ? we can say that [the country] is a coreference of [France]. By grouping these referential terms, we form coreference groups, more commonly known as coreference chains. This thesis proposes a process for coreference resolution between noun phrases for Portuguese, focusing on the use of semantic knowledge. Our proposed approach is based on syntactic-semantic linguistic rules. That is, we combine different levels of linguistic processing, using semantic relations as support, in order to infer referential relations between mentions. Models based on linguistic rules have been efficiently applied in other languages, such as: English, Spanish and Galician. In few words, these models are more efficient than machine learning approaches when we deal with less resourceful languages, since the lack of sample-rich corpora may produce a poor training. The proposed approach is the first model for Portuguese coreference resolution which uses semantic knowledge. Thus, we consider it as the main contribution of this thesis.
publishDate 2018
dc.date.accessioned.fl_str_mv 2018-06-26T14:48:46Z
dc.date.issued.fl_str_mv 2018-03-19
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://tede2.pucrs.br/tede2/handle/tede/8169
url http://tede2.pucrs.br/tede2/handle/tede/8169
dc.language.iso.fl_str_mv por
language por
dc.relation.program.fl_str_mv 1974996533081274470
dc.relation.confidence.fl_str_mv 500
500
dc.relation.cnpq.fl_str_mv -862078257083325301
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.publisher.program.fl_str_mv Programa de P?s-Gradua??o em Ci?ncia da Computa??o
dc.publisher.initials.fl_str_mv PUCRS
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Escola Polit?cnica
publisher.none.fl_str_mv Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS
instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron:PUC_RS
instname_str Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str PUC_RS
institution PUC_RS
reponame_str Biblioteca Digital de Teses e Dissertações da PUC_RS
collection Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv http://tede2.pucrs.br/tede2/bitstream/tede/8169/4/EVANDRO+BRASIL+FONSECA_TES.pdf.jpg
http://tede2.pucrs.br/tede2/bitstream/tede/8169/3/EVANDRO+BRASIL+FONSECA_TES.pdf.txt
http://tede2.pucrs.br/tede2/bitstream/tede/8169/2/EVANDRO+BRASIL+FONSECA_TES.pdf
http://tede2.pucrs.br/tede2/bitstream/tede/8169/1/license.txt
bitstream.checksum.fl_str_mv d7fa51000ab126c04f3d0dea38dd68f4
0da35164ce29c1637605f29c70d29c6b
9fca0c499753cd9d2822c59040e826bf
5a9d6006225b368ef605ba16b4f6d1be
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv biblioteca.central@pucrs.br||
_version_ 1796793234544918528