Processos de constru??o autom?tica de tesauro

Granada, Roger Leitzke

Processos de constru??o autom?tica de tesauro

Detalhes bibliográficos
Ano de defesa:	2011
Autor(a) principal:	Granada, Roger Leitzke
Orientador(a):	Lima, Vera L?cia Strube de
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
Programa de Pós-Graduação:	Programa de P?s-Gradua??o em Ci?ncia da Computa??o
Departamento:	Faculdade de Inform?ca
País:	BR
Palavras-chave em Português:	INFORM?TICA TESAUROS - ELABORA??O INDEXA??O DE ASSUNTOS
Área do conhecimento CNPq:	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://tede2.pucrs.br/tede2/handle/tede/5158
Resumo:	The advances in technology have made the amount of information available in digital format increase rapidly. This increase reflects on the importance of efficient systems to Information Retrieval (IR), getting the right information when it's requested by users. Thesauri can be associated with IR systems, allowing the system to query not only by the key term, but also by related terms, obtaining related documents that were not retrieved. The manual construction, long and costly process that gave rise to the first thesaurus, shall be performed automatically, using different methods and processes available today. With this motivation, this dissertation proposes to study three cases of automatic thesauri construction. One method uses statistical techniques to identify the best related terms. Another method uses syntactic knowledge, being necessary to extract, besides the grammatical categories of each term, the relations that a verb have with its subject or object. The latter method makes use of syntactic knowledge and semantic knowledge of the terms, identifying non apparent relations. For this, this latter method uses an adaptation of the Latent Semantic Analysis technique. We developed three methods for automatic thesaurus construction using documents from the field of data privacy. The results were applied to an IR system, allowing the evaluation by domain experts. In conclusion, we observed that, in certain cases, it's better to apply techniques that do not use semantic knowledge of the terms, obtaining better results with methods that use only the syntactic knowledge of them.

Metadados do item

id	P_RS_e48b25f1e1ead45f217eaf9e7cd5ad24
oai_identifier_str	oai:tede2.pucrs.br:tede/5158
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Lima, Vera L?cia Strube deCPF:26551519091http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781127A8CPF:96188430097http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4282547J6Granada, Roger Leitzke2015-04-14T14:49:42Z2012-03-072011-03-29GRANADA, Roger Leitzke. Processos de constru??o autom?tica de tesauro. 2011. 114 f. Disserta??o (Mestrado em Ci?ncia da Computa??o) - Pontif?cia Universidade Cat?lica do Rio Grande do Sul, Porto Alegre, 2011.http://tede2.pucrs.br/tede2/handle/tede/5158The advances in technology have made the amount of information available in digital format increase rapidly. This increase reflects on the importance of efficient systems to Information Retrieval (IR), getting the right information when it's requested by users. Thesauri can be associated with IR systems, allowing the system to query not only by the key term, but also by related terms, obtaining related documents that were not retrieved. The manual construction, long and costly process that gave rise to the first thesaurus, shall be performed automatically, using different methods and processes available today. With this motivation, this dissertation proposes to study three cases of automatic thesauri construction. One method uses statistical techniques to identify the best related terms. Another method uses syntactic knowledge, being necessary to extract, besides the grammatical categories of each term, the relations that a verb have with its subject or object. The latter method makes use of syntactic knowledge and semantic knowledge of the terms, identifying non apparent relations. For this, this latter method uses an adaptation of the Latent Semantic Analysis technique. We developed three methods for automatic thesaurus construction using documents from the field of data privacy. The results were applied to an IR system, allowing the evaluation by domain experts. In conclusion, we observed that, in certain cases, it's better to apply techniques that do not use semantic knowledge of the terms, obtaining better results with methods that use only the syntactic knowledge of them.Com o progresso da tecnologia, a quantidade de informa??o dispon?vel em formato digital tem aumentado rapidamente. Esse aumento se reflete na crescente import?ncia de sistemas de Recupera??o de Informa??es (RI) eficientes, obtendo as informa??es corretas quando requisitadas pelos usu?rios. Tesauros podem ser associados a sistemas de RI, permitindo que o sistema realize consultas n?o apenas pelo termo-chave, mas tamb?m por termos relacionados, obtendo documentos relacionados, que antes n?o eram recuperados. A cria??o manual, processo longo e oneroso que dava origem aos primeiros tesauros, passa a ser realizada automaticamente, atrav?s de diferentes m?todos e processos dispon?veis atualmente. Com esta motiva??o, este trabalho prop?e estudar tr?s processos de constru??o autom?tica de tesauros. Um m?todo utiliza t?cnicas estat?sticas para a identifica??o dos melhores termos relacionados. Outro m?todo utiliza conhecimento sint?tico, sendo necess?rio extrair, al?m das categorias gramaticais de cada termo, as rela??es que um verbo tem com seu sujeito ou objeto. O ?ltimo m?todo faz a utiliza??o de conhecimento sint?tico e de conhecimento sem?ntico dos termos, identificando rela??es que n?o s?o aparentes. Para isso, esse ?ltimo m?todo utiliza uma adapta??o da t?cnica de An?lise Sem?ntica Latente. Foram desenvolvidos estes tr?s m?todos de gera??o tesauros a partir de documentos do dom?nio de privacidade de dados. Os resultados foram aplicados a um sistema de RI, permitindo a avalia??o por especialistas do dom?nio. Como conclus?o, observamos que, em determinados casos, ? melhor a aplica??o de t?cnicas que n?o utilizem conhecimento sem?ntico dos termos, obtendo melhores resultados com m?todos que utilizam apenas o conhecimento sint?tico dos mesmos.Made available in DSpace on 2015-04-14T14:49:42Z (GMT). No. of bitstreams: 1 437178.pdf: 938995 bytes, checksum: 7f4e4a024eb9af218b4ff88670a9ca88 (MD5) Previous issue date: 2011-03-29application/pdfhttp://tede2.pucrs.br:80/tede2/retrieve/15927/437178.pdf.jpgporPontif?cia Universidade Cat?lica do Rio Grande do SulPrograma de P?s-Gradua??o em Ci?ncia da Computa??oPUCRSBRFaculdade de Inform?caINFORM?TICATESAUROS - ELABORA??OINDEXA??O DE ASSUNTOSCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOProcessos de constru??o autom?tica de tesauroinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis19749965330812744705006001946639708616176246info:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RSTHUMBNAIL437178.pdf.jpg437178.pdf.jpgimage/jpeg3552http://tede2.pucrs.br/tede2/bitstream/tede/5158/3/437178.pdf.jpg65e668b8a48cf9a7ded4c8d6c434a32dMD53TEXT437178.pdf.txt437178.pdf.txttext/plain203379http://tede2.pucrs.br/tede2/bitstream/tede/5158/2/437178.pdf.txt7d4d041293750b072733f5641adfe9a3MD52ORIGINAL437178.pdfapplication/pdf938995http://tede2.pucrs.br/tede2/bitstream/tede/5158/1/437178.pdf7f4e4a024eb9af218b4ff88670a9ca88MD51tede/51582015-04-17 11:57:32.43oai:tede2.pucrs.br:tede/5158Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2015-04-17T14:57:32Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.por.fl_str_mv	Processos de constru??o autom?tica de tesauro
title	Processos de constru??o autom?tica de tesauro
spellingShingle	Processos de constru??o autom?tica de tesauro Granada, Roger Leitzke INFORM?TICA TESAUROS - ELABORA??O INDEXA??O DE ASSUNTOS CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Processos de constru??o autom?tica de tesauro
title_full	Processos de constru??o autom?tica de tesauro
title_fullStr	Processos de constru??o autom?tica de tesauro
title_full_unstemmed	Processos de constru??o autom?tica de tesauro
title_sort	Processos de constru??o autom?tica de tesauro
author	Granada, Roger Leitzke
author_facet	Granada, Roger Leitzke
author_role	author
dc.contributor.advisor1.fl_str_mv	Lima, Vera L?cia Strube de
dc.contributor.advisor1ID.fl_str_mv	CPF:26551519091
dc.contributor.advisor1Lattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4781127A8
dc.contributor.authorID.fl_str_mv	CPF:96188430097
dc.contributor.authorLattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4282547J6
dc.contributor.author.fl_str_mv	Granada, Roger Leitzke
contributor_str_mv	Lima, Vera L?cia Strube de
dc.subject.por.fl_str_mv	INFORM?TICA TESAUROS - ELABORA??O INDEXA??O DE ASSUNTOS
topic	INFORM?TICA TESAUROS - ELABORA??O INDEXA??O DE ASSUNTOS CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.cnpq.fl_str_mv	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	The advances in technology have made the amount of information available in digital format increase rapidly. This increase reflects on the importance of efficient systems to Information Retrieval (IR), getting the right information when it's requested by users. Thesauri can be associated with IR systems, allowing the system to query not only by the key term, but also by related terms, obtaining related documents that were not retrieved. The manual construction, long and costly process that gave rise to the first thesaurus, shall be performed automatically, using different methods and processes available today. With this motivation, this dissertation proposes to study three cases of automatic thesauri construction. One method uses statistical techniques to identify the best related terms. Another method uses syntactic knowledge, being necessary to extract, besides the grammatical categories of each term, the relations that a verb have with its subject or object. The latter method makes use of syntactic knowledge and semantic knowledge of the terms, identifying non apparent relations. For this, this latter method uses an adaptation of the Latent Semantic Analysis technique. We developed three methods for automatic thesaurus construction using documents from the field of data privacy. The results were applied to an IR system, allowing the evaluation by domain experts. In conclusion, we observed that, in certain cases, it's better to apply techniques that do not use semantic knowledge of the terms, obtaining better results with methods that use only the syntactic knowledge of them.
publishDate	2011
dc.date.issued.fl_str_mv	2011-03-29
dc.date.available.fl_str_mv	2012-03-07
dc.date.accessioned.fl_str_mv	2015-04-14T14:49:42Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	GRANADA, Roger Leitzke. Processos de constru??o autom?tica de tesauro. 2011. 114 f. Disserta??o (Mestrado em Ci?ncia da Computa??o) - Pontif?cia Universidade Cat?lica do Rio Grande do Sul, Porto Alegre, 2011.
dc.identifier.uri.fl_str_mv	http://tede2.pucrs.br/tede2/handle/tede/5158
identifier_str_mv	GRANADA, Roger Leitzke. Processos de constru??o autom?tica de tesauro. 2011. 114 f. Disserta??o (Mestrado em Ci?ncia da Computa??o) - Pontif?cia Universidade Cat?lica do Rio Grande do Sul, Porto Alegre, 2011.
url	http://tede2.pucrs.br/tede2/handle/tede/5158
dc.language.iso.fl_str_mv	por
language	por
dc.relation.program.fl_str_mv	1974996533081274470
dc.relation.confidence.fl_str_mv	500 600
dc.relation.department.fl_str_mv	1946639708616176246
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.publisher.program.fl_str_mv	Programa de P?s-Gradua??o em Ci?ncia da Computa??o
dc.publisher.initials.fl_str_mv	PUCRS
dc.publisher.country.fl_str_mv	BR
dc.publisher.department.fl_str_mv	Faculdade de Inform?ca
publisher.none.fl_str_mv	Pontif?cia Universidade Cat?lica do Rio Grande do Sul
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
bitstream.url.fl_str_mv	http://tede2.pucrs.br/tede2/bitstream/tede/5158/3/437178.pdf.jpg http://tede2.pucrs.br/tede2/bitstream/tede/5158/2/437178.pdf.txt http://tede2.pucrs.br/tede2/bitstream/tede/5158/1/437178.pdf
bitstream.checksum.fl_str_mv	65e668b8a48cf9a7ded4c8d6c434a32d 7d4d041293750b072733f5641adfe9a3 7f4e4a024eb9af218b4ff88670a9ca88
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1796793207816716288

Processos de constru??o autom?tica de tesauro

Registros relacionados