Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais

Renato Rocha Souza

Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais

Detalhes bibliográficos
Ano de defesa:	2005
Autor(a) principal:	Renato Rocha Souza
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Ciência da informação Sistemas de recuperação da informação Indexação automática
Link de acesso:	https://hdl.handle.net/1843/RRSA-6GGGUF
Resumo:	Since manual indexing was found impossible for some document processing contexts, researchers seek alternatives to represent documents subjects automatically. The most common processes try to determine documents subjects through the analysis of words' frequencies. Searching for a better indexing process which analyses words and expressions within their linguistics contexts, three assumptions are made: (1) using noun phrases as descriptors is better than using keywords; (2) the extraction of the noun phrases from digitalized textual documents is possible and viable with the software tools available and (3) it is possible to establish an automated and functional process to choose good descriptors for documents using noun phrases. The aim of this research was to develop a methodology that would enable the indexation of digitalized documents through the extraction of the noun phrases and analysis of characteristics such as: (1) the frequency of occurrence of the noun phrases in the text of the document; (2) The frequency of occurrence in the whole set of documents; (3) the structure of the noun phrase; (4) the level of the noun phrase and (5) the occurrence of the noun phrase in a thesaurus of the subjects field. In order to reach this goal, the following pieces were analyzed (a) a corpus made of 15 documents from winch the noun phrases were extracted manually, to test the automatic extraction and (b) a corpus made of 60 documents coming from the field of information science. The methodology proposed was applied initially to part of the corpus for validation and calibration purposes, and then it was again applied, with some changes, to the whole corpus. The results presented showed a great deal of adequateness of the descriptors associated to the documents and this led to the conclusion that the methodology is unequivocally successful in the studied conditions.

Metadados do item

id	UFMG_778feed4f5840bb30613b27ed2b45f9a
oai_identifier_str	oai:repositorio.ufmg.br:1843/RRSA-6GGGUF
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	2019-08-14T05:20:57Z2025-09-08T23:26:07Z2019-08-14T05:20:57Z2005-05-04https://hdl.handle.net/1843/RRSA-6GGGUFSince manual indexing was found impossible for some document processing contexts, researchers seek alternatives to represent documents subjects automatically. The most common processes try to determine documents subjects through the analysis of words' frequencies. Searching for a better indexing process which analyses words and expressions within their linguistics contexts, three assumptions are made: (1) using noun phrases as descriptors is better than using keywords; (2) the extraction of the noun phrases from digitalized textual documents is possible and viable with the software tools available and (3) it is possible to establish an automated and functional process to choose good descriptors for documents using noun phrases. The aim of this research was to develop a methodology that would enable the indexation of digitalized documents through the extraction of the noun phrases and analysis of characteristics such as: (1) the frequency of occurrence of the noun phrases in the text of the document; (2) The frequency of occurrence in the whole set of documents; (3) the structure of the noun phrase; (4) the level of the noun phrase and (5) the occurrence of the noun phrase in a thesaurus of the subjects field. In order to reach this goal, the following pieces were analyzed (a) a corpus made of 15 documents from winch the noun phrases were extracted manually, to test the automatic extraction and (b) a corpus made of 60 documents coming from the field of information science. The methodology proposed was applied initially to part of the corpus for validation and calibration purposes, and then it was again applied, with some changes, to the whole corpus. The results presented showed a great deal of adequateness of the descriptors associated to the documents and this led to the conclusion that the methodology is unequivocally successful in the studied conditions.Universidade Federal de Minas GeraisSistemas de Recuperação de InformaçõesSintagmas NominaisIndexação AutomáticaCiência da informaçãoSistemas de recuperação da informaçãoIndexação automáticaUma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominaisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisRenato Rocha Souzainfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGLidia AlvarengaBeatriz Valadares CendonMaria Eugenia Albino AndradeHélio KuramotoRenata VieiraDesde que se tornaram inviáveis em alguns contextos os processos manuais de indexação de documentos, buscam-se alternativas eficazes que possibilitem a representação automática dos assuntos principais desses documentos. Os processos mais comuns de indexação automática descrevem os documentos através de uma lógica simplista advinda da análise de freqüência das palavras que neles ocorrem. Buscando propor processo de indexação mais eficaz, que analise as palavras e expressões no âmbito de seus contextos lingüísticos, três pressupostos são definidos: (1) a utilização de sintagmas nominais como descritores apresenta vantagens em relação ao uso de palavras-chave; (2) a extração de sintagmas nominais de textos de documentos digitalizados é possível e viável com ferramentas tecnológicas atualmente disponíveis e (3) é possível estabelecer processo automatizado e eficaz para escolha de descritores significativos para documentos digitalizados, utilizando sintagmas nominais. O objetivo da presente pesquisa é apresentar uma metodologia para viabilizar o processo de atribuição de descritores a textos digitalizados indexação através da extração de sintagmas nominais e da análise de fatores como a freqüência de ocorrência desses sintagmas nominais nos textos dos documentos, no conjunto dos documentos; a estrutura dos sintagmas nominais; o nível dos sintagmas nominais e a ocorrência desses em tesauro de um campo de conhecimento específico. Para atingir esse objetivo são analisados (a) um corpus de 15 documentos dos quais foram extraídos os sintagmas nominais manualmente, para testar o processo de extração automática e (b) um corpus de 60 documentos provenientes de publicações eletrônicas da área de ciência da informação. A metodologia proposta foi aplicada inicialmente a parte do corpus para validação e parametrização das variáveis do algoritmo, e então novamente aplicada, com alterações, à totalidade do corpus. Os resultados apresentados demonstraram grande pertinência dos descritores atribuídos aos documentos e permitiram concluir que a metodologia obtém sucesso inequívoco nas condições estudadas.UFMGORIGINALdoutorado___renato_rocha_souza.pdfapplication/pdf3754259https://repositorio.ufmg.br//bitstreams/aead82ec-8fff-4450-b5e9-695dcaf72955/download0bc3259c22ebc5f8487be722bab513ffMD51trueAnonymousREADTEXTdoutorado___renato_rocha_souza.pdf.txtdoutorado___renato_rocha_souza.pdf.txtExtracted texttext/plain103803https://repositorio.ufmg.br//bitstreams/7bf9fc64-5a55-475b-a477-13465939ce87/downloadf2b9e63b19d22f37cdd375eeb9c93d00MD52falseAnonymousREADTHUMBNAILdoutorado___renato_rocha_souza.pdf.jpgdoutorado___renato_rocha_souza.pdf.jpgGenerated Thumbnailimage/jpeg3539https://repositorio.ufmg.br//bitstreams/3eabda08-fbf0-4a90-9cc3-5156bdb8b534/download3bd296feba7287b412af9a98361932ccMD53falseAnonymousREAD1843/RRSA-6GGGUF2025-09-09 14:56:31.102open.accessoai:repositorio.ufmg.br:1843/RRSA-6GGGUFhttps://repositorio.ufmg.br/Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T17:56:31Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
title	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
spellingShingle	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais Renato Rocha Souza Ciência da informação Sistemas de recuperação da informação Indexação automática Sistemas de Recuperação de Informações Sintagmas Nominais Indexação Automática
title_short	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
title_full	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
title_fullStr	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
title_full_unstemmed	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
title_sort	Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais
author	Renato Rocha Souza
author_facet	Renato Rocha Souza
author_role	author
dc.contributor.author.fl_str_mv	Renato Rocha Souza
dc.subject.por.fl_str_mv	Ciência da informação Sistemas de recuperação da informação Indexação automática
topic	Ciência da informação Sistemas de recuperação da informação Indexação automática Sistemas de Recuperação de Informações Sintagmas Nominais Indexação Automática
dc.subject.other.none.fl_str_mv	Sistemas de Recuperação de Informações Sintagmas Nominais Indexação Automática
description	Since manual indexing was found impossible for some document processing contexts, researchers seek alternatives to represent documents subjects automatically. The most common processes try to determine documents subjects through the analysis of words' frequencies. Searching for a better indexing process which analyses words and expressions within their linguistics contexts, three assumptions are made: (1) using noun phrases as descriptors is better than using keywords; (2) the extraction of the noun phrases from digitalized textual documents is possible and viable with the software tools available and (3) it is possible to establish an automated and functional process to choose good descriptors for documents using noun phrases. The aim of this research was to develop a methodology that would enable the indexation of digitalized documents through the extraction of the noun phrases and analysis of characteristics such as: (1) the frequency of occurrence of the noun phrases in the text of the document; (2) The frequency of occurrence in the whole set of documents; (3) the structure of the noun phrase; (4) the level of the noun phrase and (5) the occurrence of the noun phrase in a thesaurus of the subjects field. In order to reach this goal, the following pieces were analyzed (a) a corpus made of 15 documents from winch the noun phrases were extracted manually, to test the automatic extraction and (b) a corpus made of 60 documents coming from the field of information science. The methodology proposed was applied initially to part of the corpus for validation and calibration purposes, and then it was again applied, with some changes, to the whole corpus. The results presented showed a great deal of adequateness of the descriptors associated to the documents and this led to the conclusion that the methodology is unequivocally successful in the studied conditions.
publishDate	2005
dc.date.issued.fl_str_mv	2005-05-04
dc.date.accessioned.fl_str_mv	2019-08-14T05:20:57Z 2025-09-08T23:26:07Z
dc.date.available.fl_str_mv	2019-08-14T05:20:57Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1843/RRSA-6GGGUF
url	https://hdl.handle.net/1843/RRSA-6GGGUF
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
bitstream.url.fl_str_mv	https://repositorio.ufmg.br//bitstreams/aead82ec-8fff-4450-b5e9-695dcaf72955/download https://repositorio.ufmg.br//bitstreams/7bf9fc64-5a55-475b-a477-13465939ce87/download https://repositorio.ufmg.br//bitstreams/3eabda08-fbf0-4a90-9cc3-5156bdb8b534/download
bitstream.checksum.fl_str_mv	0bc3259c22ebc5f8487be722bab513ff f2b9e63b19d22f37cdd375eeb9c93d00 3bd296feba7287b412af9a98361932cc
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1862105547848810496

Uma proposta de metodologia para escolha automática de descritores utilizando sintagmas nominais

Registros relacionados