Um modelo temporal-relacional para classificação de documentos

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Fernando Henrique de Jesus Mourao
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/1843/SLSS-7Z8MWL
Resumo: Automatic Document Classification (ADC) is one of the most relevant and challenging research problems in Information Retrieval. Despite the large number of ADC techniques already proposed, few of them take into consideration characteristics of the human language. As discussed in recent studies [Montejo-Raez et al., 2008; Chen, 1995], understanding and considering such characteristics may benefit ADC. Therefore, in this work we propose a new network-based representation for textual documents that is based on fundamental concepts of Linguistic, in particular those associated with relationships between terms. Using the proposed model, we also introduce a relational algorithm for ADC which exploits such relationships. Experimental evaluation of this algorithm shows that it achieves results that are comparable to SVM in four real datasets. In addition, its simplicity, execution efficiency and a simple parameter tuning are characteristics that make our algorithm an interesting alternative to SVM. A deeper analysis also shows that there are several dimensions in which relational algorithms may be enhanced. Due to its relevance, particular attention is given to the temporal dimension. In fact, changes occur spontaneously at every moment affecting settings and observations made previously on the term network. Considering this evolving behavior may be very useful in the area of Information Retrieval [Alonso et al., 2007]. In order to incorporate the temporal dimension to our algorithm, we attach to every relationship of our network information about the moment of its construction. The evaluation of simple temporal versions of the proposed algorithm showed that considering the temporal evolution has improved the performance of our relational classifier, by providing more accurate information about the behavior of each term. A preliminary assessment of other dimensions of analysis, such as information scarcity and the use of attributes of relationships, also showed that more elaborated techniques to address such dimensions may benefit the proposed algorithm. Further, considering the generality of the linguistic concepts incorporated in this work, we believe that our proposal may be equally successful in various ADC application domains.
id UFMG_f5bfa72e538262c23b4607ebc600004e
oai_identifier_str oai:repositorio.ufmg.br:1843/SLSS-7Z8MWL
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Um modelo temporal-relacional para classificação de documentosComputaçãoRecuperação de informaçãoMineração de dadosClassificaçãoe agrupamentoModelagem de redes complexasRecuperação de informaçãoMineração de textoAnálise temporalAutomatic Document Classification (ADC) is one of the most relevant and challenging research problems in Information Retrieval. Despite the large number of ADC techniques already proposed, few of them take into consideration characteristics of the human language. As discussed in recent studies [Montejo-Raez et al., 2008; Chen, 1995], understanding and considering such characteristics may benefit ADC. Therefore, in this work we propose a new network-based representation for textual documents that is based on fundamental concepts of Linguistic, in particular those associated with relationships between terms. Using the proposed model, we also introduce a relational algorithm for ADC which exploits such relationships. Experimental evaluation of this algorithm shows that it achieves results that are comparable to SVM in four real datasets. In addition, its simplicity, execution efficiency and a simple parameter tuning are characteristics that make our algorithm an interesting alternative to SVM. A deeper analysis also shows that there are several dimensions in which relational algorithms may be enhanced. Due to its relevance, particular attention is given to the temporal dimension. In fact, changes occur spontaneously at every moment affecting settings and observations made previously on the term network. Considering this evolving behavior may be very useful in the area of Information Retrieval [Alonso et al., 2007]. In order to incorporate the temporal dimension to our algorithm, we attach to every relationship of our network information about the moment of its construction. The evaluation of simple temporal versions of the proposed algorithm showed that considering the temporal evolution has improved the performance of our relational classifier, by providing more accurate information about the behavior of each term. A preliminary assessment of other dimensions of analysis, such as information scarcity and the use of attributes of relationships, also showed that more elaborated techniques to address such dimensions may benefit the proposed algorithm. Further, considering the generality of the linguistic concepts incorporated in this work, we believe that our proposal may be equally successful in various ADC application domains.Universidade Federal de Minas Gerais2019-08-10T02:06:04Z2025-09-09T00:55:21Z2019-08-10T02:06:04Z2009-11-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/SLSS-7Z8MWLFernando Henrique de Jesus Mouraoinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-09T18:45:10Zoai:repositorio.ufmg.br:1843/SLSS-7Z8MWLRepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T18:45:10Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv Um modelo temporal-relacional para classificação de documentos
title Um modelo temporal-relacional para classificação de documentos
spellingShingle Um modelo temporal-relacional para classificação de documentos
Fernando Henrique de Jesus Mourao
Computação
Recuperação de informação
Mineração de dados
Classificaçãoe agrupamento
Modelagem de redes complexas
Recuperação de informação
Mineração de texto
Análise temporal
title_short Um modelo temporal-relacional para classificação de documentos
title_full Um modelo temporal-relacional para classificação de documentos
title_fullStr Um modelo temporal-relacional para classificação de documentos
title_full_unstemmed Um modelo temporal-relacional para classificação de documentos
title_sort Um modelo temporal-relacional para classificação de documentos
author Fernando Henrique de Jesus Mourao
author_facet Fernando Henrique de Jesus Mourao
author_role author
dc.contributor.author.fl_str_mv Fernando Henrique de Jesus Mourao
dc.subject.por.fl_str_mv Computação
Recuperação de informação
Mineração de dados
Classificaçãoe agrupamento
Modelagem de redes complexas
Recuperação de informação
Mineração de texto
Análise temporal
topic Computação
Recuperação de informação
Mineração de dados
Classificaçãoe agrupamento
Modelagem de redes complexas
Recuperação de informação
Mineração de texto
Análise temporal
description Automatic Document Classification (ADC) is one of the most relevant and challenging research problems in Information Retrieval. Despite the large number of ADC techniques already proposed, few of them take into consideration characteristics of the human language. As discussed in recent studies [Montejo-Raez et al., 2008; Chen, 1995], understanding and considering such characteristics may benefit ADC. Therefore, in this work we propose a new network-based representation for textual documents that is based on fundamental concepts of Linguistic, in particular those associated with relationships between terms. Using the proposed model, we also introduce a relational algorithm for ADC which exploits such relationships. Experimental evaluation of this algorithm shows that it achieves results that are comparable to SVM in four real datasets. In addition, its simplicity, execution efficiency and a simple parameter tuning are characteristics that make our algorithm an interesting alternative to SVM. A deeper analysis also shows that there are several dimensions in which relational algorithms may be enhanced. Due to its relevance, particular attention is given to the temporal dimension. In fact, changes occur spontaneously at every moment affecting settings and observations made previously on the term network. Considering this evolving behavior may be very useful in the area of Information Retrieval [Alonso et al., 2007]. In order to incorporate the temporal dimension to our algorithm, we attach to every relationship of our network information about the moment of its construction. The evaluation of simple temporal versions of the proposed algorithm showed that considering the temporal evolution has improved the performance of our relational classifier, by providing more accurate information about the behavior of each term. A preliminary assessment of other dimensions of analysis, such as information scarcity and the use of attributes of relationships, also showed that more elaborated techniques to address such dimensions may benefit the proposed algorithm. Further, considering the generality of the linguistic concepts incorporated in this work, we believe that our proposal may be equally successful in various ADC application domains.
publishDate 2009
dc.date.none.fl_str_mv 2009-11-23
2019-08-10T02:06:04Z
2019-08-10T02:06:04Z
2025-09-09T00:55:21Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1843/SLSS-7Z8MWL
url https://hdl.handle.net/1843/SLSS-7Z8MWL
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv repositorio@ufmg.br
_version_ 1856413914508558336