Explorando estratégias bayesianas eficientes e eficazes para classificação de texto

Detalhes bibliográficos
Ano de defesa: 2015
Autor(a) principal: Felipe Augusto Resende Viegas
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Minas Gerais
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/1843/ESBF-9WXR5Q
Resumo: Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering, opinion mining, content organizers and authorship identification. Naive Bayes (NB) approaches are widely used as a classification paradigm, due to their simplicity, efficiency and effectiveness in several scenarios. However, NB solutions do not present competitive effectiveness in Automatic Document Classification (ADC) tasks when compared to other modern statistical learning methods. In this dissertation, we investigate whether the combination of some alternative NB learning models with different feature weighting techniques can improve the NB effectiveness in ADC. We also present an investigation on the relaxation of the NB feature independence assumption (aka, Semi-Naive approaches) in large text collections. Given the high computational costs of these investigations, we present a massively GPU-based parallelized version of the NB. Moreover, supported by the parallel implementations, we propose four novel Lazy Semi-NB approaches. In our experiments, our solutions not only outperform existing Semi-NB approaches, but also surpass our improved NB solutions in terms of effectiveness that had already outperformed SVMs.
id UFMG_9b7447074884f439155f50d9bcc407a2
oai_identifier_str oai:repositorio.ufmg.br:1843/ESBF-9WXR5Q
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Explorando estratégias bayesianas eficientes e eficazes para classificação de textoIndexação automaticaComputaçãoTeoria bayesiana de decisão estatisticaParalelizaçãoNaive BayesPonderação de atributosClassificação automática de documentosSemi-Naive BayesAutomatic Document Classification (ADC) is the basis of many important applications such as spam filtering, opinion mining, content organizers and authorship identification. Naive Bayes (NB) approaches are widely used as a classification paradigm, due to their simplicity, efficiency and effectiveness in several scenarios. However, NB solutions do not present competitive effectiveness in Automatic Document Classification (ADC) tasks when compared to other modern statistical learning methods. In this dissertation, we investigate whether the combination of some alternative NB learning models with different feature weighting techniques can improve the NB effectiveness in ADC. We also present an investigation on the relaxation of the NB feature independence assumption (aka, Semi-Naive approaches) in large text collections. Given the high computational costs of these investigations, we present a massively GPU-based parallelized version of the NB. Moreover, supported by the parallel implementations, we propose four novel Lazy Semi-NB approaches. In our experiments, our solutions not only outperform existing Semi-NB approaches, but also surpass our improved NB solutions in terms of effectiveness that had already outperformed SVMs.Universidade Federal de Minas Gerais2019-08-12T14:39:36Z2025-09-09T01:04:58Z2019-08-12T14:39:36Z2015-05-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/ESBF-9WXR5QFelipe Augusto Resende Viegasinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-09T01:04:58Zoai:repositorio.ufmg.br:1843/ESBF-9WXR5QRepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T01:04:58Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
title Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
spellingShingle Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
Felipe Augusto Resende Viegas
Indexação automatica
Computação
Teoria bayesiana de decisão estatistica
Paralelização
Naive Bayes
Ponderação de atributos
Classificação automática de documentos
Semi-Naive Bayes
title_short Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
title_full Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
title_fullStr Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
title_full_unstemmed Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
title_sort Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
author Felipe Augusto Resende Viegas
author_facet Felipe Augusto Resende Viegas
author_role author
dc.contributor.author.fl_str_mv Felipe Augusto Resende Viegas
dc.subject.por.fl_str_mv Indexação automatica
Computação
Teoria bayesiana de decisão estatistica
Paralelização
Naive Bayes
Ponderação de atributos
Classificação automática de documentos
Semi-Naive Bayes
topic Indexação automatica
Computação
Teoria bayesiana de decisão estatistica
Paralelização
Naive Bayes
Ponderação de atributos
Classificação automática de documentos
Semi-Naive Bayes
description Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering, opinion mining, content organizers and authorship identification. Naive Bayes (NB) approaches are widely used as a classification paradigm, due to their simplicity, efficiency and effectiveness in several scenarios. However, NB solutions do not present competitive effectiveness in Automatic Document Classification (ADC) tasks when compared to other modern statistical learning methods. In this dissertation, we investigate whether the combination of some alternative NB learning models with different feature weighting techniques can improve the NB effectiveness in ADC. We also present an investigation on the relaxation of the NB feature independence assumption (aka, Semi-Naive approaches) in large text collections. Given the high computational costs of these investigations, we present a massively GPU-based parallelized version of the NB. Moreover, supported by the parallel implementations, we propose four novel Lazy Semi-NB approaches. In our experiments, our solutions not only outperform existing Semi-NB approaches, but also surpass our improved NB solutions in terms of effectiveness that had already outperformed SVMs.
publishDate 2015
dc.date.none.fl_str_mv 2015-05-22
2019-08-12T14:39:36Z
2019-08-12T14:39:36Z
2025-09-09T01:04:58Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1843/ESBF-9WXR5Q
url https://hdl.handle.net/1843/ESBF-9WXR5Q
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv repositorio@ufmg.br
_version_ 1856414091545935872