Explorando estratégias bayesianas eficientes e eficazes para classificação de texto
| Ano de defesa: | 2015 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/ESBF-9WXR5Q |
Resumo: | Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering, opinion mining, content organizers and authorship identification. Naive Bayes (NB) approaches are widely used as a classification paradigm, due to their simplicity, efficiency and effectiveness in several scenarios. However, NB solutions do not present competitive effectiveness in Automatic Document Classification (ADC) tasks when compared to other modern statistical learning methods. In this dissertation, we investigate whether the combination of some alternative NB learning models with different feature weighting techniques can improve the NB effectiveness in ADC. We also present an investigation on the relaxation of the NB feature independence assumption (aka, Semi-Naive approaches) in large text collections. Given the high computational costs of these investigations, we present a massively GPU-based parallelized version of the NB. Moreover, supported by the parallel implementations, we propose four novel Lazy Semi-NB approaches. In our experiments, our solutions not only outperform existing Semi-NB approaches, but also surpass our improved NB solutions in terms of effectiveness that had already outperformed SVMs. |
| id |
UFMG_9b7447074884f439155f50d9bcc407a2 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/ESBF-9WXR5Q |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
Explorando estratégias bayesianas eficientes e eficazes para classificação de textoIndexação automaticaComputaçãoTeoria bayesiana de decisão estatisticaParalelizaçãoNaive BayesPonderação de atributosClassificação automática de documentosSemi-Naive BayesAutomatic Document Classification (ADC) is the basis of many important applications such as spam filtering, opinion mining, content organizers and authorship identification. Naive Bayes (NB) approaches are widely used as a classification paradigm, due to their simplicity, efficiency and effectiveness in several scenarios. However, NB solutions do not present competitive effectiveness in Automatic Document Classification (ADC) tasks when compared to other modern statistical learning methods. In this dissertation, we investigate whether the combination of some alternative NB learning models with different feature weighting techniques can improve the NB effectiveness in ADC. We also present an investigation on the relaxation of the NB feature independence assumption (aka, Semi-Naive approaches) in large text collections. Given the high computational costs of these investigations, we present a massively GPU-based parallelized version of the NB. Moreover, supported by the parallel implementations, we propose four novel Lazy Semi-NB approaches. In our experiments, our solutions not only outperform existing Semi-NB approaches, but also surpass our improved NB solutions in terms of effectiveness that had already outperformed SVMs.Universidade Federal de Minas Gerais2019-08-12T14:39:36Z2025-09-09T01:04:58Z2019-08-12T14:39:36Z2015-05-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/ESBF-9WXR5QFelipe Augusto Resende Viegasinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-09T01:04:58Zoai:repositorio.ufmg.br:1843/ESBF-9WXR5QRepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T01:04:58Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| title |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| spellingShingle |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto Felipe Augusto Resende Viegas Indexação automatica Computação Teoria bayesiana de decisão estatistica Paralelização Naive Bayes Ponderação de atributos Classificação automática de documentos Semi-Naive Bayes |
| title_short |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| title_full |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| title_fullStr |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| title_full_unstemmed |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| title_sort |
Explorando estratégias bayesianas eficientes e eficazes para classificação de texto |
| author |
Felipe Augusto Resende Viegas |
| author_facet |
Felipe Augusto Resende Viegas |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Felipe Augusto Resende Viegas |
| dc.subject.por.fl_str_mv |
Indexação automatica Computação Teoria bayesiana de decisão estatistica Paralelização Naive Bayes Ponderação de atributos Classificação automática de documentos Semi-Naive Bayes |
| topic |
Indexação automatica Computação Teoria bayesiana de decisão estatistica Paralelização Naive Bayes Ponderação de atributos Classificação automática de documentos Semi-Naive Bayes |
| description |
Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering, opinion mining, content organizers and authorship identification. Naive Bayes (NB) approaches are widely used as a classification paradigm, due to their simplicity, efficiency and effectiveness in several scenarios. However, NB solutions do not present competitive effectiveness in Automatic Document Classification (ADC) tasks when compared to other modern statistical learning methods. In this dissertation, we investigate whether the combination of some alternative NB learning models with different feature weighting techniques can improve the NB effectiveness in ADC. We also present an investigation on the relaxation of the NB feature independence assumption (aka, Semi-Naive approaches) in large text collections. Given the high computational costs of these investigations, we present a massively GPU-based parallelized version of the NB. Moreover, supported by the parallel implementations, we propose four novel Lazy Semi-NB approaches. In our experiments, our solutions not only outperform existing Semi-NB approaches, but also surpass our improved NB solutions in terms of effectiveness that had already outperformed SVMs. |
| publishDate |
2015 |
| dc.date.none.fl_str_mv |
2015-05-22 2019-08-12T14:39:36Z 2019-08-12T14:39:36Z 2025-09-09T01:04:58Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/ESBF-9WXR5Q |
| url |
https://hdl.handle.net/1843/ESBF-9WXR5Q |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414091545935872 |