Swarm optimization clustering methods for opinion mining
| Ano de defesa: | 2017 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de Pernambuco
UFPE Brasil Programa de Pos Graduacao em Ciencia da Computacao |
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://repositorio.ufpe.br/handle/123456789/25227 |
Resumo: | Opinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services. |
| id |
UFPE_493feafc6aa4e8966e2cf4dc53da9194 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufpe.br:123456789/25227 |
| network_acronym_str |
UFPE |
| network_name_str |
Repositório Institucional da UFPE |
| repository_id_str |
|
| spelling |
Swarm optimization clustering methods for opinion miningInteligência artificialMineração de opiniãoAgrupamento de opiniãoOtimização de enxameOpinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services.A mineração de opinião, também conhecida como análise de sentimento, é um campo de estudo que analisa os sentimentos, opiniões, atitudes e emoções das pessoas sobre diferentes entidades, expressos de forma textual. Tal análise é obtida através da classificação das opiniões em categorias, tais como positiva, negativa ou neutra. As abordagens de aprendizado supervisionado e baseadas em léxico são mais comumente utilizadas na mineração de opinião. No entanto, tais abordagens requerem um esforço considerável para preparação da base de dados de treinamento e para construção dos léxicos de opinião, respectivamente. A fim de minimizar as desvantagens das abordagens apresentadas, esta Tese propõe o uso de uma abordagem de agrupamento não supervisionada para a tarefa de mineração de opinião, a qual é capaz de produzir resultados precisos para diversos domínios sem a necessidade de dados rotulados manualmente para a etapa treinamento e sem fazer uso de ferramentas dependentes de língua. Três algoritmos de agrupamento não-supervisionado baseados em otimização de partícula de enxame (Particle Swarm Optimization - PSO) são propostos: o DPSOMUT, que é baseado em versão discreta do PSO; o IDPSOMUT, que é baseado em uma versão melhorada e autoadaptativa do PSO com função de detecção; e o IDPSOMUT/CS, que é uma versão híbrida do IDPSOMUT com o Cuckoo Search (CS). Diversos experimentos foram conduzidos com diferentes tipos de corpora, domínios, idioma do texto, balanceamento de classes, função de otimização e técnicas de pré-processamento. A eficácia dos algoritmos de agrupamento foi avaliada com medidas externas como acurácia, precisão, revocação e f-medida. A partir das análises estatísticas, os algortimos baseados em inteligência coletiva, especialmente os baseado em PSO, obtiveram melhores resultados que os algortimos que utilizam técnicas convencionais de agrupamento como o K-means e o Agglomerative. Os algoritmos propostos obtiveram um melhor desempenho utilizando o pré-processamento baseado em n-grama e utilizando a Global Silhouete como função de otimização. O corpus OBCC é também uma contribuição desta Tese e contem uma coleção dourada com 2.940 tweets com opiniões de consumidores sobre produtos e serviços em Português brasileiro.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em Ciencia da ComputacaoOLIVEIRA, Adriano Lorena Inacio dehttp://lattes.cnpq.br/6593918610781356http://lattes.cnpq.br/5194381227316437SOUZA, Ellen Polliana Ramos2018-07-26T21:58:03Z2018-07-26T21:58:03Z2017-02-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://repositorio.ufpe.br/handle/123456789/25227engAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2019-10-26T04:04:54Zoai:repositorio.ufpe.br:123456789/25227Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-26T04:04:54Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
| dc.title.none.fl_str_mv |
Swarm optimization clustering methods for opinion mining |
| title |
Swarm optimization clustering methods for opinion mining |
| spellingShingle |
Swarm optimization clustering methods for opinion mining SOUZA, Ellen Polliana Ramos Inteligência artificial Mineração de opinião Agrupamento de opinião Otimização de enxame |
| title_short |
Swarm optimization clustering methods for opinion mining |
| title_full |
Swarm optimization clustering methods for opinion mining |
| title_fullStr |
Swarm optimization clustering methods for opinion mining |
| title_full_unstemmed |
Swarm optimization clustering methods for opinion mining |
| title_sort |
Swarm optimization clustering methods for opinion mining |
| author |
SOUZA, Ellen Polliana Ramos |
| author_facet |
SOUZA, Ellen Polliana Ramos |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
OLIVEIRA, Adriano Lorena Inacio de http://lattes.cnpq.br/6593918610781356 http://lattes.cnpq.br/5194381227316437 |
| dc.contributor.author.fl_str_mv |
SOUZA, Ellen Polliana Ramos |
| dc.subject.por.fl_str_mv |
Inteligência artificial Mineração de opinião Agrupamento de opinião Otimização de enxame |
| topic |
Inteligência artificial Mineração de opinião Agrupamento de opinião Otimização de enxame |
| description |
Opinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services. |
| publishDate |
2017 |
| dc.date.none.fl_str_mv |
2017-02-22 2018-07-26T21:58:03Z 2018-07-26T21:58:03Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/25227 |
| url |
https://repositorio.ufpe.br/handle/123456789/25227 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco UFPE Brasil Programa de Pos Graduacao em Ciencia da Computacao |
| publisher.none.fl_str_mv |
Universidade Federal de Pernambuco UFPE Brasil Programa de Pos Graduacao em Ciencia da Computacao |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
| instname_str |
Universidade Federal de Pernambuco (UFPE) |
| instacron_str |
UFPE |
| institution |
UFPE |
| reponame_str |
Repositório Institucional da UFPE |
| collection |
Repositório Institucional da UFPE |
| repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
| repository.mail.fl_str_mv |
attena@ufpe.br |
| _version_ |
1856042007370137600 |