Swarm optimization clustering methods for opinion mining

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: SOUZA, Ellen Polliana Ramos
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/25227
Resumo: Opinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services.
id UFPE_493feafc6aa4e8966e2cf4dc53da9194
oai_identifier_str oai:repositorio.ufpe.br:123456789/25227
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str
spelling Swarm optimization clustering methods for opinion miningInteligência artificialMineração de opiniãoAgrupamento de opiniãoOtimização de enxameOpinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services.A mineração de opinião, também conhecida como análise de sentimento, é um campo de estudo que analisa os sentimentos, opiniões, atitudes e emoções das pessoas sobre diferentes entidades, expressos de forma textual. Tal análise é obtida através da classificação das opiniões em categorias, tais como positiva, negativa ou neutra. As abordagens de aprendizado supervisionado e baseadas em léxico são mais comumente utilizadas na mineração de opinião. No entanto, tais abordagens requerem um esforço considerável para preparação da base de dados de treinamento e para construção dos léxicos de opinião, respectivamente. A fim de minimizar as desvantagens das abordagens apresentadas, esta Tese propõe o uso de uma abordagem de agrupamento não supervisionada para a tarefa de mineração de opinião, a qual é capaz de produzir resultados precisos para diversos domínios sem a necessidade de dados rotulados manualmente para a etapa treinamento e sem fazer uso de ferramentas dependentes de língua. Três algoritmos de agrupamento não-supervisionado baseados em otimização de partícula de enxame (Particle Swarm Optimization - PSO) são propostos: o DPSOMUT, que é baseado em versão discreta do PSO; o IDPSOMUT, que é baseado em uma versão melhorada e autoadaptativa do PSO com função de detecção; e o IDPSOMUT/CS, que é uma versão híbrida do IDPSOMUT com o Cuckoo Search (CS). Diversos experimentos foram conduzidos com diferentes tipos de corpora, domínios, idioma do texto, balanceamento de classes, função de otimização e técnicas de pré-processamento. A eficácia dos algoritmos de agrupamento foi avaliada com medidas externas como acurácia, precisão, revocação e f-medida. A partir das análises estatísticas, os algortimos baseados em inteligência coletiva, especialmente os baseado em PSO, obtiveram melhores resultados que os algortimos que utilizam técnicas convencionais de agrupamento como o K-means e o Agglomerative. Os algoritmos propostos obtiveram um melhor desempenho utilizando o pré-processamento baseado em n-grama e utilizando a Global Silhouete como função de otimização. O corpus OBCC é também uma contribuição desta Tese e contem uma coleção dourada com 2.940 tweets com opiniões de consumidores sobre produtos e serviços em Português brasileiro.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em Ciencia da ComputacaoOLIVEIRA, Adriano Lorena Inacio dehttp://lattes.cnpq.br/6593918610781356http://lattes.cnpq.br/5194381227316437SOUZA, Ellen Polliana Ramos2018-07-26T21:58:03Z2018-07-26T21:58:03Z2017-02-22info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://repositorio.ufpe.br/handle/123456789/25227engAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2019-10-26T04:04:54Zoai:repositorio.ufpe.br:123456789/25227Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-26T04:04:54Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.none.fl_str_mv Swarm optimization clustering methods for opinion mining
title Swarm optimization clustering methods for opinion mining
spellingShingle Swarm optimization clustering methods for opinion mining
SOUZA, Ellen Polliana Ramos
Inteligência artificial
Mineração de opinião
Agrupamento de opinião
Otimização de enxame
title_short Swarm optimization clustering methods for opinion mining
title_full Swarm optimization clustering methods for opinion mining
title_fullStr Swarm optimization clustering methods for opinion mining
title_full_unstemmed Swarm optimization clustering methods for opinion mining
title_sort Swarm optimization clustering methods for opinion mining
author SOUZA, Ellen Polliana Ramos
author_facet SOUZA, Ellen Polliana Ramos
author_role author
dc.contributor.none.fl_str_mv OLIVEIRA, Adriano Lorena Inacio de
http://lattes.cnpq.br/6593918610781356
http://lattes.cnpq.br/5194381227316437
dc.contributor.author.fl_str_mv SOUZA, Ellen Polliana Ramos
dc.subject.por.fl_str_mv Inteligência artificial
Mineração de opinião
Agrupamento de opinião
Otimização de enxame
topic Inteligência artificial
Mineração de opinião
Agrupamento de opinião
Otimização de enxame
description Opinion Mining (OM), also known as sentiment analysis, is the field of study that analyzes people’s sentiments, evaluations, attitudes, and emotions about different entities expressed in textual input. This is accomplished through the classification of an opinion into categories, such as positive, negative, or neutral. Supervised machine learning (ML) and lexicon-based are the most frequent approaches for OM. However, these approaches require considerable effort for preparing training data and to build the opinion lexicon, respectively. In order to address the drawbacks of these approaches, this Thesis proposes the use of unsupervised clustering approach for the OM task which is able to produce accurate results for several domains without manually labeled data for the training step or tools which are language dependent. Three swarm algorithms based on Particle Swarm Optimization (PSO) and Cuckoo Search (CS) are proposed: the DPSOMUT which is based on a discrete PSO binary version, the IDPSOMUT that is based on an Improved Self-Adaptive PSO algorithm with detection function, and the IDPSOMUT/CS that is a hybrid version of IDPSOMUT and CS. Several experiments were conducted with different corpora types, domains, text language, class balancing, fitness function, and pre-processing techniques. The effectiveness of the clustering algorithms was evaluated with external measures such as accuracy, precision, recall, and F-score. From the statistical analysis, it was possible to observe that the swarm-based algorithms, especially the PSO ones, were able to find better solutions than conventional grouping techniques, such as K-means and Agglomerative. The PSO-based algorithms achieved better accuracy using a word bigram pre-processing and the Global Silhouette as fitness function. The OBCC corpus is also another contribution of this Thesis and contains a gold collection with 2,940 tweets in Brazilian Portuguese with opinions of consumers about products and services.
publishDate 2017
dc.date.none.fl_str_mv 2017-02-22
2018-07-26T21:58:03Z
2018-07-26T21:58:03Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/25227
url https://repositorio.ufpe.br/handle/123456789/25227
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1856042007370137600