Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Cardoso, Emerson Freitas
Orientador(a): Almeida, Tiago Agostinho de lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus Sorocaba
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC-So
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/9141
Resumo: Before buying a product or choosing for a trip destination, people often seek other people’s opinions to obtain a vision of the quality of what they want to acquire. Given that, opinions always had great influence on the purchase decision. Following the enhancements of the Internet and a huge increase in the volume of data traffic, social networks were created to help users post and view all kinds of information, and this caused people to also search for opinions on the Web. Sites like TripAdvisor and Yelp make it easier to share online reviews, since they help users to post their opinions from anywhere via smartphones and enable product manufacturers to gain relevant feedback quickly in a centralized way. As a result, most people nowadays trust personal recommendations as much as online reviews. However, competition between service providers and product manufacturers have also increased in social media, leading to the first cases of spam reviews: deceptive opinions published by hired people that try to promote or defame products or businesses. These reviews are carefully written in order to look like authentic ones, making it difficult to be detected by humans or automatic methods. Thus, they are used, in a misleading way, in attempt to control the general opinion, causing financial harm to business owners and users. Several approaches have been proposed for spam review detection and most of them use techniques involving machine learning and natural language processing. However, despite all progress made, there are still relevant questions that remain open, which require a criterious analysis in order to be properly answered. For instance, there is no consensus whether the performance of traditional classification methods can be affected by incremental learning or changes in reviews’ features over time; also, there is no consensus whether there is statistical difference between performances of content-based classification methods. In this scenario, this work offers a comprehensive comparison between traditional machine learning methods applied in spam review detection. This comparison is made in multiple setups, employing different types of learning and data sets. The experiments performed along with statistical analysis of the results corroborate offering appropriate answers to the existing questions. In addition, all results obtained can be used as baseline for future comparisons.
id SCAR_3ce6474f395e59787930d69e41774d5f
oai_identifier_str oai:repositorio.ufscar.br:20.500.14289/9141
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str
spelling Cardoso, Emerson FreitasAlmeida, Tiago Agostinho dehttp://lattes.cnpq.br/5368680512020633http://lattes.cnpq.br/77145572429206892c5182ea-9c26-42ba-bb43-8b986becaafd2017-10-09T17:32:49Z2017-10-09T17:32:49Z2017-08-04CARDOSO, Emerson Freitas. Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, Sorocaba, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/9141.https://repositorio.ufscar.br/handle/20.500.14289/9141Before buying a product or choosing for a trip destination, people often seek other people’s opinions to obtain a vision of the quality of what they want to acquire. Given that, opinions always had great influence on the purchase decision. Following the enhancements of the Internet and a huge increase in the volume of data traffic, social networks were created to help users post and view all kinds of information, and this caused people to also search for opinions on the Web. Sites like TripAdvisor and Yelp make it easier to share online reviews, since they help users to post their opinions from anywhere via smartphones and enable product manufacturers to gain relevant feedback quickly in a centralized way. As a result, most people nowadays trust personal recommendations as much as online reviews. However, competition between service providers and product manufacturers have also increased in social media, leading to the first cases of spam reviews: deceptive opinions published by hired people that try to promote or defame products or businesses. These reviews are carefully written in order to look like authentic ones, making it difficult to be detected by humans or automatic methods. Thus, they are used, in a misleading way, in attempt to control the general opinion, causing financial harm to business owners and users. Several approaches have been proposed for spam review detection and most of them use techniques involving machine learning and natural language processing. However, despite all progress made, there are still relevant questions that remain open, which require a criterious analysis in order to be properly answered. For instance, there is no consensus whether the performance of traditional classification methods can be affected by incremental learning or changes in reviews’ features over time; also, there is no consensus whether there is statistical difference between performances of content-based classification methods. In this scenario, this work offers a comprehensive comparison between traditional machine learning methods applied in spam review detection. This comparison is made in multiple setups, employing different types of learning and data sets. The experiments performed along with statistical analysis of the results corroborate offering appropriate answers to the existing questions. In addition, all results obtained can be used as baseline for future comparisons.Antes de comprar um produto ou escolher um destino de viagem, muitas pessoas costumam buscar por opiniões alheias para obter uma visão da qualidade daquilo que se deseja adquirir. Assim, as opiniões sempre exerceram grande influência na decisão de compra. Com o avanço da Internet e aumento no volume de informações trafegadas, surgiram redes sociais que possibilitam compartilhar e visualizar informações de todo o tipo, fazendo com que pessoas passassem a buscar também por opiniões na Web. Atualmente, sites especializados, como TripAdvisor e Yelp, oferecem um sistema de compartilhamento de opiniões online (reviews) de maneira fácil, pois possibilitam que usuários publiquem suas opiniões de qualquer lugar através de smartphones, assim como também permitem que fabricantes de produtos e prestadores de serviços obtenham feedbacks relevantes de maneira centralizada e rápida. Em virtude disso, estudos indicam que atualmente a maioria dos usuários confia tanto em recomendações pessoais quanto em reviews online. No entanto, a competição entre prestadores de serviços e fabricantes de produtos também aumentou nas redes sociais, o que levou aos primeiros casos de spam reviews: opiniões enganosas publicadas por pessoas contratadas que tentam promover ou difamar produtos ou serviços. Esses reviews são escritos cuidadosamente para parecerem autênticos, o que dificulta sua detecção por humanos ou por métodos automáticos. Assim, eles são usados para tentar, de maneira enganosa, controlar a opinião geral, podendo causar prejuízos para empresas e usuários. Diversas abordagens para a detecção de spam reviews vêm sendo propostas, sendo que a grande maioria emprega técnicas de aprendizado de máquina e processamento de linguagem natural. No entanto, apesar dos avanços já realizados, ainda há questionamentos relevantes que permanecem em aberto e demandam uma análise criteriosa para serem respondidos. Por exemplo, não há um consenso se o desempenho de métodos tradicionais de classificação pode ser afetado em cenários que demandam aprendizado incremental ou por mudanças nas características dos reviews devido ao fator cronológico, assim como também não há um consenso se existe diferença estatística entre os desempenhos dos métodos baseados no conteúdo das mensagens. Neste cenário, esta dissertação oferece uma análise e comparação compreensiva dos métodos tradicionais de aprendizado de máquina, aplicados na detecção de spam reviews. A comparação é realizada em múltiplos cenários, empregando-se diferentes tipos de aprendizado e bases de dados. Os experimentos realizados, juntamente com análise estatística dos resultados, corroboram a oferecer respostas adequadas para os questionamentos existentes. Além disso, os resultados obtidos podem ser usados como baseline para comparações futuras.Não recebi financiamentoporUniversidade Federal de São CarlosCâmpus SorocabaPrograma de Pós-Graduação em Ciência da Computação - PPGCC-SoUFSCarSpam (Mensagens eletrônicas)Processamento de linguagem natural (Computação)Opiniões falsasClassificaçãoProcessamento de linguagem naturalAprendizado de máquinaNatural language processing (Computer science)Spam (Electronic mail)Spam reviewsClassificationNatural language processingMachine learningCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOFiltragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdoAutomatic filtering of false opinions: comprehensive comparison of content-based methodsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006005de967ad-743c-4f36-972b-79dd683c0e9dinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALCARDOSO_Emerson_2017.pdfCARDOSO_Emerson_2017.pdfapplication/pdf3299853https://repositorio.ufscar.br/bitstreams/290cdd18-5bde-43aa-ab8f-451e57ef9e5e/downloadbda5605a1fb8e64f503215e839d2a9a6MD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstreams/dd858d26-222e-494b-804f-959b7ac57ac0/downloadae0398b6f8b235e40ad82cba6c50031dMD52falseAnonymousREADTEXTCARDOSO_Emerson_2017.pdf.txtCARDOSO_Emerson_2017.pdf.txtExtracted texttext/plain126579https://repositorio.ufscar.br/bitstreams/1e14a45d-e14e-418e-966b-80eb89f6d7c1/downloadd85dfbf0f8730ea5ece51c90a4dd8ff7MD55falseAnonymousREADTHUMBNAILCARDOSO_Emerson_2017.pdf.jpgCARDOSO_Emerson_2017.pdf.jpgIM Thumbnailimage/jpeg3063https://repositorio.ufscar.br/bitstreams/95d3a73c-f8ce-46e8-92b5-83c73b113a7d/download661f4f4dcbee4d77d3ac4089f4695590MD56falseAnonymousREAD20.500.14289/91412025-02-05 17:41:36.775Acesso abertoopen.accessoai:repositorio.ufscar.br:20.500.14289/9141https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T20:41:36Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==
dc.title.por.fl_str_mv Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
dc.title.alternative.eng.fl_str_mv Automatic filtering of false opinions: comprehensive comparison of content-based methods
title Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
spellingShingle Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
Cardoso, Emerson Freitas
Spam (Mensagens eletrônicas)
Processamento de linguagem natural (Computação)
Opiniões falsas
Classificação
Processamento de linguagem natural
Aprendizado de máquina
Natural language processing (Computer science)
Spam (Electronic mail)
Spam reviews
Classification
Natural language processing
Machine learning
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
title_full Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
title_fullStr Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
title_full_unstemmed Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
title_sort Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo
author Cardoso, Emerson Freitas
author_facet Cardoso, Emerson Freitas
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/7714557242920689
dc.contributor.author.fl_str_mv Cardoso, Emerson Freitas
dc.contributor.advisor1.fl_str_mv Almeida, Tiago Agostinho de
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/5368680512020633
dc.contributor.authorID.fl_str_mv 2c5182ea-9c26-42ba-bb43-8b986becaafd
contributor_str_mv Almeida, Tiago Agostinho de
dc.subject.por.fl_str_mv Spam (Mensagens eletrônicas)
Processamento de linguagem natural (Computação)
Opiniões falsas
Classificação
Processamento de linguagem natural
Aprendizado de máquina
topic Spam (Mensagens eletrônicas)
Processamento de linguagem natural (Computação)
Opiniões falsas
Classificação
Processamento de linguagem natural
Aprendizado de máquina
Natural language processing (Computer science)
Spam (Electronic mail)
Spam reviews
Classification
Natural language processing
Machine learning
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Natural language processing (Computer science)
Spam (Electronic mail)
Spam reviews
Classification
Natural language processing
Machine learning
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Before buying a product or choosing for a trip destination, people often seek other people’s opinions to obtain a vision of the quality of what they want to acquire. Given that, opinions always had great influence on the purchase decision. Following the enhancements of the Internet and a huge increase in the volume of data traffic, social networks were created to help users post and view all kinds of information, and this caused people to also search for opinions on the Web. Sites like TripAdvisor and Yelp make it easier to share online reviews, since they help users to post their opinions from anywhere via smartphones and enable product manufacturers to gain relevant feedback quickly in a centralized way. As a result, most people nowadays trust personal recommendations as much as online reviews. However, competition between service providers and product manufacturers have also increased in social media, leading to the first cases of spam reviews: deceptive opinions published by hired people that try to promote or defame products or businesses. These reviews are carefully written in order to look like authentic ones, making it difficult to be detected by humans or automatic methods. Thus, they are used, in a misleading way, in attempt to control the general opinion, causing financial harm to business owners and users. Several approaches have been proposed for spam review detection and most of them use techniques involving machine learning and natural language processing. However, despite all progress made, there are still relevant questions that remain open, which require a criterious analysis in order to be properly answered. For instance, there is no consensus whether the performance of traditional classification methods can be affected by incremental learning or changes in reviews’ features over time; also, there is no consensus whether there is statistical difference between performances of content-based classification methods. In this scenario, this work offers a comprehensive comparison between traditional machine learning methods applied in spam review detection. This comparison is made in multiple setups, employing different types of learning and data sets. The experiments performed along with statistical analysis of the results corroborate offering appropriate answers to the existing questions. In addition, all results obtained can be used as baseline for future comparisons.
publishDate 2017
dc.date.accessioned.fl_str_mv 2017-10-09T17:32:49Z
dc.date.available.fl_str_mv 2017-10-09T17:32:49Z
dc.date.issued.fl_str_mv 2017-08-04
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv CARDOSO, Emerson Freitas. Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, Sorocaba, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/9141.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/20.500.14289/9141
identifier_str_mv CARDOSO, Emerson Freitas. Filtragem automática de opiniões falsas: comparação compreensiva dos métodos baseados em conteúdo. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, Sorocaba, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/9141.
url https://repositorio.ufscar.br/handle/20.500.14289/9141
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv 5de967ad-743c-4f36-972b-79dd683c0e9d
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus Sorocaba
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação - PPGCC-So
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus Sorocaba
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstreams/290cdd18-5bde-43aa-ab8f-451e57ef9e5e/download
https://repositorio.ufscar.br/bitstreams/dd858d26-222e-494b-804f-959b7ac57ac0/download
https://repositorio.ufscar.br/bitstreams/1e14a45d-e14e-418e-966b-80eb89f6d7c1/download
https://repositorio.ufscar.br/bitstreams/95d3a73c-f8ce-46e8-92b5-83c73b113a7d/download
bitstream.checksum.fl_str_mv bda5605a1fb8e64f503215e839d2a9a6
ae0398b6f8b235e40ad82cba6c50031d
d85dfbf0f8730ea5ece51c90a4dd8ff7
661f4f4dcbee4d77d3ac4089f4695590
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv repositorio.sibi@ufscar.br
_version_ 1851688933531844608