SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos

Arruda, Claudineia Gonçalves de

SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos

Detalhes bibliográficos
Ano de defesa:	2017
Autor(a) principal:	Arruda, Claudineia Gonçalves de
Orientador(a):	Santos, Marilde Terezinha Prado
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos Câmpus São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Método Similaridade semântica Documentos Ontologia Abordagem Análise de documentos Análise qualitativa
Palavras-chave em Inglês:	Method Semantic similarity Documents Ontology Approach Document analysis Qualitative analysis
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/ufscar/8961
Resumo:	In several research areas, interviews are a means of obtaining data widely used by researchers. These interviews are arranged, in most cases, in several documents and have an informal language, because they are conversations between several people at the same time. Analyzing such documents is an arduous and time-consuming task, bringing fatigue and difficulties to a correct analysis. One solution for analyzing this type of interview is to group documents according to the similarity between them, so that experts can analyze documents of similar subjects more quickly. In this way, this work presents the method SOM4SImD, created to detect the semantic similarity between the documents composed by interviews with an informal language written in Brazilian Portuguese. In order to create this method, an ontology of the same document domain was used, which allowed the use of the formal terms of the ontology, along with its synonyms and variants, to perform the semantic annotation in the documents and to calculate the similarity between the interview pairs. Through the created method, a SimIGroup approach was developed that assists the researchers in the qualitative analysis of the documents, using Coding technique. The results show that the SOM4SImD method and the SimIGroup approach reduce the difficulties and fatigue in the analysis of the documents made by the annotators, helping to increase the number of documents analyzed. In addition, the SOM4SImD method was more advantageous in obtaining similarity between documents than the others found in the literature, reaching significant values for the performance measures, with 0.96 accuracy, 0.93 of recall and 0.94 of F-Mensure.

Metadados do item

id	SCAR_a2acd1e6799caddf4854d3a20edb8b59
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/8961
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str
spelling	Arruda, Claudineia Gonçalves deSantos, Marilde Terezinha Pradohttp://lattes.cnpq.br/9826026025118073http://lattes.cnpq.br/467926051375247235ff767b-f4aa-4bc2-b2e9-a02c50dd1ae12017-08-09T14:17:28Z2017-08-09T14:17:28Z2017-02-13ARRUDA, Claudineia Gonçalves de. SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8961.https://repositorio.ufscar.br/handle/ufscar/8961In several research areas, interviews are a means of obtaining data widely used by researchers. These interviews are arranged, in most cases, in several documents and have an informal language, because they are conversations between several people at the same time. Analyzing such documents is an arduous and time-consuming task, bringing fatigue and difficulties to a correct analysis. One solution for analyzing this type of interview is to group documents according to the similarity between them, so that experts can analyze documents of similar subjects more quickly. In this way, this work presents the method SOM4SImD, created to detect the semantic similarity between the documents composed by interviews with an informal language written in Brazilian Portuguese. In order to create this method, an ontology of the same document domain was used, which allowed the use of the formal terms of the ontology, along with its synonyms and variants, to perform the semantic annotation in the documents and to calculate the similarity between the interview pairs. Through the created method, a SimIGroup approach was developed that assists the researchers in the qualitative analysis of the documents, using Coding technique. The results show that the SOM4SImD method and the SimIGroup approach reduce the difficulties and fatigue in the analysis of the documents made by the annotators, helping to increase the number of documents analyzed. In addition, the SOM4SImD method was more advantageous in obtaining similarity between documents than the others found in the literature, reaching significant values for the performance measures, with 0.96 accuracy, 0.93 of recall and 0.94 of F-Mensure.Em diversas áreas de pesquisas, as entrevistas são um meio de obtenção de dados muito utilizadas por pesquisadores. Essas entrevistas são dispostas, na maioria das vezes, em diversos documentos e têm uma linguagem informal, por se tratar de conversas entre várias pessoas ao mesmo tempo. Analisar tais documentos é uma tarefa árdua e demorada, trazendo cansaço e dificuldades para uma análise correta. Uma solução para análise desse tipo de entrevistas é agrupar os documentos de acordo com a similaridade que existem entre eles, pois assim os especialistas conseguem analisar os documentos de assuntos parecidos de forma mais rápida. Desta forma, este trabalho apresenta o método SOM4SImD, criado para detectar a similaridade semântica entre os documentos compostos por entrevistas com uma linguagem informal escritas no português brasileiro. Para criar este método, foi utilizado uma ontologia de mesmo domínio dos documentos, que permitiu o uso dos termos formais da ontologia, juntamente com seus sinônimos e variantes para realizar a anotação semântica nos documentos e para realizar o cálculo da similaridade entre os pares de entrevistas. Através do método criado, foi desenvolvida uma abordagem SimIGroup que auxilia os pesquisadores na análise qualitativa dos documentos, utilizando a técnica Coding. Os resultados mostram que o método SOM4SImD e a abordagem SimIGroup diminuem as dificuldades e cansaço na análise dos documentos realizadas pelos anotadores, auxiliando no aumento da quantidade de documentos analisados. Além disso, o método SOM4SImD se mostrou mais vantajoso na obtenção de similaridade entre documentos do que os demais encontrados na literatura, alcançando valores significantes para as medidas de desempenho, com 0,96 de precisão, 0,93 de revocação e 0,94 de F-Mensure.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarMétodoSimilaridade semânticaDocumentosOntologiaAbordagemAnálise de documentosAnálise qualitativaMethodSemantic similarityDocumentsOntologyApproachDocument analysisQualitative analysisCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOSOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006001bdb200e-99c1-45c7-8e62-ff292489211einfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissCGA.pdfDissCGA.pdfapplication/pdf1377116https://repositorio.ufscar.br/bitstream/ufscar/8961/1/DissCGA.pdfeeaa4d5429ed9fe1aeac6a215d0acc52MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/8961/2/license.txtae0398b6f8b235e40ad82cba6c50031dMD52TEXTDissCGA.pdf.txtDissCGA.pdf.txtExtracted texttext/plain113542https://repositorio.ufscar.br/bitstream/ufscar/8961/3/DissCGA.pdf.txtad90b074f5760637df16ff091bacf4e3MD53THUMBNAILDissCGA.pdf.jpgDissCGA.pdf.jpgIM Thumbnailimage/jpeg8735https://repositorio.ufscar.br/bitstream/ufscar/8961/4/DissCGA.pdf.jpg010884560dbfe45db71acf8d3115818aMD54ufscar/89612023-09-18 18:31:25.565oai:repositorio.ufscar.br:ufscar/8961TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:25Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
title	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
spellingShingle	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos Arruda, Claudineia Gonçalves de Método Similaridade semântica Documentos Ontologia Abordagem Análise de documentos Análise qualitativa Method Semantic similarity Documents Ontology Approach Document analysis Qualitative analysis CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
title_full	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
title_fullStr	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
title_full_unstemmed	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
title_sort	SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos
author	Arruda, Claudineia Gonçalves de
author_facet	Arruda, Claudineia Gonçalves de
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/4679260513752472
dc.contributor.author.fl_str_mv	Arruda, Claudineia Gonçalves de
dc.contributor.advisor1.fl_str_mv	Santos, Marilde Terezinha Prado
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/9826026025118073
dc.contributor.authorID.fl_str_mv	35ff767b-f4aa-4bc2-b2e9-a02c50dd1ae1
contributor_str_mv	Santos, Marilde Terezinha Prado
dc.subject.por.fl_str_mv	Método Similaridade semântica Documentos Ontologia Abordagem Análise de documentos Análise qualitativa
topic	Método Similaridade semântica Documentos Ontologia Abordagem Análise de documentos Análise qualitativa Method Semantic similarity Documents Ontology Approach Document analysis Qualitative analysis CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Method Semantic similarity Documents Ontology Approach Document analysis Qualitative analysis
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	In several research areas, interviews are a means of obtaining data widely used by researchers. These interviews are arranged, in most cases, in several documents and have an informal language, because they are conversations between several people at the same time. Analyzing such documents is an arduous and time-consuming task, bringing fatigue and difficulties to a correct analysis. One solution for analyzing this type of interview is to group documents according to the similarity between them, so that experts can analyze documents of similar subjects more quickly. In this way, this work presents the method SOM4SImD, created to detect the semantic similarity between the documents composed by interviews with an informal language written in Brazilian Portuguese. In order to create this method, an ontology of the same document domain was used, which allowed the use of the formal terms of the ontology, along with its synonyms and variants, to perform the semantic annotation in the documents and to calculate the similarity between the interview pairs. Through the created method, a SimIGroup approach was developed that assists the researchers in the qualitative analysis of the documents, using Coding technique. The results show that the SOM4SImD method and the SimIGroup approach reduce the difficulties and fatigue in the analysis of the documents made by the annotators, helping to increase the number of documents analyzed. In addition, the SOM4SImD method was more advantageous in obtaining similarity between documents than the others found in the literature, reaching significant values for the performance measures, with 0.96 accuracy, 0.93 of recall and 0.94 of F-Mensure.
publishDate	2017
dc.date.accessioned.fl_str_mv	2017-08-09T14:17:28Z
dc.date.available.fl_str_mv	2017-08-09T14:17:28Z
dc.date.issued.fl_str_mv	2017-02-13
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	ARRUDA, Claudineia Gonçalves de. SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8961.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/8961
identifier_str_mv	ARRUDA, Claudineia Gonçalves de. SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8961.
url	https://repositorio.ufscar.br/handle/ufscar/8961
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	600 600
dc.relation.authority.fl_str_mv	1bdb200e-99c1-45c7-8e62-ff292489211e
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/8961/1/DissCGA.pdf https://repositorio.ufscar.br/bitstream/ufscar/8961/2/license.txt https://repositorio.ufscar.br/bitstream/ufscar/8961/3/DissCGA.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/8961/4/DissCGA.pdf.jpg
bitstream.checksum.fl_str_mv	eeaa4d5429ed9fe1aeac6a215d0acc52 ae0398b6f8b235e40ad82cba6c50031d ad90b074f5760637df16ff091bacf4e3 010884560dbfe45db71acf8d3115818a
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1802136532739424256

SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos

Registros relacionados