Similarity algorithms for Heterogeneous Information Networks

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Ribeiro, Angélica Abadia Paulista
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.teses.usp.br/teses/disponiveis/59/59143/tde-27022019-092802/
Resumo: Most real systems can be represented as a graph of multi-typed components with a large number of interactions. Heterogeneous Information Networks (HIN) are interconnected structures with data of multiple types which support the rich semantic meaning of structural types of nodes and edges. In HIN, different information can be presented using different types and forms of data, but may have the same or complementary information. So there is knowledge to be discovered. Terminology Knowledge Structures (TKS) como terminology products can be sources of linguistic representations and knowledge to be used for enrich the HIN and create a measure of similarity to extract the documents similar to each other, even if these documents are of different types (for example, finding medical articles that are in some way related to medical records). In this sense, this work presents the creation of a Heterogeneous Information Network using classical similarity measures, terminology products and the attributes of documents by an algorithm called NetworkCreator. As a contribution, an algorithm called NetworkCreator was created that from medical records and scientific articles builds an HIN with related documents, was also created. The algorithm HeteSimTKSQuery to calculate similarity measures between documents of different types which are in HIN. Terminology products with meta-paths were also explored. The results were efficient, reaching on average 89\\% accuracy in some cases. However, it is important to note that all HIN presented in the researched literature were constructed only by one type of data coming from a single source. The results show that the algorithms are feasible to solve the problems of HIN construction and search for similarity. But it still needs improvement. In the future one can work on detection in the detection of node granularity of these networks and try to reduce the network construction runtime
id USP_6e63496ac2d28b97614804171c35cf25
oai_identifier_str oai:teses.usp.br:tde-27022019-092802
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Similarity algorithms for Heterogeneous Information NetworksAlgoritmos de similaridade para Redes de Informações HeterogêneasHeterogeneous Information NetworkMedidas de SimilaridadeMeta-caminhoMeta-pathProdutos terminológicosRedes de Informação HeterogêneasSimilarity measuresTerminology productsMost real systems can be represented as a graph of multi-typed components with a large number of interactions. Heterogeneous Information Networks (HIN) are interconnected structures with data of multiple types which support the rich semantic meaning of structural types of nodes and edges. In HIN, different information can be presented using different types and forms of data, but may have the same or complementary information. So there is knowledge to be discovered. Terminology Knowledge Structures (TKS) como terminology products can be sources of linguistic representations and knowledge to be used for enrich the HIN and create a measure of similarity to extract the documents similar to each other, even if these documents are of different types (for example, finding medical articles that are in some way related to medical records). In this sense, this work presents the creation of a Heterogeneous Information Network using classical similarity measures, terminology products and the attributes of documents by an algorithm called NetworkCreator. As a contribution, an algorithm called NetworkCreator was created that from medical records and scientific articles builds an HIN with related documents, was also created. The algorithm HeteSimTKSQuery to calculate similarity measures between documents of different types which are in HIN. Terminology products with meta-paths were also explored. The results were efficient, reaching on average 89\\% accuracy in some cases. However, it is important to note that all HIN presented in the researched literature were constructed only by one type of data coming from a single source. The results show that the algorithms are feasible to solve the problems of HIN construction and search for similarity. But it still needs improvement. In the future one can work on detection in the detection of node granularity of these networks and try to reduce the network construction runtimeA maioria dos sistemas reais pode ser representada como um grafo de componentes multi-tipados com um grande número de interações. Redes de Informação Heterogênea (HIN) são estruturas interconectadas com dados de múltiplos tipos que suportam o rico significado semântico de tipos estruturais de nós e arestas. Nas HIN, diferentes informações podem ser apresentadas usando diferentes tipos e formas de dados, mas podem ter informações iguais ou complementares. Então, há conhecimento a ser descoberto. Estruturas de Conhecimento Terminológicos (TKS) como produtos terminológicos podem ser fontes de representações linguísticas e de conhecimento a ser usado para enriquecer a HIN e criar uma medida de similaridade para extrair os documentos similares entre si, mesmo que esses documentos sejam de tipos diferentes (por exemplo, encontrar os artigos médicos que de alguma forma estão relacionados com registros médicos). Nesse sentido, este trabalho apresenta o algoritmo NetworkCreator que cria uma Rede de Informações Heterogêneas utilizando medidas de similaridade clássicas, produtos de terminológicos e os atributos dos documentos. Nos experimentos, foram utilizados prontuários médicos e artigos científicos para construir a HIN e relacionar seus conteúdos. O algoritmo HeteSimTKSQuery também foi criado para calcular medidas de similaridade entre os documentos de diferentes tipos que se encontram na HIN. Produtos terminológicos com meta-caminhos também foram explorados. Os resultados se mostraram eficientes, alcançando em média 89\\% de acurácia, em alguns casos. No entanto, é importante notar que todas as HIN apresentadas na literatura pesquisada foram construídas apenas por um tipo de dados proveniente de uma única fonte. Os resultados mostram que os algoritmos são viáveis para resolver os problemas de construção de HIN e busca de similaridade. Porém, eles ainda precisam de aperfeiçoamentos. Futuramente, pode-se trabalhar na detecção da granularidade dos nós destas redes e tentar reduzir o tempo de construção da redeBiblioteca Digitais de Teses e Dissertações da USPMacedo, Alessandra AlanizRibeiro, Angélica Abadia Paulista2019-01-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/59/59143/tde-27022019-092802/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2019-06-07T17:53:58Zoai:teses.usp.br:tde-27022019-092802Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212019-06-07T17:53:58Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Similarity algorithms for Heterogeneous Information Networks
Algoritmos de similaridade para Redes de Informações Heterogêneas
title Similarity algorithms for Heterogeneous Information Networks
spellingShingle Similarity algorithms for Heterogeneous Information Networks
Ribeiro, Angélica Abadia Paulista
Heterogeneous Information Network
Medidas de Similaridade
Meta-caminho
Meta-path
Produtos terminológicos
Redes de Informação Heterogêneas
Similarity measures
Terminology products
title_short Similarity algorithms for Heterogeneous Information Networks
title_full Similarity algorithms for Heterogeneous Information Networks
title_fullStr Similarity algorithms for Heterogeneous Information Networks
title_full_unstemmed Similarity algorithms for Heterogeneous Information Networks
title_sort Similarity algorithms for Heterogeneous Information Networks
author Ribeiro, Angélica Abadia Paulista
author_facet Ribeiro, Angélica Abadia Paulista
author_role author
dc.contributor.none.fl_str_mv Macedo, Alessandra Alaniz
dc.contributor.author.fl_str_mv Ribeiro, Angélica Abadia Paulista
dc.subject.por.fl_str_mv Heterogeneous Information Network
Medidas de Similaridade
Meta-caminho
Meta-path
Produtos terminológicos
Redes de Informação Heterogêneas
Similarity measures
Terminology products
topic Heterogeneous Information Network
Medidas de Similaridade
Meta-caminho
Meta-path
Produtos terminológicos
Redes de Informação Heterogêneas
Similarity measures
Terminology products
description Most real systems can be represented as a graph of multi-typed components with a large number of interactions. Heterogeneous Information Networks (HIN) are interconnected structures with data of multiple types which support the rich semantic meaning of structural types of nodes and edges. In HIN, different information can be presented using different types and forms of data, but may have the same or complementary information. So there is knowledge to be discovered. Terminology Knowledge Structures (TKS) como terminology products can be sources of linguistic representations and knowledge to be used for enrich the HIN and create a measure of similarity to extract the documents similar to each other, even if these documents are of different types (for example, finding medical articles that are in some way related to medical records). In this sense, this work presents the creation of a Heterogeneous Information Network using classical similarity measures, terminology products and the attributes of documents by an algorithm called NetworkCreator. As a contribution, an algorithm called NetworkCreator was created that from medical records and scientific articles builds an HIN with related documents, was also created. The algorithm HeteSimTKSQuery to calculate similarity measures between documents of different types which are in HIN. Terminology products with meta-paths were also explored. The results were efficient, reaching on average 89\\% accuracy in some cases. However, it is important to note that all HIN presented in the researched literature were constructed only by one type of data coming from a single source. The results show that the algorithms are feasible to solve the problems of HIN construction and search for similarity. But it still needs improvement. In the future one can work on detection in the detection of node granularity of these networks and try to reduce the network construction runtime
publishDate 2019
dc.date.none.fl_str_mv 2019-01-28
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/59/59143/tde-27022019-092802/
url http://www.teses.usp.br/teses/disponiveis/59/59143/tde-27022019-092802/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1865491413389541376