Similarity algorithms for Heterogeneous Information Networks

Angélica Abadia Paulista Ribeiro

Similarity algorithms for Heterogeneous Information Networks

Detalhes bibliográficos
Ano de defesa:	2019
Autor(a) principal:	Angélica Abadia Paulista Ribeiro
Orientador(a):	Alessandra Alaniz Macedo
Banca de defesa:	Renata Pontin de Mattos Fortes, Alexandre Souto Martinez, Marinalva Dias Soares
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Universidade de São Paulo
Programa de Pós-Graduação:	Computação Aplicada
Departamento:	Não Informado pela instituição
País:	BR
Link de acesso:	https://doi.org/10.11606/D.59.2019.tde-27022019-092802
Resumo:	Most real systems can be represented as a graph of multi-typed components with a large number of interactions. Heterogeneous Information Networks (HIN) are interconnected structures with data of multiple types which support the rich semantic meaning of structural types of nodes and edges. In HIN, different information can be presented using different types and forms of data, but may have the same or complementary information. So there is knowledge to be discovered. Terminology Knowledge Structures (TKS) como terminology products can be sources of linguistic representations and knowledge to be used for enrich the HIN and create a measure of similarity to extract the documents similar to each other, even if these documents are of different types (for example, finding medical articles that are in some way related to medical records). In this sense, this work presents the creation of a Heterogeneous Information Network using classical similarity measures, terminology products and the attributes of documents by an algorithm called NetworkCreator. As a contribution, an algorithm called NetworkCreator was created that from medical records and scientific articles builds an HIN with related documents, was also created. The algorithm HeteSimTKSQuery to calculate similarity measures between documents of different types which are in HIN. Terminology products with meta-paths were also explored. The results were efficient, reaching on average 89\\% accuracy in some cases. However, it is important to note that all HIN presented in the researched literature were constructed only by one type of data coming from a single source. The results show that the algorithms are feasible to solve the problems of HIN construction and search for similarity. But it still needs improvement. In the future one can work on detection in the detection of node granularity of these networks and try to reduce the network construction runtime

Metadados do item

id	USP_6e63496ac2d28b97614804171c35cf25
oai_identifier_str	oai:teses.usp.br:tde-27022019-092802
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis Similarity algorithms for Heterogeneous Information Networks Algoritmos de similaridade para Redes de Informações Heterogêneas 2019-01-28Alessandra Alaniz MacedoRenata Pontin de Mattos FortesAlexandre Souto MartinezMarinalva Dias SoaresAngélica Abadia Paulista RibeiroUniversidade de São PauloComputação AplicadaUSPBR Heterogeneous Information Network Medidas de Similaridade Meta-caminho Meta-path Produtos terminológicos Redes de Informação Heterogêneas Similarity measures Terminology products Most real systems can be represented as a graph of multi-typed components with a large number of interactions. Heterogeneous Information Networks (HIN) are interconnected structures with data of multiple types which support the rich semantic meaning of structural types of nodes and edges. In HIN, different information can be presented using different types and forms of data, but may have the same or complementary information. So there is knowledge to be discovered. Terminology Knowledge Structures (TKS) como terminology products can be sources of linguistic representations and knowledge to be used for enrich the HIN and create a measure of similarity to extract the documents similar to each other, even if these documents are of different types (for example, finding medical articles that are in some way related to medical records). In this sense, this work presents the creation of a Heterogeneous Information Network using classical similarity measures, terminology products and the attributes of documents by an algorithm called NetworkCreator. As a contribution, an algorithm called NetworkCreator was created that from medical records and scientific articles builds an HIN with related documents, was also created. The algorithm HeteSimTKSQuery to calculate similarity measures between documents of different types which are in HIN. Terminology products with meta-paths were also explored. The results were efficient, reaching on average 89\\% accuracy in some cases. However, it is important to note that all HIN presented in the researched literature were constructed only by one type of data coming from a single source. The results show that the algorithms are feasible to solve the problems of HIN construction and search for similarity. But it still needs improvement. In the future one can work on detection in the detection of node granularity of these networks and try to reduce the network construction runtime A maioria dos sistemas reais pode ser representada como um grafo de componentes multi-tipados com um grande número de interações. Redes de Informação Heterogênea (HIN) são estruturas interconectadas com dados de múltiplos tipos que suportam o rico significado semântico de tipos estruturais de nós e arestas. Nas HIN, diferentes informações podem ser apresentadas usando diferentes tipos e formas de dados, mas podem ter informações iguais ou complementares. Então, há conhecimento a ser descoberto. Estruturas de Conhecimento Terminológicos (TKS) como produtos terminológicos podem ser fontes de representações linguísticas e de conhecimento a ser usado para enriquecer a HIN e criar uma medida de similaridade para extrair os documentos similares entre si, mesmo que esses documentos sejam de tipos diferentes (por exemplo, encontrar os artigos médicos que de alguma forma estão relacionados com registros médicos). Nesse sentido, este trabalho apresenta o algoritmo NetworkCreator que cria uma Rede de Informações Heterogêneas utilizando medidas de similaridade clássicas, produtos de terminológicos e os atributos dos documentos. Nos experimentos, foram utilizados prontuários médicos e artigos científicos para construir a HIN e relacionar seus conteúdos. O algoritmo HeteSimTKSQuery também foi criado para calcular medidas de similaridade entre os documentos de diferentes tipos que se encontram na HIN. Produtos terminológicos com meta-caminhos também foram explorados. Os resultados se mostraram eficientes, alcançando em média 89\\% de acurácia, em alguns casos. No entanto, é importante notar que todas as HIN apresentadas na literatura pesquisada foram construídas apenas por um tipo de dados proveniente de uma única fonte. Os resultados mostram que os algoritmos são viáveis para resolver os problemas de construção de HIN e busca de similaridade. Porém, eles ainda precisam de aperfeiçoamentos. Futuramente, pode-se trabalhar na detecção da granularidade dos nós destas redes e tentar reduzir o tempo de construção da rede https://doi.org/10.11606/D.59.2019.tde-27022019-092802info:eu-repo/semantics/openAccessengreponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USP2023-12-21T20:15:47Zoai:teses.usp.br:tde-27022019-092802Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212019-06-07T17:53:58Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.en.fl_str_mv	Similarity algorithms for Heterogeneous Information Networks
dc.title.alternative.pt.fl_str_mv	Algoritmos de similaridade para Redes de Informações Heterogêneas
title	Similarity algorithms for Heterogeneous Information Networks
spellingShingle	Similarity algorithms for Heterogeneous Information Networks Angélica Abadia Paulista Ribeiro
title_short	Similarity algorithms for Heterogeneous Information Networks
title_full	Similarity algorithms for Heterogeneous Information Networks
title_fullStr	Similarity algorithms for Heterogeneous Information Networks
title_full_unstemmed	Similarity algorithms for Heterogeneous Information Networks
title_sort	Similarity algorithms for Heterogeneous Information Networks
author	Angélica Abadia Paulista Ribeiro
author_facet	Angélica Abadia Paulista Ribeiro
author_role	author
dc.contributor.advisor1.fl_str_mv	Alessandra Alaniz Macedo
dc.contributor.referee1.fl_str_mv	Renata Pontin de Mattos Fortes
dc.contributor.referee2.fl_str_mv	Alexandre Souto Martinez
dc.contributor.referee3.fl_str_mv	Marinalva Dias Soares
dc.contributor.author.fl_str_mv	Angélica Abadia Paulista Ribeiro
contributor_str_mv	Alessandra Alaniz Macedo Renata Pontin de Mattos Fortes Alexandre Souto Martinez Marinalva Dias Soares
description	Most real systems can be represented as a graph of multi-typed components with a large number of interactions. Heterogeneous Information Networks (HIN) are interconnected structures with data of multiple types which support the rich semantic meaning of structural types of nodes and edges. In HIN, different information can be presented using different types and forms of data, but may have the same or complementary information. So there is knowledge to be discovered. Terminology Knowledge Structures (TKS) como terminology products can be sources of linguistic representations and knowledge to be used for enrich the HIN and create a measure of similarity to extract the documents similar to each other, even if these documents are of different types (for example, finding medical articles that are in some way related to medical records). In this sense, this work presents the creation of a Heterogeneous Information Network using classical similarity measures, terminology products and the attributes of documents by an algorithm called NetworkCreator. As a contribution, an algorithm called NetworkCreator was created that from medical records and scientific articles builds an HIN with related documents, was also created. The algorithm HeteSimTKSQuery to calculate similarity measures between documents of different types which are in HIN. Terminology products with meta-paths were also explored. The results were efficient, reaching on average 89\\% accuracy in some cases. However, it is important to note that all HIN presented in the researched literature were constructed only by one type of data coming from a single source. The results show that the algorithms are feasible to solve the problems of HIN construction and search for similarity. But it still needs improvement. In the future one can work on detection in the detection of node granularity of these networks and try to reduce the network construction runtime
publishDate	2019
dc.date.issued.fl_str_mv	2019-01-28
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.11606/D.59.2019.tde-27022019-092802
url	https://doi.org/10.11606/D.59.2019.tde-27022019-092802
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade de São Paulo
dc.publisher.program.fl_str_mv	Computação Aplicada
dc.publisher.initials.fl_str_mv	USP
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade de São Paulo
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1786377176033001472

Similarity algorithms for Heterogeneous Information Networks

Registros relacionados