Similaridade de grafos via hashing

Carlos Henrique de Carvalho Teixeira

Similaridade de grafos via hashing

Detalhes bibliográficos
Ano de defesa:	2011
Autor(a) principal:	Carlos Henrique de Carvalho Teixeira
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Teoria dos grafos Computação Mineração de dados (Computação)
Link de acesso:	https://hdl.handle.net/1843/SLSS-8HTML5
Resumo:	A graph is a universal data structure, useful to represent several objects and concepts.In the recent decades, the interest in graphs has been driven by a large amount of dataavailable. Examples include XML repositories, social networks, biological networks,and chemical graphs. Therefore, it is necessary to manage, query and analyze suchlarge graph data efficiently.The central problem of this thesis is the computation of the similarity betweengraphs in an efficient and effective manner. The proposed approach may be dividedinto two parts: (1) a transformation function, and (2) a signature function. A transformationfunction decomposes the input graph into approximate paths, which aresubstructures presented by this work. Approximate paths differ from simple paths byallowing gaps between nodes. Such flexible substructures are able to describe directand indirect relationships in graphs. The similarity between two graphs is computedthrough a kernel function based on the number of substructures shared by them. Sincethe number of substructures that represent a graph may be large, a signature functionapplies a hashing technique in order to provide a short descriptor for a set of substructures.The signatures are short enough to fit into the main memory and may estimatethe similarity between the sets efficiently, with theoretically guaranteed effectiveness.We have evaluated the proposed method using several real and synthetic datasets,from different application scenarios, such as information retrieval and classification.The results show that approximate paths may be used efficiently and achieve gainsw.r.t. the techniques from the literature.

Metadados do item

id	UFMG_f1d251a0c4308c5111763377d2e1473d
oai_identifier_str	oai:repositorio.ufmg.br:1843/SLSS-8HTML5
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	2019-08-14T08:56:19Z2025-09-08T23:51:30Z2019-08-14T08:56:19Z2011-05-13https://hdl.handle.net/1843/SLSS-8HTML5A graph is a universal data structure, useful to represent several objects and concepts.In the recent decades, the interest in graphs has been driven by a large amount of dataavailable. Examples include XML repositories, social networks, biological networks,and chemical graphs. Therefore, it is necessary to manage, query and analyze suchlarge graph data efficiently.The central problem of this thesis is the computation of the similarity betweengraphs in an efficient and effective manner. The proposed approach may be dividedinto two parts: (1) a transformation function, and (2) a signature function. A transformationfunction decomposes the input graph into approximate paths, which aresubstructures presented by this work. Approximate paths differ from simple paths byallowing gaps between nodes. Such flexible substructures are able to describe directand indirect relationships in graphs. The similarity between two graphs is computedthrough a kernel function based on the number of substructures shared by them. Sincethe number of substructures that represent a graph may be large, a signature functionapplies a hashing technique in order to provide a short descriptor for a set of substructures.The signatures are short enough to fit into the main memory and may estimatethe similarity between the sets efficiently, with theoretically guaranteed effectiveness.We have evaluated the proposed method using several real and synthetic datasets,from different application scenarios, such as information retrieval and classification.The results show that approximate paths may be used efficiently and achieve gainsw.r.t. the techniques from the literature.Universidade Federal de Minas GeraisCiência da ComputaçãoTeoria dos grafosComputaçãoMineração de dados (Computação)Similaridade de grafos via hashinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisCarlos Henrique de Carvalho Teixeirainfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGWagner Meira JuniorAdriano Alonso VelosoSebastián Alberto UrrutiaAlexandre Plastino de CarvalhoGrafos são estruturas de dados universais capazes de representar objetos e conceitos. Nas últimas décadas, o interesse por essa estrutura tem sido impulsionado pela grande quantidade de dados modelados naturalmente como grafos. O objetivo deste trabalho é comparar dois grafos quaisquer de forma eficiente e eficaz, facilitando as análises de grandes bases de dados. Primeiro, os grafos são decompostos em subestruturas chamadas de caminhos aproximados. A similaridade entre dois grafos é, então, calculada em função do número de subestruturas compartilhadas entre eles. Visto que o conjunto de subestruturas gerado para representar um grafo pode ser grande, nós utilizamos técnicas de hashing para reduzí-lo a um conteúdo fixo e pequeno de informacão. Além de tornar possível a análise em memória principal, as assinaturas estimam a similaridade entre os conjuntos de forma eficiente, com qualidade assegurada. Os experimentos realizados em cenários reais mostram a efetividade do método proposto.UFMGORIGINALcarloshenriquedecarvalhoteixeira.pdfapplication/pdf4813412https://repositorio.ufmg.br//bitstreams/2d65e827-feae-44ea-91d4-e9b2859087ec/download115b3cff8c6661f40816c5c7c3d1a9abMD51trueAnonymousREADTEXTcarloshenriquedecarvalhoteixeira.pdf.txttext/plain181494https://repositorio.ufmg.br//bitstreams/49c29525-39ba-488f-a87d-f162db9cdc80/download1a8483e0cf887d60cf6ec7cec3a6e66eMD52falseAnonymousREAD1843/SLSS-8HTML52025-09-08 20:51:30.812open.accessoai:repositorio.ufmg.br:1843/SLSS-8HTML5https://repositorio.ufmg.br/Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T23:51:30Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	Similaridade de grafos via hashing
title	Similaridade de grafos via hashing
spellingShingle	Similaridade de grafos via hashing Carlos Henrique de Carvalho Teixeira Teoria dos grafos Computação Mineração de dados (Computação) Ciência da Computação
title_short	Similaridade de grafos via hashing
title_full	Similaridade de grafos via hashing
title_fullStr	Similaridade de grafos via hashing
title_full_unstemmed	Similaridade de grafos via hashing
title_sort	Similaridade de grafos via hashing
author	Carlos Henrique de Carvalho Teixeira
author_facet	Carlos Henrique de Carvalho Teixeira
author_role	author
dc.contributor.author.fl_str_mv	Carlos Henrique de Carvalho Teixeira
dc.subject.por.fl_str_mv	Teoria dos grafos Computação Mineração de dados (Computação)
topic	Teoria dos grafos Computação Mineração de dados (Computação) Ciência da Computação
dc.subject.other.none.fl_str_mv	Ciência da Computação
description	A graph is a universal data structure, useful to represent several objects and concepts.In the recent decades, the interest in graphs has been driven by a large amount of dataavailable. Examples include XML repositories, social networks, biological networks,and chemical graphs. Therefore, it is necessary to manage, query and analyze suchlarge graph data efficiently.The central problem of this thesis is the computation of the similarity betweengraphs in an efficient and effective manner. The proposed approach may be dividedinto two parts: (1) a transformation function, and (2) a signature function. A transformationfunction decomposes the input graph into approximate paths, which aresubstructures presented by this work. Approximate paths differ from simple paths byallowing gaps between nodes. Such flexible substructures are able to describe directand indirect relationships in graphs. The similarity between two graphs is computedthrough a kernel function based on the number of substructures shared by them. Sincethe number of substructures that represent a graph may be large, a signature functionapplies a hashing technique in order to provide a short descriptor for a set of substructures.The signatures are short enough to fit into the main memory and may estimatethe similarity between the sets efficiently, with theoretically guaranteed effectiveness.We have evaluated the proposed method using several real and synthetic datasets,from different application scenarios, such as information retrieval and classification.The results show that approximate paths may be used efficiently and achieve gainsw.r.t. the techniques from the literature.
publishDate	2011
dc.date.issued.fl_str_mv	2011-05-13
dc.date.accessioned.fl_str_mv	2019-08-14T08:56:19Z 2025-09-08T23:51:30Z
dc.date.available.fl_str_mv	2019-08-14T08:56:19Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1843/SLSS-8HTML5
url	https://hdl.handle.net/1843/SLSS-8HTML5
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
bitstream.url.fl_str_mv	https://repositorio.ufmg.br//bitstreams/2d65e827-feae-44ea-91d4-e9b2859087ec/download https://repositorio.ufmg.br//bitstreams/49c29525-39ba-488f-a87d-f162db9cdc80/download
bitstream.checksum.fl_str_mv	115b3cff8c6661f40816c5c7c3d1a9ab 1a8483e0cf887d60cf6ec7cec3a6e66e
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1862106067237863424

Similaridade de grafos via hashing

Registros relacionados