Graph pattern mining: consolidating models, systems, and abstractions
| Ano de defesa: | 2023 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/51806 |
Resumo: | Graph Pattern Mining (GPM) refers to a class of problems involving the processing of subgraphs extracted from larger graphs. Applications to GPM algorithms include querying subgraphs with given properties of interest, identifying motif structures in biological networks, characterizing social media, among others. GPM algorithms are challenging to develop due to inherently subroutines that include non-trivial graph theory concepts and methods such as isomorphism. General-purpose GPM systems emerge as a solution to improve the user experience with such algorithms. However, general-purpose GPM systems fail in providing a consistent model that is simple to understand and qualified to express alternative algorithms for the same problem via different paradigms for subgraph enumeration, limiting the integration with modern data analytics pipelines. Furthermore, because GPM systems are so heterogeneous in terms of supported paradigms and computing architecture, existing experimental evaluations are unable to distinguish whether performance differences are best explained by algorithmic strategies or implementation details. In this work we propose a primitive-based model for GPM, a proof of concept distributed implementation of that model, and an extensive experimentation analysis of popular algorithmic paradigms used in GPM systems. We demonstrate empirically the effectiveness of our model by showing competitive performance against state-of-the-art systems without sacrificing the expressiveness of algorithms or the composability of operators. Our experimental results also show that no single paradigm is best for every application scenario, and we believe that our findings may guide practitioner towards more optimized GPM systems in the future. |
| id |
UFMG_149cea87d3ef258ba5ffba7192a96eaf |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/51806 |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
Graph pattern mining: consolidating models, systems, and abstractionsComputação – TesesMineração de padrões em grafos – TesesSistemas distribuídos – TesesMineração de padrões em grafosSistemas distribuídosGraph Pattern Mining (GPM) refers to a class of problems involving the processing of subgraphs extracted from larger graphs. Applications to GPM algorithms include querying subgraphs with given properties of interest, identifying motif structures in biological networks, characterizing social media, among others. GPM algorithms are challenging to develop due to inherently subroutines that include non-trivial graph theory concepts and methods such as isomorphism. General-purpose GPM systems emerge as a solution to improve the user experience with such algorithms. However, general-purpose GPM systems fail in providing a consistent model that is simple to understand and qualified to express alternative algorithms for the same problem via different paradigms for subgraph enumeration, limiting the integration with modern data analytics pipelines. Furthermore, because GPM systems are so heterogeneous in terms of supported paradigms and computing architecture, existing experimental evaluations are unable to distinguish whether performance differences are best explained by algorithmic strategies or implementation details. In this work we propose a primitive-based model for GPM, a proof of concept distributed implementation of that model, and an extensive experimentation analysis of popular algorithmic paradigms used in GPM systems. We demonstrate empirically the effectiveness of our model by showing competitive performance against state-of-the-art systems without sacrificing the expressiveness of algorithms or the composability of operators. Our experimental results also show that no single paradigm is best for every application scenario, and we believe that our findings may guide practitioner towards more optimized GPM systems in the future.CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoCAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorUniversidade Federal de Minas Gerais2023-04-11T17:20:09Z2025-09-08T22:53:55Z2023-04-11T17:20:09Z2023-03-24info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://hdl.handle.net/1843/51806engVinícius Vitor dos Santos Diasinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-08T22:53:55Zoai:repositorio.ufmg.br:1843/51806Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-08T22:53:55Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
Graph pattern mining: consolidating models, systems, and abstractions |
| title |
Graph pattern mining: consolidating models, systems, and abstractions |
| spellingShingle |
Graph pattern mining: consolidating models, systems, and abstractions Vinícius Vitor dos Santos Dias Computação – Teses Mineração de padrões em grafos – Teses Sistemas distribuídos – Teses Mineração de padrões em grafos Sistemas distribuídos |
| title_short |
Graph pattern mining: consolidating models, systems, and abstractions |
| title_full |
Graph pattern mining: consolidating models, systems, and abstractions |
| title_fullStr |
Graph pattern mining: consolidating models, systems, and abstractions |
| title_full_unstemmed |
Graph pattern mining: consolidating models, systems, and abstractions |
| title_sort |
Graph pattern mining: consolidating models, systems, and abstractions |
| author |
Vinícius Vitor dos Santos Dias |
| author_facet |
Vinícius Vitor dos Santos Dias |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Vinícius Vitor dos Santos Dias |
| dc.subject.por.fl_str_mv |
Computação – Teses Mineração de padrões em grafos – Teses Sistemas distribuídos – Teses Mineração de padrões em grafos Sistemas distribuídos |
| topic |
Computação – Teses Mineração de padrões em grafos – Teses Sistemas distribuídos – Teses Mineração de padrões em grafos Sistemas distribuídos |
| description |
Graph Pattern Mining (GPM) refers to a class of problems involving the processing of subgraphs extracted from larger graphs. Applications to GPM algorithms include querying subgraphs with given properties of interest, identifying motif structures in biological networks, characterizing social media, among others. GPM algorithms are challenging to develop due to inherently subroutines that include non-trivial graph theory concepts and methods such as isomorphism. General-purpose GPM systems emerge as a solution to improve the user experience with such algorithms. However, general-purpose GPM systems fail in providing a consistent model that is simple to understand and qualified to express alternative algorithms for the same problem via different paradigms for subgraph enumeration, limiting the integration with modern data analytics pipelines. Furthermore, because GPM systems are so heterogeneous in terms of supported paradigms and computing architecture, existing experimental evaluations are unable to distinguish whether performance differences are best explained by algorithmic strategies or implementation details. In this work we propose a primitive-based model for GPM, a proof of concept distributed implementation of that model, and an extensive experimentation analysis of popular algorithmic paradigms used in GPM systems. We demonstrate empirically the effectiveness of our model by showing competitive performance against state-of-the-art systems without sacrificing the expressiveness of algorithms or the composability of operators. Our experimental results also show that no single paradigm is best for every application scenario, and we believe that our findings may guide practitioner towards more optimized GPM systems in the future. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-04-11T17:20:09Z 2023-04-11T17:20:09Z 2023-03-24 2025-09-08T22:53:55Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/51806 |
| url |
https://hdl.handle.net/1843/51806 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414114296889344 |