Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing
| Ano de defesa: | 2020 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-08062020-100828/ |
Resumo: | Since the birth of web 2.0, users no longer just consume but are now active creators of content that is going to be consumed by other users. This new dynamic took data generation to a whole new scale, called planetary-scale or web-scale. Often, this data represents relationships between its elements, such as in social networks, recommendation systems, online boards, email networks, scientific citation networks, and others. Analyzing how information flows and how nodes influence each other in several of such networks is a widely regarded problem; While Belief Propagation, which is the fundamental algorithm for these types of inference, is widely used, it historically lacked convergence guarantees for real-world networks. However, even though recently alternative methods such as LinBP solve the convergence problems of the original algorithm, its scalability when dealing with large-scale problems remains a challenge. Also, several of the works proposed to solve this issue, do so by relying on specific infrastructures such as supercomputers and computational clusters. Motivated by these challenges we propose a new algorithm, called VCBP, that aims to provide a scalable framework for belief propagation on largescale problems, such as when graphs do not fit the main memory. We do so by combining stateof- the-art asynchronous vertex-centric parallel processing with state-of-the-art belief propagation algorithm. Our algorithm maintains the same accuracy rate while achieving performance orders of magnitude higher than former LinBPs implementation. Due to the asynchronous nature of our algorithm, VCBP demands fewer iterations before convergence than any previous algorithm. Additionally, we analyze our algorithm in the task of node classification, achieving significant results over real-world datasets. Our findings indicate that there is unexplored potential in todays widely available modern hardware, specifically concerning parallelism, sparking a shift towards a more cost-efficient and ubiquitous data mining scenario. |
| id |
USP_a1f2c84b6223e013af644be5875106c8 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-08062020-100828 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processingAnálise de dados sobre grafos em larga escala por meio de processamento paralelo assíncrono centrado em vérticesBelief propagationBig dataGraph processingLarga escalaParallel processingProcessamento centrado em vertícesProcessamento de grafosProcessamento paraleloPropagação de crençasVertex-centric processingSince the birth of web 2.0, users no longer just consume but are now active creators of content that is going to be consumed by other users. This new dynamic took data generation to a whole new scale, called planetary-scale or web-scale. Often, this data represents relationships between its elements, such as in social networks, recommendation systems, online boards, email networks, scientific citation networks, and others. Analyzing how information flows and how nodes influence each other in several of such networks is a widely regarded problem; While Belief Propagation, which is the fundamental algorithm for these types of inference, is widely used, it historically lacked convergence guarantees for real-world networks. However, even though recently alternative methods such as LinBP solve the convergence problems of the original algorithm, its scalability when dealing with large-scale problems remains a challenge. Also, several of the works proposed to solve this issue, do so by relying on specific infrastructures such as supercomputers and computational clusters. Motivated by these challenges we propose a new algorithm, called VCBP, that aims to provide a scalable framework for belief propagation on largescale problems, such as when graphs do not fit the main memory. We do so by combining stateof- the-art asynchronous vertex-centric parallel processing with state-of-the-art belief propagation algorithm. Our algorithm maintains the same accuracy rate while achieving performance orders of magnitude higher than former LinBPs implementation. Due to the asynchronous nature of our algorithm, VCBP demands fewer iterations before convergence than any previous algorithm. Additionally, we analyze our algorithm in the task of node classification, achieving significant results over real-world datasets. Our findings indicate that there is unexplored potential in todays widely available modern hardware, specifically concerning parallelism, sparking a shift towards a more cost-efficient and ubiquitous data mining scenario.Desde o surgimento da web 2.0, os usuários não mais apenas consomem conteúdo, mas também são responsáveis agora por criar conteúdo que será consumido por outros usuários. Essa nova dinâmica levou a produção de dados à uma nova e surpreendente escala, chamada de escala planetária. Muitas vezes tais dados representam relacionamentos entre seus elementos, como é o caso em redes sociais, sistemas de recomendação, fórums online, redes de email, redes de citação científica, entre outras. Analisar o fluxo de informações e como os nós influenciam uns aos outros nesses domínios é um problema recorrente; Apesar do algoritmo Belief Propagation ser um dos principais algoritmos utilizados nesse contexto, o algoritmo historicamente apresentou problemas de garantias de convergência quando aplicado à redes reais. Contudo, apesar de recentemente métodos alternativos como LinBP focarem em resolver o problema de convergência do algoritmo original, a escalabilidade do algoritmo em grafos de larga escala continua sendo um desafio. Além disso, muitas das propostas que tentam resolver o problema de escalabilidade necessitam de infraestrutura adicional como supercomputadores e clusters computacionais. Com a motivação desses desafios, essa tése propoe um novo algoritmo, chamado VCBP, que tem como objetivo prover um arcabouço escalável para Belief Propagation em problemas de larga escala, como ocorre nos casos em que o grafo não cabe na memória principal. A proposta combina técnicas de processamento paralelo assíncrono centrado em vértices com avanços de estado-da-arte no algoritmo de Belief Propagation. O VCBP é capaz de alcançar novos patamares de performance que são ordens de magnitude melhores que a implementação do LinBP. Além disso, devido à natureza assíncrona do algoritmo são necessárias menos iterações até que a convergência seja alcançada quando comparado com outras soluções. Por fim, analisamos também o algoritmo quando aplicado à tarefa de classificação, alcançando resultados significativos em bases de dados reais. Nossas descobertas indicam que existe um grande potencial inexplorado na tecnologia de hardware largamente disponível atualmente, especialmente em relação ao paralelismo, apontando para a oportunidade de uma computação mais acessível e com melhor custo-benefício.Biblioteca Digitais de Teses e Dissertações da USPRodrigues Junior, José FernandoGimenes, Gabriel Perri2020-02-10info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-08062020-100828/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2020-06-08T16:15:02Zoai:teses.usp.br:tde-08062020-100828Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212020-06-08T16:15:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing Análise de dados sobre grafos em larga escala por meio de processamento paralelo assíncrono centrado em vértices |
| title |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing |
| spellingShingle |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing Gimenes, Gabriel Perri Belief propagation Big data Graph processing Larga escala Parallel processing Processamento centrado em vertíces Processamento de grafos Processamento paralelo Propagação de crenças Vertex-centric processing |
| title_short |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing |
| title_full |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing |
| title_fullStr |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing |
| title_full_unstemmed |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing |
| title_sort |
Data analysis over large-scale graphs using vertex-centric asynchronous parallel processing |
| author |
Gimenes, Gabriel Perri |
| author_facet |
Gimenes, Gabriel Perri |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Rodrigues Junior, José Fernando |
| dc.contributor.author.fl_str_mv |
Gimenes, Gabriel Perri |
| dc.subject.por.fl_str_mv |
Belief propagation Big data Graph processing Larga escala Parallel processing Processamento centrado em vertíces Processamento de grafos Processamento paralelo Propagação de crenças Vertex-centric processing |
| topic |
Belief propagation Big data Graph processing Larga escala Parallel processing Processamento centrado em vertíces Processamento de grafos Processamento paralelo Propagação de crenças Vertex-centric processing |
| description |
Since the birth of web 2.0, users no longer just consume but are now active creators of content that is going to be consumed by other users. This new dynamic took data generation to a whole new scale, called planetary-scale or web-scale. Often, this data represents relationships between its elements, such as in social networks, recommendation systems, online boards, email networks, scientific citation networks, and others. Analyzing how information flows and how nodes influence each other in several of such networks is a widely regarded problem; While Belief Propagation, which is the fundamental algorithm for these types of inference, is widely used, it historically lacked convergence guarantees for real-world networks. However, even though recently alternative methods such as LinBP solve the convergence problems of the original algorithm, its scalability when dealing with large-scale problems remains a challenge. Also, several of the works proposed to solve this issue, do so by relying on specific infrastructures such as supercomputers and computational clusters. Motivated by these challenges we propose a new algorithm, called VCBP, that aims to provide a scalable framework for belief propagation on largescale problems, such as when graphs do not fit the main memory. We do so by combining stateof- the-art asynchronous vertex-centric parallel processing with state-of-the-art belief propagation algorithm. Our algorithm maintains the same accuracy rate while achieving performance orders of magnitude higher than former LinBPs implementation. Due to the asynchronous nature of our algorithm, VCBP demands fewer iterations before convergence than any previous algorithm. Additionally, we analyze our algorithm in the task of node classification, achieving significant results over real-world datasets. Our findings indicate that there is unexplored potential in todays widely available modern hardware, specifically concerning parallelism, sparking a shift towards a more cost-efficient and ubiquitous data mining scenario. |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020-02-10 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-08062020-100828/ |
| url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-08062020-100828/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1815257976882069504 |