High-level multi-GPU support for multi-core stream parallelism

Fim, Gabriel Rustick

High-level multi-GPU support for multi-core stream parallelism

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Fim, Gabriel Rustick
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Pontifícia Universidade Católica do Rio Grande do Sul Escola Politécnica Brasil PUCRS Programa de Pós-Graduação em Ciência da Computação
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Parallel Pogramming Data Parallelism Stream Processing Structured Parallel Programming GPU Programming Multi-GPU Programming Domain-Specific Language Algorithmic Skeletons High-Performance Computing C C++ Programação Paralela Paralelismo de Dados Processamento de Stream Programação Paralela Estruturada Programação GPU Programação Multi-GPU Linguagem Específica de Domínio Esqueletos Algorítmicos Computação de Alto Desempenho CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
Link de acesso:	https://tede2.pucrs.br/tede2/handle/tede/11668
Resumo:	Nowadays, computer architectures often rely on graphics processing units (GPUs) to allow massive parallelism exploitation at a lower cost. This parallelism can be particularly advantageous in stream processing, a domain of applications continuously processing a data flow of often unknown size. Nonetheless, the programmer must employ parallel programming to exploit underlying GPU hardware capabilities efficiently. This can be challenging since it involves refactoring algorithms, using parallelism techniques, and knowing about the environment’s hardware, especially when writing portable code, since GPU vendors and generations offer different capabilities. This challenge becomes even more complex in multi-GPU environments; the programmer must choose which strategy to partition their data, which strategy to schedule their tasks onto the GPUs, how to handle communication needs between tasks, and how to perform GPU asynchronous operations. To address these challenges, researchers have focused on investigating efficient programming techniques for GPUs and developing abstractions that simplify the programming process. One such abstraction is SPar, a domain-specific language (DSL) that enables the expression of stream parallelism without sacrificing performance. Recently, an extension was added to SPar that allows parallel code generation for GPUs in streaming applications. To achieve this, SPar performs source-to-source code transformations and generates GPU code using an intermediate library named GSParLib. Nonetheless, SPar supports code generation for a single GPU environment only. In this work, we investigate how to allow multi-GPU code generation for stream processing and investigate state-of-the-art optimizations and techniques for multi-GPU programming targeting multi-core systems. Our contributions are a set of data stream scheduling algorithms for multi-GPUs, which were integrated in the code generation of SPar, transparently supporting multi-GPU usage in multi-core systems. The experimental results demonstrated that it is possible to simplify the exploitation of multi-GPU for stream applications without sacrificing performance by utilizing scheduling policies specifically targeting multi-GPU through code annotations like the ones provided by SPar, achieving similar results to manual implementations targeting multi-GPU while having close to half the number of lines of code.

Metadados do item

id	P_RS_33f5fc4eed0ff0c427258f1ef82190e2
oai_identifier_str	oai:tede2.pucrs.br:tede/11668
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	High-level multi-GPU support for multi-core stream parallelismParalelismo de stream em multi-GPU para multi-coresParallel PogrammingData ParallelismStream ProcessingStructured Parallel ProgrammingGPU ProgrammingMulti-GPU ProgrammingDomain-Specific LanguageAlgorithmic SkeletonsHigh-Performance ComputingCC++Programação ParalelaParalelismo de DadosProcessamento de StreamProgramação Paralela EstruturadaProgramação GPUProgramação Multi-GPULinguagem Específica de DomínioEsqueletos AlgorítmicosComputação de Alto DesempenhoCC++CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAONowadays, computer architectures often rely on graphics processing units (GPUs) to allow massive parallelism exploitation at a lower cost. This parallelism can be particularly advantageous in stream processing, a domain of applications continuously processing a data flow of often unknown size. Nonetheless, the programmer must employ parallel programming to exploit underlying GPU hardware capabilities efficiently. This can be challenging since it involves refactoring algorithms, using parallelism techniques, and knowing about the environment’s hardware, especially when writing portable code, since GPU vendors and generations offer different capabilities. This challenge becomes even more complex in multi-GPU environments; the programmer must choose which strategy to partition their data, which strategy to schedule their tasks onto the GPUs, how to handle communication needs between tasks, and how to perform GPU asynchronous operations. To address these challenges, researchers have focused on investigating efficient programming techniques for GPUs and developing abstractions that simplify the programming process. One such abstraction is SPar, a domain-specific language (DSL) that enables the expression of stream parallelism without sacrificing performance. Recently, an extension was added to SPar that allows parallel code generation for GPUs in streaming applications. To achieve this, SPar performs source-to-source code transformations and generates GPU code using an intermediate library named GSParLib. Nonetheless, SPar supports code generation for a single GPU environment only. In this work, we investigate how to allow multi-GPU code generation for stream processing and investigate state-of-the-art optimizations and techniques for multi-GPU programming targeting multi-core systems. Our contributions are a set of data stream scheduling algorithms for multi-GPUs, which were integrated in the code generation of SPar, transparently supporting multi-GPU usage in multi-core systems. The experimental results demonstrated that it is possible to simplify the exploitation of multi-GPU for stream applications without sacrificing performance by utilizing scheduling policies specifically targeting multi-GPU through code annotations like the ones provided by SPar, achieving similar results to manual implementations targeting multi-GPU while having close to half the number of lines of code.Atualmente, as arquiteturas de computadores dependem frequentemente de unidades de processamento gráfico (GPUs) para permitir a exploração massiva do paralelismo a um custo reduzido. Este paralelismo pode ser particularmente vantajoso no processamento de streams, um domínio de aplicações que processam continuamente um fluxo de dados de tamanho muitas vezes desconhecido. No entanto, o programador deve empregar programação paralela para explorar os recursos de hardware da GPU subjacente de forma eficiente. Isso pode ser desafiador, pois envolve refatorar algoritmos, usar técnicas de paralelismo e conhecer o hardware do ambiente, especialmente ao escrever código portável, uma vez que os fornecedores e gerações de GPU oferecem capacidades diferentes. Este desafio torna-se ainda mais complexo em ambientes multi-GPU; o programador deve escolher qual estratégia será utilizada para particionar seus dados, qual estratégia de escalonamento de tarefas será utilizada nas GPUs, como lidar com as necessidades de comunicação entre tarefas e como executar operações assíncronas na GPU. Para enfrentar esses desafios, pesquisadores se concentraram na investigação de técnicas de programação eficientes para GPUs e no desenvolvimento de abstrações que simplificam o processo de programação. Uma dessas abstrações é a SPar, uma linguagem de domínio específico (DSL) que permite a expressão do paralelismo de fluxo sem sacrificar o desempenho. Recentemente, foi adicionada uma extensão a SPar que permite a geração paralela de código para GPUs em aplicações de streaming. Para conseguir isso, a SPar realiza transformações de código fonte e gera código GPU usando uma biblioteca intermediária chamada GSParLib. No entanto, SPar oferece suporte à geração de código somente para ambientes com uma única GPU. Neste trabalho, investigamos como permitir a geração de código multi-GPU para processamento de streams e investigamos otimizações e técnicas para programação multi-GPU direcionado a sistemas multi-core. Nossas contribuições são um conjunto de algoritmos de escalonamento para fluxo de dados em multi-GPUs, que foram integrados na geração de código do SPar, suportando transparentemente o uso de multi-GPU em sistemas multi-core. Os resultados experimentais demonstraram que é possível simplificar a exploração de multi-GPU para aplicações de stream sem sacrificar o desempenho, utilizando políticas de escalonamento visando especificamente multi-GPU por meio de anotações de código como as fornecidas pelo SPar, alcançando resultados semelhantes às implementações manuais visando multi-GPU, enquanto tendo quase metade do número de linhas de código.Pontifícia Universidade Católica do Rio Grande do SulEscola PolitécnicaBrasilPUCRSPrograma de Pós-Graduação em Ciência da ComputaçãoGriebler, Dalvan JairFim, Gabriel Rustick2025-06-04T22:10:07Z2025-03-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://tede2.pucrs.br/tede2/handle/tede/11668enginfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RS2025-06-04T23:00:31Zoai:tede2.pucrs.br:tede/11668Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2025-06-04T23:00:31Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.none.fl_str_mv	High-level multi-GPU support for multi-core stream parallelism Paralelismo de stream em multi-GPU para multi-cores
title	High-level multi-GPU support for multi-core stream parallelism
spellingShingle	High-level multi-GPU support for multi-core stream parallelism Fim, Gabriel Rustick Parallel Pogramming Data Parallelism Stream Processing Structured Parallel Programming GPU Programming Multi-GPU Programming Domain-Specific Language Algorithmic Skeletons High-Performance Computing C C++ Programação Paralela Paralelismo de Dados Processamento de Stream Programação Paralela Estruturada Programação GPU Programação Multi-GPU Linguagem Específica de Domínio Esqueletos Algorítmicos Computação de Alto Desempenho C C++ CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
title_short	High-level multi-GPU support for multi-core stream parallelism
title_full	High-level multi-GPU support for multi-core stream parallelism
title_fullStr	High-level multi-GPU support for multi-core stream parallelism
title_full_unstemmed	High-level multi-GPU support for multi-core stream parallelism
title_sort	High-level multi-GPU support for multi-core stream parallelism
author	Fim, Gabriel Rustick
author_facet	Fim, Gabriel Rustick
author_role	author
dc.contributor.none.fl_str_mv	Griebler, Dalvan Jair
dc.contributor.author.fl_str_mv	Fim, Gabriel Rustick
dc.subject.por.fl_str_mv	Parallel Pogramming Data Parallelism Stream Processing Structured Parallel Programming GPU Programming Multi-GPU Programming Domain-Specific Language Algorithmic Skeletons High-Performance Computing C C++ Programação Paralela Paralelismo de Dados Processamento de Stream Programação Paralela Estruturada Programação GPU Programação Multi-GPU Linguagem Específica de Domínio Esqueletos Algorítmicos Computação de Alto Desempenho C C++ CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
topic	Parallel Pogramming Data Parallelism Stream Processing Structured Parallel Programming GPU Programming Multi-GPU Programming Domain-Specific Language Algorithmic Skeletons High-Performance Computing C C++ Programação Paralela Paralelismo de Dados Processamento de Stream Programação Paralela Estruturada Programação GPU Programação Multi-GPU Linguagem Específica de Domínio Esqueletos Algorítmicos Computação de Alto Desempenho C C++ CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
description	Nowadays, computer architectures often rely on graphics processing units (GPUs) to allow massive parallelism exploitation at a lower cost. This parallelism can be particularly advantageous in stream processing, a domain of applications continuously processing a data flow of often unknown size. Nonetheless, the programmer must employ parallel programming to exploit underlying GPU hardware capabilities efficiently. This can be challenging since it involves refactoring algorithms, using parallelism techniques, and knowing about the environment’s hardware, especially when writing portable code, since GPU vendors and generations offer different capabilities. This challenge becomes even more complex in multi-GPU environments; the programmer must choose which strategy to partition their data, which strategy to schedule their tasks onto the GPUs, how to handle communication needs between tasks, and how to perform GPU asynchronous operations. To address these challenges, researchers have focused on investigating efficient programming techniques for GPUs and developing abstractions that simplify the programming process. One such abstraction is SPar, a domain-specific language (DSL) that enables the expression of stream parallelism without sacrificing performance. Recently, an extension was added to SPar that allows parallel code generation for GPUs in streaming applications. To achieve this, SPar performs source-to-source code transformations and generates GPU code using an intermediate library named GSParLib. Nonetheless, SPar supports code generation for a single GPU environment only. In this work, we investigate how to allow multi-GPU code generation for stream processing and investigate state-of-the-art optimizations and techniques for multi-GPU programming targeting multi-core systems. Our contributions are a set of data stream scheduling algorithms for multi-GPUs, which were integrated in the code generation of SPar, transparently supporting multi-GPU usage in multi-core systems. The experimental results demonstrated that it is possible to simplify the exploitation of multi-GPU for stream applications without sacrificing performance by utilizing scheduling policies specifically targeting multi-GPU through code annotations like the ones provided by SPar, achieving similar results to manual implementations targeting multi-GPU while having close to half the number of lines of code.
publishDate	2025
dc.date.none.fl_str_mv	2025-06-04T22:10:07Z 2025-03-28
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://tede2.pucrs.br/tede2/handle/tede/11668
url	https://tede2.pucrs.br/tede2/handle/tede/11668
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul Escola Politécnica Brasil PUCRS Programa de Pós-Graduação em Ciência da Computação
publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul Escola Politécnica Brasil PUCRS Programa de Pós-Graduação em Ciência da Computação
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1850041319447068672

High-level multi-GPU support for multi-core stream parallelism

Registros relacionados