Colocação Automática de Computação em Hardware Heterogêneo

Kezia Correa Andrade

Colocação Automática de Computação em Hardware Heterogêneo

Detalhes bibliográficos
Ano de defesa:	2016
Autor(a) principal:	Kezia Correa Andrade
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Linguagem de programação (Computação) Compiladores (Computadores) Computação Análise estática Compiladores Análise Estática Linguagens de Programação Paralelismo GPGPU
Link de acesso:	https://hdl.handle.net/1843/ESBF-AE2GAF
Resumo:	Graphics processing Units (GPUs) have revolutionized high performance programming.They reduced the cost of parallel hardware, however, programming these devices is still a challenge. Programmers are not able (yet) to write code to coordinate the simultaneousperformance of thousands of threads. To deal with this problem, the industry and the academia have introduced annotation systems. Examples of those systems are OpenMP 4.0, OpenSS and OpenACC, which allow developers to indicate which parts of a C or Fortran program should perform on GPU or CPU. This approach has two advantages. First, it lets programmers to obtain the benefits of the parallel hardwarewhile coding in their preferred programming languages. Second, the annotations protect programmers from details of the parallel hardware, once they pass the task of parallelizing programs for the code generator.The inclusion of pragmas in the code to hide details of the hardware does not solve all the problems that developers face when programming GPUs: they still need to identify when it will be advantageous to run a given piece of code on the GPU. In this context, the objective of this dissertation is to present a solution to such a problem. It was designed, implemented and tested techniques to automatically identify whichportions of the code must run on the GPU. For this, we used dependency information, memory layout and control flow. Were created a set of static analysis that performs three tasks: (i) identify which loops are parallelizable; (ii) insert annotations to copy data between the CPU and the GPU; (iii) estimate which loops, once tagged as parallels, are most likely to lead to performance gains. These tasks are totally automatic, are carried out without any user intervention. The platform that is presented has been implemented on two compilers. The analyses were built on top of the infrastructure available in LLVM. The parallel code generation, from annotated programs, is made by PGCC. The approach that we have developed is completely static: we decide where each function must run during the compilation of the program. This decision does not relyon any runtime system such as a middleware, or special computer architecture hooks. Another benefit of this framework is that it is completely automatic, i.e. it does not require any intervention from the programmer. As a result, programs that we produce automatically - can be up to 121x faster than their original versions.

Metadados do item

id	UFMG_e80c7d293ea8a19103b00f7d32bbe3c1
oai_identifier_str	oai:repositorio.ufmg.br:1843/ESBF-AE2GAF
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	Colocação Automática de Computação em Hardware HeterogêneoLinguagem de programação (Computação)Compiladores (Computadores)ComputaçãoAnálise estáticaCompiladoresAnálise EstáticaLinguagens de ProgramaçãoParalelismoGPGPUGraphics processing Units (GPUs) have revolutionized high performance programming.They reduced the cost of parallel hardware, however, programming these devices is still a challenge. Programmers are not able (yet) to write code to coordinate the simultaneousperformance of thousands of threads. To deal with this problem, the industry and the academia have introduced annotation systems. Examples of those systems are OpenMP 4.0, OpenSS and OpenACC, which allow developers to indicate which parts of a C or Fortran program should perform on GPU or CPU. This approach has two advantages. First, it lets programmers to obtain the benefits of the parallel hardwarewhile coding in their preferred programming languages. Second, the annotations protect programmers from details of the parallel hardware, once they pass the task of parallelizing programs for the code generator.The inclusion of pragmas in the code to hide details of the hardware does not solve all the problems that developers face when programming GPUs: they still need to identify when it will be advantageous to run a given piece of code on the GPU. In this context, the objective of this dissertation is to present a solution to such a problem. It was designed, implemented and tested techniques to automatically identify whichportions of the code must run on the GPU. For this, we used dependency information, memory layout and control flow. Were created a set of static analysis that performs three tasks: (i) identify which loops are parallelizable; (ii) insert annotations to copy data between the CPU and the GPU; (iii) estimate which loops, once tagged as parallels, are most likely to lead to performance gains. These tasks are totally automatic, are carried out without any user intervention. The platform that is presented has been implemented on two compilers. The analyses were built on top of the infrastructure available in LLVM. The parallel code generation, from annotated programs, is made by PGCC. The approach that we have developed is completely static: we decide where each function must run during the compilation of the program. This decision does not relyon any runtime system such as a middleware, or special computer architecture hooks. Another benefit of this framework is that it is completely automatic, i.e. it does not require any intervention from the programmer. As a result, programs that we produce automatically - can be up to 121x faster than their original versions.Universidade Federal de Minas Gerais2019-08-11T07:51:44Z2025-09-08T23:15:08Z2019-08-11T07:51:44Z2016-06-24info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/1843/ESBF-AE2GAFKezia Correa Andradeinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-09T18:04:45Zoai:repositorio.ufmg.br:1843/ESBF-AE2GAFRepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T18:04:45Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	Colocação Automática de Computação em Hardware Heterogêneo
title	Colocação Automática de Computação em Hardware Heterogêneo
spellingShingle	Colocação Automática de Computação em Hardware Heterogêneo Kezia Correa Andrade Linguagem de programação (Computação) Compiladores (Computadores) Computação Análise estática Compiladores Análise Estática Linguagens de Programação Paralelismo GPGPU
title_short	Colocação Automática de Computação em Hardware Heterogêneo
title_full	Colocação Automática de Computação em Hardware Heterogêneo
title_fullStr	Colocação Automática de Computação em Hardware Heterogêneo
title_full_unstemmed	Colocação Automática de Computação em Hardware Heterogêneo
title_sort	Colocação Automática de Computação em Hardware Heterogêneo
author	Kezia Correa Andrade
author_facet	Kezia Correa Andrade
author_role	author
dc.contributor.author.fl_str_mv	Kezia Correa Andrade
dc.subject.por.fl_str_mv	Linguagem de programação (Computação) Compiladores (Computadores) Computação Análise estática Compiladores Análise Estática Linguagens de Programação Paralelismo GPGPU
topic	Linguagem de programação (Computação) Compiladores (Computadores) Computação Análise estática Compiladores Análise Estática Linguagens de Programação Paralelismo GPGPU
description	Graphics processing Units (GPUs) have revolutionized high performance programming.They reduced the cost of parallel hardware, however, programming these devices is still a challenge. Programmers are not able (yet) to write code to coordinate the simultaneousperformance of thousands of threads. To deal with this problem, the industry and the academia have introduced annotation systems. Examples of those systems are OpenMP 4.0, OpenSS and OpenACC, which allow developers to indicate which parts of a C or Fortran program should perform on GPU or CPU. This approach has two advantages. First, it lets programmers to obtain the benefits of the parallel hardwarewhile coding in their preferred programming languages. Second, the annotations protect programmers from details of the parallel hardware, once they pass the task of parallelizing programs for the code generator.The inclusion of pragmas in the code to hide details of the hardware does not solve all the problems that developers face when programming GPUs: they still need to identify when it will be advantageous to run a given piece of code on the GPU. In this context, the objective of this dissertation is to present a solution to such a problem. It was designed, implemented and tested techniques to automatically identify whichportions of the code must run on the GPU. For this, we used dependency information, memory layout and control flow. Were created a set of static analysis that performs three tasks: (i) identify which loops are parallelizable; (ii) insert annotations to copy data between the CPU and the GPU; (iii) estimate which loops, once tagged as parallels, are most likely to lead to performance gains. These tasks are totally automatic, are carried out without any user intervention. The platform that is presented has been implemented on two compilers. The analyses were built on top of the infrastructure available in LLVM. The parallel code generation, from annotated programs, is made by PGCC. The approach that we have developed is completely static: we decide where each function must run during the compilation of the program. This decision does not relyon any runtime system such as a middleware, or special computer architecture hooks. Another benefit of this framework is that it is completely automatic, i.e. it does not require any intervention from the programmer. As a result, programs that we produce automatically - can be up to 121x faster than their original versions.
publishDate	2016
dc.date.none.fl_str_mv	2016-06-24 2019-08-11T07:51:44Z 2019-08-11T07:51:44Z 2025-09-08T23:15:08Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1843/ESBF-AE2GAF
url	https://hdl.handle.net/1843/ESBF-AE2GAF
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1856413890863169536

Colocação Automática de Computação em Hardware Heterogêneo

Registros relacionados