Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos

Trindade, Rafael Gauna

Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos

Detalhes bibliográficos
Ano de defesa:	2020
Autor(a) principal:	Trindade, Rafael Gauna
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
dARK ID:	ark:/26339/001300000x3bz
Idioma:	por
Instituição de defesa:	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Algoritmos adaptativos Escalonadores Laços paralelos Processadores multinúcleo assimétricos Computação paralela Computação heterogênea Computação de alto desempenho Roubo de trabalho Intel TBB OpenMP Adaptive algorithms Schedulers Parallel loops Asymmetric multicore processors Parallel computing Heterogeneous computing High performance computing Work stealing CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://repositorio.ufsm.br/handle/1/22265
Resumo:	The growing demand for computing power and energy efficiency in mobile computing has triggered the development of heterogeneous processors with specialized cores for different types of computational tasks, such as ARM big.LITTLE processors, which have different cores that combine performance with low energy consumption. Such difference in the composition of the cores in this kind of processors ends up inducing an asymmetry in the computational performance of these systems, making complicated the task of predicting the behavior of parallel applications in relation to performance when using all their cores. This asymmetry can be detected in applications that use parallel loops, a parallel programming feature that allows to divide the workload of an iterative routine between the cores present in a processor. Parallel loop schedulers that are not designed to prevent loss of performance in asymmetric multi-core processors (AMPs) can compromise the implementation of software solutions designed for this type of architecture. This dissertation presents the implementation proposal of a scheduler for parallel loops that uses an adaptive algorithm to distribute the workload among the threads, aiming at a better extraction of performance in AMPs. The scheduler uses of parallel work-stealing and lock-free as possible sequential extraction of work to face other existing solutions. Its implementation was carried out in the C ++ language, with the possibility of portability to the C language. In order to evaluate the performance of the solution, an analysis was performed on the set of NAS benchmarks and four distinct well-established scientific applications in related literature, over two real asymmetric embedded environments, against two existing solutions (OpenMP and Intel TBB). The analysis shows that the scheduler is able to extract more performance in certain cases and is very close to the best solutions in most of the remaining cases, with greater scalability potential in theory for cases where the scheduling overhead becomes an obstacle in the other solutions.

Metadados do item

id	UFSM_03ea07707ec9ecf9f20f83ae9013a129
oai_identifier_str	oai:repositorio.ufsm.br:1/22265
network_acronym_str	UFSM
network_name_str	Manancial - Repositório Digital da UFSM
repository_id_str
spelling	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricosAn adaptive scheduler for parallel loops on asymmetric multicore processorsAlgoritmos adaptativosEscalonadoresLaços paralelosProcessadores multinúcleo assimétricosComputação paralelaComputação heterogêneaComputação de alto desempenhoRoubo de trabalhoIntel TBBOpenMPAdaptive algorithmsSchedulersParallel loopsAsymmetric multicore processorsParallel computingHeterogeneous computingHigh performance computingWork stealingCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOThe growing demand for computing power and energy efficiency in mobile computing has triggered the development of heterogeneous processors with specialized cores for different types of computational tasks, such as ARM big.LITTLE processors, which have different cores that combine performance with low energy consumption. Such difference in the composition of the cores in this kind of processors ends up inducing an asymmetry in the computational performance of these systems, making complicated the task of predicting the behavior of parallel applications in relation to performance when using all their cores. This asymmetry can be detected in applications that use parallel loops, a parallel programming feature that allows to divide the workload of an iterative routine between the cores present in a processor. Parallel loop schedulers that are not designed to prevent loss of performance in asymmetric multi-core processors (AMPs) can compromise the implementation of software solutions designed for this type of architecture. This dissertation presents the implementation proposal of a scheduler for parallel loops that uses an adaptive algorithm to distribute the workload among the threads, aiming at a better extraction of performance in AMPs. The scheduler uses of parallel work-stealing and lock-free as possible sequential extraction of work to face other existing solutions. Its implementation was carried out in the C ++ language, with the possibility of portability to the C language. In order to evaluate the performance of the solution, an analysis was performed on the set of NAS benchmarks and four distinct well-established scientific applications in related literature, over two real asymmetric embedded environments, against two existing solutions (OpenMP and Intel TBB). The analysis shows that the scheduler is able to extract more performance in certain cases and is very close to the best solutions in most of the remaining cases, with greater scalability potential in theory for cases where the scheduling overhead becomes an obstacle in the other solutions.A crescente demanda por potência computacional e eficiência energética em computação móvel desencadeou o desenvolvimento de processadores heterogêneos com núcleos especializados para diferentes tipos de tarefas computacionais, como processadores ARM big.LITTLE, que tem núcleos distintos que aliam desempenho a baixo consumo energético. Tal diferença na composição dos núcleos nesse tipo de processadores acaba induzindo uma assimetria no desempenho computacional desses sistemas, tornando complicada a tarefa de prever o comportamento de aplicações paralelas com relação a desempenho quando usados todos os seus núcleos. Essa assimetria pode ser percebida em aplicações que façam uso de laços paralelos, um recurso de programação paralela que permite dividir a carga de trabalho de uma rotina iterativa entre os núcleos presentes em um processador. Escalonadores de laços paralelos que não são projetados de modo a evitar a perda de desempenho em processadores multinúcleo assimétricos (AMPs) podem comprometer a implementação de soluções de software construídas para esse tipo de arquitetura. Esta dissertação apresenta a proposta de implementação de um escalonador para laços paralelos que utiliza um algoritmo adaptativo para distribuição da carga de trabalho entre as threads, visando extrações de desempenho mais eficientes em AMPs. O escalonador utiliza-se de roubo de trabalho paralelo e extração de trabalho sequencial o mais livre de travas quanto for possível para fazer frente às demais soluções existentes. Sua implementação foi realizada na linguagem C++, com possibilidade de portabilidade para a linguagem C. Com a finalidade de avaliar o desempenho da solução, uma análise foi realizada sobre o conjunto de benchmarks NAS e quatro aplicações científicas distintas bem consolidadas na literatura relacionada, sobre dois ambientes assimétricos embarcados reais, contra duas soluções existentes (OpenMP e Intel TBB). A análise mostra que o escalonador consegue extrair mais desempenho em determinados casos e se aproxima muito das melhores soluções na maioria dos casos restantes, com potencial de escalabilidade em teoria maior para casos onde o sobrecusto de escalonamento se torna um empecilho nas demais soluções.Universidade Federal de Santa MariaBrasilCiência da ComputaçãoUFSMPrograma de Pós-Graduação em Ciência da ComputaçãoCentro de TecnologiaLima, João Vicente Ferreirahttp://lattes.cnpq.br/6266546896929217Charão, Andrea SchwertnerQueiroz, Leonardo Fialho deTrindade, Rafael Gauna2021-09-22T19:36:12Z2021-09-22T19:36:12Z2020-03-30info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://repositorio.ufsm.br/handle/1/22265ark:/26339/001300000x3bzporAttribution-NonCommercial-NoDerivatives 4.0 Internationalinfo:eu-repo/semantics/openAccessreponame:Manancial - Repositório Digital da UFSMinstname:Universidade Federal de Santa Maria (UFSM)instacron:UFSM2021-09-23T06:00:54Zoai:repositorio.ufsm.br:1/22265Biblioteca Digital de Teses e Dissertaçõeshttps://repositorio.ufsm.br/PUBhttps://repositorio.ufsm.br/oai/requestatendimento.sib@ufsm.br\|\|tedebc@gmail.com\|\|manancial@ufsm.bropendoar:2021-09-23T06:00:54Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)false
dc.title.none.fl_str_mv	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos An adaptive scheduler for parallel loops on asymmetric multicore processors
title	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos
spellingShingle	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos Trindade, Rafael Gauna Algoritmos adaptativos Escalonadores Laços paralelos Processadores multinúcleo assimétricos Computação paralela Computação heterogênea Computação de alto desempenho Roubo de trabalho Intel TBB OpenMP Adaptive algorithms Schedulers Parallel loops Asymmetric multicore processors Parallel computing Heterogeneous computing High performance computing Work stealing CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos
title_full	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos
title_fullStr	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos
title_full_unstemmed	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos
title_sort	Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos
author	Trindade, Rafael Gauna
author_facet	Trindade, Rafael Gauna
author_role	author
dc.contributor.none.fl_str_mv	Lima, João Vicente Ferreira http://lattes.cnpq.br/6266546896929217 Charão, Andrea Schwertner Queiroz, Leonardo Fialho de
dc.contributor.author.fl_str_mv	Trindade, Rafael Gauna
dc.subject.por.fl_str_mv	Algoritmos adaptativos Escalonadores Laços paralelos Processadores multinúcleo assimétricos Computação paralela Computação heterogênea Computação de alto desempenho Roubo de trabalho Intel TBB OpenMP Adaptive algorithms Schedulers Parallel loops Asymmetric multicore processors Parallel computing Heterogeneous computing High performance computing Work stealing CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
topic	Algoritmos adaptativos Escalonadores Laços paralelos Processadores multinúcleo assimétricos Computação paralela Computação heterogênea Computação de alto desempenho Roubo de trabalho Intel TBB OpenMP Adaptive algorithms Schedulers Parallel loops Asymmetric multicore processors Parallel computing Heterogeneous computing High performance computing Work stealing CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	The growing demand for computing power and energy efficiency in mobile computing has triggered the development of heterogeneous processors with specialized cores for different types of computational tasks, such as ARM big.LITTLE processors, which have different cores that combine performance with low energy consumption. Such difference in the composition of the cores in this kind of processors ends up inducing an asymmetry in the computational performance of these systems, making complicated the task of predicting the behavior of parallel applications in relation to performance when using all their cores. This asymmetry can be detected in applications that use parallel loops, a parallel programming feature that allows to divide the workload of an iterative routine between the cores present in a processor. Parallel loop schedulers that are not designed to prevent loss of performance in asymmetric multi-core processors (AMPs) can compromise the implementation of software solutions designed for this type of architecture. This dissertation presents the implementation proposal of a scheduler for parallel loops that uses an adaptive algorithm to distribute the workload among the threads, aiming at a better extraction of performance in AMPs. The scheduler uses of parallel work-stealing and lock-free as possible sequential extraction of work to face other existing solutions. Its implementation was carried out in the C ++ language, with the possibility of portability to the C language. In order to evaluate the performance of the solution, an analysis was performed on the set of NAS benchmarks and four distinct well-established scientific applications in related literature, over two real asymmetric embedded environments, against two existing solutions (OpenMP and Intel TBB). The analysis shows that the scheduler is able to extract more performance in certain cases and is very close to the best solutions in most of the remaining cases, with greater scalability potential in theory for cases where the scheduling overhead becomes an obstacle in the other solutions.
publishDate	2020
dc.date.none.fl_str_mv	2020-03-30 2021-09-22T19:36:12Z 2021-09-22T19:36:12Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.ufsm.br/handle/1/22265
dc.identifier.dark.fl_str_mv	ark:/26339/001300000x3bz
url	http://repositorio.ufsm.br/handle/1/22265
identifier_str_mv	ark:/26339/001300000x3bz
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution-NonCommercial-NoDerivatives 4.0 International
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia
publisher.none.fl_str_mv	Universidade Federal de Santa Maria Brasil Ciência da Computação UFSM Programa de Pós-Graduação em Ciência da Computação Centro de Tecnologia
dc.source.none.fl_str_mv	reponame:Manancial - Repositório Digital da UFSM instname:Universidade Federal de Santa Maria (UFSM) instacron:UFSM
instname_str	Universidade Federal de Santa Maria (UFSM)
instacron_str	UFSM
institution	UFSM
reponame_str	Manancial - Repositório Digital da UFSM
collection	Manancial - Repositório Digital da UFSM
repository.name.fl_str_mv	Manancial - Repositório Digital da UFSM - Universidade Federal de Santa Maria (UFSM)
repository.mail.fl_str_mv	atendimento.sib@ufsm.br\|\|tedebc@gmail.com\|\|manancial@ufsm.br
_version_	1847153459701219328

Escalonador adaptativo para laços paralelos em processadores multinúcleo assimétricos

Registros relacionados