Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration
| Ano de defesa: | 2020 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Universidade Federal do Rio Grande do Norte
Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO |
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://repositorio.ufrn.br/handle/123456789/30251 |
Resumo: | Reverse-time migration (RTM) is an algorithm widely used in the oil and gas industry to process seismic data. It is a computationally intensive task that can be designed to run in parallel computers. Because of it being massive and regular, this type of task is often equally and statically distributed among the available parallel processors. However, this strategy might often not be optimal. When the processors are heterogeneous, and even when most have similar processing power, many of them might still have to wait idly for the slower processors. In this paper, we show that even among homogeneous cores here might be load imbalance that can considerably affect the overall performance of a 3D RTM application. We show that dynamic load distribution has a significant advantage over the conventional static distribution, and other default OpenMP schedules, such as auto and guided. However, the granularity of the dynamically distributed chunks of work plays a key role in harvesting this advantage. In order to find the optimal granularity, we propose a coupled simulated annealing (CSA) based auto-tuning strategy that adjusts the chunk size of the work that OpenMP parallel loops assign dynamically to worker threads during the initialization of a 3D RTM application. Experiments performed on computational systems with different processor and memory specifications for different sizes of input show that the proposed method is consistently faster than the default OpenMP loop schedulers. |
| id |
UFRN_b7072aa8a0190fa55a0f22b78e776fde |
|---|---|
| oai_identifier_str |
oai:repositorio.ufrn.br:123456789/30251 |
| network_acronym_str |
UFRN |
| network_name_str |
Repositório Institucional da UFRN |
| repository_id_str |
|
| spelling |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migrationAuto ajusteRTMCSAHPCChunk SizeMemória compartilhadaReverse-time migration (RTM) is an algorithm widely used in the oil and gas industry to process seismic data. It is a computationally intensive task that can be designed to run in parallel computers. Because of it being massive and regular, this type of task is often equally and statically distributed among the available parallel processors. However, this strategy might often not be optimal. When the processors are heterogeneous, and even when most have similar processing power, many of them might still have to wait idly for the slower processors. In this paper, we show that even among homogeneous cores here might be load imbalance that can considerably affect the overall performance of a 3D RTM application. We show that dynamic load distribution has a significant advantage over the conventional static distribution, and other default OpenMP schedules, such as auto and guided. However, the granularity of the dynamically distributed chunks of work plays a key role in harvesting this advantage. In order to find the optimal granularity, we propose a coupled simulated annealing (CSA) based auto-tuning strategy that adjusts the chunk size of the work that OpenMP parallel loops assign dynamically to worker threads during the initialization of a 3D RTM application. Experiments performed on computational systems with different processor and memory specifications for different sizes of input show that the proposed method is consistently faster than the default OpenMP loop schedulers.Migração reversa no tempo (RTM do inglês - reverse-time migration) é um algoritmo amplamente usado na indústria de petróleo e gás para processar dados sísmicos. O processamento do RTM é uma tarefa computacionalmente intensiva que se adapta bem a computadores paralelos. Por ser massivo e regular, esse tipo de tarefa geralmente é distribuído igual e estaticamente entre os processadores paralelos disponíveis. No entanto, essa estratégia pode frequentemente não ser ótima. Quando os processadores são heterogêneos, e mesmo quando a maioria tem poder de processamento semelhante, muitos deles ainda precisam esperar ociosamente pelos processadores mais lentos. Nesta pesquisa, mostramos que mesmo entre núcleos homogêneos pode haver desequilíbrio de carga e afetar consideravelmente o desempenho geral de uma aplicação 3D RTM. Mostramos que a distribuição dinâmica de carga tem uma vantagem significativa sobre a distribuição estática convencional, e outras distribuições padrões do OpenMP, auto e guiado. No entanto, a granularidade do trabalho que é dividido e distribuído dinamicamente desempenha um papel fundamental na obtenção dessa vantagem. Para encontrar a granularidade ideal, propomos uma estratégia de autoajuste baseado em recozimento simulado acoplado (CSA do inglês - coupled simulated annealing) que ajusta a quantidade de interações dos laços paralelos que o OpenMP atribue dinamicamente às threads durante a inicialização de um aplicativo 3D RTM. Experimentos realizados em sistemas computacionais com diferentes especificações de processador e quantidade de memória para diferentes tamanhos de entrada mostram que o método proposto é consistentemente mais rápido do que os escalonadores padrões do OpenMP.Universidade Federal do Rio Grande do NorteBrasilUFRNPROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃOSouza, Samuel Xavier deBarros, Tiago Tavares LeiteNascimento, Hugo Alexandre Dantas doMartins, Idalmis Milian SardinaFernandes, João Batista2020-10-05T17:46:05Z2020-10-05T17:46:05Z2020-01-14info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfFERNANDES, João Batista. Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration. 2020. 75f. Dissertação (Mestrado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2020.https://repositorio.ufrn.br/handle/123456789/30251info:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRN2020-10-11T07:39:03Zoai:repositorio.ufrn.br:123456789/30251Repositório InstitucionalPUBhttp://repositorio.ufrn.br/oai/repositorio@bczm.ufrn.bropendoar:2020-10-11T07:39:03Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
| dc.title.none.fl_str_mv |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| title |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| spellingShingle |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration Fernandes, João Batista Auto ajuste RTM CSA HPC Chunk Size Memória compartilhada |
| title_short |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| title_full |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| title_fullStr |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| title_full_unstemmed |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| title_sort |
Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration |
| author |
Fernandes, João Batista |
| author_facet |
Fernandes, João Batista |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Souza, Samuel Xavier de Barros, Tiago Tavares Leite Nascimento, Hugo Alexandre Dantas do Martins, Idalmis Milian Sardina |
| dc.contributor.author.fl_str_mv |
Fernandes, João Batista |
| dc.subject.por.fl_str_mv |
Auto ajuste RTM CSA HPC Chunk Size Memória compartilhada |
| topic |
Auto ajuste RTM CSA HPC Chunk Size Memória compartilhada |
| description |
Reverse-time migration (RTM) is an algorithm widely used in the oil and gas industry to process seismic data. It is a computationally intensive task that can be designed to run in parallel computers. Because of it being massive and regular, this type of task is often equally and statically distributed among the available parallel processors. However, this strategy might often not be optimal. When the processors are heterogeneous, and even when most have similar processing power, many of them might still have to wait idly for the slower processors. In this paper, we show that even among homogeneous cores here might be load imbalance that can considerably affect the overall performance of a 3D RTM application. We show that dynamic load distribution has a significant advantage over the conventional static distribution, and other default OpenMP schedules, such as auto and guided. However, the granularity of the dynamically distributed chunks of work plays a key role in harvesting this advantage. In order to find the optimal granularity, we propose a coupled simulated annealing (CSA) based auto-tuning strategy that adjusts the chunk size of the work that OpenMP parallel loops assign dynamically to worker threads during the initialization of a 3D RTM application. Experiments performed on computational systems with different processor and memory specifications for different sizes of input show that the proposed method is consistently faster than the default OpenMP loop schedulers. |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020-10-05T17:46:05Z 2020-10-05T17:46:05Z 2020-01-14 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
FERNANDES, João Batista. Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration. 2020. 75f. Dissertação (Mestrado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2020. https://repositorio.ufrn.br/handle/123456789/30251 |
| identifier_str_mv |
FERNANDES, João Batista. Granularity load scheduling auto-tuning for multi-core processors applied to reverse-time migration. 2020. 75f. Dissertação (Mestrado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2020. |
| url |
https://repositorio.ufrn.br/handle/123456789/30251 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO |
| publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
| instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
| instacron_str |
UFRN |
| institution |
UFRN |
| reponame_str |
Repositório Institucional da UFRN |
| collection |
Repositório Institucional da UFRN |
| repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
| repository.mail.fl_str_mv |
repositorio@bczm.ufrn.br |
| _version_ |
1855758753150795776 |