ASLI schenes as a kernel convolved way to optimize stencil computation.

Januário, Guilherme Carvalho

ASLI schenes as a kernel convolved way to optimize stencil computation.

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	Januário, Guilherme Carvalho
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Aggregate stencil-loop iteratio Análise de desempenho Arquiteturas paralelas ASLI Combinatória Kernel convolution Otimização matemática Stencil computation Supercomputadores Supercomputing
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/3/3141/tde-18052021-154710/
Resumo:	Stencil computation is notorious for having the performance limited by the main memory access. In current computers it implies underutilization of the central processing units. To cope with this limitation, multiple approaches relying on reordering the computation have been proposed, most notably variations of space-blocking and timeblocking. This work introduces a technique to speed up stencil computation, which is not based on space-blocking or time-blocking. Stencil computation implies multiple iterations of traversals through every domain point, with each iteration updating every point based on the previous values of the neighboring points. The technique introduced, named Aggregate Stencil-Loop Iteration (ASLI), works by updating the value of each domain point using the original stencil operator convolved with itself one or more times. The approach implies traversing the data domain fewer times than a straightforward iterative stencil implementation would, with each traversal performing more computation per data item fetched into registers. This more complex operator creates new opportunities for in-register data reuse and increases the FLOPs-to-load ratio. Computation and data reuse schemes are developed for its application to 1, 2, and 3- dimensional stencils. The Influence Table is presented to assist in the calculation of convolved coefficients. An integer sequence is derived. For 2D and 3D star-shaped stencils, the total number of FLOPs increases, but better interaction with the memory makes it beneficial even when compared with optimized non-ASLI implementations. ASLI is relatively easy to implement, allowing more scientists to productively extract better performance from supercomputing clusters. Performance results are shown for a variety of platforms, proving the soundness of the approach and exemplifying how it can be straightforwardly applied with existing techniques and solutions, helping to increase the performance of existing optimization methods. In order to better express ASLI and to enable comparison with other approaches, a methodology is outlined and new metrics are set forth for evaluating stencil implementations, and perhaps the scalability of memory access in a machine. ASLI can be regarded as the application of a broader principle, namely, Kernel Convolution, to the particular case of stencil computation. From this perspective, the Influence Table could promote the use of Kernel Convolution in other applications.

Metadados do item

id	USP_44d64df83480bd249d906e09948825c0
oai_identifier_str	oai:teses.usp.br:tde-18052021-154710
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	ASLI schenes as a kernel convolved way to optimize stencil computation.Esquemas ASLI para optimização de computação stencil através de convolução do núcleo computacional.Aggregate stencil-loop iteratioAnálise de desempenhoArquiteturas paralelasASLICombinatóriaKernel convolutionOtimização matemáticaStencil computationSupercomputadoresSupercomputingStencil computation is notorious for having the performance limited by the main memory access. In current computers it implies underutilization of the central processing units. To cope with this limitation, multiple approaches relying on reordering the computation have been proposed, most notably variations of space-blocking and timeblocking. This work introduces a technique to speed up stencil computation, which is not based on space-blocking or time-blocking. Stencil computation implies multiple iterations of traversals through every domain point, with each iteration updating every point based on the previous values of the neighboring points. The technique introduced, named Aggregate Stencil-Loop Iteration (ASLI), works by updating the value of each domain point using the original stencil operator convolved with itself one or more times. The approach implies traversing the data domain fewer times than a straightforward iterative stencil implementation would, with each traversal performing more computation per data item fetched into registers. This more complex operator creates new opportunities for in-register data reuse and increases the FLOPs-to-load ratio. Computation and data reuse schemes are developed for its application to 1, 2, and 3- dimensional stencils. The Influence Table is presented to assist in the calculation of convolved coefficients. An integer sequence is derived. For 2D and 3D star-shaped stencils, the total number of FLOPs increases, but better interaction with the memory makes it beneficial even when compared with optimized non-ASLI implementations. ASLI is relatively easy to implement, allowing more scientists to productively extract better performance from supercomputing clusters. Performance results are shown for a variety of platforms, proving the soundness of the approach and exemplifying how it can be straightforwardly applied with existing techniques and solutions, helping to increase the performance of existing optimization methods. In order to better express ASLI and to enable comparison with other approaches, a methodology is outlined and new metrics are set forth for evaluating stencil implementations, and perhaps the scalability of memory access in a machine. ASLI can be regarded as the application of a broader principle, namely, Kernel Convolution, to the particular case of stencil computation. From this perspective, the Influence Table could promote the use of Kernel Convolution in other applications.Computação do tipo estêncil é notória por ter o desempenho computacional limitado pela capacidade da memória de acesso rápido (RAM). Nos computadores atuais, isso implica subutilização da unidade central de processamento nesse tipo de computação. Para buscar amenizar a limitação, diversas abordagens de reordenação da computação foram propostas na literatura, notoriamente subtipos de space-blocking e time-blocking. Objetiva-se neste trabalho introduzir uma nova técnica para optimização de computação estêncil, diferente de space-blocking e time-blocking. Computação estêncil implica várias iterações de travessia por todos os pontos de um domínio, com cada iteração atualizando cada ponto com base no valor prévio dos pontos vizinhos. A técnica introduzida, ASLI (Aggregate Stencil-Loop Iteration, Iteração Agregada do Laço Estêncil), funciona atualizando os valores dos pontos do domínio com o operador estêncil original convoluído consigo uma ou mais vezes. Ela implica percorrer o domínio dos dados menos vezes que em uma implementação mais direta, do estado da arte, sendo que cada travessia efetua mais computação com os dados carregados nos registradores. Este operador mais complexo cria novas oportunidades de reúso de valores presentes nos registradores, e aumenta a razão de FLOPs por carregamento de dados da memória (load). Esquemas de reúso de computação e de dados são desenvolvidos para os casos de 1-, 2-, e 3- dimensões. A Tabela de Influência é apresentada como meio de auxiliar no cálculo de coeficientes convoluídos e deriva-se uma sequência numérica relacionada. Para operadores estêncil 2D e 3D com formato estrelar, a quantia total de FLOPs aumenta, mas uma melhor interação com o subsistema de memória torna a abordagem benéfica em comparação a implementacões não-ASLI. ASLI possui implementação relativamente simples, permitindo que mais cientistas aproveitem da capacidade de seus conglomerados de supercomputação com mais facilidade. Monstram-se resultados de desempenho para uma variedade de plataformas, provando-se a viabilidade da abordagem e que esta pode ser aplicada junto a técnicas e solucões correntes, ajudando a aumentar o desempenho de outros métodos já existentes na literatura. Para melhor exibição de ASLI e de sua comparação com outras abordagens, este trabalho esboça uma metodologia e novas métricas para avaliação de computação estêncil, e talvez também de escalabilidade de acesso à memória de computadores. Pode-se entender ASLI como a aplicação de um princípio mais amplo, a Convolução de Núcleo de Computação, ao caso particular de computação estêncil. Desse ponto de vista a Tabela de Influência poderia colaborar na disseminação da Convolução de Núcleo a outras aplicações.Biblioteca Digitais de Teses e Dissertações da USPCarvalho, Tereza Cristina Melo de BritoJanuário, Guilherme Carvalho2021-03-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/3/3141/tde-18052021-154710/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-10-09T12:45:42Zoai:teses.usp.br:tde-18052021-154710Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212024-10-09T12:45:42Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	ASLI schenes as a kernel convolved way to optimize stencil computation. Esquemas ASLI para optimização de computação stencil através de convolução do núcleo computacional.
title	ASLI schenes as a kernel convolved way to optimize stencil computation.
spellingShingle	ASLI schenes as a kernel convolved way to optimize stencil computation. Januário, Guilherme Carvalho Aggregate stencil-loop iteratio Análise de desempenho Arquiteturas paralelas ASLI Combinatória Kernel convolution Otimização matemática Stencil computation Supercomputadores Supercomputing
title_short	ASLI schenes as a kernel convolved way to optimize stencil computation.
title_full	ASLI schenes as a kernel convolved way to optimize stencil computation.
title_fullStr	ASLI schenes as a kernel convolved way to optimize stencil computation.
title_full_unstemmed	ASLI schenes as a kernel convolved way to optimize stencil computation.
title_sort	ASLI schenes as a kernel convolved way to optimize stencil computation.
author	Januário, Guilherme Carvalho
author_facet	Januário, Guilherme Carvalho
author_role	author
dc.contributor.none.fl_str_mv	Carvalho, Tereza Cristina Melo de Brito
dc.contributor.author.fl_str_mv	Januário, Guilherme Carvalho
dc.subject.por.fl_str_mv	Aggregate stencil-loop iteratio Análise de desempenho Arquiteturas paralelas ASLI Combinatória Kernel convolution Otimização matemática Stencil computation Supercomputadores Supercomputing
topic	Aggregate stencil-loop iteratio Análise de desempenho Arquiteturas paralelas ASLI Combinatória Kernel convolution Otimização matemática Stencil computation Supercomputadores Supercomputing
description	Stencil computation is notorious for having the performance limited by the main memory access. In current computers it implies underutilization of the central processing units. To cope with this limitation, multiple approaches relying on reordering the computation have been proposed, most notably variations of space-blocking and timeblocking. This work introduces a technique to speed up stencil computation, which is not based on space-blocking or time-blocking. Stencil computation implies multiple iterations of traversals through every domain point, with each iteration updating every point based on the previous values of the neighboring points. The technique introduced, named Aggregate Stencil-Loop Iteration (ASLI), works by updating the value of each domain point using the original stencil operator convolved with itself one or more times. The approach implies traversing the data domain fewer times than a straightforward iterative stencil implementation would, with each traversal performing more computation per data item fetched into registers. This more complex operator creates new opportunities for in-register data reuse and increases the FLOPs-to-load ratio. Computation and data reuse schemes are developed for its application to 1, 2, and 3- dimensional stencils. The Influence Table is presented to assist in the calculation of convolved coefficients. An integer sequence is derived. For 2D and 3D star-shaped stencils, the total number of FLOPs increases, but better interaction with the memory makes it beneficial even when compared with optimized non-ASLI implementations. ASLI is relatively easy to implement, allowing more scientists to productively extract better performance from supercomputing clusters. Performance results are shown for a variety of platforms, proving the soundness of the approach and exemplifying how it can be straightforwardly applied with existing techniques and solutions, helping to increase the performance of existing optimization methods. In order to better express ASLI and to enable comparison with other approaches, a methodology is outlined and new metrics are set forth for evaluating stencil implementations, and perhaps the scalability of memory access in a machine. ASLI can be regarded as the application of a broader principle, namely, Kernel Convolution, to the particular case of stencil computation. From this perspective, the Influence Table could promote the use of Kernel Convolution in other applications.
publishDate	2021
dc.date.none.fl_str_mv	2021-03-05
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/3/3141/tde-18052021-154710/
url	https://www.teses.usp.br/teses/disponiveis/3/3141/tde-18052021-154710/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1818279208266235904

ASLI schenes as a kernel convolved way to optimize stencil computation.

Registros relacionados