Accelerating machine learning using risc-v vector extension in a manycore platform

Nunes, Willian Analdo

Accelerating machine learning using risc-v vector extension in a manycore platform

Detalhes bibliográficos
Ano de defesa:	2025
Autor(a) principal:	Nunes, Willian Analdo
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Pontifícia Universidade Católica do Rio Grande do Sul Escola Politécnica Brasil PUCRS Programa de Pós-Graduação em Ciência da Computação
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	RISC-V Vector Processing Hardware Acceleration Manycores Convolutional Neural Networks Processamento Vetorial Aceleração de Hardware Redes Neurais Convolucionais CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
Link de acesso:	https://tede2.pucrs.br/tede2/handle/tede/11675
Resumo:	The increasing computational demands of Machine Learning (ML) workloads, particularly Convolutional Neural Networks (CNNs), require efficient hardware acceleration solutions. This dissertation investigates the RISC-V Vector Extension (RVV) to accelerate the CNN inference in single-core and manycore architectures. The research presents the RS5 processor, an RTL implementation of a RISC-V-based core enhanced with a subset of RVV instructions designed for efficient data parallelism. Additionally, this processor was integrated into the Memphis-V manycore platform, enabling further performance scaling through parallel execution. A comprehensive evaluation was conducted to analyze the impact of RVV-based acceleration on performance, energy consumption, memory footprint, and hardware área costs. The results demonstrate that the vectorized implementation of CNN operations on the RS5 processor achieves a speedup of up to 7.68x (1-D CNN layer) in single-core execution compared to a scalar baseline, reducing energy consumption by up to 61% and achieves speed-ups of up to 16x in a dot-product application. When deployed in the manycore environment, additional performance gains were observed, with the first layer of AlexNet achieving up to 5.7× acceleration over the scalar single-core implementation and reducing code size by up to 87% in the second layer. The integration of auto-vectorization and manually optimized vector assembly further highlighted the effectiveness of RVV in accelerating ML workloads. Experimental results demonstrate that the integration of RVV significantly enhances CNN inference speed. The manycore implementation further amplifies these benefits, highlighting the potential of RISC-V-based vector architectures for efficient ML acceleration. This work contributes to hardware acceleration by showcasing a scalable, open-source solution for CNN applications.

Metadados do item

id	P_RS_98e22a68b266b5caa5ef6892ed81b102
oai_identifier_str	oai:tede2.pucrs.br:tede/11675
network_acronym_str	P_RS
network_name_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository_id_str
spelling	Accelerating machine learning using risc-v vector extension in a manycore platformAceleração de aprendizado de máquina usando extensão vetorial risc-v em uma plataforma manycoreRISC-VVector ProcessingHardware AccelerationManycoresConvolutional Neural NetworksRISC-VProcessamento VetorialAceleração de HardwareManycoresRedes Neurais ConvolucionaisCIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAOThe increasing computational demands of Machine Learning (ML) workloads, particularly Convolutional Neural Networks (CNNs), require efficient hardware acceleration solutions. This dissertation investigates the RISC-V Vector Extension (RVV) to accelerate the CNN inference in single-core and manycore architectures. The research presents the RS5 processor, an RTL implementation of a RISC-V-based core enhanced with a subset of RVV instructions designed for efficient data parallelism. Additionally, this processor was integrated into the Memphis-V manycore platform, enabling further performance scaling through parallel execution. A comprehensive evaluation was conducted to analyze the impact of RVV-based acceleration on performance, energy consumption, memory footprint, and hardware área costs. The results demonstrate that the vectorized implementation of CNN operations on the RS5 processor achieves a speedup of up to 7.68x (1-D CNN layer) in single-core execution compared to a scalar baseline, reducing energy consumption by up to 61% and achieves speed-ups of up to 16x in a dot-product application. When deployed in the manycore environment, additional performance gains were observed, with the first layer of AlexNet achieving up to 5.7× acceleration over the scalar single-core implementation and reducing code size by up to 87% in the second layer. The integration of auto-vectorization and manually optimized vector assembly further highlighted the effectiveness of RVV in accelerating ML workloads. Experimental results demonstrate that the integration of RVV significantly enhances CNN inference speed. The manycore implementation further amplifies these benefits, highlighting the potential of RISC-V-based vector architectures for efficient ML acceleration. This work contributes to hardware acceleration by showcasing a scalable, open-source solution for CNN applications.O crescente aumento na demanda computacional de cargas de trabalho de Machine Learning (ML), especialmente Redes Neurais Convolucionais (CNNs), exige soluções eficientes de aceleração em hardware. Esta dissertação investiga o uso da Extensão Vetorial do RISC-V (RVV) para acelerar a inferência de CNNs em arquiteturas single-core e manycore. O estudo apresenta o processador RS5, uma implementação RTL de um núcleo baseado em RISC-V aprimorado com um subconjunto de instruções RVV projetado para paralelismo eficiente de dados. Além disso, este processador foi integrado à plataforma manycore Memphis-V, permitindo uma maior escala de desempenho por meio da execução paralela. Foi realizada uma avaliação abrangente para analisar como a aceleração baseada em RVV impacta no desempenho, consumo de energia, uso de memória e custos de área de hardware. Os resultados demonstram que a implementação vetorizada das operações de CNN no processador RS5 atinge um speedup de até 7,68× (camada 1-D CNN) na execução single-core em comparação com a versão escalar, reduzindo o consumo de energia em até 61% e alcançando ganhos de desempenho de até 16× em uma aplicação de produto escalar (dot-product). Quando implantado no ambiente manycore, foram observados ganhos adicionais de desempenho, com a primeira camada da AlexNet atingindo uma aceleração de até 5,7× sobre a implementação escalar single-core e reduzindo o tamanho do código em até 87,5% na segunda camada. A integração da auto-vetorização e da otimização manual em assembly vetorial destacou ainda mais a eficácia do RVV na aceleração de cargas de trabalho de ML. Os resultados experimentais demonstram que a integração do RVV melhora significativamente a velocidade de inferência de CNNs. A implementação manycore amplifica ainda mais esses benefícios, evidenciando o potencial das arquiteturas vetoriais baseadas em RISC-V para aceleração eficiente de ML. Este trabalho contribui para a área de aceleração em hardware ao apresentar uma solução escalável e de código aberto para aplicações de CNNs.Pontifícia Universidade Católica do Rio Grande do SulEscola PolitécnicaBrasilPUCRSPrograma de Pós-Graduação em Ciência da ComputaçãoMoraes, Fernando Gehmhttp://lattes.cnpq.br/2509301929350826Nunes, Willian Analdo2025-06-10T21:41:32Z2025-03-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://tede2.pucrs.br/tede2/handle/tede/11675enginfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da PUC_RSinstname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)instacron:PUC_RS2025-06-10T23:00:19Zoai:tede2.pucrs.br:tede/11675Biblioteca Digital de Teses e Dissertaçõeshttp://tede2.pucrs.br/tede2/PRIhttps://tede2.pucrs.br/oai/requestbiblioteca.central@pucrs.br\|\|opendoar:2025-06-10T23:00:19Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)false
dc.title.none.fl_str_mv	Accelerating machine learning using risc-v vector extension in a manycore platform Aceleração de aprendizado de máquina usando extensão vetorial risc-v em uma plataforma manycore
title	Accelerating machine learning using risc-v vector extension in a manycore platform
spellingShingle	Accelerating machine learning using risc-v vector extension in a manycore platform Nunes, Willian Analdo RISC-V Vector Processing Hardware Acceleration Manycores Convolutional Neural Networks RISC-V Processamento Vetorial Aceleração de Hardware Manycores Redes Neurais Convolucionais CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
title_short	Accelerating machine learning using risc-v vector extension in a manycore platform
title_full	Accelerating machine learning using risc-v vector extension in a manycore platform
title_fullStr	Accelerating machine learning using risc-v vector extension in a manycore platform
title_full_unstemmed	Accelerating machine learning using risc-v vector extension in a manycore platform
title_sort	Accelerating machine learning using risc-v vector extension in a manycore platform
author	Nunes, Willian Analdo
author_facet	Nunes, Willian Analdo
author_role	author
dc.contributor.none.fl_str_mv	Moraes, Fernando Gehm http://lattes.cnpq.br/2509301929350826
dc.contributor.author.fl_str_mv	Nunes, Willian Analdo
dc.subject.por.fl_str_mv	RISC-V Vector Processing Hardware Acceleration Manycores Convolutional Neural Networks RISC-V Processamento Vetorial Aceleração de Hardware Manycores Redes Neurais Convolucionais CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
topic	RISC-V Vector Processing Hardware Acceleration Manycores Convolutional Neural Networks RISC-V Processamento Vetorial Aceleração de Hardware Manycores Redes Neurais Convolucionais CIENCIA DA COMPUTACAO::TEORIA DA COMPUTACAO
description	The increasing computational demands of Machine Learning (ML) workloads, particularly Convolutional Neural Networks (CNNs), require efficient hardware acceleration solutions. This dissertation investigates the RISC-V Vector Extension (RVV) to accelerate the CNN inference in single-core and manycore architectures. The research presents the RS5 processor, an RTL implementation of a RISC-V-based core enhanced with a subset of RVV instructions designed for efficient data parallelism. Additionally, this processor was integrated into the Memphis-V manycore platform, enabling further performance scaling through parallel execution. A comprehensive evaluation was conducted to analyze the impact of RVV-based acceleration on performance, energy consumption, memory footprint, and hardware área costs. The results demonstrate that the vectorized implementation of CNN operations on the RS5 processor achieves a speedup of up to 7.68x (1-D CNN layer) in single-core execution compared to a scalar baseline, reducing energy consumption by up to 61% and achieves speed-ups of up to 16x in a dot-product application. When deployed in the manycore environment, additional performance gains were observed, with the first layer of AlexNet achieving up to 5.7× acceleration over the scalar single-core implementation and reducing code size by up to 87% in the second layer. The integration of auto-vectorization and manually optimized vector assembly further highlighted the effectiveness of RVV in accelerating ML workloads. Experimental results demonstrate that the integration of RVV significantly enhances CNN inference speed. The manycore implementation further amplifies these benefits, highlighting the potential of RISC-V-based vector architectures for efficient ML acceleration. This work contributes to hardware acceleration by showcasing a scalable, open-source solution for CNN applications.
publishDate	2025
dc.date.none.fl_str_mv	2025-06-10T21:41:32Z 2025-03-11
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://tede2.pucrs.br/tede2/handle/tede/11675
url	https://tede2.pucrs.br/tede2/handle/tede/11675
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul Escola Politécnica Brasil PUCRS Programa de Pós-Graduação em Ciência da Computação
publisher.none.fl_str_mv	Pontifícia Universidade Católica do Rio Grande do Sul Escola Politécnica Brasil PUCRS Programa de Pós-Graduação em Ciência da Computação
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da PUC_RS instname:Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS) instacron:PUC_RS
instname_str	Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
instacron_str	PUC_RS
institution	PUC_RS
reponame_str	Biblioteca Digital de Teses e Dissertações da PUC_RS
collection	Biblioteca Digital de Teses e Dissertações da PUC_RS
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da PUC_RS - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)
repository.mail.fl_str_mv	biblioteca.central@pucrs.br\|\|
_version_	1850041319477477376

Accelerating machine learning using risc-v vector extension in a manycore platform

Registros relacionados