Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Oliveira, Fábio Fonseca de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Rio Grande do Norte
Brasil
UFRN
PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufrn.br/handle/123456789/58813
Resumo: In this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics.
id UFRN_3e8ec5bf6fc16fb7bc7838086f35f226
oai_identifier_str oai:repositorio.ufrn.br:123456789/58813
network_acronym_str UFRN
network_name_str Repositório Institucional da UFRN
repository_id_str
spelling Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithmsSmith-WatermanK-MersFPGAArray sistólicoAlta taxa de transferênciaBaixo uso de memóriaCNPQ::CIENCIAS BIOLOGICASIn this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics.Fundação Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESNeste trabalho, abordamos o desafio crescente de processar eficientemente o vasto e continuamente expansivo volume de dados em bases de dados biológicas. A necessidade de técnicas de análise de sequências rápidas e precisas é mais premente do que nunca, dada a importância de identificar semelhanças entre sequências biológicas para aplicações em genômica, taxonomia e além. Central para este esforço é a otimização de algoritmos de alinhamento de sequências, particularmente o Smith-Waterman (SW), um método de alto nível de precisão baseado em programação dinâmica, e o K-Mers, uma técnica para a contagem de subsequências que é fundamental na análise genômica. Propomos uma inovadora arquitetura de hardware paralelo para o algoritmo SW, incorporando uma estrutura de array sistólico que acelera significativamente as fases de avanço e retrocesso do alinhamento. Esta arquitetura pré-organiza o alinhamento na etapa de avanço, reduzindo a complexidade do subsequente retrocesso, que é iniciado a partir da posição de pontuação máxima. Validada em Field-Programmable Gate Array (FPGA), a arquitetura alcançou uma taxa de até 79,5 Giga Cell Updates por Segundo (GCPUS), demonstrando um avanço notável na eficiência de processamento. Adicionalmente, desenvolvemos um algoritmo baseado em K-Mers focado na extração exata de subsequências curtas, caracterizado por seu baixo consumo de memória, viabilidade de tempo de execução, alta capacidade de paralelização, e eficiência energética. Destinado primariamente para uso em FPGA, o algoritmo é também adaptável a outras plataformas de hardware. Estas contribuições não apenas estabelecem novos padrões em termos de velocidade e eficiência para o processamento de dados biológicos, mas também abrem caminho para avanços significativos em pesquisas genômicas e taxonômicas, entre outras áreas de bioinformática.Universidade Federal do Rio Grande do NorteBrasilUFRNPROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICAFernandes, Marcelo Augusto Costahttps://orcid.org/0000-0001-7536-2506http://lattes.cnpq.br/3475337353676349Moioli, Renan CiprianoAraújo, Daniel Sabino Amorim deSakuyama, Carlos Alberto ValderramaSilva, Lucileide Medeiros Dantas daOliveira, Fábio Fonseca de2024-07-17T20:14:13Z2024-07-17T20:14:13Z2024-04-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfOLIVEIRA, Fábio Fonseca de. Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 88f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024.https://repositorio.ufrn.br/handle/123456789/58813info:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRN2024-07-17T20:14:38Zoai:repositorio.ufrn.br:123456789/58813Repositório InstitucionalPUBhttp://repositorio.ufrn.br/oai/repositorio@bczm.ufrn.bropendoar:2024-07-17T20:14:38Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.none.fl_str_mv Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
title Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
spellingShingle Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
Oliveira, Fábio Fonseca de
Smith-Waterman
K-Mers
FPGA
Array sistólico
Alta taxa de transferência
Baixo uso de memória
CNPQ::CIENCIAS BIOLOGICAS
title_short Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
title_full Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
title_fullStr Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
title_full_unstemmed Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
title_sort Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
author Oliveira, Fábio Fonseca de
author_facet Oliveira, Fábio Fonseca de
author_role author
dc.contributor.none.fl_str_mv Fernandes, Marcelo Augusto Costa
https://orcid.org/0000-0001-7536-2506
http://lattes.cnpq.br/3475337353676349
Moioli, Renan Cipriano
Araújo, Daniel Sabino Amorim de
Sakuyama, Carlos Alberto Valderrama
Silva, Lucileide Medeiros Dantas da
dc.contributor.author.fl_str_mv Oliveira, Fábio Fonseca de
dc.subject.por.fl_str_mv Smith-Waterman
K-Mers
FPGA
Array sistólico
Alta taxa de transferência
Baixo uso de memória
CNPQ::CIENCIAS BIOLOGICAS
topic Smith-Waterman
K-Mers
FPGA
Array sistólico
Alta taxa de transferência
Baixo uso de memória
CNPQ::CIENCIAS BIOLOGICAS
description In this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics.
publishDate 2024
dc.date.none.fl_str_mv 2024-07-17T20:14:13Z
2024-07-17T20:14:13Z
2024-04-05
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv OLIVEIRA, Fábio Fonseca de. Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 88f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024.
https://repositorio.ufrn.br/handle/123456789/58813
identifier_str_mv OLIVEIRA, Fábio Fonseca de. Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 88f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024.
url https://repositorio.ufrn.br/handle/123456789/58813
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal do Rio Grande do Norte
Brasil
UFRN
PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA
publisher.none.fl_str_mv Universidade Federal do Rio Grande do Norte
Brasil
UFRN
PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRN
instname:Universidade Federal do Rio Grande do Norte (UFRN)
instacron:UFRN
instname_str Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str UFRN
institution UFRN
reponame_str Repositório Institucional da UFRN
collection Repositório Institucional da UFRN
repository.name.fl_str_mv Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv repositorio@bczm.ufrn.br
_version_ 1855758759486291968