Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Universidade Federal do Rio Grande do Norte
Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA |
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://repositorio.ufrn.br/handle/123456789/58813 |
Resumo: | In this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics. |
| id |
UFRN_3e8ec5bf6fc16fb7bc7838086f35f226 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufrn.br:123456789/58813 |
| network_acronym_str |
UFRN |
| network_name_str |
Repositório Institucional da UFRN |
| repository_id_str |
|
| spelling |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithmsSmith-WatermanK-MersFPGAArray sistólicoAlta taxa de transferênciaBaixo uso de memóriaCNPQ::CIENCIAS BIOLOGICASIn this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics.Fundação Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESNeste trabalho, abordamos o desafio crescente de processar eficientemente o vasto e continuamente expansivo volume de dados em bases de dados biológicas. A necessidade de técnicas de análise de sequências rápidas e precisas é mais premente do que nunca, dada a importância de identificar semelhanças entre sequências biológicas para aplicações em genômica, taxonomia e além. Central para este esforço é a otimização de algoritmos de alinhamento de sequências, particularmente o Smith-Waterman (SW), um método de alto nível de precisão baseado em programação dinâmica, e o K-Mers, uma técnica para a contagem de subsequências que é fundamental na análise genômica. Propomos uma inovadora arquitetura de hardware paralelo para o algoritmo SW, incorporando uma estrutura de array sistólico que acelera significativamente as fases de avanço e retrocesso do alinhamento. Esta arquitetura pré-organiza o alinhamento na etapa de avanço, reduzindo a complexidade do subsequente retrocesso, que é iniciado a partir da posição de pontuação máxima. Validada em Field-Programmable Gate Array (FPGA), a arquitetura alcançou uma taxa de até 79,5 Giga Cell Updates por Segundo (GCPUS), demonstrando um avanço notável na eficiência de processamento. Adicionalmente, desenvolvemos um algoritmo baseado em K-Mers focado na extração exata de subsequências curtas, caracterizado por seu baixo consumo de memória, viabilidade de tempo de execução, alta capacidade de paralelização, e eficiência energética. Destinado primariamente para uso em FPGA, o algoritmo é também adaptável a outras plataformas de hardware. Estas contribuições não apenas estabelecem novos padrões em termos de velocidade e eficiência para o processamento de dados biológicos, mas também abrem caminho para avanços significativos em pesquisas genômicas e taxonômicas, entre outras áreas de bioinformática.Universidade Federal do Rio Grande do NorteBrasilUFRNPROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICAFernandes, Marcelo Augusto Costahttps://orcid.org/0000-0001-7536-2506http://lattes.cnpq.br/3475337353676349Moioli, Renan CiprianoAraújo, Daniel Sabino Amorim deSakuyama, Carlos Alberto ValderramaSilva, Lucileide Medeiros Dantas daOliveira, Fábio Fonseca de2024-07-17T20:14:13Z2024-07-17T20:14:13Z2024-04-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfOLIVEIRA, Fábio Fonseca de. Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 88f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024.https://repositorio.ufrn.br/handle/123456789/58813info:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRN2024-07-17T20:14:38Zoai:repositorio.ufrn.br:123456789/58813Repositório InstitucionalPUBhttp://repositorio.ufrn.br/oai/repositorio@bczm.ufrn.bropendoar:2024-07-17T20:14:38Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
| dc.title.none.fl_str_mv |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| title |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| spellingShingle |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms Oliveira, Fábio Fonseca de Smith-Waterman K-Mers FPGA Array sistólico Alta taxa de transferência Baixo uso de memória CNPQ::CIENCIAS BIOLOGICAS |
| title_short |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| title_full |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| title_fullStr |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| title_full_unstemmed |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| title_sort |
Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms |
| author |
Oliveira, Fábio Fonseca de |
| author_facet |
Oliveira, Fábio Fonseca de |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Fernandes, Marcelo Augusto Costa https://orcid.org/0000-0001-7536-2506 http://lattes.cnpq.br/3475337353676349 Moioli, Renan Cipriano Araújo, Daniel Sabino Amorim de Sakuyama, Carlos Alberto Valderrama Silva, Lucileide Medeiros Dantas da |
| dc.contributor.author.fl_str_mv |
Oliveira, Fábio Fonseca de |
| dc.subject.por.fl_str_mv |
Smith-Waterman K-Mers FPGA Array sistólico Alta taxa de transferência Baixo uso de memória CNPQ::CIENCIAS BIOLOGICAS |
| topic |
Smith-Waterman K-Mers FPGA Array sistólico Alta taxa de transferência Baixo uso de memória CNPQ::CIENCIAS BIOLOGICAS |
| description |
In this work, we address the growing challenge of efficiently processing the vast and continuously expanding volume of data in biological databases. The need for fast and accurate sequence analysis techniques is more pressing than ever, given the importance of identifying similarities between biological sequences for applications in genomics, taxonomy, and beyond. Central to this effort is optimizing sequence alignment algorithms, particularly the Smith-Waterman (SW), a high-precision method based on dynamic programming, and K-Mers, a technique for counting subsequences fundamental in genomic analysis. We propose an innovative parallel hardware architecture for the SW algorithm, incorporating a systolic array structure that significantly accelerates the forward and backward phases of alignment. This architecture pre-organizes the alignment in the forward stage, reducing the complexity of the subsequent backtracking initiated from the maximum score position. Validated on Field-Programmable Gate Array (FPGA), the architecture achieved a rate of up to 79.5 Giga Cell Updates per Second (GCPUS), demonstrating a notable advancement in processing efficiency. Additionally, we developed a K-Mers based algorithm focused on the exact extraction of short subsequences, characterized by its low memory consumption, feasibility of execution time, high parallelization capability, and energy efficiency. Primarily intended for use in FPGA, the algorithm is also adaptable to other hardware platforms. These contributions not only set new standards in speed and efficiency for the processing of biological data but also pave the way for significant advances in genomic and taxonomic research, among other areas of bioinformatics. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-07-17T20:14:13Z 2024-07-17T20:14:13Z 2024-04-05 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
OLIVEIRA, Fábio Fonseca de. Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 88f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024. https://repositorio.ufrn.br/handle/123456789/58813 |
| identifier_str_mv |
OLIVEIRA, Fábio Fonseca de. Proposed FPGA-Based hardware architectures for acceleration of Smith-Waterman and K-Mers algorithms. Orientador: Dr. Marcelo Augusto Costa Fernandes. 2024. 88f. Tese (Doutorado em Bioinformática) - Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2024. |
| url |
https://repositorio.ufrn.br/handle/123456789/58813 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA |
| publisher.none.fl_str_mv |
Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM BIOINFORMÁTICA |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
| instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
| instacron_str |
UFRN |
| institution |
UFRN |
| reponame_str |
Repositório Institucional da UFRN |
| collection |
Repositório Institucional da UFRN |
| repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
| repository.mail.fl_str_mv |
repositorio@bczm.ufrn.br |
| _version_ |
1855758759486291968 |