Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models
| Ano de defesa: | 2020 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Tecnológica Federal do Paraná
Cornelio Procopio Brasil Programa de Pós-Graduação em Bioinformática UTFPR |
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | http://repositorio.utfpr.edu.br/jspui/handle/1/5301 |
Resumo: | The number of available biological sequences has increased in large amounts in past years, due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed for the analysis and information extraction from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The application of machine learning methods has helped to extract relevant information from various biological datasets. However, there are still several challenging problems that motivate new algorithms and pipeline proposals. Therefore, this work proposes a generic machine learning pipeline for biological sequence analysis, following two main steps: (1) feature extraction and (2) feature selection. Essentially, we focus our work on the study of dimensionality reduction and feature extraction techniques, using metaheuristics and mathematical models. As a case study, we analyze Long Non-Coding RNA sequences. Moreover, we divided this dissertation into two parts, e.g., Experimental Test I (feature selection) and Experimental Test II (feature extraction). The experimental results indicated four main contributions: (1) A pipeline with five distinct metaheuristics, using a voting scheme and execution rounds, to the feature selection problem in biological sequences; (2) The metaheuristic efficiency, providing competitive classification performance; (3) A feature extraction pipeline using nine mathematical models and (4) its generalization and robustness for distinct biological sequence classification. |
| id |
UTFPR-12_7ed96a2219f0b96754112234d4d17f03 |
|---|---|
| oai_identifier_str |
oai:repositorio.utfpr.edu.br:1/5301 |
| network_acronym_str |
UTFPR-12 |
| network_name_str |
Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) |
| repository_id_str |
|
| spelling |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical modelsMachine learningMathematical modelsHeuristic programmingAprendizado do computadorModelos matemáticosProgramação heurísticaCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOEngenharia/Tecnologia/GestãoThe number of available biological sequences has increased in large amounts in past years, due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed for the analysis and information extraction from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The application of machine learning methods has helped to extract relevant information from various biological datasets. However, there are still several challenging problems that motivate new algorithms and pipeline proposals. Therefore, this work proposes a generic machine learning pipeline for biological sequence analysis, following two main steps: (1) feature extraction and (2) feature selection. Essentially, we focus our work on the study of dimensionality reduction and feature extraction techniques, using metaheuristics and mathematical models. As a case study, we analyze Long Non-Coding RNA sequences. Moreover, we divided this dissertation into two parts, e.g., Experimental Test I (feature selection) and Experimental Test II (feature extraction). The experimental results indicated four main contributions: (1) A pipeline with five distinct metaheuristics, using a voting scheme and execution rounds, to the feature selection problem in biological sequences; (2) The metaheuristic efficiency, providing competitive classification performance; (3) A feature extraction pipeline using nine mathematical models and (4) its generalization and robustness for distinct biological sequence classification.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)O número de sequências biológicas disponíveis aumentou em grandes quantidades nos últimos anos, devido a vários projetos de sequenciamento genômico, criando um alto volume de dados. Consequentemente, novos métodos computacionais são necessários para a análise e extração de informações a partir dessas sequências. Métodos de aprendizado de máquina têm apresentado ampla aplicabilidade em biologia computacional e bioinformática. A aplicação desses métodos tem ajudado a extrair informações relevantes de vários conjuntos de dados biológicos. No entanto, ainda existem vários problemas desafiadores que motivam novas propostas de algoritmos e pipelines. Portanto, este trabalho propõe um pipeline genérico de aprendizado de máquina para análise de sequência biológica, seguindo duas etapas principais: (1) extração e (2) seleção de características. Essencialmente, concentramos nosso trabalho no estudo de técnicas de redução de dimensionalidade e extração de recursos, usando metaheurísticas e modelos matemáticos. Como estudo de caso, analisamos sequências de RNAs longos não codificantes. Além disso, dividimos esta dissertação em duas partes: Teste Experimental I (seleção de características) e Teste Experimental II (extração de características). Os resultados experimentais indicam quatro contribuições principais: (1) Um pipeline com 5 metaheurísticas diferentes, usando um esquema de votação e rodadas de execução, ao problema de seleção de características em sequências biológicas; (2) A eficiência metaheurística, proporcionando desempenho de classificação competitiva; (3) Um pipeline de extração de recursos usando 9 modelos matemáticos e (4) sua generalização e robustez para classificação de sequências biológicas distintas.Universidade Tecnológica Federal do ParanáCornelio ProcopioBrasilPrograma de Pós-Graduação em BioinformáticaUTFPRSanches, Danilo Sipolihttp://lattes.cnpq.br/6377657274398145Paschoal, Alexandre Rossi0000-0002-8887-0582http://lattes.cnpq.br/5834088144837137Carvalho, Andre Carlos Ponce de Leon Ferreira dehttp://lattes.cnpq.br/9674541381385819Sanches, Danilo Sipolihttp://lattes.cnpq.br/6377657274398145Lopes, Fabricio Martinshttp://lattes.cnpq.br/1660070580824436Bonidia, Robson Parmezan2020-10-27T00:30:26Z2020-10-27T00:30:26Z2020-02-12info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfBONIDIA, Robson Parmezan. Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models. 2020. Dissertação (Mestrado em Bioinformática) - Universidade Tecnológica Federal do Paraná, Cornélio Procópio, 2020.http://repositorio.utfpr.edu.br/jspui/handle/1/5301enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))instname:Universidade Tecnológica Federal do Paraná (UTFPR)instacron:UTFPR2022-08-12T15:44:17Zoai:repositorio.utfpr.edu.br:1/5301Repositório InstitucionalPUBhttp://repositorio.utfpr.edu.br:8080/oai/requestriut@utfpr.edu.br || sibi@utfpr.edu.bropendoar:2022-08-12T15:44:17Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)false |
| dc.title.none.fl_str_mv |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| title |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| spellingShingle |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models Bonidia, Robson Parmezan Machine learning Mathematical models Heuristic programming Aprendizado do computador Modelos matemáticos Programação heurística CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia/Tecnologia/Gestão |
| title_short |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| title_full |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| title_fullStr |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| title_full_unstemmed |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| title_sort |
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models |
| author |
Bonidia, Robson Parmezan |
| author_facet |
Bonidia, Robson Parmezan |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Sanches, Danilo Sipoli http://lattes.cnpq.br/6377657274398145 Paschoal, Alexandre Rossi 0000-0002-8887-0582 http://lattes.cnpq.br/5834088144837137 Carvalho, Andre Carlos Ponce de Leon Ferreira de http://lattes.cnpq.br/9674541381385819 Sanches, Danilo Sipoli http://lattes.cnpq.br/6377657274398145 Lopes, Fabricio Martins http://lattes.cnpq.br/1660070580824436 |
| dc.contributor.author.fl_str_mv |
Bonidia, Robson Parmezan |
| dc.subject.por.fl_str_mv |
Machine learning Mathematical models Heuristic programming Aprendizado do computador Modelos matemáticos Programação heurística CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia/Tecnologia/Gestão |
| topic |
Machine learning Mathematical models Heuristic programming Aprendizado do computador Modelos matemáticos Programação heurística CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Engenharia/Tecnologia/Gestão |
| description |
The number of available biological sequences has increased in large amounts in past years, due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed for the analysis and information extraction from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The application of machine learning methods has helped to extract relevant information from various biological datasets. However, there are still several challenging problems that motivate new algorithms and pipeline proposals. Therefore, this work proposes a generic machine learning pipeline for biological sequence analysis, following two main steps: (1) feature extraction and (2) feature selection. Essentially, we focus our work on the study of dimensionality reduction and feature extraction techniques, using metaheuristics and mathematical models. As a case study, we analyze Long Non-Coding RNA sequences. Moreover, we divided this dissertation into two parts, e.g., Experimental Test I (feature selection) and Experimental Test II (feature extraction). The experimental results indicated four main contributions: (1) A pipeline with five distinct metaheuristics, using a voting scheme and execution rounds, to the feature selection problem in biological sequences; (2) The metaheuristic efficiency, providing competitive classification performance; (3) A feature extraction pipeline using nine mathematical models and (4) its generalization and robustness for distinct biological sequence classification. |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020-10-27T00:30:26Z 2020-10-27T00:30:26Z 2020-02-12 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
BONIDIA, Robson Parmezan. Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models. 2020. Dissertação (Mestrado em Bioinformática) - Universidade Tecnológica Federal do Paraná, Cornélio Procópio, 2020. http://repositorio.utfpr.edu.br/jspui/handle/1/5301 |
| identifier_str_mv |
BONIDIA, Robson Parmezan. Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models. 2020. Dissertação (Mestrado em Bioinformática) - Universidade Tecnológica Federal do Paraná, Cornélio Procópio, 2020. |
| url |
http://repositorio.utfpr.edu.br/jspui/handle/1/5301 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Tecnológica Federal do Paraná Cornelio Procopio Brasil Programa de Pós-Graduação em Bioinformática UTFPR |
| publisher.none.fl_str_mv |
Universidade Tecnológica Federal do Paraná Cornelio Procopio Brasil Programa de Pós-Graduação em Bioinformática UTFPR |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) instname:Universidade Tecnológica Federal do Paraná (UTFPR) instacron:UTFPR |
| instname_str |
Universidade Tecnológica Federal do Paraná (UTFPR) |
| instacron_str |
UTFPR |
| institution |
UTFPR |
| reponame_str |
Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) |
| collection |
Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) |
| repository.name.fl_str_mv |
Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR) |
| repository.mail.fl_str_mv |
riut@utfpr.edu.br || sibi@utfpr.edu.br |
| _version_ |
1850498324848705536 |