Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Werdt, Paulo Roberto Teixeira lattes
Orientador(a): Dardenne, Laurent Emmanuel
Banca de defesa: Dardenne, Laurent Emmanuel, Ocaña Gautherot, Kary Ann del Carmen, Barbosa, Hélio José Corrêa, Pascutti, Pedro Geraldo
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Laboratório Nacional de Computação Científica
Programa de Pós-Graduação: Programa de Pós-Graduação em Modelagem Computacional
Departamento: Coordenação de Pós-Graduação e Aperfeiçoamento (COPGA)
País: Brasil
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: https://tede.lncc.br/handle/tede/343
Resumo: One of the main challenges of the computational molecular biology has been the development of methods capable of predicting the native structure of a protein from its sequence of amino acids. The Critical Assessment of Protein Structure Prediction has been documenting the progress and the state of the art of PSP. Nowadays PSP techniques are sub divided into two main groups: Template based, when proteins solved experimentally are used as starting points for the generation of new models; and Free Modeling (FM), where the generation of new models are obtained strictly from physicochemical principles (ab initio) or when some experimental information is used (de novo). Among the main problems that a PSP FM method has to face are: (i) the deceptiveness associated with the evaluation functions due to the theoretical simplifications introduced to make the model computationally viable; (ii) the size, complexity and multimodality of the search space, that result from the possible conformations associated with a given protein sequence. In many cases the native structure does not even belong to the global minimum basin of the investigated surface. The results of the evaluations of the CASP events indicate the Rosetta program as one of the most successful methods of PSP de novo. Its algorithm uses fragments of experimentally determined structures to guide the search in structural space. The insertion of these fragments is performed by a Monte Carlo algorithm, which assembles the fragments into the structure using Simulated Annealing, evaluated by mixed score functions, based both on molecular mechanics and on knowledge from databases of known structures. Its optimization strategy uses a large number of independent executions of the algorithm, where each trajectory seeks to quickly converge to a minimum of the energy surface, to obtain a large set of predictions, from which a subset of the most promising models is chosen. Literature results indicate that the Rosetta energy functions are the strong point of its methodology, being widely used in other successful PSP FM methods, yet its conformational search strategy can be improved. Bibliographic research on related works shows that, in fact, the Rosetta algorithm cannot efficiently explore the fragment's multimodal space, which reinforces the idea of using a different method to improve its conformational search. The objective of this work is to improve the conformational search of Rosetta through the development of evolutionary metaheuristics, in order to achieve a better mapping of the search space and obtain gains, both in the quality of the predicted structures as for the reduction of the high computational cost that Rosetta requires to get good quality models. In this approach, a populational algorithm with parental replacement by phenotypic crowding is used, where the offspring of candidate solutions replace the elements in the parental population by a criterion of structural similarity. Variations of the algorithms have also been developed with different population regimes and selection mechanisms: tournament based and by the structural clustering of individuals of the population, in addition to a crossover operator that considers secondary structure predictions. The implementation of these techniques to the Rosetta de novo methodology was gradually tested in experiments with a diverse group of protein targets. The results obtained with the most successful resulting algorithms have been contrasted with corresponding results from the original Rosetta, and also with two other similar approaches found in literature. They have demonstrated to reach a positive balance between the exploration and the exploitation of the energetic hypersurface, generated by the different scores used during the assembly of the fragments. Two of the algorithms proposed in this work achieved significant improvements, not only in the diversity and structural quality of the predicted final models, but also in the reduction of the computational cost necessary to obtain good quality structures.
id LNCC_45961fb434fbc81e716b7084bdf8d8b4
oai_identifier_str oai:tede-server.lncc.br:tede/343
network_acronym_str LNCC
network_name_str Biblioteca Digital de Teses e Dissertações do LNCC
repository_id_str
spelling Dardenne, Laurent EmmanuelDardenne, Laurent EmmanuelOcaña Gautherot, Kary Ann del CarmenBarbosa, Hélio José CorrêaPascutti, Pedro Geraldohttp://lattes.cnpq.br/5138848008740842Werdt, Paulo Roberto Teixeira2023-03-30T18:31:27Z2021-09-24WERDT, P. R. T. Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta. 2021. 199 f. Tese (Programa de Pós-Graduação em Modelagem Computacional) - Laboratório Nacional de Computação Científica, Petrópolis, 2021.https://tede.lncc.br/handle/tede/343One of the main challenges of the computational molecular biology has been the development of methods capable of predicting the native structure of a protein from its sequence of amino acids. The Critical Assessment of Protein Structure Prediction has been documenting the progress and the state of the art of PSP. Nowadays PSP techniques are sub divided into two main groups: Template based, when proteins solved experimentally are used as starting points for the generation of new models; and Free Modeling (FM), where the generation of new models are obtained strictly from physicochemical principles (ab initio) or when some experimental information is used (de novo). Among the main problems that a PSP FM method has to face are: (i) the deceptiveness associated with the evaluation functions due to the theoretical simplifications introduced to make the model computationally viable; (ii) the size, complexity and multimodality of the search space, that result from the possible conformations associated with a given protein sequence. In many cases the native structure does not even belong to the global minimum basin of the investigated surface. The results of the evaluations of the CASP events indicate the Rosetta program as one of the most successful methods of PSP de novo. Its algorithm uses fragments of experimentally determined structures to guide the search in structural space. The insertion of these fragments is performed by a Monte Carlo algorithm, which assembles the fragments into the structure using Simulated Annealing, evaluated by mixed score functions, based both on molecular mechanics and on knowledge from databases of known structures. Its optimization strategy uses a large number of independent executions of the algorithm, where each trajectory seeks to quickly converge to a minimum of the energy surface, to obtain a large set of predictions, from which a subset of the most promising models is chosen. Literature results indicate that the Rosetta energy functions are the strong point of its methodology, being widely used in other successful PSP FM methods, yet its conformational search strategy can be improved. Bibliographic research on related works shows that, in fact, the Rosetta algorithm cannot efficiently explore the fragment's multimodal space, which reinforces the idea of using a different method to improve its conformational search. The objective of this work is to improve the conformational search of Rosetta through the development of evolutionary metaheuristics, in order to achieve a better mapping of the search space and obtain gains, both in the quality of the predicted structures as for the reduction of the high computational cost that Rosetta requires to get good quality models. In this approach, a populational algorithm with parental replacement by phenotypic crowding is used, where the offspring of candidate solutions replace the elements in the parental population by a criterion of structural similarity. Variations of the algorithms have also been developed with different population regimes and selection mechanisms: tournament based and by the structural clustering of individuals of the population, in addition to a crossover operator that considers secondary structure predictions. The implementation of these techniques to the Rosetta de novo methodology was gradually tested in experiments with a diverse group of protein targets. The results obtained with the most successful resulting algorithms have been contrasted with corresponding results from the original Rosetta, and also with two other similar approaches found in literature. They have demonstrated to reach a positive balance between the exploration and the exploitation of the energetic hypersurface, generated by the different scores used during the assembly of the fragments. Two of the algorithms proposed in this work achieved significant improvements, not only in the diversity and structural quality of the predicted final models, but also in the reduction of the computational cost necessary to obtain good quality structures.Um dos principais desafios da biologia molecular computacional têm sido o desenvolvimento de métodos capazes de prever a estrutura nativa de uma proteína a partir de sua sequência de aminoácidos. O evento bianual CASP tem documentado o progresso e o estado da arte da PSP, que hoje é subdividida em dois grupos: predição baseada em Templates, quando proteínas resolvidas experimentalmente são utilizadas como moldes para a geração de novos modelos e a predição Free Modelling (FM), onde a modelagem é realizada estritamente a partir de princípios físico-químicos (ab initio) ou quando algum tipo de informação experimental é utilizada (de novo). Entre os principais problemas que um método de PSP FM tem que enfrentar estão: (i) a baixa acurácia associada às funções avaliações utilizadas devido às simplificações teóricas introduzidas para tornar o modelo viável computacionalmente; (ii) o tamanho, complexidade e multimodalidade do espaço de busca que resultam das possíveis conformações associadas a uma dada sequência proteica. Em muitos casos a estrutura nativa não pertence sequer à bacia de mínimo global da superfície de energia investigada. Os resultados das avaliações dos eventos CASP indicam o programa Rosetta como um dos métodos de PSP de novo mais bem-sucedidos. Seu algoritmo utiliza fragmentos de estruturas determinadas experimentalmente para orientar a busca no espaço das estruturas. A inserção destes fragmentos é realizada por um algoritmo de Monte Carlo, que monta os fragmentos na estrutura utilizando Simulated Annealing, avaliado por funções score mistas, baseadas tanto em mecânica molecular quanto em conhecimento proveniente de bancos de dados de estruturas já conhecidas. Sua estratégia de otimização utiliza um grande número de execuções independentes do algoritmo, onde cada trajetória procura convergir rapidamente para um mínimo da superfície de energia, para obter um grande conjunto de predições, do qual um subconjunto de modelos mais promissores é escolhido. Resultados da literatura indicam que as funções de energia do Rosetta são o ponto forte de sua metodologia, sendo amplamente utilizadas em outros métodos de PSP FM de sucesso, mas que sua estratégia de busca conformacional é passível de ser aprimorada. Pesquisas bibliográficas sobre trabalhos relacionados mostram que de fato o algoritmo do Rosetta não consegue explorar eficientemente o espaço multimodal dos fragmentos, o que reforça a ideia de utilizar um método distinto para melhorar a sua busca conformacional. O objetivo deste trabalho é aprimorar a busca conformacional do Rosetta através do desenvolvimento de metaheurísticas evolucionistas, de forma a alcançar um melhor mapeamento do espaço de busca e obter ganhos tanto na qualidade das estruturas preditas, quanto na redução do alto custo computacional que o Rosetta requer para obter modelos de boa qualidade. Nesta abordagem é utilizado um algoritmo populacional com substituição parental por aglomeração fenotípica, onde a prole de soluções candidatas substitui os elementos na população parental por um critério de semelhança estrutural. Foram também desenvolvidas variações dos algoritmos com diferentes regimes populacionais e mecanismos de seleção por torneio e baseada na clusterização estrutural dos indivíduos da população, além de um operador de recombinação que considera predições de estrutura secundária. A implementação destas técnicas, à metodologia de novo do Rosetta, foi testada gradualmente em experimentações com um grupo diverso de alvos proteicos. Os resultados obtidos com os algoritmos mais bem-sucedidos foram confrontados com resultados correspondentes do Rosetta original e de outras duas abordagens semelhantes encontradas na literatura, e demonstraram conseguir um balanço positivo entre a busca e a exploração da hipersuperfície energética, gerada pelos diferentes scores utilizados durante a montagem dos fragmentos. Dois dos algoritmos propostos neste trabalho resultaram em melhorias significativas, tanto na diversidade e na qualidade estrutural dos modelos finais preditos, quanto na redução do custo computacional necessário para se obter estruturas de boa qualidade.Submitted by Patrícia Vieira Silva (library@lncc.br) on 2023-03-30T18:29:53Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_final_Paulo_Werdt.pdf: 61469755 bytes, checksum: 1a92b7d6dfd395894907ed528ef06d21 (MD5)Approved for entry into archive by Patrícia Vieira Silva (library@lncc.br) on 2023-03-30T18:31:08Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_final_Paulo_Werdt.pdf: 61469755 bytes, checksum: 1a92b7d6dfd395894907ed528ef06d21 (MD5)Made available in DSpace on 2023-03-30T18:31:27Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_final_Paulo_Werdt.pdf: 61469755 bytes, checksum: 1a92b7d6dfd395894907ed528ef06d21 (MD5) Previous issue date: 2021-09-24Coordenação de Aperfeiçoamento de Pessoal de Nível Superiorapplication/pdfhttp://tede-server.lncc.br:8080/retrieve/1395/Tese_final_Paulo_Werdt.pdf.jpgporLaboratório Nacional de Computação CientíficaPrograma de Pós-Graduação em Modelagem ComputacionalLNCCBrasilCoordenação de Pós-Graduação e Aperfeiçoamento (COPGA)http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessBiologia molecular - Processamento eletrônico de dadosAlgorítmos genéticosBioinformáticaProteínasCNPQ::CIENCIAS BIOLOGICAS::GENETICADesenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosettainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisreponame:Biblioteca Digital de Teses e Dissertações do LNCCinstname:Laboratório Nacional de Computação Científica (LNCC)instacron:LNCCLICENSElicense.txtlicense.txttext/plain; charset=utf-82165http://tede-server.lncc.br:8080/tede/bitstream/tede/343/1/license.txtbd3efa91386c1718a7f26a329fdcb468MD51CC-LICENSElicense_urllicense_urltext/plain; charset=utf-849http://tede-server.lncc.br:8080/tede/bitstream/tede/343/2/license_url4afdbb8c545fd630ea7db775da747b2fMD52license_textlicense_texttext/html; charset=utf-80http://tede-server.lncc.br:8080/tede/bitstream/tede/343/3/license_textd41d8cd98f00b204e9800998ecf8427eMD53license_rdflicense_rdfapplication/rdf+xml; charset=utf-80http://tede-server.lncc.br:8080/tede/bitstream/tede/343/4/license_rdfd41d8cd98f00b204e9800998ecf8427eMD54ORIGINALTese_final_Paulo_Werdt.pdfTese_final_Paulo_Werdt.pdfapplication/pdf61469755http://tede-server.lncc.br:8080/tede/bitstream/tede/343/5/Tese_final_Paulo_Werdt.pdf1a92b7d6dfd395894907ed528ef06d21MD55TEXTTese_final_Paulo_Werdt.pdf.txtTese_final_Paulo_Werdt.pdf.txttext/plain215http://tede-server.lncc.br:8080/tede/bitstream/tede/343/6/Tese_final_Paulo_Werdt.pdf.txtd7065b3df6e5425f6028743a9d4ae859MD56THUMBNAILTese_final_Paulo_Werdt.pdf.jpgTese_final_Paulo_Werdt.pdf.jpgimage/jpeg3732http://tede-server.lncc.br:8080/tede/bitstream/tede/343/7/Tese_final_Paulo_Werdt.pdf.jpgda181512fb2b0c020718e96422d92ff3MD57tede/3432023-06-02 12:08:47.991oai:tede-server.lncc.br:tede/343Tk9UQTogQ09MT1FVRSBBUVVJIEEgU1VBIFBSw5NQUklBIExJQ0VOw4dBCkVzdGEgbGljZW7Dp2EgZGUgZXhlbXBsbyDDqSBmb3JuZWNpZGEgYXBlbmFzIHBhcmEgZmlucyBpbmZvcm1hdGl2b3MuCgpMSUNFTsOHQSBERSBESVNUUklCVUnDh8ODTyBOw4NPLUVYQ0xVU0lWQQoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgClhYWCAoU2lnbGEgZGEgVW5pdmVyc2lkYWRlKSBvIGRpcmVpdG8gbsOjby1leGNsdXNpdm8gZGUgcmVwcm9kdXppciwgIHRyYWR1emlyIChjb25mb3JtZSBkZWZpbmlkbyBhYmFpeG8pLCBlL291IApkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlIAplbSBxdWFscXVlciBtZWlvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBwb2RlLCBzZW0gYWx0ZXJhciBvIGNvbnRlw7pkbywgdHJhbnNwb3IgYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIApwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgU2lnbGEgZGUgVW5pdmVyc2lkYWRlIHBvZGUgbWFudGVyIG1haXMgZGUgdW1hIGPDs3BpYSBhIHN1YSB0ZXNlIG91IApkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyAKbmVzdGEgbGljZW7Dp2EuIFZvY8OqIHRhbWLDqW0gZGVjbGFyYSBxdWUgbyBkZXDDs3NpdG8gZGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBuw6NvLCBxdWUgc2VqYSBkZSBzZXUgCmNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiAKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSAKb3MgZGlyZWl0b3MgYXByZXNlbnRhZG9zIG5lc3RhIGxpY2Vuw6dhLCBlIHF1ZSBlc3NlIG1hdGVyaWFsIGRlIHByb3ByaWVkYWRlIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIAppZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdSBubyBjb250ZcO6ZG8gZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG9yYSBkZXBvc2l0YWRhLgoKQ0FTTyBBIFRFU0UgT1UgRElTU0VSVEHDh8ODTyBPUkEgREVQT1NJVEFEQSBURU5IQSBTSURPIFJFU1VMVEFETyBERSBVTSBQQVRST0PDjU5JTyBPVSAKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBTSUdMQSBERSAKVU5JVkVSU0lEQURFLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyAKVEFNQsOJTSBBUyBERU1BSVMgT0JSSUdBw4fDlUVTIEVYSUdJREFTIFBPUiBDT05UUkFUTyBPVSBBQ09SRE8uCgpBIFNpZ2xhIGRlIFVuaXZlcnNpZGFkZSBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lIChzKSBvdSBvKHMpIG5vbWUocykgZG8ocykgCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIApjb25jZWRpZGFzIHBvciBlc3RhIGxpY2Vuw6dhLgo=Biblioteca Digital de Teses e Dissertaçõeshttps://tede.lncc.br/PUBhttps://tede.lncc.br/oai/requestlibrary@lncc.br||library@lncc.bropendoar:2023-06-02T15:08:47Biblioteca Digital de Teses e Dissertações do LNCC - Laboratório Nacional de Computação Científica (LNCC)false
dc.title.por.fl_str_mv Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
title Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
spellingShingle Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
Werdt, Paulo Roberto Teixeira
Biologia molecular - Processamento eletrônico de dados
Algorítmos genéticos
Bioinformática
Proteínas
CNPQ::CIENCIAS BIOLOGICAS::GENETICA
title_short Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
title_full Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
title_fullStr Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
title_full_unstemmed Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
title_sort Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta
author Werdt, Paulo Roberto Teixeira
author_facet Werdt, Paulo Roberto Teixeira
author_role author
dc.contributor.advisor1.fl_str_mv Dardenne, Laurent Emmanuel
dc.contributor.referee1.fl_str_mv Dardenne, Laurent Emmanuel
dc.contributor.referee2.fl_str_mv Ocaña Gautherot, Kary Ann del Carmen
dc.contributor.referee3.fl_str_mv Barbosa, Hélio José Corrêa
dc.contributor.referee4.fl_str_mv Pascutti, Pedro Geraldo
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/5138848008740842
dc.contributor.author.fl_str_mv Werdt, Paulo Roberto Teixeira
contributor_str_mv Dardenne, Laurent Emmanuel
Dardenne, Laurent Emmanuel
Ocaña Gautherot, Kary Ann del Carmen
Barbosa, Hélio José Corrêa
Pascutti, Pedro Geraldo
dc.subject.por.fl_str_mv Biologia molecular - Processamento eletrônico de dados
Algorítmos genéticos
Bioinformática
Proteínas
topic Biologia molecular - Processamento eletrônico de dados
Algorítmos genéticos
Bioinformática
Proteínas
CNPQ::CIENCIAS BIOLOGICAS::GENETICA
dc.subject.cnpq.fl_str_mv CNPQ::CIENCIAS BIOLOGICAS::GENETICA
description One of the main challenges of the computational molecular biology has been the development of methods capable of predicting the native structure of a protein from its sequence of amino acids. The Critical Assessment of Protein Structure Prediction has been documenting the progress and the state of the art of PSP. Nowadays PSP techniques are sub divided into two main groups: Template based, when proteins solved experimentally are used as starting points for the generation of new models; and Free Modeling (FM), where the generation of new models are obtained strictly from physicochemical principles (ab initio) or when some experimental information is used (de novo). Among the main problems that a PSP FM method has to face are: (i) the deceptiveness associated with the evaluation functions due to the theoretical simplifications introduced to make the model computationally viable; (ii) the size, complexity and multimodality of the search space, that result from the possible conformations associated with a given protein sequence. In many cases the native structure does not even belong to the global minimum basin of the investigated surface. The results of the evaluations of the CASP events indicate the Rosetta program as one of the most successful methods of PSP de novo. Its algorithm uses fragments of experimentally determined structures to guide the search in structural space. The insertion of these fragments is performed by a Monte Carlo algorithm, which assembles the fragments into the structure using Simulated Annealing, evaluated by mixed score functions, based both on molecular mechanics and on knowledge from databases of known structures. Its optimization strategy uses a large number of independent executions of the algorithm, where each trajectory seeks to quickly converge to a minimum of the energy surface, to obtain a large set of predictions, from which a subset of the most promising models is chosen. Literature results indicate that the Rosetta energy functions are the strong point of its methodology, being widely used in other successful PSP FM methods, yet its conformational search strategy can be improved. Bibliographic research on related works shows that, in fact, the Rosetta algorithm cannot efficiently explore the fragment's multimodal space, which reinforces the idea of using a different method to improve its conformational search. The objective of this work is to improve the conformational search of Rosetta through the development of evolutionary metaheuristics, in order to achieve a better mapping of the search space and obtain gains, both in the quality of the predicted structures as for the reduction of the high computational cost that Rosetta requires to get good quality models. In this approach, a populational algorithm with parental replacement by phenotypic crowding is used, where the offspring of candidate solutions replace the elements in the parental population by a criterion of structural similarity. Variations of the algorithms have also been developed with different population regimes and selection mechanisms: tournament based and by the structural clustering of individuals of the population, in addition to a crossover operator that considers secondary structure predictions. The implementation of these techniques to the Rosetta de novo methodology was gradually tested in experiments with a diverse group of protein targets. The results obtained with the most successful resulting algorithms have been contrasted with corresponding results from the original Rosetta, and also with two other similar approaches found in literature. They have demonstrated to reach a positive balance between the exploration and the exploitation of the energetic hypersurface, generated by the different scores used during the assembly of the fragments. Two of the algorithms proposed in this work achieved significant improvements, not only in the diversity and structural quality of the predicted final models, but also in the reduction of the computational cost necessary to obtain good quality structures.
publishDate 2021
dc.date.issued.fl_str_mv 2021-09-24
dc.date.accessioned.fl_str_mv 2023-03-30T18:31:27Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv WERDT, P. R. T. Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta. 2021. 199 f. Tese (Programa de Pós-Graduação em Modelagem Computacional) - Laboratório Nacional de Computação Científica, Petrópolis, 2021.
dc.identifier.uri.fl_str_mv https://tede.lncc.br/handle/tede/343
identifier_str_mv WERDT, P. R. T. Desenvolvimento de algoritmos evolucionistas para aprimorar a metodologia PSP de novo do programa Rosetta. 2021. 199 f. Tese (Programa de Pós-Graduação em Modelagem Computacional) - Laboratório Nacional de Computação Científica, Petrópolis, 2021.
url https://tede.lncc.br/handle/tede/343
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Laboratório Nacional de Computação Científica
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Modelagem Computacional
dc.publisher.initials.fl_str_mv LNCC
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv Coordenação de Pós-Graduação e Aperfeiçoamento (COPGA)
publisher.none.fl_str_mv Laboratório Nacional de Computação Científica
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações do LNCC
instname:Laboratório Nacional de Computação Científica (LNCC)
instacron:LNCC
instname_str Laboratório Nacional de Computação Científica (LNCC)
instacron_str LNCC
institution LNCC
reponame_str Biblioteca Digital de Teses e Dissertações do LNCC
collection Biblioteca Digital de Teses e Dissertações do LNCC
bitstream.url.fl_str_mv http://tede-server.lncc.br:8080/tede/bitstream/tede/343/1/license.txt
http://tede-server.lncc.br:8080/tede/bitstream/tede/343/2/license_url
http://tede-server.lncc.br:8080/tede/bitstream/tede/343/3/license_text
http://tede-server.lncc.br:8080/tede/bitstream/tede/343/4/license_rdf
http://tede-server.lncc.br:8080/tede/bitstream/tede/343/5/Tese_final_Paulo_Werdt.pdf
http://tede-server.lncc.br:8080/tede/bitstream/tede/343/6/Tese_final_Paulo_Werdt.pdf.txt
http://tede-server.lncc.br:8080/tede/bitstream/tede/343/7/Tese_final_Paulo_Werdt.pdf.jpg
bitstream.checksum.fl_str_mv bd3efa91386c1718a7f26a329fdcb468
4afdbb8c545fd630ea7db775da747b2f
d41d8cd98f00b204e9800998ecf8427e
d41d8cd98f00b204e9800998ecf8427e
1a92b7d6dfd395894907ed528ef06d21
d7065b3df6e5425f6028743a9d4ae859
da181512fb2b0c020718e96422d92ff3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações do LNCC - Laboratório Nacional de Computação Científica (LNCC)
repository.mail.fl_str_mv library@lncc.br||library@lncc.br
_version_ 1790266131219480576