Efficient bayesian methods for mixture models with genetic applications

Detalhes bibliográficos
Ano de defesa: 2016
Autor(a) principal: Zuanetti, Daiane Aparecida
Orientador(a): Milan, Luis Aparecido lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/8426
Resumo: We propose Bayesian methods for selecting and estimating di erent types of mixture models which are widely used in Genetics and Molecular Biology. We speci cally propose data-driven selection and estimation methods for a generalized mixture model, which accommodates the usual (independent) and the rst-order (dependent) models in one framework, and QTL (quantitative trait locus) mapping models for independent and pedigree data. For clustering genes through a mixture model, we propose three nonparametric Bayesian methods: a marginal nested Dirichlet process (NDP), which is able to cluster distributions and, a predictive recursion clustering scheme (PRC) and a subset nonparametric Bayesian (SNOB) clustering algorithm for clustering big data. We analyze and compare the performance of the proposed methods and traditional procedures of selection, estimation and clustering in simulated and real data sets. The proposed methods are more exible, improve the convergence of the algorithms and provide more accurate estimates in many situations. In addition, we propose methods for predicting nonobservable QTLs genotypes and missing parents and improve the Mendelian probability of inheritance of nonfounder genotype using conditional independence structures. We also suggest applying diagnostic measures to check the goodness of t of QTL mapping models.
id SCAR_7cf69a8392ba350b7ad6181ef2154f38
oai_identifier_str oai:repositorio.ufscar.br:20.500.14289/8426
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str
spelling Zuanetti, Daiane AparecidaMilan, Luis Aparecidohttp://lattes.cnpq.br/7435391829973844http://lattes.cnpq.br/8352484284929824b32a2fc3-5d19-41db-9bab-08a95238ddf52017-01-17T11:47:50Z2017-01-17T11:47:50Z2016-12-14ZUANETTI, Daiane Aparecida. Efficient bayesian methods for mixture models with genetic applications. 2016. Tese (Doutorado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8426.https://repositorio.ufscar.br/handle/20.500.14289/8426We propose Bayesian methods for selecting and estimating di erent types of mixture models which are widely used in Genetics and Molecular Biology. We speci cally propose data-driven selection and estimation methods for a generalized mixture model, which accommodates the usual (independent) and the rst-order (dependent) models in one framework, and QTL (quantitative trait locus) mapping models for independent and pedigree data. For clustering genes through a mixture model, we propose three nonparametric Bayesian methods: a marginal nested Dirichlet process (NDP), which is able to cluster distributions and, a predictive recursion clustering scheme (PRC) and a subset nonparametric Bayesian (SNOB) clustering algorithm for clustering big data. We analyze and compare the performance of the proposed methods and traditional procedures of selection, estimation and clustering in simulated and real data sets. The proposed methods are more exible, improve the convergence of the algorithms and provide more accurate estimates in many situations. In addition, we propose methods for predicting nonobservable QTLs genotypes and missing parents and improve the Mendelian probability of inheritance of nonfounder genotype using conditional independence structures. We also suggest applying diagnostic measures to check the goodness of t of QTL mapping models.N os propomos métodos Bayesianos para selecionar e estimar diferentes tipos de modelos de mistura que são amplamente utilizados em Genética e Biologia Molecular. Especificamente, propomos métodos direcionados pelos dados para selecionar e estimar um modelo de mistura generalizado, que descreve o modelo de mistura usual (independente) e o de primeira ordem numa mesma estrutura, e modelos de mapeamento de QTL com dados independentes e familiares. Para agrupar genes através de modelos de mistura, nós propomos três métodos Bayesianos não-paramétricos: o processo de Dirichlet aninhado que possibilita agrupamento de distribuições e, um algoritmo preditivo recursivo e outro Bayesiano nãoparamétrico exato para agrupar dados de alta dimensão. Analisamos e comparamos o desempenho dos métodos propostos e dos procedimentos tradicionais de seleção e estimação de modelos e agrupamento de dados em conjuntos de dados simulados e reais. Os métodos propostos são mais extáveis, aprimoram a convergência dos algoritmos e apresentam estimativas mais precisas em muitas situações. Além disso, nós propomos procedimentos para predizer o genótipo não observável dos QTLs e de pais faltantes e melhorar a probabilidade Mendeliana de herança genética do genótipo dos descendentes através da estrutura de independência condicional entre os indivíduos. Também sugerimos aplicar medidas de diagnóstico para verificar a qualidade do ajuste dos modelos de mapeamento de QTLs.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma Interinstitucional de Pós-Graduação em Estatística - PIPGEsUFSCarMixture modelsData-driven bayesian methodsNonparametric bayesian methodsQTL mappingClustering distributionsCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::ANALISE DE DADOSEfficient bayesian methods for mixture models with genetic applicationsMétodos bayesianos eficientes para modelos de mistura com aplicações em genéticainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisOnline60060001874dfd-bd1b-409c-81e8-3185c83eacf2info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTeseDAZ.pdfTeseDAZ.pdfapplication/pdf20535130https://repositorio.ufscar.br/bitstreams/6ce9831b-1476-4bbc-b728-f3cc5093e46c/download82585444ba6f0568a20adac88fdfc626MD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstreams/fff1ba2e-f6df-4c80-bf6f-47e39d4ded0c/downloadae0398b6f8b235e40ad82cba6c50031dMD52falseAnonymousREADTEXTTeseDAZ.pdf.txtTeseDAZ.pdf.txtExtracted texttext/plain460345https://repositorio.ufscar.br/bitstreams/b1635b78-27f8-404b-819a-30cb04c0868b/download30bbdda77557fea53dcc1ee86ae35b96MD55falseAnonymousREADTHUMBNAILTeseDAZ.pdf.jpgTeseDAZ.pdf.jpgIM Thumbnailimage/jpeg2266https://repositorio.ufscar.br/bitstreams/5a2c7bcd-601c-45fc-a77a-7b914732b32d/downloadee6b66ddbee349433340c027bf5650d0MD56falseAnonymousREAD20.500.14289/84262025-02-05 18:56:04.89Acesso abertoopen.accessoai:repositorio.ufscar.br:20.500.14289/8426https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T21:56:04Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==
dc.title.eng.fl_str_mv Efficient bayesian methods for mixture models with genetic applications
dc.title.alternative.eng.fl_str_mv Métodos bayesianos eficientes para modelos de mistura com aplicações em genética
title Efficient bayesian methods for mixture models with genetic applications
spellingShingle Efficient bayesian methods for mixture models with genetic applications
Zuanetti, Daiane Aparecida
Mixture models
Data-driven bayesian methods
Nonparametric bayesian methods
QTL mapping
Clustering distributions
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::ANALISE DE DADOS
title_short Efficient bayesian methods for mixture models with genetic applications
title_full Efficient bayesian methods for mixture models with genetic applications
title_fullStr Efficient bayesian methods for mixture models with genetic applications
title_full_unstemmed Efficient bayesian methods for mixture models with genetic applications
title_sort Efficient bayesian methods for mixture models with genetic applications
author Zuanetti, Daiane Aparecida
author_facet Zuanetti, Daiane Aparecida
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/8352484284929824
dc.contributor.author.fl_str_mv Zuanetti, Daiane Aparecida
dc.contributor.advisor1.fl_str_mv Milan, Luis Aparecido
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/7435391829973844
dc.contributor.authorID.fl_str_mv b32a2fc3-5d19-41db-9bab-08a95238ddf5
contributor_str_mv Milan, Luis Aparecido
dc.subject.eng.fl_str_mv Mixture models
Data-driven bayesian methods
Nonparametric bayesian methods
QTL mapping
Clustering distributions
topic Mixture models
Data-driven bayesian methods
Nonparametric bayesian methods
QTL mapping
Clustering distributions
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::ANALISE DE DADOS
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::ANALISE DE DADOS
description We propose Bayesian methods for selecting and estimating di erent types of mixture models which are widely used in Genetics and Molecular Biology. We speci cally propose data-driven selection and estimation methods for a generalized mixture model, which accommodates the usual (independent) and the rst-order (dependent) models in one framework, and QTL (quantitative trait locus) mapping models for independent and pedigree data. For clustering genes through a mixture model, we propose three nonparametric Bayesian methods: a marginal nested Dirichlet process (NDP), which is able to cluster distributions and, a predictive recursion clustering scheme (PRC) and a subset nonparametric Bayesian (SNOB) clustering algorithm for clustering big data. We analyze and compare the performance of the proposed methods and traditional procedures of selection, estimation and clustering in simulated and real data sets. The proposed methods are more exible, improve the convergence of the algorithms and provide more accurate estimates in many situations. In addition, we propose methods for predicting nonobservable QTLs genotypes and missing parents and improve the Mendelian probability of inheritance of nonfounder genotype using conditional independence structures. We also suggest applying diagnostic measures to check the goodness of t of QTL mapping models.
publishDate 2016
dc.date.issued.fl_str_mv 2016-12-14
dc.date.accessioned.fl_str_mv 2017-01-17T11:47:50Z
dc.date.available.fl_str_mv 2017-01-17T11:47:50Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ZUANETTI, Daiane Aparecida. Efficient bayesian methods for mixture models with genetic applications. 2016. Tese (Doutorado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8426.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/20.500.14289/8426
identifier_str_mv ZUANETTI, Daiane Aparecida. Efficient bayesian methods for mixture models with genetic applications. 2016. Tese (Doutorado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8426.
url https://repositorio.ufscar.br/handle/20.500.14289/8426
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv 01874dfd-bd1b-409c-81e8-3185c83eacf2
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.publisher.program.fl_str_mv Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstreams/6ce9831b-1476-4bbc-b728-f3cc5093e46c/download
https://repositorio.ufscar.br/bitstreams/fff1ba2e-f6df-4c80-bf6f-47e39d4ded0c/download
https://repositorio.ufscar.br/bitstreams/b1635b78-27f8-404b-819a-30cb04c0868b/download
https://repositorio.ufscar.br/bitstreams/5a2c7bcd-601c-45fc-a77a-7b914732b32d/download
bitstream.checksum.fl_str_mv 82585444ba6f0568a20adac88fdfc626
ae0398b6f8b235e40ad82cba6c50031d
30bbdda77557fea53dcc1ee86ae35b96
ee6b66ddbee349433340c027bf5650d0
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv repositorio.sibi@ufscar.br
_version_ 1851688942984757248