Comparing two populations using Bayesian Fourier series density estimation
| Ano de defesa: | 2017 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
| Programa de Pós-Graduação: |
Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Palavras-chave em Inglês: | |
| Área do conhecimento CNPq: | |
| Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/8920 |
Resumo: | Given two samples from two populations, one could ask how similar the populations are, that is, how close their probability distributions are. For absolutely continuous distributions, one way to measure the proximity of such populations is to use a measure of distance (metric) between the probability density functions (which are unknown given that only samples are observed). In this work, we work with the integrated squared distance as metric. To measure the uncertainty of the squared integrated distance, we first model the uncertainty of each of the probability density functions using a nonparametric Bayesian method. The method consists of estimating the probability density function f (or its logarithm) using Fourier series {f0;f1; :::;fI}. Assigning a prior distribution to f is then equivalent to assigning a prior distribution to the coefficients of this series. We used the prior suggested by Scricciolo (2006) (sieve prior), which not only places a prior on such coefficients, but also on I itself, so that in reality we work with a Bayesian mixture of finite dimensional models. To obtain posterior samples of such mixture, we marginalize out the discrete model index parameter I and use a statistical software called Stan. We conclude that the Bayesian Fourier series method has good performance when compared to kernel density estimation, although both methods often have problems in the estimation of the probability density function near the boundaries. Lastly, we showed how the methodology of Fourier series can be used to access the uncertainty regarding the similarity of two samples. In particular, we applied this method to dataset of patients with Alzheimer. |
| id |
SCAR_38f61f2a228412278a93bd22d7cb0082 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufscar.br:20.500.14289/8920 |
| network_acronym_str |
SCAR |
| network_name_str |
Repositório Institucional da UFSCAR |
| repository_id_str |
|
| spelling |
Inacio, Marco Henrique de AlmeidaIzbicki, Rafaelhttp://lattes.cnpq.br/9991192137633896http://lattes.cnpq.br/19319010200278872088c08d-706d-46ee-b538-1353de75519d2017-08-07T17:57:44Z2017-08-07T17:57:44Z2017-04-12INACIO, Marco Henrique de Almeida. Comparing two populations using Bayesian Fourier series density estimation. 2017. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8920.https://repositorio.ufscar.br/handle/20.500.14289/8920Given two samples from two populations, one could ask how similar the populations are, that is, how close their probability distributions are. For absolutely continuous distributions, one way to measure the proximity of such populations is to use a measure of distance (metric) between the probability density functions (which are unknown given that only samples are observed). In this work, we work with the integrated squared distance as metric. To measure the uncertainty of the squared integrated distance, we first model the uncertainty of each of the probability density functions using a nonparametric Bayesian method. The method consists of estimating the probability density function f (or its logarithm) using Fourier series {f0;f1; :::;fI}. Assigning a prior distribution to f is then equivalent to assigning a prior distribution to the coefficients of this series. We used the prior suggested by Scricciolo (2006) (sieve prior), which not only places a prior on such coefficients, but also on I itself, so that in reality we work with a Bayesian mixture of finite dimensional models. To obtain posterior samples of such mixture, we marginalize out the discrete model index parameter I and use a statistical software called Stan. We conclude that the Bayesian Fourier series method has good performance when compared to kernel density estimation, although both methods often have problems in the estimation of the probability density function near the boundaries. Lastly, we showed how the methodology of Fourier series can be used to access the uncertainty regarding the similarity of two samples. In particular, we applied this method to dataset of patients with Alzheimer.Dadas duas amostras de duas populações, pode-se questionar o quão parecidas as duas populações são, ou seja, o quão próximas estão suas distribuições de probabilidade. Para distribuições absolutamente contínuas, uma maneira de mensurar a proximidade dessas populações é utilizando uma medida de distância (métrica) entre as funções densidade de probabilidade (as quais são desconhecidas, em virtude de observarmos apenas as amostras). Nesta dissertação, utilizamos a distância quadrática integrada como métrica. Para mensurar a incerteza da distância quadrática integrada, primeiramente modelamos a incerteza sobre cada uma das funções densidade de probabilidade através de uma método bayesiano não paramétrico. O método consiste em estimar a função de densidade de probabilidade f (ou seu logaritmo) usando séries de Fourier {f0;f1; :::;fI}. Atribuir uma distribuição a priori para f é então equivalente a atribuir uma distribuição a priori aos coeficientes dessa serie. Utilizamos a priori sugerida em Scricciolo (2006) (priori de sieve), a qual não coloca uma priori somente nesses coeficientes, mas também no próprio I, de modo que, na realidade, trabalhamos com uma mistura bayesiana de modelos de dimensão finita. Para obter amostras a posteriori dessas misturas, marginalizamos o parâmetro (discreto) de indexação de modelos, I, e usamos um software estatístico chamado Stan. Concluímos que o método bayesiano de séries de Fourier tem boa performance quando comparado ao de estimativa de densidade kernel, apesar de ambos os métodos frequentemente apresentarem problemas na estimação da função de densidade de probabilidade perto das fronteiras. Por fim, mostramos como a metodologia de series de Fourier pode ser utilizada para mensurar a incerteza a cerca da similaridade de duas amostras. Em particular, aplicamos este método a um conjunto de dados de pacientes com doença de Alzheimer.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)engUniversidade Federal de São CarlosCâmpus São CarlosPrograma Interinstitucional de Pós-Graduação em Estatística - PIPGEsUFSCarSéries de FourierSéries ortogonaisEstimação de densidadeAmostragem discretaFourier seriesOrthogonal seriesDensity estimationDiscrete samplingCIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICAComparing two populations using Bayesian Fourier series density estimationComparação de duas populações utilizando estimação bayesiana de densidades por séries de Fourierinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006003e57f161-19fe-4345-9e87-bc60eb7be98finfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissMHAI.pdfDissMHAI.pdfapplication/pdf1513128https://repositorio.ufscar.br/bitstreams/672f946b-f3c1-4b69-bb7d-b30b366b6fee/download1bb98ae57371ab00d2c86311b02054cbMD51trueAnonymousREADLICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstreams/12c7be87-41c9-4d46-aa47-80daab1a3df8/downloadae0398b6f8b235e40ad82cba6c50031dMD52falseAnonymousREADTEXTDissMHAI.pdf.txtDissMHAI.pdf.txtExtracted texttext/plain56267https://repositorio.ufscar.br/bitstreams/8b463d4b-bd70-4a91-8976-6affc863be30/download7748449ca25c6067ea4d6bdca216faafMD55falseAnonymousREADTHUMBNAILDissMHAI.pdf.jpgDissMHAI.pdf.jpgIM Thumbnailimage/jpeg4270https://repositorio.ufscar.br/bitstreams/4a8268bc-8eb3-49dd-9c81-77827fa27e73/download50beff09febd4b86309df5c31df11e73MD56falseAnonymousREAD20.500.14289/89202025-02-05 17:35:50.891Acesso abertoopen.accessoai:repositorio.ufscar.br:20.500.14289/8920https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-05T20:35:50Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg== |
| dc.title.eng.fl_str_mv |
Comparing two populations using Bayesian Fourier series density estimation |
| dc.title.alternative.por.fl_str_mv |
Comparação de duas populações utilizando estimação bayesiana de densidades por séries de Fourier |
| title |
Comparing two populations using Bayesian Fourier series density estimation |
| spellingShingle |
Comparing two populations using Bayesian Fourier series density estimation Inacio, Marco Henrique de Almeida Séries de Fourier Séries ortogonais Estimação de densidade Amostragem discreta Fourier series Orthogonal series Density estimation Discrete sampling CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
| title_short |
Comparing two populations using Bayesian Fourier series density estimation |
| title_full |
Comparing two populations using Bayesian Fourier series density estimation |
| title_fullStr |
Comparing two populations using Bayesian Fourier series density estimation |
| title_full_unstemmed |
Comparing two populations using Bayesian Fourier series density estimation |
| title_sort |
Comparing two populations using Bayesian Fourier series density estimation |
| author |
Inacio, Marco Henrique de Almeida |
| author_facet |
Inacio, Marco Henrique de Almeida |
| author_role |
author |
| dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/1931901020027887 |
| dc.contributor.author.fl_str_mv |
Inacio, Marco Henrique de Almeida |
| dc.contributor.advisor1.fl_str_mv |
Izbicki, Rafael |
| dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/9991192137633896 |
| dc.contributor.authorID.fl_str_mv |
2088c08d-706d-46ee-b538-1353de75519d |
| contributor_str_mv |
Izbicki, Rafael |
| dc.subject.por.fl_str_mv |
Séries de Fourier Séries ortogonais Estimação de densidade Amostragem discreta |
| topic |
Séries de Fourier Séries ortogonais Estimação de densidade Amostragem discreta Fourier series Orthogonal series Density estimation Discrete sampling CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
| dc.subject.eng.fl_str_mv |
Fourier series Orthogonal series Density estimation Discrete sampling |
| dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA |
| description |
Given two samples from two populations, one could ask how similar the populations are, that is, how close their probability distributions are. For absolutely continuous distributions, one way to measure the proximity of such populations is to use a measure of distance (metric) between the probability density functions (which are unknown given that only samples are observed). In this work, we work with the integrated squared distance as metric. To measure the uncertainty of the squared integrated distance, we first model the uncertainty of each of the probability density functions using a nonparametric Bayesian method. The method consists of estimating the probability density function f (or its logarithm) using Fourier series {f0;f1; :::;fI}. Assigning a prior distribution to f is then equivalent to assigning a prior distribution to the coefficients of this series. We used the prior suggested by Scricciolo (2006) (sieve prior), which not only places a prior on such coefficients, but also on I itself, so that in reality we work with a Bayesian mixture of finite dimensional models. To obtain posterior samples of such mixture, we marginalize out the discrete model index parameter I and use a statistical software called Stan. We conclude that the Bayesian Fourier series method has good performance when compared to kernel density estimation, although both methods often have problems in the estimation of the probability density function near the boundaries. Lastly, we showed how the methodology of Fourier series can be used to access the uncertainty regarding the similarity of two samples. In particular, we applied this method to dataset of patients with Alzheimer. |
| publishDate |
2017 |
| dc.date.accessioned.fl_str_mv |
2017-08-07T17:57:44Z |
| dc.date.available.fl_str_mv |
2017-08-07T17:57:44Z |
| dc.date.issued.fl_str_mv |
2017-04-12 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
INACIO, Marco Henrique de Almeida. Comparing two populations using Bayesian Fourier series density estimation. 2017. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8920. |
| dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/20.500.14289/8920 |
| identifier_str_mv |
INACIO, Marco Henrique de Almeida. Comparing two populations using Bayesian Fourier series density estimation. 2017. Dissertação (Mestrado em Estatística) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/20.500.14289/8920. |
| url |
https://repositorio.ufscar.br/handle/20.500.14289/8920 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.confidence.fl_str_mv |
600 600 |
| dc.relation.authority.fl_str_mv |
3e57f161-19fe-4345-9e87-bc60eb7be98f |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
| dc.publisher.program.fl_str_mv |
Programa Interinstitucional de Pós-Graduação em Estatística - PIPGEs |
| dc.publisher.initials.fl_str_mv |
UFSCar |
| publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
| instname_str |
Universidade Federal de São Carlos (UFSCAR) |
| instacron_str |
UFSCAR |
| institution |
UFSCAR |
| reponame_str |
Repositório Institucional da UFSCAR |
| collection |
Repositório Institucional da UFSCAR |
| bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstreams/672f946b-f3c1-4b69-bb7d-b30b366b6fee/download https://repositorio.ufscar.br/bitstreams/12c7be87-41c9-4d46-aa47-80daab1a3df8/download https://repositorio.ufscar.br/bitstreams/8b463d4b-bd70-4a91-8976-6affc863be30/download https://repositorio.ufscar.br/bitstreams/4a8268bc-8eb3-49dd-9c81-77827fa27e73/download |
| bitstream.checksum.fl_str_mv |
1bb98ae57371ab00d2c86311b02054cb ae0398b6f8b235e40ad82cba6c50031d 7748449ca25c6067ea4d6bdca216faaf 50beff09febd4b86309df5c31df11e73 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
| repository.mail.fl_str_mv |
repositorio.sibi@ufscar.br |
| _version_ |
1851688795765735424 |