Archetypal analysis as an imputation method and multivariate data augmentation
| Ano de defesa: | 2021 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/11/11134/tde-12112021-114459/ |
Resumo: | Multivariate statistics studies the relation between a set of random variables and how to analyze them simultaneously. In Multivariate Statistics, archetypes are extreme elements capable of rewriting all observations of a sample, or population, by means of linear combinations. Through the Archetypal Analysis (AA), a multivariate technique that aims to reduce the dimensionality of observations, it is possible to find and select their archetypes, which are convex combinations of the data. AA can be applied in several areas of knowledge and with different uses of archetypes. On this thesis we proposed two different uses of the AA in multivariate contexts: as a sample augmentation method and as an imputation method. The first approach was addressed in samples from bivariate correlated normal random variables from different covariance structures and a simulation study was carried out to evaluate three proposed algorithms and compare them to traditional methods. It was observed that regardless of the correlation structure between the variables, it is possible to increase up to 20% of the sample size. The second approach have evaluated the use of archetypes to impute values by Single and Multiple imputation in a multivariate dataset, with simulated missing data. It was also conducted a simulation study to evaluate the proposed methods that were compared to traditional ones too. The results were promising and the imputed values were very similar to the originals. Therefore, in the two approaches discussed in this work the results points out to the ability of the archetypes representing the dataset and so expressing it as a new data or filling up possible missing values satisfactorily. |
| id |
USP_600823bcf57d71d19c0d8dc0b80bd511 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-12112021-114459 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Archetypal analysis as an imputation method and multivariate data augmentationAnálise de Arquétipos como método de imputação e aumento de dados multivariadosDados faltantesEstatística multivariadaEstudo de simulaçãoMétodo não supervisionadoMissing dataMultivariate statisticsSimulation studyUnsupervised methodMultivariate statistics studies the relation between a set of random variables and how to analyze them simultaneously. In Multivariate Statistics, archetypes are extreme elements capable of rewriting all observations of a sample, or population, by means of linear combinations. Through the Archetypal Analysis (AA), a multivariate technique that aims to reduce the dimensionality of observations, it is possible to find and select their archetypes, which are convex combinations of the data. AA can be applied in several areas of knowledge and with different uses of archetypes. On this thesis we proposed two different uses of the AA in multivariate contexts: as a sample augmentation method and as an imputation method. The first approach was addressed in samples from bivariate correlated normal random variables from different covariance structures and a simulation study was carried out to evaluate three proposed algorithms and compare them to traditional methods. It was observed that regardless of the correlation structure between the variables, it is possible to increase up to 20% of the sample size. The second approach have evaluated the use of archetypes to impute values by Single and Multiple imputation in a multivariate dataset, with simulated missing data. It was also conducted a simulation study to evaluate the proposed methods that were compared to traditional ones too. The results were promising and the imputed values were very similar to the originals. Therefore, in the two approaches discussed in this work the results points out to the ability of the archetypes representing the dataset and so expressing it as a new data or filling up possible missing values satisfactorily.A estatística multivariada estuda a relação entre um conjunto de variáveis aleatórias e como analisá-las simultaneamente. Na estatística multivariada, os arquétipos são elementos extremos capazes de reescrever todas as observações de uma amostra, ou população, por meio de combinações lineares. Por meio da Análise de Arquétipos (AA), técnica multivariada que visa reduzir a dimensionalidade das observações, é possível encontrar e selecionar seus arquétipos, que são combinações convexas dos dados. A AA pode ser aplicada em várias áreas do conhecimento e com diferentes usos de arquétipos. Nesta tese, foram propostos dois usos diferentes da AA em contextos multivariados: como método de aumento amostral e como método de imputação. A primeira abordagem foi estudada em amostras provenientes de variáveis aleatórias normais bivariadas correlacionadas de diferentes estruturas de covariância, a partir das quais um estudo de simulação foi realizado para avaliar três algoritmos propostos e compará-los com métodos tradicionais. Observou-se que independentemente da estrutura de correlação entre as variáveis é possível aumentar até 20% do tamanho amostral. A segunda abordagem avaliou o uso de arquétipos para imputar valores por imputação Simples e Múltipla em um conjunto de dados multivariados, com dados faltantes simulados. Um estudo de simulação também foi conduzido para avaliar os métodos propostos e estes também foram comparados a métodos tradicionais. Os resultados foram promissores e os valores imputados foram muito semelhantes aos originais. Portanto, nas duas abordagens discutidas nesse trabalho, os resultados apontam para a capacidade dos arquétipos de representar o conjunto de dados e, assim, expressá-los como um novo dado ou preencher de forma satisfatória os possíveis valores ausentes.Biblioteca Digitais de Teses e Dissertações da USPDias, Carlos Tadeu dos SantosCavalcanti, Pórtya Piscitelli2021-09-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/11/11134/tde-12112021-114459/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-11-12T19:41:02Zoai:teses.usp.br:tde-12112021-114459Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-11-12T19:41:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Archetypal analysis as an imputation method and multivariate data augmentation Análise de Arquétipos como método de imputação e aumento de dados multivariados |
| title |
Archetypal analysis as an imputation method and multivariate data augmentation |
| spellingShingle |
Archetypal analysis as an imputation method and multivariate data augmentation Cavalcanti, Pórtya Piscitelli Dados faltantes Estatística multivariada Estudo de simulação Método não supervisionado Missing data Multivariate statistics Simulation study Unsupervised method |
| title_short |
Archetypal analysis as an imputation method and multivariate data augmentation |
| title_full |
Archetypal analysis as an imputation method and multivariate data augmentation |
| title_fullStr |
Archetypal analysis as an imputation method and multivariate data augmentation |
| title_full_unstemmed |
Archetypal analysis as an imputation method and multivariate data augmentation |
| title_sort |
Archetypal analysis as an imputation method and multivariate data augmentation |
| author |
Cavalcanti, Pórtya Piscitelli |
| author_facet |
Cavalcanti, Pórtya Piscitelli |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Dias, Carlos Tadeu dos Santos |
| dc.contributor.author.fl_str_mv |
Cavalcanti, Pórtya Piscitelli |
| dc.subject.por.fl_str_mv |
Dados faltantes Estatística multivariada Estudo de simulação Método não supervisionado Missing data Multivariate statistics Simulation study Unsupervised method |
| topic |
Dados faltantes Estatística multivariada Estudo de simulação Método não supervisionado Missing data Multivariate statistics Simulation study Unsupervised method |
| description |
Multivariate statistics studies the relation between a set of random variables and how to analyze them simultaneously. In Multivariate Statistics, archetypes are extreme elements capable of rewriting all observations of a sample, or population, by means of linear combinations. Through the Archetypal Analysis (AA), a multivariate technique that aims to reduce the dimensionality of observations, it is possible to find and select their archetypes, which are convex combinations of the data. AA can be applied in several areas of knowledge and with different uses of archetypes. On this thesis we proposed two different uses of the AA in multivariate contexts: as a sample augmentation method and as an imputation method. The first approach was addressed in samples from bivariate correlated normal random variables from different covariance structures and a simulation study was carried out to evaluate three proposed algorithms and compare them to traditional methods. It was observed that regardless of the correlation structure between the variables, it is possible to increase up to 20% of the sample size. The second approach have evaluated the use of archetypes to impute values by Single and Multiple imputation in a multivariate dataset, with simulated missing data. It was also conducted a simulation study to evaluate the proposed methods that were compared to traditional ones too. The results were promising and the imputed values were very similar to the originals. Therefore, in the two approaches discussed in this work the results points out to the ability of the archetypes representing the dataset and so expressing it as a new data or filling up possible missing values satisfactorily. |
| publishDate |
2021 |
| dc.date.none.fl_str_mv |
2021-09-08 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/11/11134/tde-12112021-114459/ |
| url |
https://www.teses.usp.br/teses/disponiveis/11/11134/tde-12112021-114459/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1815258149292081152 |