Essays on misspecification detection in double bounded random variables modeling

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: SILVA, José Jairo de Santana e
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso embargado
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Estatistica
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/52017
Resumo: The beta distribution is routinely used to model variables that assume values in the standard unit interval. Several alternative laws have, nonetheless, been proposed in the literature, such as the Kumaraswamy and simplex distributions. A natural and empirically motivated question is: does the beta law provide an adequate representation for a given dataset? We test the null hypothesis that the beta model is correctly specified against the alternative hypothesis that it does not provide an adequate data fit. Our tests are based on the information matrix equality, which only holds when the model is correctly specified. They are thus sensitive to model misspecification. Simulation evidence shows that the tests perform well, especially when coupled with bootstrap resampling. We model state and county Covid-19 mortality rates in the United States. The misspecification tests indicate that the beta law successfully represents Covid-19 death rates when they are computed using either data from prior to the start of the vaccination campaign or data collected when such a campaign was under way. In the latter case, the beta law is only accepted when the negative impact of vaccination reach on death rates is moderate. The beta model is rejected under data heterogeneity, i.e., when mortality rates are computed using information gathered during both time periods. The beta regression model is tailored for responses that assume values in the standard unit interval. In its more general formulation, it comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which fails to hold when the model is incorrectly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. Two empirical applications are presented and discussed. Diagnostic analyses in regression modeling are usually based on residuals or local influence measures. They are used for detecting atypical observations. We develop a new approach for detecting such observations when the parameters of the model are estimated by maximum likelihood. It is based on the information matrix equality, which holds when the model is correctly specified. We consider different measures of the distance between two symmetric matrices and use them with the sample counterparts of the matrices in the information matrix equality in such a way that zero distance corresponds to correct model specification. The distance measures we use thus quantify the degree of model adequacy. We use such measures to identify observations that are atypical because they disproportionately alter the degree of model adequacy. We also introduce a modified generalized Cook distance and a new criterion that uses the two generalized Cook’s distances (modified and unmodified). Empirical applications involving Gaussian and beta models are presented and discussed.
id UFPE_a1da376d4817ed5f2bbb5d19cc6089fa
oai_identifier_str oai:repositorio.ufpe.br:123456789/52017
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str
spelling Essays on misspecification detection in double bounded random variables modelingEstatística matemáticaBootstrapDistribuição betaThe beta distribution is routinely used to model variables that assume values in the standard unit interval. Several alternative laws have, nonetheless, been proposed in the literature, such as the Kumaraswamy and simplex distributions. A natural and empirically motivated question is: does the beta law provide an adequate representation for a given dataset? We test the null hypothesis that the beta model is correctly specified against the alternative hypothesis that it does not provide an adequate data fit. Our tests are based on the information matrix equality, which only holds when the model is correctly specified. They are thus sensitive to model misspecification. Simulation evidence shows that the tests perform well, especially when coupled with bootstrap resampling. We model state and county Covid-19 mortality rates in the United States. The misspecification tests indicate that the beta law successfully represents Covid-19 death rates when they are computed using either data from prior to the start of the vaccination campaign or data collected when such a campaign was under way. In the latter case, the beta law is only accepted when the negative impact of vaccination reach on death rates is moderate. The beta model is rejected under data heterogeneity, i.e., when mortality rates are computed using information gathered during both time periods. The beta regression model is tailored for responses that assume values in the standard unit interval. In its more general formulation, it comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which fails to hold when the model is incorrectly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. Two empirical applications are presented and discussed. Diagnostic analyses in regression modeling are usually based on residuals or local influence measures. They are used for detecting atypical observations. We develop a new approach for detecting such observations when the parameters of the model are estimated by maximum likelihood. It is based on the information matrix equality, which holds when the model is correctly specified. We consider different measures of the distance between two symmetric matrices and use them with the sample counterparts of the matrices in the information matrix equality in such a way that zero distance corresponds to correct model specification. The distance measures we use thus quantify the degree of model adequacy. We use such measures to identify observations that are atypical because they disproportionately alter the degree of model adequacy. We also introduce a modified generalized Cook distance and a new criterion that uses the two generalized Cook’s distances (modified and unmodified). Empirical applications involving Gaussian and beta models are presented and discussed.CAPESA distribuição beta é usada rotineiramente para modelar variáveis que assumem valores no intervalo unitário padrão. Várias leis alternativas foram, contudo, propostas na literatura, tais como as distribuições Kumaraswamy e simplex. Uma questão natural e empiricamente motivada é: a lei beta fornece uma representação adequada para os dados sob análise? Nós testamos a hipótese nula de que o modelo beta está corretamente especificado contra a hipótese alternativa de que ele não fornece um ajuste adequado aos dados. Nossos testes são baseados na igualdade da matriz de informação, que somente é válida quando o modelo se encontra corretamente especificado. Os testes são, portanto, sensíveis a qualquer forma de especificação incorreta do modelo. Resultados de simulação mostram que os testes têm bom desempenho, especialmente quando utilizados com reamostragem bootstrap. Nós modelamos as taxas de mortalidade estaduais e municipais de Covid-19 nos Estados Unidos. Nossos testes de má especificação indicam que a lei beta representa adequadamente as taxas de mortalidade do Covid-19 quando estas são computadas com base em dados anteriores ao início da campanha de vacinação de Covid-19 ou com base em dados coletados quando tal campanha já se encontrava em andamento. No último caso, a lei beta só é aceita quando o impacto da vacinação sobre as taxas de mortalidade é moderado. O modelo beta é rejeitado sob heterogeneidade de dados, ou seja, quando as taxas de mortalidade são computadas usando informações coletadas durante ambos os períodos de tempo. Os testes de má especificação são estendidos para cobrir o modelo beta de regressão de precisão variável. O modelo de regressão beta é usado com variáveis dependentes que assumem valores no intervalo unitário padrão, (0,1). Em sua formulação mais geral, contém dois submodelos, um para a média e outro para o parâmetro de precisão. Apresentamos expressões em forma fechada para estatísticas de teste da matriz de informação nessa classe de modelos. Reamostragem bootstrap é usada para alcançar melhor controle sobre a frequência de erro tipo I. São apresentados resultados de simulação de Monte Carlo sobre o comportamento dos testes, tanto sob a hipótese nula como sob a hipótese alternativa. Os resultados indicam que os testes são tipicamente capazes de detectar especificação incorreta do modelo, em especial quando o tamanho da amostra não é pequeno. A análise de diagnóstico na modelagem de regressão é geralmente realizada com base na análise de resíduos ou influência local. Desenvolvemos uma nova abordagem para detectar pontos de dados atípicos em modelos para os quais a estimativa de parâmetros é realizada por máxima verossimilhança. A nova abordagem utiliza a igualdade da matriz de informação que é válida quando o modelo está corretamente especificado. Consideramos diferentes medidas da distância entre duas matrizes simétricas e as utilizamos com as contrapartidas amostrais das matrizes na igualdade da matriz de informação de tal forma que a distância zero corresponde à especificação correta do modelo. As medidas de distância que usamos quantificam, assim, o grau de adequação do modelo. Mostramos que elas podem ser usadas para identificar observações que contribuem desproporcionalmente para alterar o grau de adequação do modelo. Também introduzimos uma distância Cook generalizada modificada e um novo critério que utiliza as duas distâncias Cook generalizadas (modificadas e não modificadas). Aplicações empíricas envolvendo modelos de regressão gaussiano e beta são apresentadas e discutidas.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em EstatisticaCRIBARI-NETO, FranciscoVASCONCELLOS, Klaus Leite Pintohttp://lattes.cnpq.br/6362072625585117http://lattes.cnpq.br/2225977664095899http://lattes.cnpq.br/4556088473868411SILVA, José Jairo de Santana e2023-08-22T13:08:02Z2023-08-22T13:08:02Z2023-07-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfSILVA, José Jairo de Santana e. Essays on misspecification detection in double bounded random variables modeling. 2023. Tese (Doutorado em Estatística) – Universidade Federal de Pernambuco, Recife, 2023.https://repositorio.ufpe.br/handle/123456789/52017engAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/embargoedAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2023-08-23T05:17:44Zoai:repositorio.ufpe.br:123456789/52017Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212023-08-23T05:17:44Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.none.fl_str_mv Essays on misspecification detection in double bounded random variables modeling
title Essays on misspecification detection in double bounded random variables modeling
spellingShingle Essays on misspecification detection in double bounded random variables modeling
SILVA, José Jairo de Santana e
Estatística matemática
Bootstrap
Distribuição beta
title_short Essays on misspecification detection in double bounded random variables modeling
title_full Essays on misspecification detection in double bounded random variables modeling
title_fullStr Essays on misspecification detection in double bounded random variables modeling
title_full_unstemmed Essays on misspecification detection in double bounded random variables modeling
title_sort Essays on misspecification detection in double bounded random variables modeling
author SILVA, José Jairo de Santana e
author_facet SILVA, José Jairo de Santana e
author_role author
dc.contributor.none.fl_str_mv CRIBARI-NETO, Francisco
VASCONCELLOS, Klaus Leite Pinto
http://lattes.cnpq.br/6362072625585117
http://lattes.cnpq.br/2225977664095899
http://lattes.cnpq.br/4556088473868411
dc.contributor.author.fl_str_mv SILVA, José Jairo de Santana e
dc.subject.por.fl_str_mv Estatística matemática
Bootstrap
Distribuição beta
topic Estatística matemática
Bootstrap
Distribuição beta
description The beta distribution is routinely used to model variables that assume values in the standard unit interval. Several alternative laws have, nonetheless, been proposed in the literature, such as the Kumaraswamy and simplex distributions. A natural and empirically motivated question is: does the beta law provide an adequate representation for a given dataset? We test the null hypothesis that the beta model is correctly specified against the alternative hypothesis that it does not provide an adequate data fit. Our tests are based on the information matrix equality, which only holds when the model is correctly specified. They are thus sensitive to model misspecification. Simulation evidence shows that the tests perform well, especially when coupled with bootstrap resampling. We model state and county Covid-19 mortality rates in the United States. The misspecification tests indicate that the beta law successfully represents Covid-19 death rates when they are computed using either data from prior to the start of the vaccination campaign or data collected when such a campaign was under way. In the latter case, the beta law is only accepted when the negative impact of vaccination reach on death rates is moderate. The beta model is rejected under data heterogeneity, i.e., when mortality rates are computed using information gathered during both time periods. The beta regression model is tailored for responses that assume values in the standard unit interval. In its more general formulation, it comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which fails to hold when the model is incorrectly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. Two empirical applications are presented and discussed. Diagnostic analyses in regression modeling are usually based on residuals or local influence measures. They are used for detecting atypical observations. We develop a new approach for detecting such observations when the parameters of the model are estimated by maximum likelihood. It is based on the information matrix equality, which holds when the model is correctly specified. We consider different measures of the distance between two symmetric matrices and use them with the sample counterparts of the matrices in the information matrix equality in such a way that zero distance corresponds to correct model specification. The distance measures we use thus quantify the degree of model adequacy. We use such measures to identify observations that are atypical because they disproportionately alter the degree of model adequacy. We also introduce a modified generalized Cook distance and a new criterion that uses the two generalized Cook’s distances (modified and unmodified). Empirical applications involving Gaussian and beta models are presented and discussed.
publishDate 2023
dc.date.none.fl_str_mv 2023-08-22T13:08:02Z
2023-08-22T13:08:02Z
2023-07-27
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv SILVA, José Jairo de Santana e. Essays on misspecification detection in double bounded random variables modeling. 2023. Tese (Doutorado em Estatística) – Universidade Federal de Pernambuco, Recife, 2023.
https://repositorio.ufpe.br/handle/123456789/52017
identifier_str_mv SILVA, José Jairo de Santana e. Essays on misspecification detection in double bounded random variables modeling. 2023. Tese (Doutorado em Estatística) – Universidade Federal de Pernambuco, Recife, 2023.
url https://repositorio.ufpe.br/handle/123456789/52017
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/embargoedAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Estatistica
publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Estatistica
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1856042063368290304