Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor

Detalhes bibliográficos
Ano de defesa: 2014
Autor(a) principal: Lyra, Taíse Ferraz
Orientador(a): Carvalho, Paulo Cezar P.
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Link de acesso: https://hdl.handle.net/10438/11780
Resumo: Outliers are observations that appear to be inconsistent with the others. Also called atypical, extreme or aberrant values, these inconsistencies can be caused, for instance, by political changes or economic crises, unexpected cold or heat waves, and measurement or typing errors. Although outliers are not necessarily incorrect values, they can distort the results of an analysis and lead researchers to erroneous conclusions if they are related to measurement or typing errors. The objective of this research is to study and compare different methods for detecting abnormalities in the price series from the Consumer Price Index (Índice de Preços ao Consumidor - IPC), calculated by the Brazilian Institute of Economy (Instituto Brasileiro de Economia - IBRE) from Getulio Vargas Foundation (Fundação Getulio Vargas - FGV). The IPC measures the price variation of a fixed set of goods and services, which are part of customary expenses for families with income levels between 1 and 33 monthly minimum wages and is mainly used as an indice of reference to evaluate the purchasing power of consumer. In addition to the method currently used by price analysts in IBRE, the study also considered variations of the IBRE Method, the Boxplot Method, the SIQR Boxplot Method, the Adjusted Boxplot Method, the Resistant Fences Method, the Quartile Method, the Modified Quartile Method, the Median Absolute Deviation Method and the Tukey Algorithm. These methods wre applied to data of the munucipalities Rio de Janeiro and São Paulo. In order to analyze the performance of each method, it is necessary to know the real extreme values in advance. Therefore, in this study, it was assumed that prices which were discarded or changed by analysts in the critical process were the real outliers. The method from IBRE is correlated with altered or discarded prices by analysts. Thus, the assumption that the changed or discarded prices by the analysts are the real outliers can influence the results, causing the method from IBRE be favored compared to other methods. However, thus, it is possible to compute two measurements by which the methods are evaluated. The first is the method’s accuracy score, which displays the proportion of detected real outliers. The second is the number of false-positive produced by the method, that tells how many values needed to be flagged to detect a real outlier. As higher the hit rate generated by the method and as the lower the amount of false positives produced therefrom, the better the performance of the method. Therefore, it was possible to construct a ranking relative to the performance of the methods, identifying the best among those analyzed. In the municipality of Rio de Janeiro, some of the variations of the method from IBRE showed equal or superior to the original method performances. As for the city of São Paulo, the method from IBRE showed the best performance. It is argued that a method correctly detects an outlier when it signals a real outlier as an extreme value. The method with the highest accuracy score and with smaller number of false-positive was from IBRE. For future investigations, we hope to test the methods in data obtained from simulation and from widely used data bases, so that the assumption related to the discarded or changed prices, during the critical process, does not alter the results.
id FGV_3e047d9e2857d90e8ef8dc6b958a14d9
oai_identifier_str oai:repositorio.fgv.br:10438/11780
network_acronym_str FGV
network_name_str Repositório Institucional do FGV (FGV Repositório Digital)
repository_id_str
spelling Lyra, Taíse FerrazEscolas::EMApPaixão, Crysttian ArantesSilva, Moacyr Alvim Horta Barbosa daSilva, Salomão Lipcovitch Quadros daZani, Sheila CristinaCarvalho, Paulo Cezar P.2014-05-26T19:28:52Z2014-05-26T19:28:52Z2014-02-24LYRA, Taíse Ferraz. Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor. Dissertação (Mestrado em Matemática Aplicada) - Escola de Matemática Aplicada, Fundação Getúlio Vargas - FGV, Rio de Janeiro, 2014.https://hdl.handle.net/10438/11780Outliers are observations that appear to be inconsistent with the others. Also called atypical, extreme or aberrant values, these inconsistencies can be caused, for instance, by political changes or economic crises, unexpected cold or heat waves, and measurement or typing errors. Although outliers are not necessarily incorrect values, they can distort the results of an analysis and lead researchers to erroneous conclusions if they are related to measurement or typing errors. The objective of this research is to study and compare different methods for detecting abnormalities in the price series from the Consumer Price Index (Índice de Preços ao Consumidor - IPC), calculated by the Brazilian Institute of Economy (Instituto Brasileiro de Economia - IBRE) from Getulio Vargas Foundation (Fundação Getulio Vargas - FGV). The IPC measures the price variation of a fixed set of goods and services, which are part of customary expenses for families with income levels between 1 and 33 monthly minimum wages and is mainly used as an indice of reference to evaluate the purchasing power of consumer. In addition to the method currently used by price analysts in IBRE, the study also considered variations of the IBRE Method, the Boxplot Method, the SIQR Boxplot Method, the Adjusted Boxplot Method, the Resistant Fences Method, the Quartile Method, the Modified Quartile Method, the Median Absolute Deviation Method and the Tukey Algorithm. These methods wre applied to data of the munucipalities Rio de Janeiro and São Paulo. In order to analyze the performance of each method, it is necessary to know the real extreme values in advance. Therefore, in this study, it was assumed that prices which were discarded or changed by analysts in the critical process were the real outliers. The method from IBRE is correlated with altered or discarded prices by analysts. Thus, the assumption that the changed or discarded prices by the analysts are the real outliers can influence the results, causing the method from IBRE be favored compared to other methods. However, thus, it is possible to compute two measurements by which the methods are evaluated. The first is the method’s accuracy score, which displays the proportion of detected real outliers. The second is the number of false-positive produced by the method, that tells how many values needed to be flagged to detect a real outlier. As higher the hit rate generated by the method and as the lower the amount of false positives produced therefrom, the better the performance of the method. Therefore, it was possible to construct a ranking relative to the performance of the methods, identifying the best among those analyzed. In the municipality of Rio de Janeiro, some of the variations of the method from IBRE showed equal or superior to the original method performances. As for the city of São Paulo, the method from IBRE showed the best performance. It is argued that a method correctly detects an outlier when it signals a real outlier as an extreme value. The method with the highest accuracy score and with smaller number of false-positive was from IBRE. For future investigations, we hope to test the methods in data obtained from simulation and from widely used data bases, so that the assumption related to the discarded or changed prices, during the critical process, does not alter the results.Outliers são observações que parecem ser inconsistentes com as demais. Também chamadas de valores atípicos, extremos ou aberrantes, estas inconsistências podem ser causadas por mudanças de política ou crises econômicas, ondas inesperadas de frio ou calor, erros de medida ou digitação, entre outras. Outliers não são necessariamente valores incorretos, mas, quando provenientes de erros de medida ou digitação, podem distorcer os resultados de uma análise e levar o pesquisador à conclusões equivocadas. O objetivo deste trabalho é estudar e comparar diferentes métodos para detecção de anormalidades em séries de preços do Índice de Preços ao Consumidor (IPC), calculado pelo Instituto Brasileiro de Economia (IBRE) da Fundação Getulio Vargas (FGV). O IPC mede a variação dos preços de um conjunto fixo de bens e serviços componentes de despesas habituais das famílias com nível de renda situado entre 1 e 33 salários mínimos mensais e é usado principalmente como um índice de referência para avaliação do poder de compra do consumidor. Além do método utilizado atualmente no IBRE pelos analistas de preços, os métodos considerados neste estudo são: variações do Método do IBRE, Método do Boxplot, Método do Boxplot SIQR, Método do Boxplot Ajustado, Método de Cercas Resistentes, Método do Quartil, do Quartil Modificado, Método do Desvio Mediano Absoluto e Algoritmo de Tukey. Tais métodos foram aplicados em dados pertencentes aos municípios Rio de Janeiro e São Paulo. Para que se possa analisar o desempenho de cada método, é necessário conhecer os verdadeiros valores extremos antecipadamente. Portanto, neste trabalho, tal análise foi feita assumindo que os preços descartados ou alterados pelos analistas no processo de crítica são os verdadeiros outliers. O Método do IBRE é bastante correlacionado com os preços alterados ou descartados pelos analistas. Sendo assim, a suposição de que os preços alterados ou descartados pelos analistas são os verdadeiros valores extremos pode influenciar os resultados, fazendo com que o mesmo seja favorecido em comparação com os demais métodos. No entanto, desta forma, é possível computar duas medidas através das quais os métodos são avaliados. A primeira é a porcentagem de acerto do método, que informa a proporção de verdadeiros outliers detectados. A segunda é o número de falsos positivos produzidos pelo método, que informa quantos valores precisaram ser sinalizados para um verdadeiro outlier ser detectado. Quanto maior for a proporção de acerto gerada pelo método e menor for a quantidade de falsos positivos produzidos pelo mesmo, melhor é o desempenho do método. Sendo assim, foi possível construir um ranking referente ao desempenho dos métodos, identificando o melhor dentre os analisados. Para o município do Rio de Janeiro, algumas das variações do Método do IBRE apresentaram desempenhos iguais ou superiores ao do método original. Já para o município de São Paulo, o Método do IBRE apresentou o melhor desempenho. Em trabalhos futuros, espera-se testar os métodos em dados obtidos por simulação ou que constituam bases largamente utilizadas na literatura, de forma que a suposição de que os preços descartados ou alterados pelos analistas no processo de crítica são os verdadeiros outliers não interfira nos resultados.porTolerance intervalHit rateFalse positivesIntervalo de tolerânciaProporção de acertoFalsos positivosMatemáticaValores estranhos (Estatística)Índices de preços ao consumidorÍndices de preçosMétodos para detecção de outliers em séries de preços do índice de preços ao consumidorinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional do FGV (FGV Repositório Digital)instname:Fundação Getulio Vargas (FGV)instacron:FGVORIGINALDissertação - Taíse Ferraz Lyra (Versão Final).pdfDissertação - Taíse Ferraz Lyra (Versão Final).pdfArtigo Principalapplication/pdf1069993https://repositorio.fgv.br/bitstreams/9ad29d3c-6721-481e-93f2-668073257608/download3407689a27bfac06aff01d4fda05f6f2MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-84707https://repositorio.fgv.br/bitstreams/28751bd1-8f1a-416e-9942-070767e5e571/downloaddfb340242cced38a6cca06c627998fa1MD52TEXTDissertação - Taíse Ferraz Lyra (Versão Final).pdf.txtDissertação - Taíse Ferraz Lyra (Versão Final).pdf.txtExtracted texttext/plain103187https://repositorio.fgv.br/bitstreams/0f7ef27e-a690-47a1-bab6-44fb6152e74d/download02ec2f82ae133b3086e7d3987da1ccafMD55THUMBNAILDissertação - Taíse Ferraz Lyra (Versão Final).pdf.jpgDissertação - Taíse Ferraz Lyra (Versão Final).pdf.jpgGenerated Thumbnailimage/jpeg3061https://repositorio.fgv.br/bitstreams/13ee6058-d2fb-413e-81ee-59000d5fae87/download10965f6db8554a77f5b8cbf40e8310a3MD5610438/117802024-02-22 19:20:39.416restrictedoai:repositorio.fgv.br:10438/11780https://repositorio.fgv.brRepositório InstitucionalPRIhttp://bibliotecadigital.fgv.br/dspace-oai/requestopendoar:39742024-02-22T19:20:39Repositório Institucional do FGV (FGV Repositório Digital) - Fundação Getulio Vargas (FGV)falseVEVSTU9TIExJQ0VOQ0lBTUVOVE8gUEFSQSBBUlFVSVZBTUVOVE8sIFJFUFJPRFXDh8ODTyBFIERJVlVMR0HDh8ODTwpQw5pCTElDQSBERSBDT05URcOaRE8gw4AgQklCTElPVEVDQSBWSVJUVUFMIEZHViAodmVyc8OjbyAxLjIpCgoxLiBWb2PDqiwgdXN1w6FyaW8tZGVwb3NpdGFudGUgZGEgQmlibGlvdGVjYSBWaXJ0dWFsIEZHViwgYXNzZWd1cmEsIG5vCnByZXNlbnRlIGF0bywgcXVlIMOpIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhdHJpbW9uaWFpcyBlL291CmRpcmVpdG9zIGNvbmV4b3MgcmVmZXJlbnRlcyDDoCB0b3RhbGlkYWRlIGRhIE9icmEgb3JhIGRlcG9zaXRhZGEgZW0KZm9ybWF0byBkaWdpdGFsLCBiZW0gY29tbyBkZSBzZXVzIGNvbXBvbmVudGVzIG1lbm9yZXMsIGVtIHNlIHRyYXRhbmRvCmRlIG9icmEgY29sZXRpdmEsIGNvbmZvcm1lIG8gcHJlY2VpdHVhZG8gcGVsYSBMZWkgOS42MTAvOTggZS9vdSBMZWkKOS42MDkvOTguIE7Do28gc2VuZG8gZXN0ZSBvIGNhc28sIHZvY8OqIGFzc2VndXJhIHRlciBvYnRpZG8sIGRpcmV0YW1lbnRlCmRvcyBkZXZpZG9zIHRpdHVsYXJlcywgYXV0b3JpemHDp8OjbyBwcsOpdmlhIGUgZXhwcmVzc2EgcGFyYSBvIGRlcMOzc2l0byBlCmRpdnVsZ2HDp8OjbyBkYSBPYnJhLCBhYnJhbmdlbmRvIHRvZG9zIG9zIGRpcmVpdG9zIGF1dG9yYWlzIGUgY29uZXhvcwphZmV0YWRvcyBwZWxhIGFzc2luYXR1cmEgZG9zIHByZXNlbnRlcyB0ZXJtb3MgZGUgbGljZW5jaWFtZW50bywgZGUKbW9kbyBhIGVmZXRpdmFtZW50ZSBpc2VudGFyIGEgRnVuZGHDp8OjbyBHZXR1bGlvIFZhcmdhcyBlIHNldXMKZnVuY2lvbsOhcmlvcyBkZSBxdWFscXVlciByZXNwb25zYWJpbGlkYWRlIHBlbG8gdXNvIG7Do28tYXV0b3JpemFkbyBkbwptYXRlcmlhbCBkZXBvc2l0YWRvLCBzZWphIGVtIHZpbmN1bGHDp8OjbyDDoCBCaWJsaW90ZWNhIFZpcnR1YWwgRkdWLCBzZWphCmVtIHZpbmN1bGHDp8OjbyBhIHF1YWlzcXVlciBzZXJ2acOnb3MgZGUgYnVzY2EgZSBkaXN0cmlidWnDp8OjbyBkZSBjb250ZcO6ZG8KcXVlIGZhw6dhbSB1c28gZGFzIGludGVyZmFjZXMgZSBlc3Bhw6dvIGRlIGFybWF6ZW5hbWVudG8gcHJvdmlkZW5jaWFkb3MKcGVsYSBGdW5kYcOnw6NvIEdldHVsaW8gVmFyZ2FzIHBvciBtZWlvIGRlIHNldXMgc2lzdGVtYXMgaW5mb3JtYXRpemFkb3MuCgoyLiBBIGFzc2luYXR1cmEgZGVzdGEgbGljZW7Dp2EgdGVtIGNvbW8gY29uc2Vxw7zDqm5jaWEgYSB0cmFuc2ZlcsOqbmNpYSwgYQp0w610dWxvIG7Do28tZXhjbHVzaXZvIGUgbsOjby1vbmVyb3NvLCBpc2VudGEgZG8gcGFnYW1lbnRvIGRlIHJveWFsdGllcwpvdSBxdWFscXVlciBvdXRyYSBjb250cmFwcmVzdGHDp8OjbywgcGVjdW5pw6FyaWEgb3UgbsOjbywgw6AgRnVuZGHDp8OjbwpHZXR1bGlvIFZhcmdhcywgZG9zIGRpcmVpdG9zIGRlIGFybWF6ZW5hciBkaWdpdGFsbWVudGUsIHJlcHJvZHV6aXIgZQpkaXN0cmlidWlyIG5hY2lvbmFsIGUgaW50ZXJuYWNpb25hbG1lbnRlIGEgT2JyYSwgaW5jbHVpbmRvLXNlIG8gc2V1CnJlc3Vtby9hYnN0cmFjdCwgcG9yIG1laW9zIGVsZXRyw7RuaWNvcywgbm8gc2l0ZSBkYSBCaWJsaW90ZWNhIFZpcnR1YWwKRkdWLCBhbyBww7pibGljbyBlbSBnZXJhbCwgZW0gcmVnaW1lIGRlIGFjZXNzbyBhYmVydG8uCgozLiBBIHByZXNlbnRlIGxpY2Vuw6dhIHRhbWLDqW0gYWJyYW5nZSwgbm9zIG1lc21vcyB0ZXJtb3MgZXN0YWJlbGVjaWRvcwpubyBpdGVtIDIsIHN1cHJhLCBxdWFscXVlciBkaXJlaXRvIGRlIGNvbXVuaWNhw6fDo28gYW8gcMO6YmxpY28gY2Fiw612ZWwKZW0gcmVsYcOnw6NvIMOgIE9icmEgb3JhIGRlcG9zaXRhZGEsIGluY2x1aW5kby1zZSBvcyB1c29zIHJlZmVyZW50ZXMgw6AKcmVwcmVzZW50YcOnw6NvIHDDumJsaWNhIGUvb3UgZXhlY3XDp8OjbyBww7pibGljYSwgYmVtIGNvbW8gcXVhbHF1ZXIgb3V0cmEKbW9kYWxpZGFkZSBkZSBjb211bmljYcOnw6NvIGFvIHDDumJsaWNvIHF1ZSBleGlzdGEgb3UgdmVuaGEgYSBleGlzdGlyLApub3MgdGVybW9zIGRvIGFydGlnbyA2OCBlIHNlZ3VpbnRlcyBkYSBMZWkgOS42MTAvOTgsIG5hIGV4dGVuc8OjbyBxdWUKZm9yIGFwbGljw6F2ZWwgYW9zIHNlcnZpw6dvcyBwcmVzdGFkb3MgYW8gcMO6YmxpY28gcGVsYSBCaWJsaW90ZWNhClZpcnR1YWwgRkdWLgoKNC4gRXN0YSBsaWNlbsOnYSBhYnJhbmdlLCBhaW5kYSwgbm9zIG1lc21vcyB0ZXJtb3MgZXN0YWJlbGVjaWRvcyBubwppdGVtIDIsIHN1cHJhLCB0b2RvcyBvcyBkaXJlaXRvcyBjb25leG9zIGRlIGFydGlzdGFzIGludMOpcnByZXRlcyBvdQpleGVjdXRhbnRlcywgcHJvZHV0b3JlcyBmb25vZ3LDoWZpY29zIG91IGVtcHJlc2FzIGRlIHJhZGlvZGlmdXPDo28gcXVlCmV2ZW50dWFsbWVudGUgc2VqYW0gYXBsaWPDoXZlaXMgZW0gcmVsYcOnw6NvIMOgIG9icmEgZGVwb3NpdGFkYSwgZW0KY29uZm9ybWlkYWRlIGNvbSBvIHJlZ2ltZSBmaXhhZG8gbm8gVMOtdHVsbyBWIGRhIExlaSA5LjYxMC85OC4KCjUuIFNlIGEgT2JyYSBkZXBvc2l0YWRhIGZvaSBvdSDDqSBvYmpldG8gZGUgZmluYW5jaWFtZW50byBwb3IKaW5zdGl0dWnDp8O1ZXMgZGUgZm9tZW50byDDoCBwZXNxdWlzYSBvdSBxdWFscXVlciBvdXRyYSBzZW1lbGhhbnRlLCB2b2PDqgpvdSBvIHRpdHVsYXIgYXNzZWd1cmEgcXVlIGN1bXByaXUgdG9kYXMgYXMgb2JyaWdhw6fDtWVzIHF1ZSBsaGUgZm9yYW0KaW1wb3N0YXMgcGVsYSBpbnN0aXR1acOnw6NvIGZpbmFuY2lhZG9yYSBlbSByYXrDo28gZG8gZmluYW5jaWFtZW50bywgZQpxdWUgbsOjbyBlc3TDoSBjb250cmFyaWFuZG8gcXVhbHF1ZXIgZGlzcG9zacOnw6NvIGNvbnRyYXR1YWwgcmVmZXJlbnRlIMOgCnB1YmxpY2HDp8OjbyBkbyBjb250ZcO6ZG8gb3JhIHN1Ym1ldGlkbyDDoCBCaWJsaW90ZWNhIFZpcnR1YWwgRkdWLgoKNi4gQ2FzbyBhIE9icmEgb3JhIGRlcG9zaXRhZGEgZW5jb250cmUtc2UgbGljZW5jaWFkYSBzb2IgdW1hIGxpY2Vuw6dhCkNyZWF0aXZlIENvbW1vbnMgKHF1YWxxdWVyIHZlcnPDo28pLCBzb2IgYSBsaWNlbsOnYSBHTlUgRnJlZQpEb2N1bWVudGF0aW9uIExpY2Vuc2UgKHF1YWxxdWVyIHZlcnPDo28pLCBvdSBvdXRyYSBsaWNlbsOnYSBxdWFsaWZpY2FkYQpjb21vIGxpdnJlIHNlZ3VuZG8gb3MgY3JpdMOpcmlvcyBkYSBEZWZpbml0aW9uIG9mIEZyZWUgQ3VsdHVyYWwgV29ya3MKKGRpc3BvbsOtdmVsIGVtOiBodHRwOi8vZnJlZWRvbWRlZmluZWQub3JnL0RlZmluaXRpb24pIG91IEZyZWUgU29mdHdhcmUKRGVmaW5pdGlvbiAoZGlzcG9uw612ZWwgZW06IGh0dHA6Ly93d3cuZ251Lm9yZy9waGlsb3NvcGh5L2ZyZWUtc3cuaHRtbCksIApvIGFycXVpdm8gcmVmZXJlbnRlIMOgIE9icmEgZGV2ZSBpbmRpY2FyIGEgbGljZW7Dp2EgYXBsaWPDoXZlbCBlbQpjb250ZcO6ZG8gbGVnw612ZWwgcG9yIHNlcmVzIGh1bWFub3MgZSwgc2UgcG9zc8OtdmVsLCB0YW1iw6ltIGVtIG1ldGFkYWRvcwpsZWfDrXZlaXMgcG9yIG3DoXF1aW5hLiBBIGluZGljYcOnw6NvIGRhIGxpY2Vuw6dhIGFwbGljw6F2ZWwgZGV2ZSBzZXIKYWNvbXBhbmhhZGEgZGUgdW0gbGluayBwYXJhIG9zIHRlcm1vcyBkZSBsaWNlbmNpYW1lbnRvIG91IHN1YSBjw7NwaWEKaW50ZWdyYWwuCgoKQW8gY29uY2x1aXIgYSBwcmVzZW50ZSBldGFwYSBlIGFzIGV0YXBhcyBzdWJzZXHDvGVudGVzIGRvIHByb2Nlc3NvIGRlCnN1Ym1pc3PDo28gZGUgYXJxdWl2b3Mgw6AgQmlibGlvdGVjYSBWaXJ0dWFsIEZHViwgdm9jw6ogYXRlc3RhIHF1ZSBsZXUgZQpjb25jb3JkYSBpbnRlZ3JhbG1lbnRlIGNvbSBvcyB0ZXJtb3MgYWNpbWEgZGVsaW1pdGFkb3MsIGFzc2luYW5kby1vcwpzZW0gZmF6ZXIgcXVhbHF1ZXIgcmVzZXJ2YSBlIG5vdmFtZW50ZSBjb25maXJtYW5kbyBxdWUgY3VtcHJlIG9zCnJlcXVpc2l0b3MgaW5kaWNhZG9zIG5vIGl0ZW0gMSwgc3VwcmEuCgpIYXZlbmRvIHF1YWxxdWVyIGRpc2NvcmTDom5jaWEgZW0gcmVsYcOnw6NvIGFvcyBwcmVzZW50ZXMgdGVybW9zIG91IG7Do28Kc2UgdmVyaWZpY2FuZG8gbyBleGlnaWRvIG5vIGl0ZW0gMSwgc3VwcmEsIHZvY8OqIGRldmUgaW50ZXJyb21wZXIKaW1lZGlhdGFtZW50ZSBvIHByb2Nlc3NvIGRlIHN1Ym1pc3PDo28uIEEgY29udGludWlkYWRlIGRvIHByb2Nlc3NvCmVxdWl2YWxlIMOgIGFzc2luYXR1cmEgZGVzdGUgZG9jdW1lbnRvLCBjb20gdG9kYXMgYXMgY29uc2Vxw7zDqm5jaWFzIG5lbGUKcHJldmlzdGFzLCBzdWplaXRhbmRvLXNlIG8gc2lnbmF0w6FyaW8gYSBzYW7Dp8O1ZXMgY2l2aXMgZSBjcmltaW5haXMgY2Fzbwpuw6NvIHNlamEgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGF0cmltb25pYWlzIGUvb3UgY29uZXhvcwphcGxpY8OhdmVpcyDDoCBPYnJhIGRlcG9zaXRhZGEgZHVyYW50ZSBlc3RlIHByb2Nlc3NvLCBvdSBjYXNvIG7Do28gdGVuaGEKb2J0aWRvIHByw6l2aWEgZSBleHByZXNzYSBhdXRvcml6YcOnw6NvIGRvIHRpdHVsYXIgcGFyYSBvIGRlcMOzc2l0byBlCnRvZG9zIG9zIHVzb3MgZGEgT2JyYSBlbnZvbHZpZG9zLgoKClBhcmEgYSBzb2x1w6fDo28gZGUgcXVhbHF1ZXIgZMO6dmlkYSBxdWFudG8gYW9zIHRlcm1vcyBkZSBsaWNlbmNpYW1lbnRvIGUKbyBwcm9jZXNzbyBkZSBzdWJtaXNzw6NvLCBjbGlxdWUgbm8gbGluayAiRmFsZSBjb25vc2NvIi4K
dc.title.por.fl_str_mv Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
title Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
spellingShingle Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
Lyra, Taíse Ferraz
Tolerance interval
Hit rate
False positives
Intervalo de tolerância
Proporção de acerto
Falsos positivos
Matemática
Valores estranhos (Estatística)
Índices de preços ao consumidor
Índices de preços
title_short Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
title_full Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
title_fullStr Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
title_full_unstemmed Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
title_sort Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor
author Lyra, Taíse Ferraz
author_facet Lyra, Taíse Ferraz
author_role author
dc.contributor.unidadefgv.por.fl_str_mv Escolas::EMAp
dc.contributor.member.none.fl_str_mv Paixão, Crysttian Arantes
Silva, Moacyr Alvim Horta Barbosa da
Silva, Salomão Lipcovitch Quadros da
Zani, Sheila Cristina
dc.contributor.author.fl_str_mv Lyra, Taíse Ferraz
dc.contributor.advisor1.fl_str_mv Carvalho, Paulo Cezar P.
contributor_str_mv Carvalho, Paulo Cezar P.
dc.subject.eng.fl_str_mv Tolerance interval
topic Tolerance interval
Hit rate
False positives
Intervalo de tolerância
Proporção de acerto
Falsos positivos
Matemática
Valores estranhos (Estatística)
Índices de preços ao consumidor
Índices de preços
dc.subject.por.fl_str_mv Hit rate
False positives
Intervalo de tolerância
Proporção de acerto
Falsos positivos
dc.subject.area.por.fl_str_mv Matemática
dc.subject.bibliodata.por.fl_str_mv Valores estranhos (Estatística)
Índices de preços ao consumidor
Índices de preços
description Outliers are observations that appear to be inconsistent with the others. Also called atypical, extreme or aberrant values, these inconsistencies can be caused, for instance, by political changes or economic crises, unexpected cold or heat waves, and measurement or typing errors. Although outliers are not necessarily incorrect values, they can distort the results of an analysis and lead researchers to erroneous conclusions if they are related to measurement or typing errors. The objective of this research is to study and compare different methods for detecting abnormalities in the price series from the Consumer Price Index (Índice de Preços ao Consumidor - IPC), calculated by the Brazilian Institute of Economy (Instituto Brasileiro de Economia - IBRE) from Getulio Vargas Foundation (Fundação Getulio Vargas - FGV). The IPC measures the price variation of a fixed set of goods and services, which are part of customary expenses for families with income levels between 1 and 33 monthly minimum wages and is mainly used as an indice of reference to evaluate the purchasing power of consumer. In addition to the method currently used by price analysts in IBRE, the study also considered variations of the IBRE Method, the Boxplot Method, the SIQR Boxplot Method, the Adjusted Boxplot Method, the Resistant Fences Method, the Quartile Method, the Modified Quartile Method, the Median Absolute Deviation Method and the Tukey Algorithm. These methods wre applied to data of the munucipalities Rio de Janeiro and São Paulo. In order to analyze the performance of each method, it is necessary to know the real extreme values in advance. Therefore, in this study, it was assumed that prices which were discarded or changed by analysts in the critical process were the real outliers. The method from IBRE is correlated with altered or discarded prices by analysts. Thus, the assumption that the changed or discarded prices by the analysts are the real outliers can influence the results, causing the method from IBRE be favored compared to other methods. However, thus, it is possible to compute two measurements by which the methods are evaluated. The first is the method’s accuracy score, which displays the proportion of detected real outliers. The second is the number of false-positive produced by the method, that tells how many values needed to be flagged to detect a real outlier. As higher the hit rate generated by the method and as the lower the amount of false positives produced therefrom, the better the performance of the method. Therefore, it was possible to construct a ranking relative to the performance of the methods, identifying the best among those analyzed. In the municipality of Rio de Janeiro, some of the variations of the method from IBRE showed equal or superior to the original method performances. As for the city of São Paulo, the method from IBRE showed the best performance. It is argued that a method correctly detects an outlier when it signals a real outlier as an extreme value. The method with the highest accuracy score and with smaller number of false-positive was from IBRE. For future investigations, we hope to test the methods in data obtained from simulation and from widely used data bases, so that the assumption related to the discarded or changed prices, during the critical process, does not alter the results.
publishDate 2014
dc.date.accessioned.fl_str_mv 2014-05-26T19:28:52Z
dc.date.available.fl_str_mv 2014-05-26T19:28:52Z
dc.date.issued.fl_str_mv 2014-02-24
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv LYRA, Taíse Ferraz. Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor. Dissertação (Mestrado em Matemática Aplicada) - Escola de Matemática Aplicada, Fundação Getúlio Vargas - FGV, Rio de Janeiro, 2014.
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10438/11780
identifier_str_mv LYRA, Taíse Ferraz. Métodos para detecção de outliers em séries de preços do índice de preços ao consumidor. Dissertação (Mestrado em Matemática Aplicada) - Escola de Matemática Aplicada, Fundação Getúlio Vargas - FGV, Rio de Janeiro, 2014.
url https://hdl.handle.net/10438/11780
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional do FGV (FGV Repositório Digital)
instname:Fundação Getulio Vargas (FGV)
instacron:FGV
instname_str Fundação Getulio Vargas (FGV)
instacron_str FGV
institution FGV
reponame_str Repositório Institucional do FGV (FGV Repositório Digital)
collection Repositório Institucional do FGV (FGV Repositório Digital)
bitstream.url.fl_str_mv https://repositorio.fgv.br/bitstreams/9ad29d3c-6721-481e-93f2-668073257608/download
https://repositorio.fgv.br/bitstreams/28751bd1-8f1a-416e-9942-070767e5e571/download
https://repositorio.fgv.br/bitstreams/0f7ef27e-a690-47a1-bab6-44fb6152e74d/download
https://repositorio.fgv.br/bitstreams/13ee6058-d2fb-413e-81ee-59000d5fae87/download
bitstream.checksum.fl_str_mv 3407689a27bfac06aff01d4fda05f6f2
dfb340242cced38a6cca06c627998fa1
02ec2f82ae133b3086e7d3987da1ccaf
10965f6db8554a77f5b8cbf40e8310a3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional do FGV (FGV Repositório Digital) - Fundação Getulio Vargas (FGV)
repository.mail.fl_str_mv
_version_ 1827842521614516224