Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Silvana Mara Ribeiro lattes
Orientador(a): Cristiano Leite de Castro lattes
Banca de defesa: Luis Antonio Aguirre, Frederico Gadelha Guimarães
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Minas Gerais
Programa de Pós-Graduação: Programa de Pós-Graduação em Engenharia Elétrica
Departamento: ENG - DEPARTAMENTO DE ENGENHARIA ELÉTRICA
País: Brasil
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/46099
https://orcid.org/0000-0002-6754-4374
Resumo: Dealing with missingness in time series data is a very important, but oftentimes overlooked, step in data analysis. In this dissertation, the pattern of time series data and missingness mechanisms are described to help identify which imputation method should be used to impute missing data, along with a review of imputation methods and how they work. Recommended methods from literature are used to impute synthetic data of different pattern and the results are discussed. In this dissertation, two new methods to impute missing time steps are presented and compared to other classical imputation methods, as well as state-of-the-art methods. The first imputation method presented is Imputation by Pattern. This method is based on the premise that imputing the data using the literature- recommended methods will achieve the best results. Heuristics are proposed to separate the time series by pattern. The second imputation method presented is Imputation by Decomposition. This method consists in decomposing the time series in its components and then imputing them using the literature-recommended methods. The combination of these methods and the Kalman filter are also tested. The discussed imputation methods are used to impute a financial indexes and instability trackers data set, a COVID-19 data set and a deng data set and then predictions are made and the results are presented. The Imputation by Pattern method combined with the Kalman filter achieved consistently satisfactory results, although it did not always achieve the best results. The Imputation by Decomposition method achieved good results, specially when some time was spent investigating which variation worked better with each data set. Overall, both imputation method achieved similar, and in some cases, better results than the classical imputation methods.
id UFMG_c52974862cd94c9d1220dbf9f2f7ee37
oai_identifier_str oai:repositorio.ufmg.br:1843/46099
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Cristiano Leite de Castrohttp://lattes.cnpq.br/7892966809901738Luis Antonio AguirreFrederico Gadelha Guimarãeshttp://lattes.cnpq.br/8320252058050644Silvana Mara Ribeiro2022-10-07T17:37:19Z2022-10-07T17:37:19Z2021-07-28http://hdl.handle.net/1843/46099https://orcid.org/0000-0002-6754-4374Dealing with missingness in time series data is a very important, but oftentimes overlooked, step in data analysis. In this dissertation, the pattern of time series data and missingness mechanisms are described to help identify which imputation method should be used to impute missing data, along with a review of imputation methods and how they work. Recommended methods from literature are used to impute synthetic data of different pattern and the results are discussed. In this dissertation, two new methods to impute missing time steps are presented and compared to other classical imputation methods, as well as state-of-the-art methods. The first imputation method presented is Imputation by Pattern. This method is based on the premise that imputing the data using the literature- recommended methods will achieve the best results. Heuristics are proposed to separate the time series by pattern. The second imputation method presented is Imputation by Decomposition. This method consists in decomposing the time series in its components and then imputing them using the literature-recommended methods. The combination of these methods and the Kalman filter are also tested. The discussed imputation methods are used to impute a financial indexes and instability trackers data set, a COVID-19 data set and a deng data set and then predictions are made and the results are presented. The Imputation by Pattern method combined with the Kalman filter achieved consistently satisfactory results, although it did not always achieve the best results. The Imputation by Decomposition method achieved good results, specially when some time was spent investigating which variation worked better with each data set. Overall, both imputation method achieved similar, and in some cases, better results than the classical imputation methods.Um passo importante, porém muitas vezes negligenciado, durante a análise de dados de séries temporais é a imputação de dados ausentes. Nessa dissertação, as características de séries temporais e mecanismos de perda são descritos para ajudar na identificação de qual método de imputação deve ser utilizado para imputar dados ausentes, juntamente com uma revisão bibliográfica de métodos de imputação e seu funcionamento. Os métodos de imputação recomendados pela literatura são utilizados para imputar dados sintéticos com diferentes características e os resultados são discutidos. Dois novos métodos de imputação de séries temporais são apresentados e comparados com métodos de imputação clássicos e métodos do estado-da-arte. O primeiro método de imputação apresentado é o de Imputação pelo Padrão. Esse método se baseia na premissa que utilizando-se o método de imputação recomendado pela literatura para cada padrão de série temporal se obterá os melhores resultados. Heurísticas de separação das séries temporais por padrão foram desenvolvidas. O segundo método apresentado é o de Imputação por Decomposição. Esse método consiste em decompor a série temporal e depois imputar cada um de seus componentes pelos métodos recomendados pela literatura. As combinações desses métodos e o filtro de Kalman também foram testados. Os métodos de imputação discutidos são utilizados para imputar dados de índices financeiros e rastreadores de instabilidade, dados sobre a COVID-19 e dados sobre a dengue. Predições são realizadas com os dados dos casos de estudo e os resultados são apresentados. Os resultados obtidos pelo método de Imputação por Padrão combinado com o filtro de Kalman são consistentemente satisfatórios, apesar de nem sempre obter os melhores resultados. O método de Imputação por Decomposição também obteve bons resultados, principalmente quando algum tempo foi gasto para investigar qual de suas variações se adequou melhor a cada conjunto de dados. No geral, ambos os métodos mostraram resultados similares e/ou melhores que os métodos de imputação clássicos.engUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Engenharia ElétricaUFMGBrasilENG - DEPARTAMENTO DE ENGENHARIA ELÉTRICAhttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/info:eu-repo/semantics/openAccessEngenharia elétricaAnálise de séries temporaisAusência de dados (Estatística)Ciências sociais - Métodos estatísticosMissing dataTime seriesImputation methodsDecompositionPatternImputation by decomposition and by time series nature : novel imputation methods for missing data in time seriesImputação por decomposição e pela natureza da série temporal : novos métodos de imputação para dados ausentes em séries temporaisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALImputation by decomposition and by time series nature- novel imputation methods for missing data in time series.pdfImputation by decomposition and by time series nature- novel imputation methods for missing data in time series.pdfapplication/pdf6800395https://repositorio.ufmg.br/bitstream/1843/46099/1/Imputation%20by%20decomposition%20and%20by%20time%20series%20nature-%20novel%20imputation%20methods%20for%20missing%20data%20in%20time%20series.pdfbc7257b396996fe1bc85b5af10bfc960MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufmg.br/bitstream/1843/46099/2/license_rdfcfd6801dba008cb6adbd9838b81582abMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/46099/3/license.txtcda590c95a0b51b4d15f60c9642ca272MD531843/460992022-10-07 14:37:20.157oai:repositorio.ufmg.br:1843/46099TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2022-10-07T17:37:20Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
dc.title.alternative.pt_BR.fl_str_mv Imputação por decomposição e pela natureza da série temporal : novos métodos de imputação para dados ausentes em séries temporais
title Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
spellingShingle Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
Silvana Mara Ribeiro
Missing data
Time series
Imputation methods
Decomposition
Pattern
Engenharia elétrica
Análise de séries temporais
Ausência de dados (Estatística)
Ciências sociais - Métodos estatísticos
title_short Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
title_full Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
title_fullStr Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
title_full_unstemmed Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
title_sort Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
author Silvana Mara Ribeiro
author_facet Silvana Mara Ribeiro
author_role author
dc.contributor.advisor1.fl_str_mv Cristiano Leite de Castro
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/7892966809901738
dc.contributor.referee1.fl_str_mv Luis Antonio Aguirre
dc.contributor.referee2.fl_str_mv Frederico Gadelha Guimarães
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/8320252058050644
dc.contributor.author.fl_str_mv Silvana Mara Ribeiro
contributor_str_mv Cristiano Leite de Castro
Luis Antonio Aguirre
Frederico Gadelha Guimarães
dc.subject.por.fl_str_mv Missing data
Time series
Imputation methods
Decomposition
Pattern
topic Missing data
Time series
Imputation methods
Decomposition
Pattern
Engenharia elétrica
Análise de séries temporais
Ausência de dados (Estatística)
Ciências sociais - Métodos estatísticos
dc.subject.other.pt_BR.fl_str_mv Engenharia elétrica
Análise de séries temporais
Ausência de dados (Estatística)
Ciências sociais - Métodos estatísticos
description Dealing with missingness in time series data is a very important, but oftentimes overlooked, step in data analysis. In this dissertation, the pattern of time series data and missingness mechanisms are described to help identify which imputation method should be used to impute missing data, along with a review of imputation methods and how they work. Recommended methods from literature are used to impute synthetic data of different pattern and the results are discussed. In this dissertation, two new methods to impute missing time steps are presented and compared to other classical imputation methods, as well as state-of-the-art methods. The first imputation method presented is Imputation by Pattern. This method is based on the premise that imputing the data using the literature- recommended methods will achieve the best results. Heuristics are proposed to separate the time series by pattern. The second imputation method presented is Imputation by Decomposition. This method consists in decomposing the time series in its components and then imputing them using the literature-recommended methods. The combination of these methods and the Kalman filter are also tested. The discussed imputation methods are used to impute a financial indexes and instability trackers data set, a COVID-19 data set and a deng data set and then predictions are made and the results are presented. The Imputation by Pattern method combined with the Kalman filter achieved consistently satisfactory results, although it did not always achieve the best results. The Imputation by Decomposition method achieved good results, specially when some time was spent investigating which variation worked better with each data set. Overall, both imputation method achieved similar, and in some cases, better results than the classical imputation methods.
publishDate 2021
dc.date.issued.fl_str_mv 2021-07-28
dc.date.accessioned.fl_str_mv 2022-10-07T17:37:19Z
dc.date.available.fl_str_mv 2022-10-07T17:37:19Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/46099
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/0000-0002-6754-4374
url http://hdl.handle.net/1843/46099
https://orcid.org/0000-0002-6754-4374
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Engenharia Elétrica
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ENG - DEPARTAMENTO DE ENGENHARIA ELÉTRICA
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/46099/1/Imputation%20by%20decomposition%20and%20by%20time%20series%20nature-%20novel%20imputation%20methods%20for%20missing%20data%20in%20time%20series.pdf
https://repositorio.ufmg.br/bitstream/1843/46099/2/license_rdf
https://repositorio.ufmg.br/bitstream/1843/46099/3/license.txt
bitstream.checksum.fl_str_mv bc7257b396996fe1bc85b5af10bfc960
cfd6801dba008cb6adbd9838b81582ab
cda590c95a0b51b4d15f60c9642ca272
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1793891071879544832