Methodologies to improve one-class classifier performance applied to multivariate time series

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Machado, André Paulo Ferreira
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
eng
Instituição de defesa: Universidade Federal do Espírito Santo
BR
Doutorado em Engenharia Elétrica
Centro Tecnológico
UFES
Programa de Pós-Graduação em Engenharia Elétrica
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufes.br/handle/10/17724
Resumo: This work proposes novel methodologies to improve the performance of one-class classifiers applied to multivariate time series data. The main method is through clustering of multivariate time series. Datasets arising from real processes come from the available sensors and are affected by many factors, such as aging of the process, changes in the operation region, and equipment malfunction. Despite that, one expects that the classes represented by such diverse data can be unveiled via trained classifiers. This work hypothesizes that the overall performance can be improved by training sets of one-class classifiers with subsets of data clustered by similarity, obtained by DTW Barycenter Averaging (DBA) which is used to measure the similarity between the time series and each cluster. The proposed method is applied to one class classifiers since they are trained only with the target class, which is clustered based on time series similarity using Dynamic Time Warping and k-means. Additionally, a second approach is proposed, called time-shift of labels, to improve the differentiation between normal and faulty data. This method is applied during the training phase and focuses on particular situations involving the transition from normality to faulty data, where the boundaries are difficult to differentiate (overlapping data). The time-shift results show a mitigation of the effect of overlapping data. The advantages of the techniques are illustrated through their application to two public datasets one from the oil industry with instances characterizing eight classes of data represented by five time series (3W dataset), and another from a hydraulic system for the study of typical hydraulic system failures with five classes and seventeen time series (Condition monitoring of hydraulic systems - ICM dataset). For the 3W dataset, seven classes are selected to train Long Short Term Memory (LSTM) classifiers using the variables and instances clustered using time series clustering algorithms. The results demonstrate that increasing the similarity of training data tends to improve the performance of the LSTM classifier, achieving an increase of 10% in the overall performance on the 3W dataset. In a specific case, where the clustering model raised the similarity by 84%, the classification performance improved by 21%. For condition monitoring of hydraulic system data, the proposed method achieved a significant performance improvement of over 40% compared to the baseline model. Notably, in the specific case of leakage fault, the classification performance improvement rises by 64%
id UFES_a961638206ff322bb8e8bb55e28816c6
oai_identifier_str oai:repositorio.ufes.br:10/17724
network_acronym_str UFES
network_name_str Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
repository_id_str
spelling Methodologies to improve one-class classifier performance applied to multivariate time seriesAgrupamento de series temporaisOne-Class ClassifierDynamic Time WarpingLong Short-Term MemoryClassificação de Series Temporais MultivariadasEngenharia ElétricaThis work proposes novel methodologies to improve the performance of one-class classifiers applied to multivariate time series data. The main method is through clustering of multivariate time series. Datasets arising from real processes come from the available sensors and are affected by many factors, such as aging of the process, changes in the operation region, and equipment malfunction. Despite that, one expects that the classes represented by such diverse data can be unveiled via trained classifiers. This work hypothesizes that the overall performance can be improved by training sets of one-class classifiers with subsets of data clustered by similarity, obtained by DTW Barycenter Averaging (DBA) which is used to measure the similarity between the time series and each cluster. The proposed method is applied to one class classifiers since they are trained only with the target class, which is clustered based on time series similarity using Dynamic Time Warping and k-means. Additionally, a second approach is proposed, called time-shift of labels, to improve the differentiation between normal and faulty data. This method is applied during the training phase and focuses on particular situations involving the transition from normality to faulty data, where the boundaries are difficult to differentiate (overlapping data). The time-shift results show a mitigation of the effect of overlapping data. The advantages of the techniques are illustrated through their application to two public datasets one from the oil industry with instances characterizing eight classes of data represented by five time series (3W dataset), and another from a hydraulic system for the study of typical hydraulic system failures with five classes and seventeen time series (Condition monitoring of hydraulic systems - ICM dataset). For the 3W dataset, seven classes are selected to train Long Short Term Memory (LSTM) classifiers using the variables and instances clustered using time series clustering algorithms. The results demonstrate that increasing the similarity of training data tends to improve the performance of the LSTM classifier, achieving an increase of 10% in the overall performance on the 3W dataset. In a specific case, where the clustering model raised the similarity by 84%, the classification performance improved by 21%. For condition monitoring of hydraulic system data, the proposed method achieved a significant performance improvement of over 40% compared to the baseline model. Notably, in the specific case of leakage fault, the classification performance improvement rises by 64%Essa dissertação propõe metodologias inovadoras para melhorar o desempenho de classificadores One-Class aplicados a dados de séries temporais multivariadas. O método principal se baseia no agrupamento de séries temporais multivariadas. Os conjuntos de dados provenientes de processos reais vêm de sensores disponíveis e são afetados por muitos fatores, tais como o mudança do processo, mudanças na região de operação e mau funcionamento do equipamento. Apesar disso, espera-se que as classes representadas por esses dados tão diversos possam ser reveladas por meio de classificadores treinados. Este trabalho levanta a hipótese de que o desempenho geral pode ser aprimorado treinando conjuntos de classificadores One-Class com subconjuntos de dados agrupados por similaridade, obtidos pela Média da Centroide de Distorção Temporal Dinâmica (DTW Barycenter Averaging - DBA), usada para medir a similaridade entre as séries temporais e de cada grupo. O método proposto é aplicado a classificadores One-Class, pois eles são treinados apenas com a classe alvo, que é agrupada com base na similaridade da série temporal usando Distorção Temporal Dinâmica (DBA) e agrupamento de dados k-médias. Além disso, uma segunda abordagem é proposta, chamada deslocamento temporal de rótulos, para melhorar a diferenciação entre dados normais e defeituosos. Este método é aplicado durante a fase de treinamento e foca em situações específicas envolvendo a transição da normalidade para dados defeituosos, onde os limites são difíceis de diferenciar (dados sobrepostos). Os resultados do deslocamento temporal mostram uma mitigação do efeito dos dados sobrepostos. As vantagens das técnicas são ilustradas por meio de sua aplicação em dois conjuntos de dados públicos: um da indústria de petróleo com instâncias que caracterizam oito classes de dados representadas por cinco séries temporais (conjunto de dados 3W) e outro de um sistema hidráulico para o estudo de falhas típicas de sistemas hidráulicos com cinco classes e dezessete séries temporais (conjunto de dados Monitoramento de condições de sistemas hidráulicos - ICM). Para o conjunto de dados 3W, sete classes são selecionadas para treinar classificadores LSTM (Long Short-Term Memory) usando o agrupamento de séries temporais. Os resultados demonstram que o aumento da similaridade dos dados de treinamento tende a melhorar o desempenho do classificador LSTM, alcançando um aumento de 10% no desempenho geral no conjunto de dados 3W. Em um caso específico, onde o modelo de agrupamento aumentou a similaridade em 84%, o desempenho da classificação melhorou em 21%. Para o monitoramento da condição de dados do sistema hidráulico, o método proposto alcançou uma melhoria significativa de desempenho de mais de 40% em comparação com o modelo base. Notavelmente, no caso específico de falha de vazamento, a melhoria do desempenho da classificação aumentou em 64%Universidade Federal do Espírito SantoBRDoutorado em Engenharia ElétricaCentro TecnológicoUFESPrograma de Pós-Graduação em Engenharia ElétricaCiarelli, Patrick Marques https://orcid.org/0000-0003-3177-4028http://lattes.cnpq.br/1267950518719423Munaro, Celso José https://orcid.org/0000-0002-2297-7395http://lattes.cnpq.br/5929530967371970https://orcid.org/0009-0009-2472-4373https://orcid.org/0009-0009-2472-4373Vargas, Ricardo Emanuel Vaz https://orcid.org/0000-0001-6243-4590http://lattes.cnpq.br/1658300192778908Serra, Ginalber Luiz de Oliveira https://orcid.org/0000-0002-4424-618Xhttp://lattes.cnpq.br/0831092299374520Coelho, Leandro dos Santos https://orcid.org/0000-0001-5728-943Xhttp://lattes.cnpq.br/3483667901818921Lima Netto, Sergio https://orcid.org/0000-0001-7389-1463http://lattes.cnpq.br/3566465649283245Machado, André Paulo Ferreira2024-09-11T19:48:17Z2024-09-11T19:48:17Z2024-04-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisTextapplication/pdfapplication/pdfhttp://repositorio.ufes.br/handle/10/17724porenghttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)instname:Universidade Federal do Espírito Santo (UFES)instacron:UFES2025-02-18T15:56:32Zoai:repositorio.ufes.br:10/17724Repositório InstitucionalPUBhttp://repositorio.ufes.br/oai/requestriufes@ufes.bropendoar:21082025-02-18T15:56:32Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)false
dc.title.none.fl_str_mv Methodologies to improve one-class classifier performance applied to multivariate time series
title Methodologies to improve one-class classifier performance applied to multivariate time series
spellingShingle Methodologies to improve one-class classifier performance applied to multivariate time series
Machado, André Paulo Ferreira
Agrupamento de series temporais
One-Class Classifier
Dynamic Time Warping
Long Short-Term Memory
Classificação de Series Temporais Multivariadas
Engenharia Elétrica
title_short Methodologies to improve one-class classifier performance applied to multivariate time series
title_full Methodologies to improve one-class classifier performance applied to multivariate time series
title_fullStr Methodologies to improve one-class classifier performance applied to multivariate time series
title_full_unstemmed Methodologies to improve one-class classifier performance applied to multivariate time series
title_sort Methodologies to improve one-class classifier performance applied to multivariate time series
author Machado, André Paulo Ferreira
author_facet Machado, André Paulo Ferreira
author_role author
dc.contributor.none.fl_str_mv Ciarelli, Patrick Marques
https://orcid.org/0000-0003-3177-4028
http://lattes.cnpq.br/1267950518719423
Munaro, Celso José
https://orcid.org/0000-0002-2297-7395
http://lattes.cnpq.br/5929530967371970
https://orcid.org/0009-0009-2472-4373
https://orcid.org/0009-0009-2472-4373
Vargas, Ricardo Emanuel Vaz
https://orcid.org/0000-0001-6243-4590
http://lattes.cnpq.br/1658300192778908
Serra, Ginalber Luiz de Oliveira
https://orcid.org/0000-0002-4424-618X
http://lattes.cnpq.br/0831092299374520
Coelho, Leandro dos Santos
https://orcid.org/0000-0001-5728-943X
http://lattes.cnpq.br/3483667901818921
Lima Netto, Sergio
https://orcid.org/0000-0001-7389-1463
http://lattes.cnpq.br/3566465649283245
dc.contributor.author.fl_str_mv Machado, André Paulo Ferreira
dc.subject.por.fl_str_mv Agrupamento de series temporais
One-Class Classifier
Dynamic Time Warping
Long Short-Term Memory
Classificação de Series Temporais Multivariadas
Engenharia Elétrica
topic Agrupamento de series temporais
One-Class Classifier
Dynamic Time Warping
Long Short-Term Memory
Classificação de Series Temporais Multivariadas
Engenharia Elétrica
description This work proposes novel methodologies to improve the performance of one-class classifiers applied to multivariate time series data. The main method is through clustering of multivariate time series. Datasets arising from real processes come from the available sensors and are affected by many factors, such as aging of the process, changes in the operation region, and equipment malfunction. Despite that, one expects that the classes represented by such diverse data can be unveiled via trained classifiers. This work hypothesizes that the overall performance can be improved by training sets of one-class classifiers with subsets of data clustered by similarity, obtained by DTW Barycenter Averaging (DBA) which is used to measure the similarity between the time series and each cluster. The proposed method is applied to one class classifiers since they are trained only with the target class, which is clustered based on time series similarity using Dynamic Time Warping and k-means. Additionally, a second approach is proposed, called time-shift of labels, to improve the differentiation between normal and faulty data. This method is applied during the training phase and focuses on particular situations involving the transition from normality to faulty data, where the boundaries are difficult to differentiate (overlapping data). The time-shift results show a mitigation of the effect of overlapping data. The advantages of the techniques are illustrated through their application to two public datasets one from the oil industry with instances characterizing eight classes of data represented by five time series (3W dataset), and another from a hydraulic system for the study of typical hydraulic system failures with five classes and seventeen time series (Condition monitoring of hydraulic systems - ICM dataset). For the 3W dataset, seven classes are selected to train Long Short Term Memory (LSTM) classifiers using the variables and instances clustered using time series clustering algorithms. The results demonstrate that increasing the similarity of training data tends to improve the performance of the LSTM classifier, achieving an increase of 10% in the overall performance on the 3W dataset. In a specific case, where the clustering model raised the similarity by 84%, the classification performance improved by 21%. For condition monitoring of hydraulic system data, the proposed method achieved a significant performance improvement of over 40% compared to the baseline model. Notably, in the specific case of leakage fault, the classification performance improvement rises by 64%
publishDate 2024
dc.date.none.fl_str_mv 2024-09-11T19:48:17Z
2024-09-11T19:48:17Z
2024-04-05
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.ufes.br/handle/10/17724
url http://repositorio.ufes.br/handle/10/17724
dc.language.iso.fl_str_mv por
eng
language por
eng
dc.rights.driver.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv Text
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Universidade Federal do Espírito Santo
BR
Doutorado em Engenharia Elétrica
Centro Tecnológico
UFES
Programa de Pós-Graduação em Engenharia Elétrica
publisher.none.fl_str_mv Universidade Federal do Espírito Santo
BR
Doutorado em Engenharia Elétrica
Centro Tecnológico
UFES
Programa de Pós-Graduação em Engenharia Elétrica
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
instname:Universidade Federal do Espírito Santo (UFES)
instacron:UFES
instname_str Universidade Federal do Espírito Santo (UFES)
instacron_str UFES
institution UFES
reponame_str Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
collection Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)
repository.mail.fl_str_mv riufes@ufes.br
_version_ 1834479103304007680