Exportação concluída — 

Machine learning and information retrieval techniques for time series analysis

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Rozin, Bionda [UNESP]
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Estadual Paulista (Unesp)
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/11449/257737
Resumo: Due to the great applicability of time series in diverse scenarios, such as medicine, agriculture, economics, and science, the analysis and processing of this kind of data is demanding. Tools such as information retrieval, classification, and clustering are crucial for analyzing time series in different contexts and with different objectives. Information retrieval tasks in time series data can identify patterns and rank data by similarity. At the same time, classification can label time series based on a training set, and clustering can group time series based on their similarities. Also, semi-supervised classification considers both labeled and unlabeled data to perform classification. In general, Machine Learning and Information Retrieval tasks are extremely dependent on a good computational representation of data, generating more effective results and assertive conclusions about the performed task. In this scenario, one of the main challenges is to obtain good features from Time Series. Also, similarity metrics usually consider only pairwise relations, not considering important information in the neighborhood of the analyzed items in the dataset. The objective of this research is to apply machine learning and information retrieval techniques for obtaining effective results in time series analysis. Four different methods are employed, and different feature extractors are evaluated in all tasks. First, a comparative study of univariate time series representation and ranking through contextual ranked-based distance learning is conducted in 10 different datasets, leading to mAP gains up to 31.78\%. Giving sequence to this research line, we propose multivariate time series analysis by processing each dimension of the series individually and using contextual rank aggregation methods to merge results and obtain a similarity representation used for retrieval and classification, obtaining competitive results to two SOTA methods. A clustering-based framework for data analysis based on temporal graph encoding is also proposed, where data is split using time segmentation criteria, and highly interpretative results are reached in this framework when applied to ball possession analysis in football matches. Last, semi-supervised classification of univariate time series using imaging methods and label propagation is proposed, reaching similar results to supervised classification.
id UNSP_940a0d53c54c448c60037d7484af010e
oai_identifier_str oai:repositorio.unesp.br:11449/257737
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str
spelling Machine learning and information retrieval techniques for time series analysisTécnicas de aprendizado de máquina e recuperação da informação para análise de séries temporaisClassificationInformation retrievalClusteringSemi-supervised learningMachine learningFeature extractionRankingTime series analysisClassificaçãoRecuperação da informaçãoAgrupamentoAprendizado semi-supervisionadoAprendizado de máquinaExtração de característicasRanqueamentoAnálise de séries temporaisDue to the great applicability of time series in diverse scenarios, such as medicine, agriculture, economics, and science, the analysis and processing of this kind of data is demanding. Tools such as information retrieval, classification, and clustering are crucial for analyzing time series in different contexts and with different objectives. Information retrieval tasks in time series data can identify patterns and rank data by similarity. At the same time, classification can label time series based on a training set, and clustering can group time series based on their similarities. Also, semi-supervised classification considers both labeled and unlabeled data to perform classification. In general, Machine Learning and Information Retrieval tasks are extremely dependent on a good computational representation of data, generating more effective results and assertive conclusions about the performed task. In this scenario, one of the main challenges is to obtain good features from Time Series. Also, similarity metrics usually consider only pairwise relations, not considering important information in the neighborhood of the analyzed items in the dataset. The objective of this research is to apply machine learning and information retrieval techniques for obtaining effective results in time series analysis. Four different methods are employed, and different feature extractors are evaluated in all tasks. First, a comparative study of univariate time series representation and ranking through contextual ranked-based distance learning is conducted in 10 different datasets, leading to mAP gains up to 31.78\%. Giving sequence to this research line, we propose multivariate time series analysis by processing each dimension of the series individually and using contextual rank aggregation methods to merge results and obtain a similarity representation used for retrieval and classification, obtaining competitive results to two SOTA methods. A clustering-based framework for data analysis based on temporal graph encoding is also proposed, where data is split using time segmentation criteria, and highly interpretative results are reached in this framework when applied to ball possession analysis in football matches. Last, semi-supervised classification of univariate time series using imaging methods and label propagation is proposed, reaching similar results to supervised classification.Considerando o vasto domínio de aplicações de dados temporais, como o setor médico, agrícola, financeiro e científico, por exemplo, exige-se cada vez mais a análise e processamento desse tipo de dado. Tarefas como recuperação da informação, classificação e agrupamento são cruciais para analisar séries temporais em diferentes contextos e com diferentes objetivos. Recuperação da informação aplicadas em conjuntos de séries temporais permitem a identificação de padrões e ranqueamento dos dados conforme a sua semelhança, enquanto a classificação rotula séries temporais com base em um conjunto de treinamento, e tarefas de \textit{clustering} agrupam séries temporais com base em suas similaridades. Ainda, há a classificação semi-supervisionada, que considera ambos os dados rotulados e não-rotulados para classificar os dados. No geral, as tarefas de aprendizado de máquina e recuperação da informação são extremamente dependentes de uma boa representação computacional dos dados, gerando resultados mais eficazes e conclusões mais assertivas em relação à tarefa executada. Neste cenário, um dos desafios é obter uma boa representação computacional das séries temporais. Além disso, medidas de similaridade geralmente consideram apenas a similaridade par a par, desconsiderando informações importantes presentes na vizinhança dos itens analisados, no conjunto de dados. O objetivo dessa pesquisa é aplicar tecnicas de aprendizado de maquina e recuperação da informação para obter resultados efetivos em análises de séries temporais. Quatro diferentes métodos são empregados e diferentes extratores de características são avaliados em todas tarefas. Primeiro, um estudo comparativo de representação e ranqueamento de séries temporais univariadas por meio de aprendizado contextual de distância baseado em ranqueamento é conduzido em 10 conjuntos de dados diferentes, levando a ganhos de mAP de até 31,78\%. Dando sequência a esta linha de pesquisa, propomos a análise de séries temporais multivariada processando cada dimensão da série individualmente e utilizando métodos de agregação contextual de ranques para mesclar resultados e obter uma representação de similaridade utilizada para recuperação e classificação, obtendo resultados competitivos a dois métodos do estado da arte. Também é proposto um arcabouço baseado em agrupamento para análise de dados baseada na codificação de gráficos temporais, onde os dados são divididos usando critérios de segmentação temporal, e resultados altamente interpretativos são alcançados neste arcabouço quando aplicado à análise de posse de bola em partidas de futebol. Por último, é proposta uma classificação semi-supervisionada de séries temporais univariadas utilizando métodos de representação por imagem e propagação de rótulos, alcançando resultados semelhantes à classificação supervisionada.Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)FAPESP: 2022/01359-1FAPESP: 2023/08087-0Universidade Estadual Paulista (Unesp)Pedronette, Daniel Carlos Guimarães [UNESP]Universidade Estadual Paulista (Unesp)Rozin, Bionda [UNESP]2024-10-14T11:53:28Z2024-10-14T11:53:28Z2024-09-04info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/11449/25773733004153073P293597331448631060000-0002-5993-6570enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESP2024-10-14T15:18:53Zoai:repositorio.unesp.br:11449/257737Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestrepositoriounesp@unesp.bropendoar:29462024-10-14T15:18:53Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Machine learning and information retrieval techniques for time series analysis
Técnicas de aprendizado de máquina e recuperação da informação para análise de séries temporais
title Machine learning and information retrieval techniques for time series analysis
spellingShingle Machine learning and information retrieval techniques for time series analysis
Rozin, Bionda [UNESP]
Classification
Information retrieval
Clustering
Semi-supervised learning
Machine learning
Feature extraction
Ranking
Time series analysis
Classificação
Recuperação da informação
Agrupamento
Aprendizado semi-supervisionado
Aprendizado de máquina
Extração de características
Ranqueamento
Análise de séries temporais
title_short Machine learning and information retrieval techniques for time series analysis
title_full Machine learning and information retrieval techniques for time series analysis
title_fullStr Machine learning and information retrieval techniques for time series analysis
title_full_unstemmed Machine learning and information retrieval techniques for time series analysis
title_sort Machine learning and information retrieval techniques for time series analysis
author Rozin, Bionda [UNESP]
author_facet Rozin, Bionda [UNESP]
author_role author
dc.contributor.none.fl_str_mv Pedronette, Daniel Carlos Guimarães [UNESP]
Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv Rozin, Bionda [UNESP]
dc.subject.por.fl_str_mv Classification
Information retrieval
Clustering
Semi-supervised learning
Machine learning
Feature extraction
Ranking
Time series analysis
Classificação
Recuperação da informação
Agrupamento
Aprendizado semi-supervisionado
Aprendizado de máquina
Extração de características
Ranqueamento
Análise de séries temporais
topic Classification
Information retrieval
Clustering
Semi-supervised learning
Machine learning
Feature extraction
Ranking
Time series analysis
Classificação
Recuperação da informação
Agrupamento
Aprendizado semi-supervisionado
Aprendizado de máquina
Extração de características
Ranqueamento
Análise de séries temporais
description Due to the great applicability of time series in diverse scenarios, such as medicine, agriculture, economics, and science, the analysis and processing of this kind of data is demanding. Tools such as information retrieval, classification, and clustering are crucial for analyzing time series in different contexts and with different objectives. Information retrieval tasks in time series data can identify patterns and rank data by similarity. At the same time, classification can label time series based on a training set, and clustering can group time series based on their similarities. Also, semi-supervised classification considers both labeled and unlabeled data to perform classification. In general, Machine Learning and Information Retrieval tasks are extremely dependent on a good computational representation of data, generating more effective results and assertive conclusions about the performed task. In this scenario, one of the main challenges is to obtain good features from Time Series. Also, similarity metrics usually consider only pairwise relations, not considering important information in the neighborhood of the analyzed items in the dataset. The objective of this research is to apply machine learning and information retrieval techniques for obtaining effective results in time series analysis. Four different methods are employed, and different feature extractors are evaluated in all tasks. First, a comparative study of univariate time series representation and ranking through contextual ranked-based distance learning is conducted in 10 different datasets, leading to mAP gains up to 31.78\%. Giving sequence to this research line, we propose multivariate time series analysis by processing each dimension of the series individually and using contextual rank aggregation methods to merge results and obtain a similarity representation used for retrieval and classification, obtaining competitive results to two SOTA methods. A clustering-based framework for data analysis based on temporal graph encoding is also proposed, where data is split using time segmentation criteria, and highly interpretative results are reached in this framework when applied to ball possession analysis in football matches. Last, semi-supervised classification of univariate time series using imaging methods and label propagation is proposed, reaching similar results to supervised classification.
publishDate 2024
dc.date.none.fl_str_mv 2024-10-14T11:53:28Z
2024-10-14T11:53:28Z
2024-09-04
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/11449/257737
33004153073P2
9359733144863106
0000-0002-5993-6570
url https://hdl.handle.net/11449/257737
identifier_str_mv 33004153073P2
9359733144863106
0000-0002-5993-6570
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Estadual Paulista (Unesp)
publisher.none.fl_str_mv Universidade Estadual Paulista (Unesp)
dc.source.none.fl_str_mv reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv repositoriounesp@unesp.br
_version_ 1854954822638239744