BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Oliveira, Douglas Amorim de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/
Resumo: In the wake of the exponential growth in data generation witnessed in recent decades, this thesis addresses the inherent challenges of classification tasks within data streams, characterized by their continuous, real-time flow and inherent dynamic nature. With a focus on the compounding issues of concept drift and class imbalance, the thesis introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), a novel ensemble method to deal with these challenges. BASWE leverages Balanced Accuracy, sliding windows, and resampling techniques to effectively handle imbalanced classes and concept drifts, ensuring robust performance even as data patterns evolve. In experiments conducted on 40 datasets, comprising 16 real-world datasets and 24 synthetic datasets generated under three configurations - no drift, gradual drift, and sudden drift - and with varying imbalance ratios, BASWE demonstrated superior performance compared to seven other state-of-the-art algorithms (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB, and UOB) in terms of F1 Score and the Kappa statistic.
id USP_53f1b9f033adaec64700225e5f46c859
oai_identifier_str oai:teses.usp.br:tde-01012024-172026
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept driftBASWE: Balanced Accuracy-based Sliding Window Ensemble para classificação em fluxos de dados desbalanceados e com concept driftConcept DriftEnsembleAprendizado de MáquinaConcept DriftDados DesbalanceadosEnsembleImbalanced DatasetsMachine LearningIn the wake of the exponential growth in data generation witnessed in recent decades, this thesis addresses the inherent challenges of classification tasks within data streams, characterized by their continuous, real-time flow and inherent dynamic nature. With a focus on the compounding issues of concept drift and class imbalance, the thesis introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), a novel ensemble method to deal with these challenges. BASWE leverages Balanced Accuracy, sliding windows, and resampling techniques to effectively handle imbalanced classes and concept drifts, ensuring robust performance even as data patterns evolve. In experiments conducted on 40 datasets, comprising 16 real-world datasets and 24 synthetic datasets generated under three configurations - no drift, gradual drift, and sudden drift - and with varying imbalance ratios, BASWE demonstrated superior performance compared to seven other state-of-the-art algorithms (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB, and UOB) in terms of F1 Score and the Kappa statistic.Diante do crescimento exponencial na geração de dados presenciado nas últimas décadas, esta dissertação aborda os desafios inerentes às tarefas de classificação dentro de fluxos de dados, caracterizados por seu fluxo contínuo, em tempo real e natureza dinâmica inerente. Com foco nos problemas complexos de concept drift e classes desbalanceadas, a dissertação apresenta o Balanced Accuracy-based Sliding Window Ensemble (BASWE), um novo método de ensemble para lidar com esses desafios. O BASWE faz uso da Acurácia Balanceadas, janelas deslizantes e técnicas de reamostragem para lidar com classes desbalanceadas e concept drifts, garantindo um desempenho robusto mesmo à medida que as distribuições de dados se modificam. Em experimentos conduzidos em 40 datasets, compostos por 16 datasets reais e 24 datasets sintéticos gerados sob três configurações - sem concept drift, com concept drift gradual e com concept drift repentino - e com proporções de desequilíbrio variadas, o BASWE demonstrou desempenho superior em comparação com outros sete algoritmos estado-da-arte (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB e UOB) em termos de F1 Score e a estatística Kappa.Biblioteca Digitais de Teses e Dissertações da USPDelgado, Karina ValdiviaOliveira, Douglas Amorim de2023-11-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-07-07T13:31:02Zoai:teses.usp.br:tde-01012024-172026Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-07-07T13:31:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
BASWE: Balanced Accuracy-based Sliding Window Ensemble para classificação em fluxos de dados desbalanceados e com concept drift
title BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
spellingShingle BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
Oliveira, Douglas Amorim de
Concept Drift
Ensemble
Aprendizado de Máquina
Concept Drift
Dados Desbalanceados
Ensemble
Imbalanced Datasets
Machine Learning
title_short BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
title_full BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
title_fullStr BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
title_full_unstemmed BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
title_sort BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
author Oliveira, Douglas Amorim de
author_facet Oliveira, Douglas Amorim de
author_role author
dc.contributor.none.fl_str_mv Delgado, Karina Valdivia
dc.contributor.author.fl_str_mv Oliveira, Douglas Amorim de
dc.subject.por.fl_str_mv Concept Drift
Ensemble
Aprendizado de Máquina
Concept Drift
Dados Desbalanceados
Ensemble
Imbalanced Datasets
Machine Learning
topic Concept Drift
Ensemble
Aprendizado de Máquina
Concept Drift
Dados Desbalanceados
Ensemble
Imbalanced Datasets
Machine Learning
description In the wake of the exponential growth in data generation witnessed in recent decades, this thesis addresses the inherent challenges of classification tasks within data streams, characterized by their continuous, real-time flow and inherent dynamic nature. With a focus on the compounding issues of concept drift and class imbalance, the thesis introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), a novel ensemble method to deal with these challenges. BASWE leverages Balanced Accuracy, sliding windows, and resampling techniques to effectively handle imbalanced classes and concept drifts, ensuring robust performance even as data patterns evolve. In experiments conducted on 40 datasets, comprising 16 real-world datasets and 24 synthetic datasets generated under three configurations - no drift, gradual drift, and sudden drift - and with varying imbalance ratios, BASWE demonstrated superior performance compared to seven other state-of-the-art algorithms (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB, and UOB) in terms of F1 Score and the Kappa statistic.
publishDate 2023
dc.date.none.fl_str_mv 2023-11-29
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/
url https://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1844786324666580992