BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift
| Ano de defesa: | 2023 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/ |
Resumo: | In the wake of the exponential growth in data generation witnessed in recent decades, this thesis addresses the inherent challenges of classification tasks within data streams, characterized by their continuous, real-time flow and inherent dynamic nature. With a focus on the compounding issues of concept drift and class imbalance, the thesis introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), a novel ensemble method to deal with these challenges. BASWE leverages Balanced Accuracy, sliding windows, and resampling techniques to effectively handle imbalanced classes and concept drifts, ensuring robust performance even as data patterns evolve. In experiments conducted on 40 datasets, comprising 16 real-world datasets and 24 synthetic datasets generated under three configurations - no drift, gradual drift, and sudden drift - and with varying imbalance ratios, BASWE demonstrated superior performance compared to seven other state-of-the-art algorithms (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB, and UOB) in terms of F1 Score and the Kappa statistic. |
| id |
USP_53f1b9f033adaec64700225e5f46c859 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-01012024-172026 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept driftBASWE: Balanced Accuracy-based Sliding Window Ensemble para classificação em fluxos de dados desbalanceados e com concept driftConcept DriftEnsembleAprendizado de MáquinaConcept DriftDados DesbalanceadosEnsembleImbalanced DatasetsMachine LearningIn the wake of the exponential growth in data generation witnessed in recent decades, this thesis addresses the inherent challenges of classification tasks within data streams, characterized by their continuous, real-time flow and inherent dynamic nature. With a focus on the compounding issues of concept drift and class imbalance, the thesis introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), a novel ensemble method to deal with these challenges. BASWE leverages Balanced Accuracy, sliding windows, and resampling techniques to effectively handle imbalanced classes and concept drifts, ensuring robust performance even as data patterns evolve. In experiments conducted on 40 datasets, comprising 16 real-world datasets and 24 synthetic datasets generated under three configurations - no drift, gradual drift, and sudden drift - and with varying imbalance ratios, BASWE demonstrated superior performance compared to seven other state-of-the-art algorithms (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB, and UOB) in terms of F1 Score and the Kappa statistic.Diante do crescimento exponencial na geração de dados presenciado nas últimas décadas, esta dissertação aborda os desafios inerentes às tarefas de classificação dentro de fluxos de dados, caracterizados por seu fluxo contínuo, em tempo real e natureza dinâmica inerente. Com foco nos problemas complexos de concept drift e classes desbalanceadas, a dissertação apresenta o Balanced Accuracy-based Sliding Window Ensemble (BASWE), um novo método de ensemble para lidar com esses desafios. O BASWE faz uso da Acurácia Balanceadas, janelas deslizantes e técnicas de reamostragem para lidar com classes desbalanceadas e concept drifts, garantindo um desempenho robusto mesmo à medida que as distribuições de dados se modificam. Em experimentos conduzidos em 40 datasets, compostos por 16 datasets reais e 24 datasets sintéticos gerados sob três configurações - sem concept drift, com concept drift gradual e com concept drift repentino - e com proporções de desequilíbrio variadas, o BASWE demonstrou desempenho superior em comparação com outros sete algoritmos estado-da-arte (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB e UOB) em termos de F1 Score e a estatística Kappa.Biblioteca Digitais de Teses e Dissertações da USPDelgado, Karina ValdiviaOliveira, Douglas Amorim de2023-11-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-07-07T13:31:02Zoai:teses.usp.br:tde-01012024-172026Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-07-07T13:31:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift BASWE: Balanced Accuracy-based Sliding Window Ensemble para classificação em fluxos de dados desbalanceados e com concept drift |
| title |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift |
| spellingShingle |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift Oliveira, Douglas Amorim de Concept Drift Ensemble Aprendizado de Máquina Concept Drift Dados Desbalanceados Ensemble Imbalanced Datasets Machine Learning |
| title_short |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift |
| title_full |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift |
| title_fullStr |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift |
| title_full_unstemmed |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift |
| title_sort |
BASWE: Balanced Accuracy-based Sliding Window Ensemble for classification in imbalanced data streams with concept drift |
| author |
Oliveira, Douglas Amorim de |
| author_facet |
Oliveira, Douglas Amorim de |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Delgado, Karina Valdivia |
| dc.contributor.author.fl_str_mv |
Oliveira, Douglas Amorim de |
| dc.subject.por.fl_str_mv |
Concept Drift Ensemble Aprendizado de Máquina Concept Drift Dados Desbalanceados Ensemble Imbalanced Datasets Machine Learning |
| topic |
Concept Drift Ensemble Aprendizado de Máquina Concept Drift Dados Desbalanceados Ensemble Imbalanced Datasets Machine Learning |
| description |
In the wake of the exponential growth in data generation witnessed in recent decades, this thesis addresses the inherent challenges of classification tasks within data streams, characterized by their continuous, real-time flow and inherent dynamic nature. With a focus on the compounding issues of concept drift and class imbalance, the thesis introduces the Balanced Accuracy-based Sliding Window Ensemble (BASWE), a novel ensemble method to deal with these challenges. BASWE leverages Balanced Accuracy, sliding windows, and resampling techniques to effectively handle imbalanced classes and concept drifts, ensuring robust performance even as data patterns evolve. In experiments conducted on 40 datasets, comprising 16 real-world datasets and 24 synthetic datasets generated under three configurations - no drift, gradual drift, and sudden drift - and with varying imbalance ratios, BASWE demonstrated superior performance compared to seven other state-of-the-art algorithms (CALMID, CSARF, KUE, OOB, ROSE, SMOTE-OB, and UOB) in terms of F1 Score and the Kappa statistic. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-11-29 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/ |
| url |
https://www.teses.usp.br/teses/disponiveis/100/100131/tde-01012024-172026/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1844786324666580992 |