Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados

Cazzolato, Mirela Teixeira

Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados

Detalhes bibliográficos
Ano de defesa:	2014
Autor(a) principal:	Cazzolato, Mirela Teixeira
Orientador(a):	Ribeiro, Marcela Xavier
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	BR
Palavras-chave em Português:	Ciência da computação Banco de dados Fluxo de dados Classificação Data mining (Mineração de dados) Fractais Árvore de decisão Algoritmo Incremental
Palavras-chave em Inglês:	Data streams Classification Data mining Decision tree Incremental algorithm StARMiner Tree FDDM Fractal theory
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/565
Resumo:	A data stream is generated in a fast way, continuously, ordered, and in large quantities. To process data streams there must be considered, among others factors, the limited use of memory, the need of real-time processing, the accuracy of the results and the concept drift (which occurs when there is a change in the concept of the data being analyzed). Decision tree is a popular form of representation of the classifier, that is intuitive and fast to build, generally obtaining high accuracy. The techniques of incremental decision trees present in the literature generally have high computational costs to construct and update the model, especially regarding the calculation to split the decision nodes. The existent methods have a conservative characteristic to deal with limited amounts of data, tending to improve their results as the number of examples increases. Another problem is that many real-world applications generate data with noise, and the existing techniques have a low tolerance to these events. This work aims to develop decision tree methods for data streams, that supply the deficiencies of the current state of the art. In addition, another objective is to develop a technique to detect concept drift using the fractal theory. This functionality should indicate when there is a need to correct the model, allowing the adequate description of most recent events. To achieve the objectives, three decision tree algorithms were developed: StARMiner Tree, Automatic StARMiner Tree, and Information Gain StARMiner Tree. These algorithms use a statistical method as heuristic to split the nodes, which is not dependent on the number of examples and is fast. In the experiments the algorithms achieved high accuracy, also showing a tolerant behavior in the classification of noisy data. Finally, a drift detection method was proposed to detect changes in the data distribution, based on the fractal theory. The method, called Fractal Detection Method, detects significant changes on the data distribution, causing the model to be updated when it does not describe the data (becoming obsolete). The method achieved good results in the classification of data containing concept drift, proving to be suitable for evolutionary analysis of data.

Metadados do item

id	SCAR_d260d657a6e9514fc46f4d3c58a9b182
oai_identifier_str	oai:repositorio.ufscar.br:20.500.14289/565
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str
spelling	Cazzolato, Mirela TeixeiraRibeiro, Marcela Xavierhttp://lattes.cnpq.br/0300141044144026http://lattes.cnpq.br/5404143204431052ea6530f1-0d8a-4e1d-999b-271af6fa764d2016-06-02T19:06:13Z2014-07-282016-06-02T19:06:13Z2014-03-24CAZZOLATO, Mirela Teixeira. Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados. 2014. 75 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2014.https://repositorio.ufscar.br/handle/20.500.14289/565A data stream is generated in a fast way, continuously, ordered, and in large quantities. To process data streams there must be considered, among others factors, the limited use of memory, the need of real-time processing, the accuracy of the results and the concept drift (which occurs when there is a change in the concept of the data being analyzed). Decision tree is a popular form of representation of the classifier, that is intuitive and fast to build, generally obtaining high accuracy. The techniques of incremental decision trees present in the literature generally have high computational costs to construct and update the model, especially regarding the calculation to split the decision nodes. The existent methods have a conservative characteristic to deal with limited amounts of data, tending to improve their results as the number of examples increases. Another problem is that many real-world applications generate data with noise, and the existing techniques have a low tolerance to these events. This work aims to develop decision tree methods for data streams, that supply the deficiencies of the current state of the art. In addition, another objective is to develop a technique to detect concept drift using the fractal theory. This functionality should indicate when there is a need to correct the model, allowing the adequate description of most recent events. To achieve the objectives, three decision tree algorithms were developed: StARMiner Tree, Automatic StARMiner Tree, and Information Gain StARMiner Tree. These algorithms use a statistical method as heuristic to split the nodes, which is not dependent on the number of examples and is fast. In the experiments the algorithms achieved high accuracy, also showing a tolerant behavior in the classification of noisy data. Finally, a drift detection method was proposed to detect changes in the data distribution, based on the fractal theory. The method, called Fractal Detection Method, detects significant changes on the data distribution, causing the model to be updated when it does not describe the data (becoming obsolete). The method achieved good results in the classification of data containing concept drift, proving to be suitable for evolutionary analysis of data.Um data stream e gerado de forma rápida, contínua, ordenada e em grande quantidade. Para o processamento de data streams deve-se considerar, dentre outros fatores, o uso limitado de memoria, a necessidade de processamento em tempo real, a precisão dos resultados e o concept drift (que ocorre quando há uma mudança no conceito dos dados que estão sendo analisados). À arvore de decisão e uma popular forma de representação do modelo classificador, intuitiva, e rápida de construir, geralmente possuindo alta acurada. Às técnicas de arvores de decisão incrementais presentes na literatura geralmente apresentam um alto custo computacional para a construção e atualização do modelo, principalmente no que se refere ao calculo para a decisão de divisão dos nós. Os métodos existentes possuem uma característica conservadora para lidar com quantidades de dados limitadas, tendendo a melhorar seus resultados conforme o número de exemplos aumenta. Outro problema e a geração dos dados com ruídos por muitas aplicações reais, pois as técnicas existentes possuem baixa tolerância a essas ocorrências. Este trabalho tem como objetivo o desenvolvimento de métodos de arvores de decisão para data streams, que suprem as deficiências do atual estado da arte. Além disso, outro objetivo deste projeto e o desenvolvimento de uma funcionalidade para detecção de concept drift utilizando a teoria dos fractais, corrigindo o modelo sempre que necessário, possibilitando a descrição correta dos acontecimentos mais recentes dos dados. Para atingir os objetivos foram desenvolvidos três algoritmos de arvore de decisão: o StÀRMiner Tree, o Àutomatic StÀRMiner Tree, e o Information Gain StÀR-Miner Tree. Esses algoritmos utilizam um método estatístico como heurística de divisão de nós, que não é dependente do numero de exemplos lidos e que e rápida. Os algoritmos obtiveram alta acurácia nos experimentos realizados, mostrando também um comportamento tolerante na classificação de dados ruidosos. Finalmente, foi proposto um método para a detecção de mudanças no comportamento dos dados baseado na teoria dos fractais, o Fractal Drift Detection Method. Ele detecta mudanças significativas na distribuicao dos dados, fazendo com que o modelo seja atualizado sempre que o mesmo não descrever os dados atuais (se tornar obsoleto). O método obteve bons resultados na classificação de dados contendo concept drift, mostrando ser adequado para a análise evolutiva dos dados.Financiadora de Estudos e Projetosapplication/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarBRCiência da computaçãoBanco de dadosFluxo de dadosClassificaçãoData mining (Mineração de dados)FractaisÁrvore de decisãoAlgoritmo IncrementalData streamsClassificationData miningDecision treeIncremental algorithmStARMiner TreeFDDMFractal theoryCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOClassificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dadosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis04d8be23-7330-4147-baf0-14545dd9cbdfinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL5984.pdfapplication/pdf1962060https://repositorio.ufscar.br/bitstreams/082f6c8c-b3dd-4720-8d8e-2dbf8ec7a4a4/downloadd943b973e9dd5f12ab87985f7388cb80MD51trueAnonymousREADTEXT5984.pdf.txt5984.pdf.txtExtracted texttext/plain0https://repositorio.ufscar.br/bitstreams/8c626ea8-7610-41f6-8da1-9edd3a5c4abf/downloadd41d8cd98f00b204e9800998ecf8427eMD56falseAnonymousREADTHUMBNAIL5984.pdf.jpg5984.pdf.jpgIM Thumbnailimage/jpeg8829https://repositorio.ufscar.br/bitstreams/81ece515-06cb-4cc7-9599-c18af6fa1521/download50bf0d5808d65b02705984bb420895c9MD57falseAnonymousREAD20.500.14289/5652025-02-06 04:40:43.63open.accessoai:repositorio.ufscar.br:20.500.14289/565https://repositorio.ufscar.brRepositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestrepositorio.sibi@ufscar.bropendoar:43222025-02-06T07:40:43Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
title	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
spellingShingle	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados Cazzolato, Mirela Teixeira Ciência da computação Banco de dados Fluxo de dados Classificação Data mining (Mineração de dados) Fractais Árvore de decisão Algoritmo Incremental Data streams Classification Data mining Decision tree Incremental algorithm StARMiner Tree FDDM Fractal theory CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
title_full	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
title_fullStr	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
title_full_unstemmed	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
title_sort	Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados
author	Cazzolato, Mirela Teixeira
author_facet	Cazzolato, Mirela Teixeira
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/5404143204431052
dc.contributor.author.fl_str_mv	Cazzolato, Mirela Teixeira
dc.contributor.advisor1.fl_str_mv	Ribeiro, Marcela Xavier
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/0300141044144026
dc.contributor.authorID.fl_str_mv	ea6530f1-0d8a-4e1d-999b-271af6fa764d
contributor_str_mv	Ribeiro, Marcela Xavier
dc.subject.por.fl_str_mv	Ciência da computação Banco de dados Fluxo de dados Classificação Data mining (Mineração de dados) Fractais Árvore de decisão Algoritmo Incremental
topic	Ciência da computação Banco de dados Fluxo de dados Classificação Data mining (Mineração de dados) Fractais Árvore de decisão Algoritmo Incremental Data streams Classification Data mining Decision tree Incremental algorithm StARMiner Tree FDDM Fractal theory CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	Data streams Classification Data mining Decision tree Incremental algorithm StARMiner Tree FDDM Fractal theory
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	A data stream is generated in a fast way, continuously, ordered, and in large quantities. To process data streams there must be considered, among others factors, the limited use of memory, the need of real-time processing, the accuracy of the results and the concept drift (which occurs when there is a change in the concept of the data being analyzed). Decision tree is a popular form of representation of the classifier, that is intuitive and fast to build, generally obtaining high accuracy. The techniques of incremental decision trees present in the literature generally have high computational costs to construct and update the model, especially regarding the calculation to split the decision nodes. The existent methods have a conservative characteristic to deal with limited amounts of data, tending to improve their results as the number of examples increases. Another problem is that many real-world applications generate data with noise, and the existing techniques have a low tolerance to these events. This work aims to develop decision tree methods for data streams, that supply the deficiencies of the current state of the art. In addition, another objective is to develop a technique to detect concept drift using the fractal theory. This functionality should indicate when there is a need to correct the model, allowing the adequate description of most recent events. To achieve the objectives, three decision tree algorithms were developed: StARMiner Tree, Automatic StARMiner Tree, and Information Gain StARMiner Tree. These algorithms use a statistical method as heuristic to split the nodes, which is not dependent on the number of examples and is fast. In the experiments the algorithms achieved high accuracy, also showing a tolerant behavior in the classification of noisy data. Finally, a drift detection method was proposed to detect changes in the data distribution, based on the fractal theory. The method, called Fractal Detection Method, detects significant changes on the data distribution, causing the model to be updated when it does not describe the data (becoming obsolete). The method achieved good results in the classification of data containing concept drift, proving to be suitable for evolutionary analysis of data.
publishDate	2014
dc.date.available.fl_str_mv	2014-07-28 2016-06-02T19:06:13Z
dc.date.issued.fl_str_mv	2014-03-24
dc.date.accessioned.fl_str_mv	2016-06-02T19:06:13Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	CAZZOLATO, Mirela Teixeira. Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados. 2014. 75 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2014.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/20.500.14289/565
identifier_str_mv	CAZZOLATO, Mirela Teixeira. Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados. 2014. 75 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2014.
url	https://repositorio.ufscar.br/handle/20.500.14289/565
dc.language.iso.fl_str_mv	por
language	por
dc.relation.authority.fl_str_mv	04d8be23-7330-4147-baf0-14545dd9cbdf
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv	UFSCar
dc.publisher.country.fl_str_mv	BR
publisher.none.fl_str_mv	Universidade Federal de São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstreams/082f6c8c-b3dd-4720-8d8e-2dbf8ec7a4a4/download https://repositorio.ufscar.br/bitstreams/8c626ea8-7610-41f6-8da1-9edd3a5c4abf/download https://repositorio.ufscar.br/bitstreams/81ece515-06cb-4cc7-9599-c18af6fa1521/download
bitstream.checksum.fl_str_mv	d943b973e9dd5f12ab87985f7388cb80 d41d8cd98f00b204e9800998ecf8427e 50bf0d5808d65b02705984bb420895c9
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv	repositorio.sibi@ufscar.br
_version_	1851688924884238336

Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados

Registros relacionados