Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Fróes, Nádia Junqueira Martarelli
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/18/18156/tde-16112021-121737/
Resumo: Data mining techniques have gained prominence in recent years due to their wide range of applications. Among such techniques is data clustering, which identifies groups in unlabeled datasets. In view of the need to describe the characteristics of a phenomenon in more detail through numerical and categorical attributes, the development of the data clustering technique started to include the study of the mixed datasets in the process of clustering. Although promising, this new branch of study is still recent in the literature. Thus, this thesis aims to contribute to the advancement of the mixed data clustering problem, through four objectives, which are: to propose a standard representation for documents published in the knowledge-discovery field, to carry out a systematic review of the literature that provides a comprehensive view of this thematic and also a detailed comprehension of the selected documents; to perform the modeling of the meta-heuristic Biased Random-Key Genetic Algorithm (BRKGA) to propose a solution to the feature balancing problem in distance-based mixed data clustering algorithms; and to perform the modeling and hybridization of the following meta-heuristics: Evolutionary Clustering Search, Iterated Local Search, and BRKGA, to propose a solution to the feature weighting problem in a model-based mixed data clustering algorithm. As a result, one proposed for an initial idea for a standard representation, and a systematic review of the literature was obtained with a bibliometric and individual analysis of 160 documents resulted from the selection step of the designed methodological procedure. In addition, the proposed meta-heuristics obtained the best performances in most of the 476 simulated datasets, which contemplated several characteristics, such as data generated through the normal and lognormal distribution, balancing and unbalancing numerical and categorical attributes in relation to the amount of each type of feature in the dataset, different levels of overlapping of attributes, among others. Therefore, one concludes that this thesis reached its objectives, contributing to the advancement of the mixed data clustering technique.
id USP_1d29653ec137c4b7de1ccd90a85f843a
oai_identifier_str oai:teses.usp.br:tde-16112021-121737
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methodsContribuições para o problema de agrupamento de dados mistos: de uma proposta conceitual de codificação e classificação ao uso de métodos de otimizaçãoAgrupamento de Dados MistosBalanceamento de AtributosFeature BalancingFeature WeightingMixed Data ClusteringPonderação de AtributosRepresentação PadrãoRevisão Sistemática da LiteraturaStandard RepresentationSystematic Literature ReviewData mining techniques have gained prominence in recent years due to their wide range of applications. Among such techniques is data clustering, which identifies groups in unlabeled datasets. In view of the need to describe the characteristics of a phenomenon in more detail through numerical and categorical attributes, the development of the data clustering technique started to include the study of the mixed datasets in the process of clustering. Although promising, this new branch of study is still recent in the literature. Thus, this thesis aims to contribute to the advancement of the mixed data clustering problem, through four objectives, which are: to propose a standard representation for documents published in the knowledge-discovery field, to carry out a systematic review of the literature that provides a comprehensive view of this thematic and also a detailed comprehension of the selected documents; to perform the modeling of the meta-heuristic Biased Random-Key Genetic Algorithm (BRKGA) to propose a solution to the feature balancing problem in distance-based mixed data clustering algorithms; and to perform the modeling and hybridization of the following meta-heuristics: Evolutionary Clustering Search, Iterated Local Search, and BRKGA, to propose a solution to the feature weighting problem in a model-based mixed data clustering algorithm. As a result, one proposed for an initial idea for a standard representation, and a systematic review of the literature was obtained with a bibliometric and individual analysis of 160 documents resulted from the selection step of the designed methodological procedure. In addition, the proposed meta-heuristics obtained the best performances in most of the 476 simulated datasets, which contemplated several characteristics, such as data generated through the normal and lognormal distribution, balancing and unbalancing numerical and categorical attributes in relation to the amount of each type of feature in the dataset, different levels of overlapping of attributes, among others. Therefore, one concludes that this thesis reached its objectives, contributing to the advancement of the mixed data clustering technique.As técnicas de mineração de dados têm ganhado destaque nos últimos anos devido sua ampla gama de aplicações. Dentre tais técnicas, está o agrupamento de dados, o qual identifica grupos em conjuntos de dados não rotulados. Diante da necessidade de descrever as características do fenômeno estudado com mais detalhes por meio do uso de atributos numéricos e categóricos, o desenvolvimento da técnica de agrupamento de dados passou a contemplar o estudo do agrupamento de dados mistos. Embora promissor, este novo ramo de estudo ainda é recente na literatura. Dessa forma, esta tese visa contribuir com o avanço da técnica de agrupamento de dados mistos, por meio de quatro objetivos, que são: propor uma representação padrão para documentos publicados no campo de descoberta do conhecimento, realizar uma revisão sistemática da literatura que forneça uma visão abrangente desta temática e também detalhada dos documentos selecionados; realizar a modelagem da meta-heurística Biased Random-Key Genetic Algorithm (BRKGA) para propor uma solução para o problema do balanceamento de atributos em algoritmos de agrupamento de dados mistos baseados em distância; e realizar a modelagem e hibridização das meta-heurísticas: Evolutionary Clustering Search, Iterated Local Search e BRKGA, para propor uma solução para o problema de ponderação de atributos em um algoritmo de agrupamento de dados mistos baseado em modelo. Como resultados, obteve-se uma proposta de ideia inicial para a representação padrão e uma revisão sistemática da literatura com uma análise bibliométrica e individual de 160 documentos resultantes da seleção feita para o estudo. Além disso, as meta-heurísticas propostas obtiveram os melhores desempenhos na maioria dos 476 conjuntos de dados simulados, os quais contemplaram diversas características, como dados gerados por meio de distribuição normal e lognormal, balanceamento e desbalanceamento de atributos numéricos e categóricos com relação a quantidade de cada tipo de atributo no conjunto de dados, sobreposição de atributos em diferentes níveis, entre outras. Portanto, conclui-se que esta tese alcançou seus objetivos, contribuindo para o avanço da técnica de agrupamento de dados mistos.Biblioteca Digitais de Teses e Dissertações da USPNagano, Marcelo SeidoFróes, Nádia Junqueira Martarelli2021-07-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/18/18156/tde-16112021-121737/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-12-09T20:18:06Zoai:teses.usp.br:tde-16112021-121737Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-12-09T20:18:06Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
Contribuições para o problema de agrupamento de dados mistos: de uma proposta conceitual de codificação e classificação ao uso de métodos de otimização
title Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
spellingShingle Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
Fróes, Nádia Junqueira Martarelli
Agrupamento de Dados Mistos
Balanceamento de Atributos
Feature Balancing
Feature Weighting
Mixed Data Clustering
Ponderação de Atributos
Representação Padrão
Revisão Sistemática da Literatura
Standard Representation
Systematic Literature Review
title_short Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
title_full Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
title_fullStr Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
title_full_unstemmed Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
title_sort Contributions to the mixed data clustering problem: from a conceptual codification and classification proposal to the usage of optimization methods
author Fróes, Nádia Junqueira Martarelli
author_facet Fróes, Nádia Junqueira Martarelli
author_role author
dc.contributor.none.fl_str_mv Nagano, Marcelo Seido
dc.contributor.author.fl_str_mv Fróes, Nádia Junqueira Martarelli
dc.subject.por.fl_str_mv Agrupamento de Dados Mistos
Balanceamento de Atributos
Feature Balancing
Feature Weighting
Mixed Data Clustering
Ponderação de Atributos
Representação Padrão
Revisão Sistemática da Literatura
Standard Representation
Systematic Literature Review
topic Agrupamento de Dados Mistos
Balanceamento de Atributos
Feature Balancing
Feature Weighting
Mixed Data Clustering
Ponderação de Atributos
Representação Padrão
Revisão Sistemática da Literatura
Standard Representation
Systematic Literature Review
description Data mining techniques have gained prominence in recent years due to their wide range of applications. Among such techniques is data clustering, which identifies groups in unlabeled datasets. In view of the need to describe the characteristics of a phenomenon in more detail through numerical and categorical attributes, the development of the data clustering technique started to include the study of the mixed datasets in the process of clustering. Although promising, this new branch of study is still recent in the literature. Thus, this thesis aims to contribute to the advancement of the mixed data clustering problem, through four objectives, which are: to propose a standard representation for documents published in the knowledge-discovery field, to carry out a systematic review of the literature that provides a comprehensive view of this thematic and also a detailed comprehension of the selected documents; to perform the modeling of the meta-heuristic Biased Random-Key Genetic Algorithm (BRKGA) to propose a solution to the feature balancing problem in distance-based mixed data clustering algorithms; and to perform the modeling and hybridization of the following meta-heuristics: Evolutionary Clustering Search, Iterated Local Search, and BRKGA, to propose a solution to the feature weighting problem in a model-based mixed data clustering algorithm. As a result, one proposed for an initial idea for a standard representation, and a systematic review of the literature was obtained with a bibliometric and individual analysis of 160 documents resulted from the selection step of the designed methodological procedure. In addition, the proposed meta-heuristics obtained the best performances in most of the 476 simulated datasets, which contemplated several characteristics, such as data generated through the normal and lognormal distribution, balancing and unbalancing numerical and categorical attributes in relation to the amount of each type of feature in the dataset, different levels of overlapping of attributes, among others. Therefore, one concludes that this thesis reached its objectives, contributing to the advancement of the mixed data clustering technique.
publishDate 2021
dc.date.none.fl_str_mv 2021-07-16
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/18/18156/tde-16112021-121737/
url https://www.teses.usp.br/teses/disponiveis/18/18156/tde-16112021-121737/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815258229124366336