A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization
| Ano de defesa: | 2018 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de Pernambuco
|
| Programa de Pós-Graduação: |
Programa de Pos Graduacao em Ciencia da Computacao
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Brasil
|
| Palavras-chave em Português: | |
| Link de acesso: | https://repositorio.ufpe.br/handle/123456789/30523 |
Resumo: | Data Clustering is one of the most important issues in data mining and machine learning. Clustering is a task of discovering homogeneous groups of the studied objects. Recently, many researchers have a significant interest in developing clustering algorithms. The most problem in clustering is that we do not have prior information knowledge about the given dataset. The traditional clustering approaches are designed for searching clusters in the entire space. However, in high-dimensional real world datasets, there are usually many irrelevant dimensions for clustering, where the traditional clustering methods work often improperly. Subspace clustering is an extension of traditional clustering that enables finding subspace clusters only in relevant dimensions within a data set. However, most subspace clustering methods usually suffer from the issue that their complicated parameter settings are almost troublesome to be determined, and therefore it can be difficult to implement these methods in practical applications. This work proposes a partitioning fuzzy clustering algorithm with entropy regularization and automatic variable selection through adaptive distance where the dissimilarity measure is obtained as the sum of the Euclidean distance between objects and prototypes calculated individually for each variable. The main advantage of the proposed approach to conventional clustering methods is the possibility of using adaptive distances, which change with each iteration of the algorithm. This type of dissimilarity measure is adequate to learn the weights of the variables dynamically during the clustering process, leading to an improvement of the performance of the algorithms. Another advantage of the proposed approach is the use of the entropy regularization term that serves as a regulating factor during the minimization process. The proposed method is an iterative three-step algorithm that provides a fuzzy partition, a representative for each fuzzy cluster. For this, an objective function that includes a multidimensional distance function as a measure of dissimilarity and entropy as the regularization term is minimized. Experiments on simulated, real world and image data corroborate the usefulness of the proposed algorithm. |
| id |
UFPE_dc752dbacfeedad0c95497dd8ffb36a1 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufpe.br:123456789/30523 |
| network_acronym_str |
UFPE |
| network_name_str |
Repositório Institucional da UFPE |
| repository_id_str |
|
| spelling |
RIZO RODRÍGUEZ, Sara Inéshttp://lattes.cnpq.br/5082535257923332http://lattes.cnpq.br/3909162572623711CARVALHO, Francisco de Assis Tenório de2019-05-07T20:44:09Z2019-05-07T20:44:09Z2018-02-21https://repositorio.ufpe.br/handle/123456789/30523Data Clustering is one of the most important issues in data mining and machine learning. Clustering is a task of discovering homogeneous groups of the studied objects. Recently, many researchers have a significant interest in developing clustering algorithms. The most problem in clustering is that we do not have prior information knowledge about the given dataset. The traditional clustering approaches are designed for searching clusters in the entire space. However, in high-dimensional real world datasets, there are usually many irrelevant dimensions for clustering, where the traditional clustering methods work often improperly. Subspace clustering is an extension of traditional clustering that enables finding subspace clusters only in relevant dimensions within a data set. However, most subspace clustering methods usually suffer from the issue that their complicated parameter settings are almost troublesome to be determined, and therefore it can be difficult to implement these methods in practical applications. This work proposes a partitioning fuzzy clustering algorithm with entropy regularization and automatic variable selection through adaptive distance where the dissimilarity measure is obtained as the sum of the Euclidean distance between objects and prototypes calculated individually for each variable. The main advantage of the proposed approach to conventional clustering methods is the possibility of using adaptive distances, which change with each iteration of the algorithm. This type of dissimilarity measure is adequate to learn the weights of the variables dynamically during the clustering process, leading to an improvement of the performance of the algorithms. Another advantage of the proposed approach is the use of the entropy regularization term that serves as a regulating factor during the minimization process. The proposed method is an iterative three-step algorithm that provides a fuzzy partition, a representative for each fuzzy cluster. For this, an objective function that includes a multidimensional distance function as a measure of dissimilarity and entropy as the regularization term is minimized. Experiments on simulated, real world and image data corroborate the usefulness of the proposed algorithm.CNPqO agrupamento de dados é uma das questões mais importantes na mineração de dados e na aprendizagem de máquinas. O objetivo principal é descobrir grupos homogêneos nos objetos estudados. A maior dificuldade é que não se tem conhecimento prévio sobre o conjunto de dados. Usualmente as abordagens de agrupamento tradicionais são projetadas para pesquisar grupos em todo o espaço. No entanto, em conjuntos de dados reais de alta dimensão, geralmente existem muitas características irrelevantes para o agrupamento, onde os métodos tradicionais não apresentam bom performance. O agrupamento em subespaços é uma extensão do agrupamento tradicional que permite encontrar grupos em subespaços gerados apenas pelas variáveis relevantes do conjunto de dados. No entanto a maioria desses métodos precisam configurações de parâmetros não triviais e portanto o uso desses métodos em aplicações práticas é dificultado devido a encontrar a parametrização apropriada. Este trabalho propõe um algoritmo de agrupamento particional difuso com regularização de entropia e seleção automática de variáveis. A seleção de variáveis é feita através de distância adaptativa sendo a medida de dissimilaridade a soma das distâncias Euclidiana entre padrões e protótipos para cada variável. A principal vantagem da abordagem proposta sobre os métodos de agrupamento convencionais é a possibilidade do uso de distâncias adaptativas, as quais mudam a cada iteração do algoritmo. Este tipo de medida de dissimilaridade é adequado ao aprendizado dos pesos das variáveis dinamicamente durante o processo de agrupamento, levando a uma melhora do desempenho dos algoritmos. Outra vantagem da abordagem proposta é o uso do termo de regularização da entropia que serve como um fator regulador durante o processo de minimização. O método proposto é um algoritmo iterativo de três passos que fornece uma partição difusa, um representante para cada grupo difuso e aprende um peso de relevância para cada variável em cada grupo. Para isto é minimizada uma função objetivo que inclui uma função de distância multidimensional como medida de dissimilaridade e entropia como o termo de regularização. Os experimentos realizados em conjuntos de dados simulados, do mundo real e em imagens corroboram a utilidade do algoritmo proposto.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessMineração de dadosAgrupamento difusoA fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularizationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETHUMBNAILDISSERTAÇÃO Sara Inés Rizo Rodríguez.pdf.jpgDISSERTAÇÃO Sara Inés Rizo Rodríguez.pdf.jpgGenerated Thumbnailimage/jpeg1342https://repositorio.ufpe.br/bitstream/123456789/30523/6/DISSERTA%c3%87%c3%83O%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.jpg3fac4ae31557d8237c4cd39cb69f6ebaMD56ORIGINALDISSERTAÇÃO Sara Inés Rizo Rodríguez.pdfDISSERTAÇÃO Sara Inés Rizo Rodríguez.pdfapplication/pdf7245472https://repositorio.ufpe.br/bitstream/123456789/30523/1/DISSERTA%c3%87%c3%83O%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf7ff6007cf4da5683d5dd24dc8434f7aaMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82311https://repositorio.ufpe.br/bitstream/123456789/30523/3/license.txt4b8a02c7f2818eaf00dcf2260dd5eb08MD53CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/30523/4/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD54TEXTDISSERTAÇÃO Sara Inés Rizo Rodríguez.pdf.txtDISSERTAÇÃO Sara Inés Rizo Rodríguez.pdf.txtExtracted texttext/plain228302https://repositorio.ufpe.br/bitstream/123456789/30523/5/DISSERTA%c3%87%c3%83O%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.txt237fa147b99aac7e77f3ef0fa9f764dbMD55123456789/305232019-10-25 10:32:03.528oai:repositorio.ufpe.br:123456789/30523TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLMKgZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gcmVzcGVjdGl2byBjb250cmF0byBvdSBhY29yZG8uCgpBIFVGUEUgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIG8ocykgbm9tZShzKSBkbyhzKSBhdXRvciAoZXMpIGRvcyBkaXJlaXRvcyBkbyBkb2N1bWVudG8gZW50cmVndWUgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRvIHByZXZpc3RvIG5hIGFsw61uZWEgYykuCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-25T13:32:03Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
| dc.title.pt_BR.fl_str_mv |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| title |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| spellingShingle |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization RIZO RODRÍGUEZ, Sara Inés Mineração de dados Agrupamento difuso |
| title_short |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| title_full |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| title_fullStr |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| title_full_unstemmed |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| title_sort |
A fuzzy partitional clustering algorithm with adaptative euclidean distance and entropy regularization |
| author |
RIZO RODRÍGUEZ, Sara Inés |
| author_facet |
RIZO RODRÍGUEZ, Sara Inés |
| author_role |
author |
| dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/5082535257923332 |
| dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/3909162572623711 |
| dc.contributor.author.fl_str_mv |
RIZO RODRÍGUEZ, Sara Inés |
| dc.contributor.advisor1.fl_str_mv |
CARVALHO, Francisco de Assis Tenório de |
| contributor_str_mv |
CARVALHO, Francisco de Assis Tenório de |
| dc.subject.por.fl_str_mv |
Mineração de dados Agrupamento difuso |
| topic |
Mineração de dados Agrupamento difuso |
| description |
Data Clustering is one of the most important issues in data mining and machine learning. Clustering is a task of discovering homogeneous groups of the studied objects. Recently, many researchers have a significant interest in developing clustering algorithms. The most problem in clustering is that we do not have prior information knowledge about the given dataset. The traditional clustering approaches are designed for searching clusters in the entire space. However, in high-dimensional real world datasets, there are usually many irrelevant dimensions for clustering, where the traditional clustering methods work often improperly. Subspace clustering is an extension of traditional clustering that enables finding subspace clusters only in relevant dimensions within a data set. However, most subspace clustering methods usually suffer from the issue that their complicated parameter settings are almost troublesome to be determined, and therefore it can be difficult to implement these methods in practical applications. This work proposes a partitioning fuzzy clustering algorithm with entropy regularization and automatic variable selection through adaptive distance where the dissimilarity measure is obtained as the sum of the Euclidean distance between objects and prototypes calculated individually for each variable. The main advantage of the proposed approach to conventional clustering methods is the possibility of using adaptive distances, which change with each iteration of the algorithm. This type of dissimilarity measure is adequate to learn the weights of the variables dynamically during the clustering process, leading to an improvement of the performance of the algorithms. Another advantage of the proposed approach is the use of the entropy regularization term that serves as a regulating factor during the minimization process. The proposed method is an iterative three-step algorithm that provides a fuzzy partition, a representative for each fuzzy cluster. For this, an objective function that includes a multidimensional distance function as a measure of dissimilarity and entropy as the regularization term is minimized. Experiments on simulated, real world and image data corroborate the usefulness of the proposed algorithm. |
| publishDate |
2018 |
| dc.date.issued.fl_str_mv |
2018-02-21 |
| dc.date.accessioned.fl_str_mv |
2019-05-07T20:44:09Z |
| dc.date.available.fl_str_mv |
2019-05-07T20:44:09Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/30523 |
| url |
https://repositorio.ufpe.br/handle/123456789/30523 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
| dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
| dc.publisher.initials.fl_str_mv |
UFPE |
| dc.publisher.country.fl_str_mv |
Brasil |
| publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
| instname_str |
Universidade Federal de Pernambuco (UFPE) |
| instacron_str |
UFPE |
| institution |
UFPE |
| reponame_str |
Repositório Institucional da UFPE |
| collection |
Repositório Institucional da UFPE |
| bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/30523/6/DISSERTA%c3%87%c3%83O%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.jpg https://repositorio.ufpe.br/bitstream/123456789/30523/1/DISSERTA%c3%87%c3%83O%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf https://repositorio.ufpe.br/bitstream/123456789/30523/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/30523/4/license_rdf https://repositorio.ufpe.br/bitstream/123456789/30523/5/DISSERTA%c3%87%c3%83O%20Sara%20In%c3%a9s%20Rizo%20Rodr%c3%adguez.pdf.txt |
| bitstream.checksum.fl_str_mv |
3fac4ae31557d8237c4cd39cb69f6eba 7ff6007cf4da5683d5dd24dc8434f7aa 4b8a02c7f2818eaf00dcf2260dd5eb08 e39d27027a6cc9cb039ad269a5db8e34 237fa147b99aac7e77f3ef0fa9f764db |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
| repository.mail.fl_str_mv |
attena@ufpe.br |
| _version_ |
1862741709562052608 |