Semantic description and internal validation of clusters for applications in categorical data sets
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| dARK ID: | ark:/48912/001300001jjpk |
| Idioma: | eng |
| Instituição de defesa: |
Universidade Federal de São Paulo
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/11600/71444 |
Resumo: | In clustering problems whose objective is not based specifically on spatial proximity but rather on feature patterns, traditional cluster validation indices may not be appropriate. This work proposes a tool that performs the description of clusters and can be used as an internal validation index to suggest the most appropriate number of clusters for applications in categorical data sets. To evaluate our index, we also propose a categorical synthetic data generator specifically designed for this application. We tested synthetic and real data sets with different configurations to evaluate the performance of the proposed index in comparison with well-known indexes in the literature. Thus, we demonstrate that the index has great potential to describe clusters and discover the number of most suitable clusters. The synthetic data generator is capable of producing relevant data sets for the internal validation process. |
| id |
UFSP_82bfcc8520b4a57f5f86f2d3ef6b8be0 |
|---|---|
| oai_identifier_str |
oai:repositorio.unifesp.br:11600/71444 |
| network_acronym_str |
UFSP |
| network_name_str |
Repositório Institucional da UNIFESP |
| repository_id_str |
|
| spelling |
http://lattes.cnpq.br/0145582312635382http://lattes.cnpq.br/1785341067396776Aquino, Roberto Douglas Guimarães de [UNIFESP]http://lattes.cnpq.br/2373005809061037Curtis, Vitor VenceslauVerri, Filipe Alves NetoInstituto Tecnológico de Aeronáutica2024-07-23T10:53:01Z2024-07-23T10:53:01Z2024-06-19In clustering problems whose objective is not based specifically on spatial proximity but rather on feature patterns, traditional cluster validation indices may not be appropriate. This work proposes a tool that performs the description of clusters and can be used as an internal validation index to suggest the most appropriate number of clusters for applications in categorical data sets. To evaluate our index, we also propose a categorical synthetic data generator specifically designed for this application. We tested synthetic and real data sets with different configurations to evaluate the performance of the proposed index in comparison with well-known indexes in the literature. Thus, we demonstrate that the index has great potential to describe clusters and discover the number of most suitable clusters. The synthetic data generator is capable of producing relevant data sets for the internal validation process.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)curtis@ita.br76 f.https://hdl.handle.net/11600/71444ark:/48912/001300001jjpkengUniversidade Federal de São Pauloinfo:eu-repo/semantics/openAccesscluster analysissemantic descriptioninternal clustering validation indexsynthetic dataSemantic description and internal validation of clusters for applications in categorical data setsinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/publishedVersionreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESPInstituto de Ciência e Tecnologia (ICT)Pesquisa OperacionalCiência de dadosCiência de dadosLICENSElicense.txtlicense.txttext/plain; charset=utf-85679https://repositorio.unifesp.br/bitstreams/4d898c10-366f-4698-a39b-09dbe82e40a6/download859ba7aac438f424e54bd364c2aecf3cMD52ORIGINALPhD Thesis vITA.pdfPhD Thesis vITA.pdfapplication/pdf3196700https://repositorio.unifesp.br/bitstreams/a78d9ebe-e6cb-4104-974b-2a6c600e492f/download4ba0a148142553cc9902c8f1a476f5b1MD53TEXTPhD Thesis vITA.pdf.txtPhD Thesis vITA.pdf.txtExtracted texttext/plain100828https://repositorio.unifesp.br/bitstreams/d24ca402-4e49-45c9-955a-bfe72b2fcb92/download882c7c94c369cea11f5d0ea39a33059eMD56THUMBNAILPhD Thesis vITA.pdf.jpgPhD Thesis vITA.pdf.jpgGenerated Thumbnailimage/jpeg3536https://repositorio.unifesp.br/bitstreams/ab76a045-eb2e-4cdd-a038-ef28d8c4afc8/downloade128fa76232876ba0cae62fe38b119b4MD5711600/714442024-08-22 09:15:33.473oai:repositorio.unifesp.br:11600/71444https://repositorio.unifesp.brRepositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestbiblioteca.csp@unifesp.bropendoar:34652024-08-22T09:15:33Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)falseClRFUk1PUyBFIENPTkRJw4fDlUVTIFBBUkEgTyBMSUNFTkNJQU1FTlRPIERPIEFSUVVJVkFNRU5UTywgUkVQUk9EVcOHw4NPIEUgRElWVUxHQcOHw4NPIFDDmkJMSUNBIERFIENPTlRFw5pETyBOTyBSRVBPU0lUw5NSSU8gSU5TVElUVUNJT05BTCBVTklGRVNQIDxicj48YnI+CgoxLiBEZWNsYXJvLW1lIHJlc3BvbnPDoXZlbCBwZWxvIHRyYWJhbGhvICBlL291IHVzdcOhcmlvLWRlcG9zaXRhbnRlIG5vIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIFVOSUZFU1AsYXNzZWd1cm8gbm8gcHJlc2VudGUgYXRvIHF1ZSBzb3UgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgcGF0cmltb25pYWlzIGUvb3UgZGlyZWl0b3MgY29uZXhvcyByZWZlcmVudGVzIMOgIHRvdGFsaWRhZGUgZGEgT2JyYSBvcmEgZGVwb3NpdGFkYSBlbSBmb3JtYXRvIGRpZ2l0YWwsIGJlbSBjb21vIGRlIHNldXMgY29tcG9uZW50ZXMgbWVub3JlcywgZW0gc2UgdHJhdGFuZG8gZGUgb2JyYSBjb2xldGl2YSwgY29uZm9ybWUgbyBwcmVjZWl0dWFkbyBwZWxhIExlaSA5LjYxMC85OCBlL291IExlaSA5LjYwOS85OC4gTsOjbyBzZW5kbyBlc3RlIG8gY2FzbywgYXNzZWd1cm8gdGVyIG9idGlkbyBkaXJldGFtZW50ZSBkb3MgZGV2aWRvcyB0aXR1bGFyZXMgYXV0b3JpemHDp8OjbyBwcsOpdmlhIGUgZXhwcmVzc2EgcGFyYSBvIGRlcMOzc2l0byBlIHBhcmEgYSBkaXZ1bGdhw6fDo28gZGEgT2JyYSwgYWJyYW5nZW5kbyB0b2RvcyBvcyBkaXJlaXRvcyBhdXRvcmFpcyBlIGNvbmV4b3MgYWZldGFkb3MgcGVsYSBhc3NpbmF0dXJhIGRvIHByZXNlbnRlIHRlcm1vIGRlIGxpY2VuY2lhbWVudG8sIGRlIG1vZG8gYSBlZmV0aXZhbWVudGUgaXNlbnRhciBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFPDo28gUGF1bG8gKFVOSUZFU1ApIGUgc2V1cyBmdW5jaW9uw6FyaW9zIGRlIHF1YWxxdWVyIHJlc3BvbnNhYmlsaWRhZGUgcGVsbyB1c28gbsOjby1hdXRvcml6YWRvIGRvIG1hdGVyaWFsIGRlcG9zaXRhZG8sIHNlamEgZW0gdmluY3VsYcOnw6NvIGFvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIFVOSUZFU1AsIHNlamEgZW0gdmluY3VsYcOnw6NvIGEgcXVhaXNxdWVyIHNlcnZpw6dvcyBkZSBidXNjYSBlIGRlIGRpc3RyaWJ1acOnw6NvIGRlIGNvbnRlw7pkbyBxdWUgZmHDp2FtIHVzbyBkYXMgaW50ZXJmYWNlcyBlIGVzcGHDp28gZGUgYXJtYXplbmFtZW50byBwcm92aWRlbmNpYWRvcyBwZWxhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFPDo28gUGF1bG8gKFVOSUZFU1ApIHBvciBtZWlvIGRlIHNldXMgc2lzdGVtYXMgaW5mb3JtYXRpemFkb3MuCgoyLiBBIGNvbmNvcmTDom5jaWEgY29tIGVzdGEgbGljZW7Dp2EgdGVtIGNvbW8gY29uc2VxdcOqbmNpYSBhIHRyYW5zZmVyw6puY2lhLCBhIHTDrXR1bG8gbsOjby1leGNsdXNpdm8gZSBuw6NvLW9uZXJvc28sIGlzZW50YSBkbyBwYWdhbWVudG8gZGUgcm95YWx0aWVzIG91IHF1YWxxdWVyIG91dHJhIGNvbnRyYXByZXN0YcOnw6NvLCBwZWN1bmnDoXJpYSBvdSBuw6NvLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTw6NvIFBhdWxvIChVTklGRVNQKSBkb3MgZGlyZWl0b3MgZGUgYXJtYXplbmFyIGRpZ2l0YWxtZW50ZSwgZGUgcmVwcm9kdXppciBlIGRlIGRpc3RyaWJ1aXIgbmFjaW9uYWwgZSBpbnRlcm5hY2lvbmFsbWVudGUgYSBPYnJhLCBpbmNsdWluZG8tc2UgbyBzZXUgcmVzdW1vL2Fic3RyYWN0LCBwb3IgbWVpb3MgZWxldHLDtG5pY29zIGFvIHDDumJsaWNvIGVtIGdlcmFsLCBlbSByZWdpbWUgZGUgYWNlc3NvIGFiZXJ0by4KCjMuIEEgcHJlc2VudGUgbGljZW7Dp2EgdGFtYsOpbSBhYnJhbmdlLCBub3MgbWVzbW9zIHRlcm1vcyBlc3RhYmVsZWNpZG9zIG5vIGl0ZW0gMiwgc3VwcmEsIHF1YWxxdWVyIGRpcmVpdG8gZGUgY29tdW5pY2HDp8OjbyBhbyBww7pibGljbyBjYWLDrXZlbCBlbSByZWxhw6fDo28gw6AgT2JyYSBvcmEgZGVwb3NpdGFkYSwgaW5jbHVpbmRvLXNlIG9zIHVzb3MgcmVmZXJlbnRlcyDDoCByZXByZXNlbnRhw6fDo28gcMO6YmxpY2EgZS9vdSBleGVjdcOnw6NvIHDDumJsaWNhLCBiZW0gY29tbyBxdWFscXVlciBvdXRyYSBtb2RhbGlkYWRlIGRlIGNvbXVuaWNhw6fDo28gYW8gcMO6YmxpY28gcXVlIGV4aXN0YSBvdSB2ZW5oYSBhIGV4aXN0aXIsIG5vcyB0ZXJtb3MgZG8gYXJ0aWdvIDY4IGUgc2VndWludGVzIGRhIExlaSA5LjYxMC85OCwgbmEgZXh0ZW5zw6NvIHF1ZSBmb3IgYXBsaWPDoXZlbCBhb3Mgc2VydmnDp29zIHByZXN0YWRvcyBhbyBww7pibGljbyBwZWxhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFPDo28gUGF1bG8gKFVOSUZFU1ApLgoKNC4gRXN0YSBsaWNlbsOnYSBhYnJhbmdlLCBhaW5kYSwgbm9zIG1lc21vcyB0ZXJtb3MgZXN0YWJlbGVjaWRvcyBubyBpdGVtIDIsIHN1cHJhLCB0b2RvcyBvcyBkaXJlaXRvcyBjb25leG9zIGRlIGFydGlzdGFzIGludMOpcnByZXRlcyBvdSBleGVjdXRhbnRlcywgcHJvZHV0b3JlcyBmb25vZ3LDoWZpY29zIG91IGVtcHJlc2FzIGRlIHJhZGlvZGlmdXPDo28gcXVlIGV2ZW50dWFsbWVudGUgc2VqYW0gYXBsaWPDoXZlaXMgZW0gcmVsYcOnw6NvIMOgIG9icmEgZGVwb3NpdGFkYSwgZW0gY29uZm9ybWlkYWRlIGNvbSBvIHJlZ2ltZSBmaXhhZG8gbm8gVMOtdHVsbyBWIGRhIExlaSA5LjYxMC85OC4KCjUuIFNlIGEgT2JyYSBkZXBvc2l0YWRhIGZvaSBvdSDDqSBvYmpldG8gZGUgZmluYW5jaWFtZW50byBwb3IgaW5zdGl0dWnDp8O1ZXMgZGUgZm9tZW50byDDoCBwZXNxdWlzYSBvdSBxdWFscXVlciBvdXRyYSBzZW1lbGhhbnRlLCB2b2PDqiBvdSBvIHRpdHVsYXIgYXNzZWd1cmEgcXVlIGN1bXByaXUgdG9kYXMgYXMgb2JyaWdhw6fDtWVzIHF1ZSBsaGUgZm9yYW0gaW1wb3N0YXMgcGVsYSBpbnN0aXR1acOnw6NvIGZpbmFuY2lhZG9yYSBlbSByYXrDo28gZG8gZmluYW5jaWFtZW50bywgZSBxdWUgbsOjbyBlc3TDoSBjb250cmFyaWFuZG8gcXVhbHF1ZXIgZGlzcG9zacOnw6NvIGNvbnRyYXR1YWwgcmVmZXJlbnRlIMOgIHB1YmxpY2HDp8OjbyBkbyBjb250ZcO6ZG8gb3JhIHN1Ym1ldGlkbyBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBVTklGRVNQLgogCjYuIEF1dG9yaXphIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU8OjbyBQYXVsbyBhIGRpc3BvbmliaWxpemFyIGEgb2JyYSBubyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBVTklGRVNQIGRlIGZvcm1hIGdyYXR1aXRhLCBkZSBhY29yZG8gY29tIGEgbGljZW7Dp2EgcMO6YmxpY2EgQ3JlYXRpdmUgQ29tbW9uczogQXRyaWJ1acOnw6NvLVNlbSBEZXJpdmHDp8O1ZXMtU2VtIERlcml2YWRvcyA0LjAgSW50ZXJuYWNpb25hbCAoQ0MgQlktTkMtTkQpLCBwZXJtaXRpbmRvIHNldSBsaXZyZSBhY2Vzc28sIHVzbyBlIGNvbXBhcnRpbGhhbWVudG8sIGRlc2RlIHF1ZSBjaXRhZGEgYSBmb250ZS4gQSBvYnJhIGNvbnRpbnVhIHByb3RlZ2lkYSBwb3IgRGlyZWl0b3MgQXV0b3JhaXMgZS9vdSBwb3Igb3V0cmFzIGxlaXMgYXBsaWPDoXZlaXMuIFF1YWxxdWVyIHVzbyBkYSBvYnJhLCBxdWUgbsOjbyBvIGF1dG9yaXphZG8gc29iIGVzdGEgbGljZW7Dp2Egb3UgcGVsYSBsZWdpc2xhw6fDo28gYXV0b3JhbCwgw6kgcHJvaWJpZG8uICAKCjcuIEF0ZXN0YSBxdWUgYSBPYnJhIHN1Ym1ldGlkYSBuw6NvIGNvbnTDqW0gcXVhbHF1ZXIgaW5mb3JtYcOnw6NvIGNvbmZpZGVuY2lhbCBzdWEgb3UgZGUgdGVyY2Vpcm9zLgoKOC4gQXRlc3RhIHF1ZSBvIHRyYWJhbGhvIHN1Ym1ldGlkbyDDqSBvcmlnaW5hbCBlIGZvaSBlbGFib3JhZG8gcmVzcGVpdGFuZG8gb3MgcHJpbmPDrXBpb3MgZGEgbW9yYWwgZSBkYSDDqXRpY2EgZSBuw6NvIHZpb2xvdSBxdWFscXVlciBkaXJlaXRvIGRlIHByb3ByaWVkYWRlIGludGVsZWN0dWFsLCBzb2IgcGVuYSBkZSByZXNwb25kZXIgY2l2aWwsIGNyaW1pbmFsLCDDqXRpY2EgZSBwcm9maXNzaW9uYWxtZW50ZSBwb3IgbWV1cyBhdG9zOwoKOS4gQXRlc3RhIHF1ZSBhIHZlcnPDo28gZG8gdHJhYmFsaG8gcHJlc2VudGUgbm8gYXJxdWl2byBzdWJtZXRpZG8gw6kgYSB2ZXJzw6NvIGRlZmluaXRpdmEgcXVlIGluY2x1aSBhcyBhbHRlcmHDp8O1ZXMgZGVjb3JyZW50ZXMgZGEgZGVmZXNhLCBzb2xpY2l0YWRhcyBwZWxhIGJhbmNhLCBzZSBob3V2ZSBhbGd1bWEsIG91IHNvbGljaXRhZGFzIHBvciBwYXJ0ZSBkZSBvcmllbnRhw6fDo28gZG9jZW50ZSByZXNwb25zw6F2ZWw7CgoxMC4gQ29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTw6NvIFBhdWxvIChVTklGRVNQKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgcmVhbGl6YXIgcXVhaXNxdWVyIGFsdGVyYcOnw7VlcyBuYSBtw61kaWEgb3Ugbm8gZm9ybWF0byBkbyBhcnF1aXZvIHBhcmEgcHJvcMOzc2l0b3MgZGUgcHJlc2VydmHDp8OjbyBkaWdpdGFsLCBkZSBhY2Vzc2liaWxpZGFkZSBlIGRlIG1lbGhvciBpZGVudGlmaWNhw6fDo28gZG8gdHJhYmFsaG8gc3VibWV0aWRvLCBkZXNkZSBxdWUgbsOjbyBzZWphIGFsdGVyYWRvIHNldSBjb250ZcO6ZG8gaW50ZWxlY3R1YWwuCgpBbyBjb25jbHVpciBhcyBldGFwYXMgZG8gcHJvY2Vzc28gZGUgc3VibWlzc8OjbyBkZSBhcnF1aXZvcyBubyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBVTklGRVNQLCBhdGVzdG8gcXVlIGxpIGUgY29uY29yZGVpIGludGVncmFsbWVudGUgY29tIG9zIHRlcm1vcyBhY2ltYSBkZWxpbWl0YWRvcywgc2VtIGZhemVyIHF1YWxxdWVyIHJlc2VydmEgZSBub3ZhbWVudGUgY29uZmlybWFuZG8gcXVlIGN1bXBybyBvcyByZXF1aXNpdG9zIGluZGljYWRvcyBub3MgaXRlbnMgbWVuY2lvbmFkb3MgYW50ZXJpb3JtZW50ZS4KCkhhdmVuZG8gcXVhbHF1ZXIgZGlzY29yZMOibmNpYSBlbSByZWxhw6fDo28gYSBwcmVzZW50ZSBsaWNlbsOnYSBvdSBuw6NvIHNlIHZlcmlmaWNhbmRvIG8gZXhpZ2lkbyBub3MgaXRlbnMgYW50ZXJpb3Jlcywgdm9jw6ogZGV2ZSBpbnRlcnJvbXBlciBpbWVkaWF0YW1lbnRlIG8gcHJvY2Vzc28gZGUgc3VibWlzc8Ojby4gQSBjb250aW51aWRhZGUgZG8gcHJvY2Vzc28gZXF1aXZhbGUgw6AgY29uY29yZMOibmNpYSBlIMOgIGFzc2luYXR1cmEgZGVzdGUgZG9jdW1lbnRvLCBjb20gdG9kYXMgYXMgY29uc2VxdcOqbmNpYXMgbmVsZSBwcmV2aXN0YXMsIHN1amVpdGFuZG8tc2UgbyBzaWduYXTDoXJpbyBhIHNhbsOnw7VlcyBjaXZpcyBlIGNyaW1pbmFpcyBjYXNvIG7Do28gc2VqYSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXRyaW1vbmlhaXMgZS9vdSBjb25leG9zIGFwbGljw6F2ZWlzIMOgIE9icmEgZGVwb3NpdGFkYSBkdXJhbnRlIGVzdGUgcHJvY2Vzc28sIG91IGNhc28gbsOjbyB0ZW5oYSBvYnRpZG8gcHLDqXZpYSBlIGV4cHJlc3NhIGF1dG9yaXphw6fDo28gZG8gdGl0dWxhciBwYXJhIG8gZGVww7NzaXRvIGUgdG9kb3Mgb3MgdXNvcyBkYSBPYnJhIGVudm9sdmlkb3MuCgpTZSB0aXZlciBxdWFscXVlciBkw7p2aWRhIHF1YW50byBhb3MgdGVybW9zIGRlIGxpY2VuY2lhbWVudG8gZSBxdWFudG8gYW8gcHJvY2Vzc28gZGUgc3VibWlzc8OjbywgZW50cmUgZW0gY29udGF0byBjb20gYSBiaWJsaW90ZWNhIGRvIHNldSBjYW1wdXMgKGNvbnN1bHRlIGVtOiBodHRwczovL2JpYmxpb3RlY2FzLnVuaWZlc3AuYnIvYmlibGlvdGVjYXMtZGEtcmVkZSkuIAoK |
| dc.title.none.fl_str_mv |
Semantic description and internal validation of clusters for applications in categorical data sets |
| title |
Semantic description and internal validation of clusters for applications in categorical data sets |
| spellingShingle |
Semantic description and internal validation of clusters for applications in categorical data sets Aquino, Roberto Douglas Guimarães de [UNIFESP] cluster analysis semantic description internal clustering validation index synthetic data |
| title_short |
Semantic description and internal validation of clusters for applications in categorical data sets |
| title_full |
Semantic description and internal validation of clusters for applications in categorical data sets |
| title_fullStr |
Semantic description and internal validation of clusters for applications in categorical data sets |
| title_full_unstemmed |
Semantic description and internal validation of clusters for applications in categorical data sets |
| title_sort |
Semantic description and internal validation of clusters for applications in categorical data sets |
| author |
Aquino, Roberto Douglas Guimarães de [UNIFESP] |
| author_facet |
Aquino, Roberto Douglas Guimarães de [UNIFESP] |
| author_role |
author |
| dc.contributor.advisor-coLattes.none.fl_str_mv |
http://lattes.cnpq.br/0145582312635382 |
| dc.contributor.advisorLattes.none.fl_str_mv |
http://lattes.cnpq.br/1785341067396776 |
| dc.contributor.authorLattes.none.fl_str_mv |
http://lattes.cnpq.br/2373005809061037 |
| dc.contributor.author.fl_str_mv |
Aquino, Roberto Douglas Guimarães de [UNIFESP] |
| dc.contributor.advisor1.fl_str_mv |
Curtis, Vitor Venceslau |
| dc.contributor.advisor-co1.fl_str_mv |
Verri, Filipe Alves Neto |
| contributor_str_mv |
Curtis, Vitor Venceslau Verri, Filipe Alves Neto |
| dc.subject.por.fl_str_mv |
cluster analysis semantic description internal clustering validation index synthetic data |
| topic |
cluster analysis semantic description internal clustering validation index synthetic data |
| description |
In clustering problems whose objective is not based specifically on spatial proximity but rather on feature patterns, traditional cluster validation indices may not be appropriate. This work proposes a tool that performs the description of clusters and can be used as an internal validation index to suggest the most appropriate number of clusters for applications in categorical data sets. To evaluate our index, we also propose a categorical synthetic data generator specifically designed for this application. We tested synthetic and real data sets with different configurations to evaluate the performance of the proposed index in comparison with well-known indexes in the literature. Thus, we demonstrate that the index has great potential to describe clusters and discover the number of most suitable clusters. The synthetic data generator is capable of producing relevant data sets for the internal validation process. |
| publishDate |
2024 |
| dc.date.accessioned.fl_str_mv |
2024-07-23T10:53:01Z |
| dc.date.available.fl_str_mv |
2024-07-23T10:53:01Z |
| dc.date.issued.fl_str_mv |
2024-06-19 |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/11600/71444 |
| dc.identifier.dark.fl_str_mv |
ark:/48912/001300001jjpk |
| url |
https://hdl.handle.net/11600/71444 |
| identifier_str_mv |
ark:/48912/001300001jjpk |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
76 f. |
| dc.coverage.spatial.none.fl_str_mv |
Instituto Tecnológico de Aeronáutica |
| dc.publisher.none.fl_str_mv |
Universidade Federal de São Paulo |
| publisher.none.fl_str_mv |
Universidade Federal de São Paulo |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UNIFESP instname:Universidade Federal de São Paulo (UNIFESP) instacron:UNIFESP |
| instname_str |
Universidade Federal de São Paulo (UNIFESP) |
| instacron_str |
UNIFESP |
| institution |
UNIFESP |
| reponame_str |
Repositório Institucional da UNIFESP |
| collection |
Repositório Institucional da UNIFESP |
| bitstream.url.fl_str_mv |
https://repositorio.unifesp.br/bitstreams/4d898c10-366f-4698-a39b-09dbe82e40a6/download https://repositorio.unifesp.br/bitstreams/a78d9ebe-e6cb-4104-974b-2a6c600e492f/download https://repositorio.unifesp.br/bitstreams/d24ca402-4e49-45c9-955a-bfe72b2fcb92/download https://repositorio.unifesp.br/bitstreams/ab76a045-eb2e-4cdd-a038-ef28d8c4afc8/download |
| bitstream.checksum.fl_str_mv |
859ba7aac438f424e54bd364c2aecf3c 4ba0a148142553cc9902c8f1a476f5b1 882c7c94c369cea11f5d0ea39a33059e e128fa76232876ba0cae62fe38b119b4 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP) |
| repository.mail.fl_str_mv |
biblioteca.csp@unifesp.br |
| _version_ |
1866180447575212032 |