Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: OLIVEIRA, Marcos de Souza
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso embargado
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/62451
Resumo: With the advancement of information technology, data volume is rapidly increasing, po- sing significant challenges for storage and processing. This growth occurs both in the number of samples and in the number of features, making initial exploratory small data analysis crucial to reducing computational demands and improving data quality for ma- chine learning (ML) training. However, simply reducing the number of samples can in- tensify the “curse of dimensionality,” complicating analysis when a small dataset contains many features. Dimensionality reduction techniques are therefore essential for enabling more efficient and interpretable analyses. Unlike methods such as PCA, which transform the original data, unsupervised feature selection techniques identify the most relevant va- riables without requiring labels, enhancing the interpretability of natural data patterns. However, patterns may emerge only within specific feature subsets, known as subspaces. In some cases, the original features may not be sufficient, requiring the generation of new ones to identify these subspaces. This research explores two strategies for handling high- dimensional data with few samples: (i) a novel unsupervised feature selection method and (ii) a clustering approach based on subspaces. Experiments on real and synthetic datasets showed that the proposed methods outperform state-of-the-art approaches, as evidenced by clustering evaluation metrics and statistical tests.
id UFPE_e3aed5f6351bbb0b5c020687f4596456
oai_identifier_str oai:repositorio.ufpe.br:123456789/62451
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str
spelling Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster AnalysisSmall Data AnalysisUnsupervised feature selectionSubspace clustering.With the advancement of information technology, data volume is rapidly increasing, po- sing significant challenges for storage and processing. This growth occurs both in the number of samples and in the number of features, making initial exploratory small data analysis crucial to reducing computational demands and improving data quality for ma- chine learning (ML) training. However, simply reducing the number of samples can in- tensify the “curse of dimensionality,” complicating analysis when a small dataset contains many features. Dimensionality reduction techniques are therefore essential for enabling more efficient and interpretable analyses. Unlike methods such as PCA, which transform the original data, unsupervised feature selection techniques identify the most relevant va- riables without requiring labels, enhancing the interpretability of natural data patterns. However, patterns may emerge only within specific feature subsets, known as subspaces. In some cases, the original features may not be sufficient, requiring the generation of new ones to identify these subspaces. This research explores two strategies for handling high- dimensional data with few samples: (i) a novel unsupervised feature selection method and (ii) a clustering approach based on subspaces. Experiments on real and synthetic datasets showed that the proposed methods outperform state-of-the-art approaches, as evidenced by clustering evaluation metrics and statistical tests.Com o avanço das tecnologias da informação, o volume de dados cresce rapidamente, au- mentando os desafios de armazenamento e processamento. Esse crescimento ocorre tanto no número de exemplos quanto na quantidade de características, tornando essencial a análise exploratória inicial em small data para reduzir a carga computacional e melhorar a qualidade dos dados no treinamento de algoritmos de aprendizado de máquina (AM). No entanto, a simples redução de exemplos pode acentuar a “maldição da dimensionali- dade”, dificultando a análise quando há um número limitado de exemplos descritos por muitas características. Técnicas de redução de dimensionalidade tornam-se, assim, essen- ciais para viabilizar análises mais eficientes e interpretáveis. Diferente de métodos como PCA, que transformam os dados originais, abordagens não supervisionadas de seleção de características identificam as variáveis mais relevantes sem necessidade de rótulos, favo- recendo a interpretabilidade dos padrões naturais dos dados. Entretanto, padrões podem emergir apenas em subconjuntos específicos de características, os chamados subespaços. Em alguns casos, as características originais podem não ser suficientes, exigindo a gera- ção de novas para identificar esses subespaços. Diante disso, esta pesquisa propõe duas estratégias para lidar com dados de alta dimensionalidade e poucos exemplos: (i) um novo método não supervisionado de seleção de características e (ii) um modelo de agru- pamento baseado em subespaços. Experimentos em conjuntos de dados reais e sintéticos demonstraram que os métodos propostos superam abordagens do estado da arte, conforme evidenciado por métricas de análise de cluster e testes estatísticos.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em Ciencia da ComputacaoQUEIROZ, Sergio Ricardo de MeloCARVALHO, Francisco de Assis Tenório dehttp://lattes.cnpq.br/6137784444858483http://lattes.cnpq.br/9263224550858823http://lattes.cnpq.br/3909162572623711OLIVEIRA, Marcos de Souza2025-04-22T17:43:45Z2025-04-22T17:43:45Z2025-02-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfOLIVEIRA, Marcos de Souza. Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.https://repositorio.ufpe.br/handle/123456789/62451enghttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/embargoedAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2025-04-23T05:27:55Zoai:repositorio.ufpe.br:123456789/62451Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212025-04-23T05:27:55Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.none.fl_str_mv Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
title Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
spellingShingle Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
OLIVEIRA, Marcos de Souza
Small Data Analysis
Unsupervised feature selection
Subspace clustering.
title_short Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
title_full Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
title_fullStr Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
title_full_unstemmed Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
title_sort Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis
author OLIVEIRA, Marcos de Souza
author_facet OLIVEIRA, Marcos de Souza
author_role author
dc.contributor.none.fl_str_mv QUEIROZ, Sergio Ricardo de Melo
CARVALHO, Francisco de Assis Tenório de
http://lattes.cnpq.br/6137784444858483
http://lattes.cnpq.br/9263224550858823
http://lattes.cnpq.br/3909162572623711
dc.contributor.author.fl_str_mv OLIVEIRA, Marcos de Souza
dc.subject.por.fl_str_mv Small Data Analysis
Unsupervised feature selection
Subspace clustering.
topic Small Data Analysis
Unsupervised feature selection
Subspace clustering.
description With the advancement of information technology, data volume is rapidly increasing, po- sing significant challenges for storage and processing. This growth occurs both in the number of samples and in the number of features, making initial exploratory small data analysis crucial to reducing computational demands and improving data quality for ma- chine learning (ML) training. However, simply reducing the number of samples can in- tensify the “curse of dimensionality,” complicating analysis when a small dataset contains many features. Dimensionality reduction techniques are therefore essential for enabling more efficient and interpretable analyses. Unlike methods such as PCA, which transform the original data, unsupervised feature selection techniques identify the most relevant va- riables without requiring labels, enhancing the interpretability of natural data patterns. However, patterns may emerge only within specific feature subsets, known as subspaces. In some cases, the original features may not be sufficient, requiring the generation of new ones to identify these subspaces. This research explores two strategies for handling high- dimensional data with few samples: (i) a novel unsupervised feature selection method and (ii) a clustering approach based on subspaces. Experiments on real and synthetic datasets showed that the proposed methods outperform state-of-the-art approaches, as evidenced by clustering evaluation metrics and statistical tests.
publishDate 2025
dc.date.none.fl_str_mv 2025-04-22T17:43:45Z
2025-04-22T17:43:45Z
2025-02-03
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv OLIVEIRA, Marcos de Souza. Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.
https://repositorio.ufpe.br/handle/123456789/62451
identifier_str_mv OLIVEIRA, Marcos de Souza. Unsupervised Feature Selection and Deep Subspace Clustering for Exploratory High-Dimensional Cluster Analysis. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.
url https://repositorio.ufpe.br/handle/123456789/62451
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/embargoedAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1856041925040144384