On Multi-Label Meta-Learning for automated pipeline recommendation

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: MAIA, Cynthia Moreira
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/67046
Resumo: Automated Machine Learning (AutoML) aims to automate stages of the machine learn ing process, such as algorithm selection, data preprocessing, and hyperparameter tuning. One of its main challenges is designing a search space that can handle different problems while ensuring the best trade-off between performance and computational cost. Traditional AutoML approaches primarily explore the search space online, utilizing optimization strategies such as Bayesian Optimization to identify the optimal configuration within a specified time budget. Although effective, such methods often result in high computational costs. In contrast, our proposal seeks to avoid online search strategies by employing meta-learning to address these challenges. This approach leverages the meta-features of problems to recommend solutions appropriate to their nature, thereby eliminating the need for exhaustive search at runtime. Accordingly, we propose MetaML, the first study of this thesis, a meta-learning approach based on multi-label algorithms for pipeline recommendation in AutoML. To this end, we present a curated search space design that automatically reduces the number of candidate pipelines, based on historical data from online repositories, including only the most frequently used pipelines with the best performance across a significant number of datasets. Additionally, we propose chained recommendations utilizing multi-label algorithms that take into account the interdependencies between pipeline stages. Experiments conducted on different datasets demonstrate the effectiveness of the approach, with MetaML achieving satisfactory results and, in some cases, superior outcomes at a lower computational cost compared to current AutoML methods. However, the pipelines derived from the repository experiments showed limited representativeness with respect to preprocessing techniques. As an alternative, we pro pose the PIPES meta-dataset, the second study of this thesis, which consists of a collection of experiments involving multiple pipelines, designed to represent all selected combinations of techniques, including different preprocessing blocks and a classification block. After con structing PIPES, we employed this meta-dataset in the third study of the thesis, MetaML 2.0, to investigate whether broader pipeline representativeness could yield even better results. The experiments demonstrated that this approach indeed achieved improved performance in specific datasets.
id UFPE_ffdb8226b7c9f9fb199cb7053363a171
oai_identifier_str oai:repositorio.ufpe.br:123456789/67046
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str
spelling On Multi-Label Meta-Learning for automated pipeline recommendationFluxosMeta-aprendizagemMultirrótuloAutomated Machine Learning (AutoML) aims to automate stages of the machine learn ing process, such as algorithm selection, data preprocessing, and hyperparameter tuning. One of its main challenges is designing a search space that can handle different problems while ensuring the best trade-off between performance and computational cost. Traditional AutoML approaches primarily explore the search space online, utilizing optimization strategies such as Bayesian Optimization to identify the optimal configuration within a specified time budget. Although effective, such methods often result in high computational costs. In contrast, our proposal seeks to avoid online search strategies by employing meta-learning to address these challenges. This approach leverages the meta-features of problems to recommend solutions appropriate to their nature, thereby eliminating the need for exhaustive search at runtime. Accordingly, we propose MetaML, the first study of this thesis, a meta-learning approach based on multi-label algorithms for pipeline recommendation in AutoML. To this end, we present a curated search space design that automatically reduces the number of candidate pipelines, based on historical data from online repositories, including only the most frequently used pipelines with the best performance across a significant number of datasets. Additionally, we propose chained recommendations utilizing multi-label algorithms that take into account the interdependencies between pipeline stages. Experiments conducted on different datasets demonstrate the effectiveness of the approach, with MetaML achieving satisfactory results and, in some cases, superior outcomes at a lower computational cost compared to current AutoML methods. However, the pipelines derived from the repository experiments showed limited representativeness with respect to preprocessing techniques. As an alternative, we pro pose the PIPES meta-dataset, the second study of this thesis, which consists of a collection of experiments involving multiple pipelines, designed to represent all selected combinations of techniques, including different preprocessing blocks and a classification block. After con structing PIPES, we employed this meta-dataset in the third study of the thesis, MetaML 2.0, to investigate whether broader pipeline representativeness could yield even better results. The experiments demonstrated that this approach indeed achieved improved performance in specific datasets.O Aprendizado de Máquina Automatizado (Automated Machine Learning- AutoML) visa automatizar etapas do processo de aprendizado de máquina, como seleção de algoritmos, pré-processamento e ajuste de hiperparâmetros. Um de seus principais desafios é projetar um espaço de busca que atenda a diferentes problemas, garantindo a melhor relação entre desempenho e custo computacional. As abordagens tradicionais de AutoML exploram principalmente o espaço de busca em tempo de execução (online), aplicando estratégias de otimização como a Otimização Bayesiana para encontrar a melhor configuração dentro de um prazo determinado. Embora eficazes, tais estratégias frequentemente resultam em altos custos computacionais. Em contraste, nossa proposta busca evitar estratégias de busca online empregando metaaprendizado para abordar tais desafios. Essa abordagem utiliza as meta-características dos problemas para recomendar soluções apropriadas à sua natureza, eliminando assim a necessidade de busca exaustiva em tempo de execução. Dessa forma, propomos o MetaML, primeiro estudo desta tese, uma abordagem de meta-aprendizado baseada em algoritmos multirrótulos para recomendação de pipelines em AutoML. Para tanto, apresentamos um projeto de espaço de busca com curadoria que reduz automaticamente o número de pipelines candidatos, com base em dados históricos de repositórios online, incluindo apenas os pipelines mais utilizados e com melhor desempenho em um número significativo de conjuntos de dados. Além disso, propomos recomendações encadeadas usando algoritmos multirrótulos que consideram as interdependências entre as etapas do pipeline. Experimentos em diferentes conjuntos de dados demonstram a eficácia da abordagem, com o MetaML alcançando resultados satisfatórios e, em alguns casos, resultados superiores a um custo computacional menor do que os métodos AutoML atuais. No entanto, os pipelines derivados dos experimentos do repositório online apresentaram pouca representatividade em relação ao uso de técnicas de pré-processamento. Como alternativa, propomos o meta-dataset PIPES, o segundo estudo da tese, que consiste em uma coleção de experimentos envolvendo múltiplos pipelines, projetados para representar todas as combinações selecionadas de técnicas incluindo diferentes blocos de pré-processamento e um bloco de classificação. Após a construção do PIPES, utilizamos este meta-dataset no terceiro estudo da tese, o MetaML 2.0, para verificar se é possível obter resultados ainda melhores com uma representatividade mais ampla dos pipelines. Os experimentos demonstraram que, de fato, a abordagem proporcionou desempenhos melhores em determinados conjuntos de dados.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em Ciencia da ComputacaoCAVALCANTI, George Darmiton da CunhaCRUZ, Rafael Menelau Oliveira ehttp://lattes.cnpq.br/7914454797013089http://lattes.cnpq.br/8577312109146354http://lattes.cnpq.br/1143656271684404MAIA, Cynthia Moreira2025-12-03T16:11:33Z2025-12-03T16:11:33Z2025-10-15info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfMAIA, Cynthia Moreira. On Multi-Label Meta-Learning for automated pipeline recommendation. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.https://repositorio.ufpe.br/handle/123456789/67046enghttps://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2025-12-07T19:35:02Zoai:repositorio.ufpe.br:123456789/67046Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212025-12-07T19:35:02Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.none.fl_str_mv On Multi-Label Meta-Learning for automated pipeline recommendation
title On Multi-Label Meta-Learning for automated pipeline recommendation
spellingShingle On Multi-Label Meta-Learning for automated pipeline recommendation
MAIA, Cynthia Moreira
Fluxos
Meta-aprendizagem
Multirrótulo
title_short On Multi-Label Meta-Learning for automated pipeline recommendation
title_full On Multi-Label Meta-Learning for automated pipeline recommendation
title_fullStr On Multi-Label Meta-Learning for automated pipeline recommendation
title_full_unstemmed On Multi-Label Meta-Learning for automated pipeline recommendation
title_sort On Multi-Label Meta-Learning for automated pipeline recommendation
author MAIA, Cynthia Moreira
author_facet MAIA, Cynthia Moreira
author_role author
dc.contributor.none.fl_str_mv CAVALCANTI, George Darmiton da Cunha
CRUZ, Rafael Menelau Oliveira e
http://lattes.cnpq.br/7914454797013089
http://lattes.cnpq.br/8577312109146354
http://lattes.cnpq.br/1143656271684404
dc.contributor.author.fl_str_mv MAIA, Cynthia Moreira
dc.subject.por.fl_str_mv Fluxos
Meta-aprendizagem
Multirrótulo
topic Fluxos
Meta-aprendizagem
Multirrótulo
description Automated Machine Learning (AutoML) aims to automate stages of the machine learn ing process, such as algorithm selection, data preprocessing, and hyperparameter tuning. One of its main challenges is designing a search space that can handle different problems while ensuring the best trade-off between performance and computational cost. Traditional AutoML approaches primarily explore the search space online, utilizing optimization strategies such as Bayesian Optimization to identify the optimal configuration within a specified time budget. Although effective, such methods often result in high computational costs. In contrast, our proposal seeks to avoid online search strategies by employing meta-learning to address these challenges. This approach leverages the meta-features of problems to recommend solutions appropriate to their nature, thereby eliminating the need for exhaustive search at runtime. Accordingly, we propose MetaML, the first study of this thesis, a meta-learning approach based on multi-label algorithms for pipeline recommendation in AutoML. To this end, we present a curated search space design that automatically reduces the number of candidate pipelines, based on historical data from online repositories, including only the most frequently used pipelines with the best performance across a significant number of datasets. Additionally, we propose chained recommendations utilizing multi-label algorithms that take into account the interdependencies between pipeline stages. Experiments conducted on different datasets demonstrate the effectiveness of the approach, with MetaML achieving satisfactory results and, in some cases, superior outcomes at a lower computational cost compared to current AutoML methods. However, the pipelines derived from the repository experiments showed limited representativeness with respect to preprocessing techniques. As an alternative, we pro pose the PIPES meta-dataset, the second study of this thesis, which consists of a collection of experiments involving multiple pipelines, designed to represent all selected combinations of techniques, including different preprocessing blocks and a classification block. After con structing PIPES, we employed this meta-dataset in the third study of the thesis, MetaML 2.0, to investigate whether broader pipeline representativeness could yield even better results. The experiments demonstrated that this approach indeed achieved improved performance in specific datasets.
publishDate 2025
dc.date.none.fl_str_mv 2025-12-03T16:11:33Z
2025-12-03T16:11:33Z
2025-10-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv MAIA, Cynthia Moreira. On Multi-Label Meta-Learning for automated pipeline recommendation. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
https://repositorio.ufpe.br/handle/123456789/67046
identifier_str_mv MAIA, Cynthia Moreira. On Multi-Label Meta-Learning for automated pipeline recommendation. 2025. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Pernambuco, Recife, 2025.
url https://repositorio.ufpe.br/handle/123456789/67046
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1856042092178964480