Label operation for multi-label learning
Ano de defesa: | 2020 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | , , |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade de São Paulo
|
Programa de Pós-Graduação: |
Ciências da Computação e Matemática Computacional
|
Departamento: |
Não Informado pela instituição
|
País: |
BR
|
Link de acesso: | https://doi.org/10.11606/T.55.2020.tde-18082020-161950 |
Resumo: | Classification tasks in which instances are associated with multiple concepts are known as multilabel classification. They have attracted growing attention in the machine-learning community, given the high number of applications and multi-labeled data available nowadays. Consequently, many strategies have been proposed exploring different particularities, such as label imbalance, dimensionality reduction and labels dependence. Despite that, some aspects that may affect strategies as a whole have been overlooked. For instance, some strategies transform the original multi-labeled data into single-labeled data upon which a base algorithm can be applied. However, the impact of choosing a specific base algorithm against another is unknown and usually ignored. Moreover, it was observed that many labels are never correctly predicted regardless of the strategies used. So far, very little attention has been paid to theses issues, which may produce misleading results. Therefore, this thesis aims to investigate the multi-label strategies covering these particularities. For such, an extensive comparative study is performed focusing on the influence of the base algorithms on the results. Moreover, label operation is proposed as an optimization procedure able to reduce the number of labels never predicted. Through an empirical methodology, label expansion and reduction enhanced different evaluation measures, mitigating the label prediction problem, although it was not completely removed. Additionally, metalearning is used to reduce the complexity of the operations and to provide some understanding concerning the studied issue. Considering this, characterization measures for meta-learning were systematically investigated, which resulted in a new taxonomy to organize them. In summary, the findings and contributions presented here are relevant to the multi-label and meta-learning research fields. They potentially have an impact on the methodology, and raise open new questions concerning unnoticed aspects of these areas. |
id |
USP_8bd43376e9c6386dccda8410520bf21f |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-18082020-161950 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
|
spelling |
info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesis Label operation for multi-label learning Operação de rótulo para o aprendizado multirrótulo 2020-04-09André Carlos Ponce de Leon Ferreira de CarvalhoAlexandre Plastino de CarvalhoRicardo Bastos Cavalcante PrudêncioSolange Oliveira RezendeAdriano Rivolli da SilvaUniversidade de São PauloCiências da Computação e Matemática ComputacionalUSPBR Label operation Meta-características Meta-features Meta-learning Multi-label Multirrótulo Operação com rótulos Problem transformation Transformação de problema, Meta-aprendizado Classification tasks in which instances are associated with multiple concepts are known as multilabel classification. They have attracted growing attention in the machine-learning community, given the high number of applications and multi-labeled data available nowadays. Consequently, many strategies have been proposed exploring different particularities, such as label imbalance, dimensionality reduction and labels dependence. Despite that, some aspects that may affect strategies as a whole have been overlooked. For instance, some strategies transform the original multi-labeled data into single-labeled data upon which a base algorithm can be applied. However, the impact of choosing a specific base algorithm against another is unknown and usually ignored. Moreover, it was observed that many labels are never correctly predicted regardless of the strategies used. So far, very little attention has been paid to theses issues, which may produce misleading results. Therefore, this thesis aims to investigate the multi-label strategies covering these particularities. For such, an extensive comparative study is performed focusing on the influence of the base algorithms on the results. Moreover, label operation is proposed as an optimization procedure able to reduce the number of labels never predicted. Through an empirical methodology, label expansion and reduction enhanced different evaluation measures, mitigating the label prediction problem, although it was not completely removed. Additionally, metalearning is used to reduce the complexity of the operations and to provide some understanding concerning the studied issue. Considering this, characterization measures for meta-learning were systematically investigated, which resulted in a new taxonomy to organize them. In summary, the findings and contributions presented here are relevant to the multi-label and meta-learning research fields. They potentially have an impact on the methodology, and raise open new questions concerning unnoticed aspects of these areas. Tarefas de classificação nas quais instâncias são associadas com múltiplos conceitos são conhecidas como classificação multirrótulo e devido ao alto número de aplicações e dados multirrótulos disponíveis atualmente, é grande o interesse deste assunto pela comunidade de aprendizado de máquina. Consequentemente, têm sido propostas muitas estratégias explorando diferentes particularidades desse tipo de tarefa como o desbalanceamento dos rótulos, redução de dimensionalidade e a dependência dos rótulos. No entanto, alguns aspectos que podem afetar tais estratégias são negligenciados, como as que transformam os dados multirótulos em dados monorótulos e utilizam um algoritmo base para resolver as subtarefas geradas. O impacto de se escolher um algoritmo específico em detrimento de outro é desconhecido e normalmente ignorado, assim como foi observado que muitos rótulos nunca são corretamente preditos, independentemente da estratégia utilizada. Estas questões não têm recebido a devida atenção, mesmo podendo produzir resultados enganosos, portanto, esta pesquisa tem por objetivo investigar as estratégias multirrótulos explorando essas particularidades. Para tanto, um extensivo estudo comparativo foi realizado, cujo foco é analisar a influência do algoritmo base nos resultados. Além disso, a operação de rótulo é proposta como uma estratégia de otimização capaz de reduzir o número de rótulos incorretamente preditos. Foi constatada, por meio de uma metodologia empírica, que as operações de expansão e redução dos rótulos melhoraram diferentes medidas de avaliação e reduziram o problema dos rótulos não preditos, embora não completamente. O metaaprendizado foi também investigado como forma de reduzir a complexidade das operações e prover algum entendimento sobre as questões estudadas. Com isso, as medidas de caracterização para meta-aprendizado foram sistematicamente investigadas, resultando em uma nova taxonomia para organizá-las. Desse modo, as desc obertas e contribuições apresentadas aqui são relevantes, principalmente, para a área de pesquisa em aprendizado multirrótulo e meta-aprendizado, assim como levantam novas questões relacionadas a aspectos despercebidos de tais áreas. A presente tese também tem potencial impacto na metodologia experimental desse tipo de pesquisa. https://doi.org/10.11606/T.55.2020.tde-18082020-161950info:eu-repo/semantics/openAccessengreponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USP2023-12-21T19:19:08Zoai:teses.usp.br:tde-18082020-161950Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212020-08-18T22:28:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.en.fl_str_mv |
Label operation for multi-label learning |
dc.title.alternative.pt.fl_str_mv |
Operação de rótulo para o aprendizado multirrótulo |
title |
Label operation for multi-label learning |
spellingShingle |
Label operation for multi-label learning Adriano Rivolli da Silva |
title_short |
Label operation for multi-label learning |
title_full |
Label operation for multi-label learning |
title_fullStr |
Label operation for multi-label learning |
title_full_unstemmed |
Label operation for multi-label learning |
title_sort |
Label operation for multi-label learning |
author |
Adriano Rivolli da Silva |
author_facet |
Adriano Rivolli da Silva |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
André Carlos Ponce de Leon Ferreira de Carvalho |
dc.contributor.referee1.fl_str_mv |
Alexandre Plastino de Carvalho |
dc.contributor.referee2.fl_str_mv |
Ricardo Bastos Cavalcante Prudêncio |
dc.contributor.referee3.fl_str_mv |
Solange Oliveira Rezende |
dc.contributor.author.fl_str_mv |
Adriano Rivolli da Silva |
contributor_str_mv |
André Carlos Ponce de Leon Ferreira de Carvalho Alexandre Plastino de Carvalho Ricardo Bastos Cavalcante Prudêncio Solange Oliveira Rezende |
description |
Classification tasks in which instances are associated with multiple concepts are known as multilabel classification. They have attracted growing attention in the machine-learning community, given the high number of applications and multi-labeled data available nowadays. Consequently, many strategies have been proposed exploring different particularities, such as label imbalance, dimensionality reduction and labels dependence. Despite that, some aspects that may affect strategies as a whole have been overlooked. For instance, some strategies transform the original multi-labeled data into single-labeled data upon which a base algorithm can be applied. However, the impact of choosing a specific base algorithm against another is unknown and usually ignored. Moreover, it was observed that many labels are never correctly predicted regardless of the strategies used. So far, very little attention has been paid to theses issues, which may produce misleading results. Therefore, this thesis aims to investigate the multi-label strategies covering these particularities. For such, an extensive comparative study is performed focusing on the influence of the base algorithms on the results. Moreover, label operation is proposed as an optimization procedure able to reduce the number of labels never predicted. Through an empirical methodology, label expansion and reduction enhanced different evaluation measures, mitigating the label prediction problem, although it was not completely removed. Additionally, metalearning is used to reduce the complexity of the operations and to provide some understanding concerning the studied issue. Considering this, characterization measures for meta-learning were systematically investigated, which resulted in a new taxonomy to organize them. In summary, the findings and contributions presented here are relevant to the multi-label and meta-learning research fields. They potentially have an impact on the methodology, and raise open new questions concerning unnoticed aspects of these areas. |
publishDate |
2020 |
dc.date.issued.fl_str_mv |
2020-04-09 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.11606/T.55.2020.tde-18082020-161950 |
url |
https://doi.org/10.11606/T.55.2020.tde-18082020-161950 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade de São Paulo |
dc.publisher.program.fl_str_mv |
Ciências da Computação e Matemática Computacional |
dc.publisher.initials.fl_str_mv |
USP |
dc.publisher.country.fl_str_mv |
BR |
publisher.none.fl_str_mv |
Universidade de São Paulo |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1786376928735789056 |