Music representation learning based on heterogeneous graph

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Silva, Angelo Cesar Mendes da
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/
Resumo: Music has been present in different historical moments since the formation of society, accompanying survival tasks and leisure tasks. Nowadays, the emergence and popularization of media with resources for storing and reproducing music has further introduced the presence of music in peoples daily lives. In addition to artistic content, songs also began to generate volumes of data and new interested markets. In this sense, several data mining methods have been proposed in recent decades to extract information to support decision-making. Defining a representation for such data is essential for its use in data mining algorithms. Musical data is intrinsically multimodal and heterogeneous, so to represent it, a unified structure must be built that supports features with different semantic compositions and organized in different spaces. Some approaches that explore variations of feature fusion processes are proposed with this objective. Due to the multimodal and heterogeneous structure, representations based on fusion methods have restrictions in scenarios where features are absent and can be reduced to unimodal representations, reducing the diversity of musical content in their formation. Musical representations are introduced into algorithms that deal with tasks modeled as machine learning tasks that produce knowledge that supports decision-making. The challenges in defining musical representations are associated with the lack of information in the data due to access restrictions or incomplete modalities and the construction of a method that aggregates heterogeneous information into a unified space. This thesis focuses on developing data structures that support the natural composition of musical data and representation learning methods capable of dealing with tasks related to musical information retrieval. In particular, the contributions of this thesis are related to the tasks of automatic annotation, recognition of emotions in music, prediction of similarity between artists, and a multitask application. We summarize our contributions in: (i) a methodology for modeling musical data in heterogeneous networks; (ii) an algorithm based on information propagation to deal with the challenge of missing features in data; (iii) methods based on graph neural networks to deal with musical information retrieval tasks; (iv) analyzes related to the complementarity of information between multiple musical features and also related tasks.
id USP_cb54201213cc15356b340314437260f3
oai_identifier_str oai:teses.usp.br:tde-24012025-150432
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Music representation learning based on heterogeneous graphAprendizado de representação musical baseado em grafos heterogêneosAprendizado de representaçãoGraph neural networksHeterogeneous networkMusic information retrievalRecuperação de informação musicalRedes heterogêneasRedes neurais para grafosRepresentation learningMusic has been present in different historical moments since the formation of society, accompanying survival tasks and leisure tasks. Nowadays, the emergence and popularization of media with resources for storing and reproducing music has further introduced the presence of music in peoples daily lives. In addition to artistic content, songs also began to generate volumes of data and new interested markets. In this sense, several data mining methods have been proposed in recent decades to extract information to support decision-making. Defining a representation for such data is essential for its use in data mining algorithms. Musical data is intrinsically multimodal and heterogeneous, so to represent it, a unified structure must be built that supports features with different semantic compositions and organized in different spaces. Some approaches that explore variations of feature fusion processes are proposed with this objective. Due to the multimodal and heterogeneous structure, representations based on fusion methods have restrictions in scenarios where features are absent and can be reduced to unimodal representations, reducing the diversity of musical content in their formation. Musical representations are introduced into algorithms that deal with tasks modeled as machine learning tasks that produce knowledge that supports decision-making. The challenges in defining musical representations are associated with the lack of information in the data due to access restrictions or incomplete modalities and the construction of a method that aggregates heterogeneous information into a unified space. This thesis focuses on developing data structures that support the natural composition of musical data and representation learning methods capable of dealing with tasks related to musical information retrieval. In particular, the contributions of this thesis are related to the tasks of automatic annotation, recognition of emotions in music, prediction of similarity between artists, and a multitask application. We summarize our contributions in: (i) a methodology for modeling musical data in heterogeneous networks; (ii) an algorithm based on information propagation to deal with the challenge of missing features in data; (iii) methods based on graph neural networks to deal with musical information retrieval tasks; (iv) analyzes related to the complementarity of information between multiple musical features and also related tasks.A música esteve presente em diversos momentos históricos desde a formação da sociedade, acompanhando tarefas de sobrevivência à tarefas de lazer. Em momentos atuais, o surgimento e popularização de mídias com recursos para armazenamento e reprodução musical introduziram ainda mais a presença da música no cotidiano das pessoas. Além do conteúdo artístico, as músicas também começaram a gerar volumes de dados e novos mercados interessados. Neste sentido, diversos métodos de mineração de dados musicais foram propostos nas últimas décadas com objetivo de extrair informações que apoiem tomadas de decisões. Para que possam ser utilizados em algoritmos de mineração de dados, é essencial definir uma representação para tais dados. Dados musicais são intrinsecamente multimodais e heterogêneos, de forma que, para representá-los, deve-se construir uma estrutura unificada que suporte características com diferentes composições semânticas e dispostas em espaços distintos. Algumas abordagens foram propostas na literatura visando explorar variações de processos de fusões de características. No entanto, devido à formação multimodal e heterogênea, as abordagens existentes baseadas em métodos de fusão possuem restrições em cenários em que há falta de características e podem ser reduzidas a representações unimodais, reduzindo a diversidade de conteúdo musical em sua formação. Assim, os desafios em definir representações musicais estão associados à falta de informações nos dados, por restrições de acesso ou modalidades incompletas, e a construção de um método que agregue informações heterogêneas em um espaço unificado. Esta tese se concentra tanto no desenvolvimento de representações heterogêneas para dados musicais que suportem a composição natural de dados musicais quanto métodos de aprendizado de representação capazes de lidar com tarefas relacionadas à área de recuperação de informação musical. Nós investigamos o uso de redes heterogêneas para estruturar dados musicais e sua introdução em métodos de aprendizado de representação baseados em grafos. Em especial, as contribuições desta tese estão relacionadas com as tarefas de anotação automática, reconhecimento de emoções em músicas, predição de similaridade entre artistas e uma aplicação multi-tarefa. Em resumo, as contribuições podem ser sintetizadas em: (i) uma metodologia para modelagem de dados musicais em redes heterogêneas; (ii) um algoritmo baseado em propagação de informação para lidar com o desafio falta de características nos dados; (iii) métodos baseados em redes neurais para grafos para lidar com as tarefas de recuperação de informação musical; (iv) análises relacionadas à complementaridade de informação entre múltiplas características musicais e também tarefas relacionadas.Biblioteca Digitais de Teses e Dissertações da USPMarcacini, Ricardo MarcondesSilva, Diego FurtadoSilva, Angelo Cesar Mendes da2024-09-30info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-01-24T17:14:02Zoai:teses.usp.br:tde-24012025-150432Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-01-24T17:14:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Music representation learning based on heterogeneous graph
Aprendizado de representação musical baseado em grafos heterogêneos
title Music representation learning based on heterogeneous graph
spellingShingle Music representation learning based on heterogeneous graph
Silva, Angelo Cesar Mendes da
Aprendizado de representação
Graph neural networks
Heterogeneous network
Music information retrieval
Recuperação de informação musical
Redes heterogêneas
Redes neurais para grafos
Representation learning
title_short Music representation learning based on heterogeneous graph
title_full Music representation learning based on heterogeneous graph
title_fullStr Music representation learning based on heterogeneous graph
title_full_unstemmed Music representation learning based on heterogeneous graph
title_sort Music representation learning based on heterogeneous graph
author Silva, Angelo Cesar Mendes da
author_facet Silva, Angelo Cesar Mendes da
author_role author
dc.contributor.none.fl_str_mv Marcacini, Ricardo Marcondes
Silva, Diego Furtado
dc.contributor.author.fl_str_mv Silva, Angelo Cesar Mendes da
dc.subject.por.fl_str_mv Aprendizado de representação
Graph neural networks
Heterogeneous network
Music information retrieval
Recuperação de informação musical
Redes heterogêneas
Redes neurais para grafos
Representation learning
topic Aprendizado de representação
Graph neural networks
Heterogeneous network
Music information retrieval
Recuperação de informação musical
Redes heterogêneas
Redes neurais para grafos
Representation learning
description Music has been present in different historical moments since the formation of society, accompanying survival tasks and leisure tasks. Nowadays, the emergence and popularization of media with resources for storing and reproducing music has further introduced the presence of music in peoples daily lives. In addition to artistic content, songs also began to generate volumes of data and new interested markets. In this sense, several data mining methods have been proposed in recent decades to extract information to support decision-making. Defining a representation for such data is essential for its use in data mining algorithms. Musical data is intrinsically multimodal and heterogeneous, so to represent it, a unified structure must be built that supports features with different semantic compositions and organized in different spaces. Some approaches that explore variations of feature fusion processes are proposed with this objective. Due to the multimodal and heterogeneous structure, representations based on fusion methods have restrictions in scenarios where features are absent and can be reduced to unimodal representations, reducing the diversity of musical content in their formation. Musical representations are introduced into algorithms that deal with tasks modeled as machine learning tasks that produce knowledge that supports decision-making. The challenges in defining musical representations are associated with the lack of information in the data due to access restrictions or incomplete modalities and the construction of a method that aggregates heterogeneous information into a unified space. This thesis focuses on developing data structures that support the natural composition of musical data and representation learning methods capable of dealing with tasks related to musical information retrieval. In particular, the contributions of this thesis are related to the tasks of automatic annotation, recognition of emotions in music, prediction of similarity between artists, and a multitask application. We summarize our contributions in: (i) a methodology for modeling musical data in heterogeneous networks; (ii) an algorithm based on information propagation to deal with the challenge of missing features in data; (iii) methods based on graph neural networks to deal with musical information retrieval tasks; (iv) analyzes related to the complementarity of information between multiple musical features and also related tasks.
publishDate 2024
dc.date.none.fl_str_mv 2024-09-30
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/
url https://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1839839156186906624