Music representation learning based on heterogeneous graph
| Ano de defesa: | 2024 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/ |
Resumo: | Music has been present in different historical moments since the formation of society, accompanying survival tasks and leisure tasks. Nowadays, the emergence and popularization of media with resources for storing and reproducing music has further introduced the presence of music in peoples daily lives. In addition to artistic content, songs also began to generate volumes of data and new interested markets. In this sense, several data mining methods have been proposed in recent decades to extract information to support decision-making. Defining a representation for such data is essential for its use in data mining algorithms. Musical data is intrinsically multimodal and heterogeneous, so to represent it, a unified structure must be built that supports features with different semantic compositions and organized in different spaces. Some approaches that explore variations of feature fusion processes are proposed with this objective. Due to the multimodal and heterogeneous structure, representations based on fusion methods have restrictions in scenarios where features are absent and can be reduced to unimodal representations, reducing the diversity of musical content in their formation. Musical representations are introduced into algorithms that deal with tasks modeled as machine learning tasks that produce knowledge that supports decision-making. The challenges in defining musical representations are associated with the lack of information in the data due to access restrictions or incomplete modalities and the construction of a method that aggregates heterogeneous information into a unified space. This thesis focuses on developing data structures that support the natural composition of musical data and representation learning methods capable of dealing with tasks related to musical information retrieval. In particular, the contributions of this thesis are related to the tasks of automatic annotation, recognition of emotions in music, prediction of similarity between artists, and a multitask application. We summarize our contributions in: (i) a methodology for modeling musical data in heterogeneous networks; (ii) an algorithm based on information propagation to deal with the challenge of missing features in data; (iii) methods based on graph neural networks to deal with musical information retrieval tasks; (iv) analyzes related to the complementarity of information between multiple musical features and also related tasks. |
| id |
USP_cb54201213cc15356b340314437260f3 |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-24012025-150432 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
|
| spelling |
Music representation learning based on heterogeneous graphAprendizado de representação musical baseado em grafos heterogêneosAprendizado de representaçãoGraph neural networksHeterogeneous networkMusic information retrievalRecuperação de informação musicalRedes heterogêneasRedes neurais para grafosRepresentation learningMusic has been present in different historical moments since the formation of society, accompanying survival tasks and leisure tasks. Nowadays, the emergence and popularization of media with resources for storing and reproducing music has further introduced the presence of music in peoples daily lives. In addition to artistic content, songs also began to generate volumes of data and new interested markets. In this sense, several data mining methods have been proposed in recent decades to extract information to support decision-making. Defining a representation for such data is essential for its use in data mining algorithms. Musical data is intrinsically multimodal and heterogeneous, so to represent it, a unified structure must be built that supports features with different semantic compositions and organized in different spaces. Some approaches that explore variations of feature fusion processes are proposed with this objective. Due to the multimodal and heterogeneous structure, representations based on fusion methods have restrictions in scenarios where features are absent and can be reduced to unimodal representations, reducing the diversity of musical content in their formation. Musical representations are introduced into algorithms that deal with tasks modeled as machine learning tasks that produce knowledge that supports decision-making. The challenges in defining musical representations are associated with the lack of information in the data due to access restrictions or incomplete modalities and the construction of a method that aggregates heterogeneous information into a unified space. This thesis focuses on developing data structures that support the natural composition of musical data and representation learning methods capable of dealing with tasks related to musical information retrieval. In particular, the contributions of this thesis are related to the tasks of automatic annotation, recognition of emotions in music, prediction of similarity between artists, and a multitask application. We summarize our contributions in: (i) a methodology for modeling musical data in heterogeneous networks; (ii) an algorithm based on information propagation to deal with the challenge of missing features in data; (iii) methods based on graph neural networks to deal with musical information retrieval tasks; (iv) analyzes related to the complementarity of information between multiple musical features and also related tasks.A música esteve presente em diversos momentos históricos desde a formação da sociedade, acompanhando tarefas de sobrevivência à tarefas de lazer. Em momentos atuais, o surgimento e popularização de mídias com recursos para armazenamento e reprodução musical introduziram ainda mais a presença da música no cotidiano das pessoas. Além do conteúdo artístico, as músicas também começaram a gerar volumes de dados e novos mercados interessados. Neste sentido, diversos métodos de mineração de dados musicais foram propostos nas últimas décadas com objetivo de extrair informações que apoiem tomadas de decisões. Para que possam ser utilizados em algoritmos de mineração de dados, é essencial definir uma representação para tais dados. Dados musicais são intrinsecamente multimodais e heterogêneos, de forma que, para representá-los, deve-se construir uma estrutura unificada que suporte características com diferentes composições semânticas e dispostas em espaços distintos. Algumas abordagens foram propostas na literatura visando explorar variações de processos de fusões de características. No entanto, devido à formação multimodal e heterogênea, as abordagens existentes baseadas em métodos de fusão possuem restrições em cenários em que há falta de características e podem ser reduzidas a representações unimodais, reduzindo a diversidade de conteúdo musical em sua formação. Assim, os desafios em definir representações musicais estão associados à falta de informações nos dados, por restrições de acesso ou modalidades incompletas, e a construção de um método que agregue informações heterogêneas em um espaço unificado. Esta tese se concentra tanto no desenvolvimento de representações heterogêneas para dados musicais que suportem a composição natural de dados musicais quanto métodos de aprendizado de representação capazes de lidar com tarefas relacionadas à área de recuperação de informação musical. Nós investigamos o uso de redes heterogêneas para estruturar dados musicais e sua introdução em métodos de aprendizado de representação baseados em grafos. Em especial, as contribuições desta tese estão relacionadas com as tarefas de anotação automática, reconhecimento de emoções em músicas, predição de similaridade entre artistas e uma aplicação multi-tarefa. Em resumo, as contribuições podem ser sintetizadas em: (i) uma metodologia para modelagem de dados musicais em redes heterogêneas; (ii) um algoritmo baseado em propagação de informação para lidar com o desafio falta de características nos dados; (iii) métodos baseados em redes neurais para grafos para lidar com as tarefas de recuperação de informação musical; (iv) análises relacionadas à complementaridade de informação entre múltiplas características musicais e também tarefas relacionadas.Biblioteca Digitais de Teses e Dissertações da USPMarcacini, Ricardo MarcondesSilva, Diego FurtadoSilva, Angelo Cesar Mendes da2024-09-30info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-01-24T17:14:02Zoai:teses.usp.br:tde-24012025-150432Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-01-24T17:14:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
Music representation learning based on heterogeneous graph Aprendizado de representação musical baseado em grafos heterogêneos |
| title |
Music representation learning based on heterogeneous graph |
| spellingShingle |
Music representation learning based on heterogeneous graph Silva, Angelo Cesar Mendes da Aprendizado de representação Graph neural networks Heterogeneous network Music information retrieval Recuperação de informação musical Redes heterogêneas Redes neurais para grafos Representation learning |
| title_short |
Music representation learning based on heterogeneous graph |
| title_full |
Music representation learning based on heterogeneous graph |
| title_fullStr |
Music representation learning based on heterogeneous graph |
| title_full_unstemmed |
Music representation learning based on heterogeneous graph |
| title_sort |
Music representation learning based on heterogeneous graph |
| author |
Silva, Angelo Cesar Mendes da |
| author_facet |
Silva, Angelo Cesar Mendes da |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Marcacini, Ricardo Marcondes Silva, Diego Furtado |
| dc.contributor.author.fl_str_mv |
Silva, Angelo Cesar Mendes da |
| dc.subject.por.fl_str_mv |
Aprendizado de representação Graph neural networks Heterogeneous network Music information retrieval Recuperação de informação musical Redes heterogêneas Redes neurais para grafos Representation learning |
| topic |
Aprendizado de representação Graph neural networks Heterogeneous network Music information retrieval Recuperação de informação musical Redes heterogêneas Redes neurais para grafos Representation learning |
| description |
Music has been present in different historical moments since the formation of society, accompanying survival tasks and leisure tasks. Nowadays, the emergence and popularization of media with resources for storing and reproducing music has further introduced the presence of music in peoples daily lives. In addition to artistic content, songs also began to generate volumes of data and new interested markets. In this sense, several data mining methods have been proposed in recent decades to extract information to support decision-making. Defining a representation for such data is essential for its use in data mining algorithms. Musical data is intrinsically multimodal and heterogeneous, so to represent it, a unified structure must be built that supports features with different semantic compositions and organized in different spaces. Some approaches that explore variations of feature fusion processes are proposed with this objective. Due to the multimodal and heterogeneous structure, representations based on fusion methods have restrictions in scenarios where features are absent and can be reduced to unimodal representations, reducing the diversity of musical content in their formation. Musical representations are introduced into algorithms that deal with tasks modeled as machine learning tasks that produce knowledge that supports decision-making. The challenges in defining musical representations are associated with the lack of information in the data due to access restrictions or incomplete modalities and the construction of a method that aggregates heterogeneous information into a unified space. This thesis focuses on developing data structures that support the natural composition of musical data and representation learning methods capable of dealing with tasks related to musical information retrieval. In particular, the contributions of this thesis are related to the tasks of automatic annotation, recognition of emotions in music, prediction of similarity between artists, and a multitask application. We summarize our contributions in: (i) a methodology for modeling musical data in heterogeneous networks; (ii) an algorithm based on information propagation to deal with the challenge of missing features in data; (iii) methods based on graph neural networks to deal with musical information retrieval tasks; (iv) analyzes related to the complementarity of information between multiple musical features and also related tasks. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-09-30 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/ |
| url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012025-150432/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1839839156186906624 |