IVS: interpretative variable selection via perfect bipartite matching
| Ano de defesa: | 2023 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Não Informado pela instituição
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Área do conhecimento CNPq: | |
| Link de acesso: | http://repositorio.ufc.br/handle/riufc/74683 |
Resumo: | Feature selection is a fundamental process in machine learning to identify the most relevant subset of features for a given problem. Among the various feature selection approaches, filter methods stand out for their simplicity and efficiency. However, these methods lack interpretability regarding the relationships between the selected and unselected features. To address this challenge, we propose a novel pairwise feature selection method based on Perfect Bipartite Matching, which establishes optimized linear relationships between features, thus facilitating the interpretation of feature connections. We also demonstrate how to incorporate domain knowledge, allowing users to exclude/include desirable patterns (e.g., pre-select specific features). Empirical evaluations using 17 datasets demonstrate the effectiveness of our approach compared to baseline methods. Furthermore, we present a case study on Chagas disease, showcasing detailed interpretation results and the significance of selected features in sudden cardiac death prevention. |
| id |
UFC-7_71fa6691d4fc29d9538a762841ca76fd |
|---|---|
| oai_identifier_str |
oai:repositorio.ufc.br:riufc/74683 |
| network_acronym_str |
UFC-7 |
| network_name_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| repository_id_str |
|
| spelling |
Caldas, Weslley LiobaMadeiro, João Paulo do ValeGomes, João Paulo Pordeus2023-10-18T16:24:22Z2023-10-18T16:24:22Z2023CALDAS, Weslley Lioba. IVS: interpretative variable selection via perfect bipartite matching. 2023. 65 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2023.http://repositorio.ufc.br/handle/riufc/74683Feature selection is a fundamental process in machine learning to identify the most relevant subset of features for a given problem. Among the various feature selection approaches, filter methods stand out for their simplicity and efficiency. However, these methods lack interpretability regarding the relationships between the selected and unselected features. To address this challenge, we propose a novel pairwise feature selection method based on Perfect Bipartite Matching, which establishes optimized linear relationships between features, thus facilitating the interpretation of feature connections. We also demonstrate how to incorporate domain knowledge, allowing users to exclude/include desirable patterns (e.g., pre-select specific features). Empirical evaluations using 17 datasets demonstrate the effectiveness of our approach compared to baseline methods. Furthermore, we present a case study on Chagas disease, showcasing detailed interpretation results and the significance of selected features in sudden cardiac death prevention.A seleção de características é um processo fundamental em aprendizado de máquina para identificar o subconjunto mais relevante de atributos para um determinado problema. Entre as várias abordagens de seleção de características, os métodos de filtro se destacam por sua simplicidade e eficiência. No entanto, esses métodos carecem de interpretabilidade em relação às relações entre as características selecionadas e não selecionadas. Para enfrentar esse desafio, propomos um novo método de seleção de características em pares baseado em Emparelhamento Bipartido Perfeito, que estabelece relações lineares otimizadas entre as características, facilitando assim a interpretação das conexões entre elas. Também demonstramos como incorporar conhecimento de domínio, permitindo aos usuários excluir/incluir padrões desejáveis (por exemplo, pré-selecionar características específicas). Avaliações empíricas utilizando 17 conjuntos de dados demonstram a eficácia de nossa abordagem em comparação com os métodos de referência. Além disso, apresentamos um estudo de caso sobre a doença de Chagas, mostrando resultados de interpretação detalhados e a importância das características selecionadas na prevenção da morte súbita cardíaca.IVS: interpretative variable selection via perfect bipartite matchinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisDoença de ChagasInterpretabilidadeSeleção de atributosAprendizagem de máquinaChagas diseaseInterpretabilityFeature selectionMachine learningCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFChttp://lattes.cnpq.br/3450623955098872http://lattes.cnpq.br/9553770402705512http://lattes.cnpq.br/43281594665060742023-10-18ORIGINAL2023_tese_wlcaldas.pdf2023_tese_wlcaldas.pdfapplication/pdf856904http://repositorio.ufc.br/bitstream/riufc/74683/3/2023_tese_wlcaldas.pdf28dc1d3447d8de74933270dbc2b00752MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/74683/4/license.txt8a4605be74aa9ea9d79846c1fba20a33MD54riufc/746832023-10-18 13:24:32.798oai:repositorio.ufc.br:riufc/74683Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2023-10-18T16:24:32Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false |
| dc.title.pt_BR.fl_str_mv |
IVS: interpretative variable selection via perfect bipartite matching |
| title |
IVS: interpretative variable selection via perfect bipartite matching |
| spellingShingle |
IVS: interpretative variable selection via perfect bipartite matching Caldas, Weslley Lioba CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Doença de Chagas Interpretabilidade Seleção de atributos Aprendizagem de máquina Chagas disease Interpretability Feature selection Machine learning |
| title_short |
IVS: interpretative variable selection via perfect bipartite matching |
| title_full |
IVS: interpretative variable selection via perfect bipartite matching |
| title_fullStr |
IVS: interpretative variable selection via perfect bipartite matching |
| title_full_unstemmed |
IVS: interpretative variable selection via perfect bipartite matching |
| title_sort |
IVS: interpretative variable selection via perfect bipartite matching |
| author |
Caldas, Weslley Lioba |
| author_facet |
Caldas, Weslley Lioba |
| author_role |
author |
| dc.contributor.co-advisor.none.fl_str_mv |
Madeiro, João Paulo do Vale |
| dc.contributor.author.fl_str_mv |
Caldas, Weslley Lioba |
| dc.contributor.advisor1.fl_str_mv |
Gomes, João Paulo Pordeus |
| contributor_str_mv |
Gomes, João Paulo Pordeus |
| dc.subject.cnpq.fl_str_mv |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO |
| topic |
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Doença de Chagas Interpretabilidade Seleção de atributos Aprendizagem de máquina Chagas disease Interpretability Feature selection Machine learning |
| dc.subject.ptbr.pt_BR.fl_str_mv |
Doença de Chagas Interpretabilidade Seleção de atributos Aprendizagem de máquina |
| dc.subject.en.pt_BR.fl_str_mv |
Chagas disease Interpretability Feature selection Machine learning |
| description |
Feature selection is a fundamental process in machine learning to identify the most relevant subset of features for a given problem. Among the various feature selection approaches, filter methods stand out for their simplicity and efficiency. However, these methods lack interpretability regarding the relationships between the selected and unselected features. To address this challenge, we propose a novel pairwise feature selection method based on Perfect Bipartite Matching, which establishes optimized linear relationships between features, thus facilitating the interpretation of feature connections. We also demonstrate how to incorporate domain knowledge, allowing users to exclude/include desirable patterns (e.g., pre-select specific features). Empirical evaluations using 17 datasets demonstrate the effectiveness of our approach compared to baseline methods. Furthermore, we present a case study on Chagas disease, showcasing detailed interpretation results and the significance of selected features in sudden cardiac death prevention. |
| publishDate |
2023 |
| dc.date.accessioned.fl_str_mv |
2023-10-18T16:24:22Z |
| dc.date.available.fl_str_mv |
2023-10-18T16:24:22Z |
| dc.date.issued.fl_str_mv |
2023 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
CALDAS, Weslley Lioba. IVS: interpretative variable selection via perfect bipartite matching. 2023. 65 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2023. |
| dc.identifier.uri.fl_str_mv |
http://repositorio.ufc.br/handle/riufc/74683 |
| identifier_str_mv |
CALDAS, Weslley Lioba. IVS: interpretative variable selection via perfect bipartite matching. 2023. 65 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2023. |
| url |
http://repositorio.ufc.br/handle/riufc/74683 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC |
| instname_str |
Universidade Federal do Ceará (UFC) |
| instacron_str |
UFC |
| institution |
UFC |
| reponame_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| collection |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| bitstream.url.fl_str_mv |
http://repositorio.ufc.br/bitstream/riufc/74683/3/2023_tese_wlcaldas.pdf http://repositorio.ufc.br/bitstream/riufc/74683/4/license.txt |
| bitstream.checksum.fl_str_mv |
28dc1d3447d8de74933270dbc2b00752 8a4605be74aa9ea9d79846c1fba20a33 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC) |
| repository.mail.fl_str_mv |
bu@ufc.br || repositorio@ufc.br |
| _version_ |
1847793215512510464 |