Processos t-student em classificação
| Ano de defesa: | 2021 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Não Informado pela instituição
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | http://www.repositorio.ufc.br/handle/riufc/60533 |
Resumo: | Gaussian Process regression models (GPR) are excellent non-parametric alternatives for modeling complex problems, among the advantages, we can mention: good predictive performance, non-parametric flexibility, interpretability and easy computational implementation. Thus, the proposal for GP classification models is useful to deal with most diverse classification problems. However, Gaussian Process models are not robust to outliers, due to the light-tailed nature of the Gaussian distribution. In this work, we propose a new t-Student Process classifier (TPC), as an alternative to Gaussian Processes. The TPC aproach is able to deal most adequately with classification problems which input data x are contaminated by outliers. The proposed classifier had its performance evaluated with the traditional Gaussian Process classifier (GPC) in real data sets from the biomedical area, where the outliers were generated artificially. For applications in the case of binary classification, spinal diagnostic data and breast cancer diagnosis were used. For applications in the multiclass case, the set of vertebral column observations in its multiclass version was considered. The inferences about the models covered in this work were made using the NUTS method, an MCMC technique variant of Hamiltonian Monte Carlo. Due to the results of the applications carried out in this work, the TPC classifier achieved very promising results, mainly in the task of multiclass classification, in which the proposal of robustness in data contaminated by textit outliers was well attended. |
| id |
UFC-7_602b21e93f42e2247f00c1f58eb27503 |
|---|---|
| oai_identifier_str |
oai:repositorio.ufc.br:riufc/60533 |
| network_acronym_str |
UFC-7 |
| network_name_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| repository_id_str |
|
| spelling |
Assunção, Alan da SilvaAndrade, José Aílton Alencar2021-09-20T11:38:01Z2021-09-20T11:38:01Z2021ASSUNÇÃO, Alan da Silva. Processos t-student em classificação. 2021. 117 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2021.http://www.repositorio.ufc.br/handle/riufc/60533Gaussian Process regression models (GPR) are excellent non-parametric alternatives for modeling complex problems, among the advantages, we can mention: good predictive performance, non-parametric flexibility, interpretability and easy computational implementation. Thus, the proposal for GP classification models is useful to deal with most diverse classification problems. However, Gaussian Process models are not robust to outliers, due to the light-tailed nature of the Gaussian distribution. In this work, we propose a new t-Student Process classifier (TPC), as an alternative to Gaussian Processes. The TPC aproach is able to deal most adequately with classification problems which input data x are contaminated by outliers. The proposed classifier had its performance evaluated with the traditional Gaussian Process classifier (GPC) in real data sets from the biomedical area, where the outliers were generated artificially. For applications in the case of binary classification, spinal diagnostic data and breast cancer diagnosis were used. For applications in the multiclass case, the set of vertebral column observations in its multiclass version was considered. The inferences about the models covered in this work were made using the NUTS method, an MCMC technique variant of Hamiltonian Monte Carlo. Due to the results of the applications carried out in this work, the TPC classifier achieved very promising results, mainly in the task of multiclass classification, in which the proposal of robustness in data contaminated by textit outliers was well attended.Modelos de regressão baseados em Processo Gaussiano (GPR) são excelentes alternativas não-paramétricas para modelagem de problemas complexos, e apresentam muitas atratividades das quais podemos citar: boa performance preditiva, flexibilidade não-paramétrica, interpretabilidade e relativamente fácil implementação conceitual. Dessa forma, a proposta de modelos de classificação de GP ´e um caminho bastante útil para lidar com os mais diversos problemas de classificação. Entretanto, modelos de Processo Gaussiano não possuem robustez a outliers, devido à natureza de cauda leve da distribuição Gaussiana. Com isso, neste trabalho, propomos um novo classificador com um Processo t-Student (TPC), como distribuição a priori, como forma alternativa aos Processos Gaussianos. O TPC tem por objetivo lidar de forma adequada com problemas de classificação cujos dados de entrada x estejam contaminados por outliers. O classificador proposto teve seu desempenho avaliado junto ao tradicional classificador de Processo Gaussiano (GPC) em conjuntos de dados reais da área biomédica, em que os outliers foram gerados artificialmente. Para as aplicações no caso de classificação binária, dados de diagnóstico de coluna vertebral e diagnóstico de câncer de mama foram utilizados. Para as aplicações no caso multiclasse, o conjunto de observações de coluna vertebral em sua versão multiclasse foi considerado. As inferências sobre os modelos abordados nesta pesquisa foram feitas por meio do método NUTS, uma técnica MCMC variante do Monte Carlo Hamiltoniano. Pelos resultados das aplicações realizadas neste trabalho, o classificador TPC alcançou resultados bastante promissores, principalmente na tarefa de classificação multiclasse, em que a proposta de robustez em dados contaminados por outliers foi bem atendida.Classificador de processo gaussianoRobustezClassificador de processo t-studentModelagem não-paramétricaGaussian process classifierRobustnessT-student process classifierNon-parametric modelingProcessos t-student em classificaçãoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/60533/8/license.txt8a4605be74aa9ea9d79846c1fba20a33MD58ORIGINAL2021_dis_asassunção.pdf2021_dis_asassunção.pdfapplication/pdf2148307http://repositorio.ufc.br/bitstream/riufc/60533/9/2021_dis_asassun%c3%a7%c3%a3o.pdfd2e77ef0d3a51f22daaef8233b35b030MD59riufc/605332021-12-21 16:19:05.583oai:repositorio.ufc.br:riufc/60533Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2021-12-21T19:19:05Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false |
| dc.title.pt_BR.fl_str_mv |
Processos t-student em classificação |
| title |
Processos t-student em classificação |
| spellingShingle |
Processos t-student em classificação Assunção, Alan da Silva Classificador de processo gaussiano Robustez Classificador de processo t-student Modelagem não-paramétrica Gaussian process classifier Robustness T-student process classifier Non-parametric modeling |
| title_short |
Processos t-student em classificação |
| title_full |
Processos t-student em classificação |
| title_fullStr |
Processos t-student em classificação |
| title_full_unstemmed |
Processos t-student em classificação |
| title_sort |
Processos t-student em classificação |
| author |
Assunção, Alan da Silva |
| author_facet |
Assunção, Alan da Silva |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Assunção, Alan da Silva |
| dc.contributor.advisor1.fl_str_mv |
Andrade, José Aílton Alencar |
| contributor_str_mv |
Andrade, José Aílton Alencar |
| dc.subject.por.fl_str_mv |
Classificador de processo gaussiano Robustez Classificador de processo t-student Modelagem não-paramétrica Gaussian process classifier Robustness T-student process classifier Non-parametric modeling |
| topic |
Classificador de processo gaussiano Robustez Classificador de processo t-student Modelagem não-paramétrica Gaussian process classifier Robustness T-student process classifier Non-parametric modeling |
| description |
Gaussian Process regression models (GPR) are excellent non-parametric alternatives for modeling complex problems, among the advantages, we can mention: good predictive performance, non-parametric flexibility, interpretability and easy computational implementation. Thus, the proposal for GP classification models is useful to deal with most diverse classification problems. However, Gaussian Process models are not robust to outliers, due to the light-tailed nature of the Gaussian distribution. In this work, we propose a new t-Student Process classifier (TPC), as an alternative to Gaussian Processes. The TPC aproach is able to deal most adequately with classification problems which input data x are contaminated by outliers. The proposed classifier had its performance evaluated with the traditional Gaussian Process classifier (GPC) in real data sets from the biomedical area, where the outliers were generated artificially. For applications in the case of binary classification, spinal diagnostic data and breast cancer diagnosis were used. For applications in the multiclass case, the set of vertebral column observations in its multiclass version was considered. The inferences about the models covered in this work were made using the NUTS method, an MCMC technique variant of Hamiltonian Monte Carlo. Due to the results of the applications carried out in this work, the TPC classifier achieved very promising results, mainly in the task of multiclass classification, in which the proposal of robustness in data contaminated by textit outliers was well attended. |
| publishDate |
2021 |
| dc.date.accessioned.fl_str_mv |
2021-09-20T11:38:01Z |
| dc.date.available.fl_str_mv |
2021-09-20T11:38:01Z |
| dc.date.issued.fl_str_mv |
2021 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
ASSUNÇÃO, Alan da Silva. Processos t-student em classificação. 2021. 117 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2021. |
| dc.identifier.uri.fl_str_mv |
http://www.repositorio.ufc.br/handle/riufc/60533 |
| identifier_str_mv |
ASSUNÇÃO, Alan da Silva. Processos t-student em classificação. 2021. 117 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2021. |
| url |
http://www.repositorio.ufc.br/handle/riufc/60533 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC |
| instname_str |
Universidade Federal do Ceará (UFC) |
| instacron_str |
UFC |
| institution |
UFC |
| reponame_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| collection |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| bitstream.url.fl_str_mv |
http://repositorio.ufc.br/bitstream/riufc/60533/8/license.txt http://repositorio.ufc.br/bitstream/riufc/60533/9/2021_dis_asassun%c3%a7%c3%a3o.pdf |
| bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 d2e77ef0d3a51f22daaef8233b35b030 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC) |
| repository.mail.fl_str_mv |
bu@ufc.br || repositorio@ufc.br |
| _version_ |
1847793305419513856 |