Processos t-student em classificação

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Assunção, Alan da Silva
Orientador(a): Andrade, José Aílton Alencar
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/60533
Resumo: Gaussian Process regression models (GPR) are excellent non-parametric alternatives for modeling complex problems, among the advantages, we can mention: good predictive performance, non-parametric flexibility, interpretability and easy computational implementation. Thus, the proposal for GP classification models is useful to deal with most diverse classification problems. However, Gaussian Process models are not robust to outliers, due to the light-tailed nature of the Gaussian distribution. In this work, we propose a new t-Student Process classifier (TPC), as an alternative to Gaussian Processes. The TPC aproach is able to deal most adequately with classification problems which input data x are contaminated by outliers. The proposed classifier had its performance evaluated with the traditional Gaussian Process classifier (GPC) in real data sets from the biomedical area, where the outliers were generated artificially. For applications in the case of binary classification, spinal diagnostic data and breast cancer diagnosis were used. For applications in the multiclass case, the set of vertebral column observations in its multiclass version was considered. The inferences about the models covered in this work were made using the NUTS method, an MCMC technique variant of Hamiltonian Monte Carlo. Due to the results of the applications carried out in this work, the TPC classifier achieved very promising results, mainly in the task of multiclass classification, in which the proposal of robustness in data contaminated by textit outliers was well attended.
id UFC-7_602b21e93f42e2247f00c1f58eb27503
oai_identifier_str oai:repositorio.ufc.br:riufc/60533
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Assunção, Alan da SilvaAndrade, José Aílton Alencar2021-09-20T11:38:01Z2021-09-20T11:38:01Z2021ASSUNÇÃO, Alan da Silva. Processos t-student em classificação. 2021. 117 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2021.http://www.repositorio.ufc.br/handle/riufc/60533Gaussian Process regression models (GPR) are excellent non-parametric alternatives for modeling complex problems, among the advantages, we can mention: good predictive performance, non-parametric flexibility, interpretability and easy computational implementation. Thus, the proposal for GP classification models is useful to deal with most diverse classification problems. However, Gaussian Process models are not robust to outliers, due to the light-tailed nature of the Gaussian distribution. In this work, we propose a new t-Student Process classifier (TPC), as an alternative to Gaussian Processes. The TPC aproach is able to deal most adequately with classification problems which input data x are contaminated by outliers. The proposed classifier had its performance evaluated with the traditional Gaussian Process classifier (GPC) in real data sets from the biomedical area, where the outliers were generated artificially. For applications in the case of binary classification, spinal diagnostic data and breast cancer diagnosis were used. For applications in the multiclass case, the set of vertebral column observations in its multiclass version was considered. The inferences about the models covered in this work were made using the NUTS method, an MCMC technique variant of Hamiltonian Monte Carlo. Due to the results of the applications carried out in this work, the TPC classifier achieved very promising results, mainly in the task of multiclass classification, in which the proposal of robustness in data contaminated by textit outliers was well attended.Modelos de regressão baseados em Processo Gaussiano (GPR) são excelentes alternativas não-paramétricas para modelagem de problemas complexos, e apresentam muitas atratividades das quais podemos citar: boa performance preditiva, flexibilidade não-paramétrica, interpretabilidade e relativamente fácil implementação conceitual. Dessa forma, a proposta de modelos de classificação de GP ´e um caminho bastante útil para lidar com os mais diversos problemas de classificação. Entretanto, modelos de Processo Gaussiano não possuem robustez a outliers, devido à natureza de cauda leve da distribuição Gaussiana. Com isso, neste trabalho, propomos um novo classificador com um Processo t-Student (TPC), como distribuição a priori, como forma alternativa aos Processos Gaussianos. O TPC tem por objetivo lidar de forma adequada com problemas de classificação cujos dados de entrada x estejam contaminados por outliers. O classificador proposto teve seu desempenho avaliado junto ao tradicional classificador de Processo Gaussiano (GPC) em conjuntos de dados reais da área biomédica, em que os outliers foram gerados artificialmente. Para as aplicações no caso de classificação binária, dados de diagnóstico de coluna vertebral e diagnóstico de câncer de mama foram utilizados. Para as aplicações no caso multiclasse, o conjunto de observações de coluna vertebral em sua versão multiclasse foi considerado. As inferências sobre os modelos abordados nesta pesquisa foram feitas por meio do método NUTS, uma técnica MCMC variante do Monte Carlo Hamiltoniano. Pelos resultados das aplicações realizadas neste trabalho, o classificador TPC alcançou resultados bastante promissores, principalmente na tarefa de classificação multiclasse, em que a proposta de robustez em dados contaminados por outliers foi bem atendida.Classificador de processo gaussianoRobustezClassificador de processo t-studentModelagem não-paramétricaGaussian process classifierRobustnessT-student process classifierNon-parametric modelingProcessos t-student em classificaçãoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/60533/8/license.txt8a4605be74aa9ea9d79846c1fba20a33MD58ORIGINAL2021_dis_asassunção.pdf2021_dis_asassunção.pdfapplication/pdf2148307http://repositorio.ufc.br/bitstream/riufc/60533/9/2021_dis_asassun%c3%a7%c3%a3o.pdfd2e77ef0d3a51f22daaef8233b35b030MD59riufc/605332021-12-21 16:19:05.583oai:repositorio.ufc.br:riufc/60533Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2021-12-21T19:19:05Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.pt_BR.fl_str_mv Processos t-student em classificação
title Processos t-student em classificação
spellingShingle Processos t-student em classificação
Assunção, Alan da Silva
Classificador de processo gaussiano
Robustez
Classificador de processo t-student
Modelagem não-paramétrica
Gaussian process classifier
Robustness
T-student process classifier
Non-parametric modeling
title_short Processos t-student em classificação
title_full Processos t-student em classificação
title_fullStr Processos t-student em classificação
title_full_unstemmed Processos t-student em classificação
title_sort Processos t-student em classificação
author Assunção, Alan da Silva
author_facet Assunção, Alan da Silva
author_role author
dc.contributor.author.fl_str_mv Assunção, Alan da Silva
dc.contributor.advisor1.fl_str_mv Andrade, José Aílton Alencar
contributor_str_mv Andrade, José Aílton Alencar
dc.subject.por.fl_str_mv Classificador de processo gaussiano
Robustez
Classificador de processo t-student
Modelagem não-paramétrica
Gaussian process classifier
Robustness
T-student process classifier
Non-parametric modeling
topic Classificador de processo gaussiano
Robustez
Classificador de processo t-student
Modelagem não-paramétrica
Gaussian process classifier
Robustness
T-student process classifier
Non-parametric modeling
description Gaussian Process regression models (GPR) are excellent non-parametric alternatives for modeling complex problems, among the advantages, we can mention: good predictive performance, non-parametric flexibility, interpretability and easy computational implementation. Thus, the proposal for GP classification models is useful to deal with most diverse classification problems. However, Gaussian Process models are not robust to outliers, due to the light-tailed nature of the Gaussian distribution. In this work, we propose a new t-Student Process classifier (TPC), as an alternative to Gaussian Processes. The TPC aproach is able to deal most adequately with classification problems which input data x are contaminated by outliers. The proposed classifier had its performance evaluated with the traditional Gaussian Process classifier (GPC) in real data sets from the biomedical area, where the outliers were generated artificially. For applications in the case of binary classification, spinal diagnostic data and breast cancer diagnosis were used. For applications in the multiclass case, the set of vertebral column observations in its multiclass version was considered. The inferences about the models covered in this work were made using the NUTS method, an MCMC technique variant of Hamiltonian Monte Carlo. Due to the results of the applications carried out in this work, the TPC classifier achieved very promising results, mainly in the task of multiclass classification, in which the proposal of robustness in data contaminated by textit outliers was well attended.
publishDate 2021
dc.date.accessioned.fl_str_mv 2021-09-20T11:38:01Z
dc.date.available.fl_str_mv 2021-09-20T11:38:01Z
dc.date.issued.fl_str_mv 2021
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv ASSUNÇÃO, Alan da Silva. Processos t-student em classificação. 2021. 117 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2021.
dc.identifier.uri.fl_str_mv http://www.repositorio.ufc.br/handle/riufc/60533
identifier_str_mv ASSUNÇÃO, Alan da Silva. Processos t-student em classificação. 2021. 117 f. Dissertação (Mestrado em Modelagem e Métodos Quantitativos) - Departamento de Estatística e Matemática Aplicada, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2021.
url http://www.repositorio.ufc.br/handle/riufc/60533
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
bitstream.url.fl_str_mv http://repositorio.ufc.br/bitstream/riufc/60533/8/license.txt
http://repositorio.ufc.br/bitstream/riufc/60533/9/2021_dis_asassun%c3%a7%c3%a3o.pdf
bitstream.checksum.fl_str_mv 8a4605be74aa9ea9d79846c1fba20a33
d2e77ef0d3a51f22daaef8233b35b030
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1847793305419513856