Robust algorithms for linear regression and locally linear embedding
| Ano de defesa: | 2017 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Não Informado pela instituição
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | http://www.repositorio.ufc.br/handle/riufc/22445 |
Resumo: | Nowadays a very large quantity of data is flowing around our digital society. There is a growing interest in converting this large amount of data into valuable and useful information. Machine learning plays an essential role in the transformation of data into knowledge. However, the probability of outliers inside the data is too high to marginalize the importance of robust algorithms. To understand that, various models of outliers are studied. In this work, several robust estimators within the generalized linear model for regression framework are discussed and analyzed: namely, the M-Estimator, the S-Estimator, the MM-Estimator, the RANSAC and the Theil-Sen estimator. This choice is motivated by the necessity of examining algorithms with different working principles. In particular, the M-, S-, MM-Estimator are based on a modification of the least squares criterion, whereas the RANSAC is based on finding the smallest subset of points that guarantees a predefined model accuracy. The Theil Sen, on the other hand, uses the median of least square models to estimate. The performance of the estimators under a wide range of experimental conditions is compared and analyzed. In addition to the linear regression problem, the dimensionality reduction problem is considered. More specifically, the locally linear embedding, the principal component analysis and some robust approaches of them are treated. Motivated by giving some robustness to the LLE algorithm, the RALLE algorithm is proposed. Its main idea is to use different sizes of neighborhoods to construct the weights of the points; to achieve this, the RAPCA is executed in each set of neighbors and the risky points are discarded from the corresponding neighborhood. The performance of the LLE, the RLLE and the RALLE over some datasets is evaluated. |
| id |
UFC-7_85f216200dc153079a99df7d70efdd6e |
|---|---|
| oai_identifier_str |
oai:repositorio.ufc.br:riufc/22445 |
| network_acronym_str |
UFC-7 |
| network_name_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| repository_id_str |
|
| spelling |
Rettes, Julio Alberto SibajaCorona, FrancescoAlcântara, João Fernando Lima2017-04-04T11:10:44Z2017-04-04T11:10:44Z2017RETTES, Julio Alberto Sibaja. Robust algorithms for linear regression and locally linear embedding. 2017. 105 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2017.http://www.repositorio.ufc.br/handle/riufc/22445Nowadays a very large quantity of data is flowing around our digital society. There is a growing interest in converting this large amount of data into valuable and useful information. Machine learning plays an essential role in the transformation of data into knowledge. However, the probability of outliers inside the data is too high to marginalize the importance of robust algorithms. To understand that, various models of outliers are studied. In this work, several robust estimators within the generalized linear model for regression framework are discussed and analyzed: namely, the M-Estimator, the S-Estimator, the MM-Estimator, the RANSAC and the Theil-Sen estimator. This choice is motivated by the necessity of examining algorithms with different working principles. In particular, the M-, S-, MM-Estimator are based on a modification of the least squares criterion, whereas the RANSAC is based on finding the smallest subset of points that guarantees a predefined model accuracy. The Theil Sen, on the other hand, uses the median of least square models to estimate. The performance of the estimators under a wide range of experimental conditions is compared and analyzed. In addition to the linear regression problem, the dimensionality reduction problem is considered. More specifically, the locally linear embedding, the principal component analysis and some robust approaches of them are treated. Motivated by giving some robustness to the LLE algorithm, the RALLE algorithm is proposed. Its main idea is to use different sizes of neighborhoods to construct the weights of the points; to achieve this, the RAPCA is executed in each set of neighbors and the risky points are discarded from the corresponding neighborhood. The performance of the LLE, the RLLE and the RALLE over some datasets is evaluated.Na atualidade um grande volume de dados é produzido na nossa sociedade digital. Existe um crescente interesse em converter esses dados em informação útil e o aprendizado de máquinas tem um papel central nessa transformação de dados em conhecimento. Por outro lado, a probabilidade dos dados conterem outliers é muito alta para ignorar a importância dos algoritmos robustos. Para se familiarizar com isso, são estudados vários modelos de outliers. Neste trabalho, discutimos e analisamos vários estimadores robustos dentro do contexto dos modelos de regressão linear generalizados: são eles o M-Estimator, o S-Estimator, o MM-Estimator, o RANSAC e o Theil-Senestimator. A escolha dos estimadores é motivada pelo principio de explorar algoritmos com distintos conceitos de funcionamento. Em particular os estimadores M, S e MM são baseados na modificação do critério de minimização dos mínimos quadrados, enquanto que o RANSAC se fundamenta em achar o menor subconjunto que permita garantir uma acurácia predefinida ao modelo. Por outro lado o Theil-Sen usa a mediana de modelos obtidos usando mínimos quadradosno processo de estimação. O desempenho dos estimadores em uma ampla gama de condições experimentais é comparado e analisado. Além do problema de regressão linear, considera-se o problema de redução da dimensionalidade. Especificamente, são tratados o Locally Linear Embedding, o Principal ComponentAnalysis e outras abordagens robustas destes. É proposto um método denominado RALLE com a motivação de prover de robustez ao algoritmo de LLE. A ideia principal é usar vizinhanças de tamanhos variáveis para construir os pesos dos pontos; para fazer isto possível, o RAPCA é executado em cada grupo de vizinhos e os pontos sob risco são descartados da vizinhança correspondente. É feita uma avaliação do desempenho do LLE, do RLLE e do RALLE sobre algumas bases de dados.OutliersEstatística robustaRegressão linearRedução da dimensionalidadeLocally linearEmbeddingRobust algorithms for linear regression and locally linear embeddingAlgoritmos robustos para regressão linear e locally linear embeddinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisporreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://repositorio.ufc.br/bitstream/riufc/22445/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52ORIGINAL2017_dis_rettesjas.pdf2017_dis_rettesjas.pdfapplication/pdf3569500http://repositorio.ufc.br/bitstream/riufc/22445/1/2017_dis_rettesjas.pdf46cedc2d9f96d0f58bcdfe3e0d975d78MD51riufc/224452020-07-09 11:34:18.892oai:repositorio.ufc.br:riufc/22445Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2020-07-09T14:34:18Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false |
| dc.title.pt_BR.fl_str_mv |
Robust algorithms for linear regression and locally linear embedding |
| dc.title.alternative.pt_BR.fl_str_mv |
Algoritmos robustos para regressão linear e locally linear embedding |
| title |
Robust algorithms for linear regression and locally linear embedding |
| spellingShingle |
Robust algorithms for linear regression and locally linear embedding Rettes, Julio Alberto Sibaja Outliers Estatística robusta Regressão linear Redução da dimensionalidade Locally linear Embedding |
| title_short |
Robust algorithms for linear regression and locally linear embedding |
| title_full |
Robust algorithms for linear regression and locally linear embedding |
| title_fullStr |
Robust algorithms for linear regression and locally linear embedding |
| title_full_unstemmed |
Robust algorithms for linear regression and locally linear embedding |
| title_sort |
Robust algorithms for linear regression and locally linear embedding |
| author |
Rettes, Julio Alberto Sibaja |
| author_facet |
Rettes, Julio Alberto Sibaja |
| author_role |
author |
| dc.contributor.co-advisor.none.fl_str_mv |
Corona, Francesco |
| dc.contributor.author.fl_str_mv |
Rettes, Julio Alberto Sibaja |
| dc.contributor.advisor1.fl_str_mv |
Alcântara, João Fernando Lima |
| contributor_str_mv |
Alcântara, João Fernando Lima |
| dc.subject.por.fl_str_mv |
Outliers Estatística robusta Regressão linear Redução da dimensionalidade Locally linear Embedding |
| topic |
Outliers Estatística robusta Regressão linear Redução da dimensionalidade Locally linear Embedding |
| description |
Nowadays a very large quantity of data is flowing around our digital society. There is a growing interest in converting this large amount of data into valuable and useful information. Machine learning plays an essential role in the transformation of data into knowledge. However, the probability of outliers inside the data is too high to marginalize the importance of robust algorithms. To understand that, various models of outliers are studied. In this work, several robust estimators within the generalized linear model for regression framework are discussed and analyzed: namely, the M-Estimator, the S-Estimator, the MM-Estimator, the RANSAC and the Theil-Sen estimator. This choice is motivated by the necessity of examining algorithms with different working principles. In particular, the M-, S-, MM-Estimator are based on a modification of the least squares criterion, whereas the RANSAC is based on finding the smallest subset of points that guarantees a predefined model accuracy. The Theil Sen, on the other hand, uses the median of least square models to estimate. The performance of the estimators under a wide range of experimental conditions is compared and analyzed. In addition to the linear regression problem, the dimensionality reduction problem is considered. More specifically, the locally linear embedding, the principal component analysis and some robust approaches of them are treated. Motivated by giving some robustness to the LLE algorithm, the RALLE algorithm is proposed. Its main idea is to use different sizes of neighborhoods to construct the weights of the points; to achieve this, the RAPCA is executed in each set of neighbors and the risky points are discarded from the corresponding neighborhood. The performance of the LLE, the RLLE and the RALLE over some datasets is evaluated. |
| publishDate |
2017 |
| dc.date.accessioned.fl_str_mv |
2017-04-04T11:10:44Z |
| dc.date.available.fl_str_mv |
2017-04-04T11:10:44Z |
| dc.date.issued.fl_str_mv |
2017 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.citation.fl_str_mv |
RETTES, Julio Alberto Sibaja. Robust algorithms for linear regression and locally linear embedding. 2017. 105 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2017. |
| dc.identifier.uri.fl_str_mv |
http://www.repositorio.ufc.br/handle/riufc/22445 |
| identifier_str_mv |
RETTES, Julio Alberto Sibaja. Robust algorithms for linear regression and locally linear embedding. 2017. 105 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2017. |
| url |
http://www.repositorio.ufc.br/handle/riufc/22445 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da Universidade Federal do Ceará (UFC) instname:Universidade Federal do Ceará (UFC) instacron:UFC |
| instname_str |
Universidade Federal do Ceará (UFC) |
| instacron_str |
UFC |
| institution |
UFC |
| reponame_str |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| collection |
Repositório Institucional da Universidade Federal do Ceará (UFC) |
| bitstream.url.fl_str_mv |
http://repositorio.ufc.br/bitstream/riufc/22445/2/license.txt http://repositorio.ufc.br/bitstream/riufc/22445/1/2017_dis_rettesjas.pdf |
| bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 46cedc2d9f96d0f58bcdfe3e0d975d78 |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC) |
| repository.mail.fl_str_mv |
bu@ufc.br || repositorio@ufc.br |
| _version_ |
1847793283335454720 |