Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias
| Ano de defesa: | 2020 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Tese |
| Tipo de acesso: | Acesso aberto |
| Idioma: | por |
| Instituição de defesa: |
Universidade Federal de Minas Gerais
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Link de acesso: | https://hdl.handle.net/1843/35575 |
Resumo: | Analysis of biological data is a very important task since it allows one to obtain useful information and to expand knowledge about some biological domain. Biological materials can be analyzed by several techniques and a widely used method is the infrared spectroscopy analysis, which allows the information extraction through the emission of infrared light in the samples. Spectroscopy techniques produce a large volume of data, which make manual analysis by experts complex. Computer Science, through Machine Learning, can help in this task, o↵ering ways to understand and produce important knowledge from spectral samples. Convolutional Neural Networks, generally successfully applied to image recognition, are specifically suitable for spectral data of samples, since spectral structure can be seen as an image. This work conducts experiments in order to detect the occurrence of adulteration in bovine milk through infrared spectroscopy analyzes using a proposed convolutiona neural network architecture and ensemble decision trees models. In the experiments, spectral data from thousands of pure and adulterated samples of bovine milk were subjected to the proposed convolutional neural network, to the ensemblee models and to other commonly used methods for this purpose, allowing a comparison of di↵erent approaches. The proposed approach was able to detect adulterants with an accuracy of up to 98.76% for convolutional neural network and tree embedding, while the commonly used baseline methods produced average accuracy of 65.88%. Despite the good performance of the convolutional neural network for the milk problem, designing neural network architectures that o↵er good performance for generic problems is a challenging task. Generally, the search for an adequate architecture is a specific process for the problem addressed and it is conducted by specialized scientists through manual and extensive tests, and also prior knowledge on similar problems is required. Automated Machine Learning, or AutoML, can collaborate in this process, since its objectives is the search for the most adequate architectures for the input problem in an automated way and no human interaction. However, Machine Learning methods, or even AutoML methods, generally do not consider individual characteristics from instances that belong to the analyzed datasets. By considering information that reflects each sample’s characteristics, it is possible to evaluate the methods’ habilities. Item Response Theory (IRT) is a psychometrics approach that can be adapted to Machine Learning and can o↵er complexity descriptions on an instance level, and also characterize inherit abilities to Machine Learning models. In this work, we propose an innovative methodology based on AutoML and IRT that is capable of o↵ering high explicability of Convolutional Neural Network models. The method selects the most adequate instances to be submitted to di↵erent models, based on the diculty and discrimination information and also models abilities. Several experiments were conducted in order to evaluate the methodology viability on real datasets, comparing the results with other benchmark methods. Results show that the proposed method presents, in most cases, better performance over the other methods. The proposed methodology was capable of generating average accuracies of 96.96% for a specific dataset, while a Convolutional Neural Network manually created presented 78.43%, a voting approach with hundreds of models presented 69.06% and an already existing AutoML system presented 91.81% of accuracy. |
| id |
UFMG_ad5c2eb5bb8131da7e5a6e0e51eb2b7c |
|---|---|
| oai_identifier_str |
oai:repositorio.ufmg.br:1843/35575 |
| network_acronym_str |
UFMG |
| network_name_str |
Repositório Institucional da UFMG |
| repository_id_str |
|
| spelling |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instânciasComputação – TesesAprendizado do computador – TesesRedes neurais convolucionais – TesesTeoria de Resposta ao Item – TesesLeite - Análise -TesesAprendizado de MáquinaAutoMLTeoria de Resposta ao ItemRedes Neurais ConvolucionaisAnalysis of biological data is a very important task since it allows one to obtain useful information and to expand knowledge about some biological domain. Biological materials can be analyzed by several techniques and a widely used method is the infrared spectroscopy analysis, which allows the information extraction through the emission of infrared light in the samples. Spectroscopy techniques produce a large volume of data, which make manual analysis by experts complex. Computer Science, through Machine Learning, can help in this task, o↵ering ways to understand and produce important knowledge from spectral samples. Convolutional Neural Networks, generally successfully applied to image recognition, are specifically suitable for spectral data of samples, since spectral structure can be seen as an image. This work conducts experiments in order to detect the occurrence of adulteration in bovine milk through infrared spectroscopy analyzes using a proposed convolutiona neural network architecture and ensemble decision trees models. In the experiments, spectral data from thousands of pure and adulterated samples of bovine milk were subjected to the proposed convolutional neural network, to the ensemblee models and to other commonly used methods for this purpose, allowing a comparison of di↵erent approaches. The proposed approach was able to detect adulterants with an accuracy of up to 98.76% for convolutional neural network and tree embedding, while the commonly used baseline methods produced average accuracy of 65.88%. Despite the good performance of the convolutional neural network for the milk problem, designing neural network architectures that o↵er good performance for generic problems is a challenging task. Generally, the search for an adequate architecture is a specific process for the problem addressed and it is conducted by specialized scientists through manual and extensive tests, and also prior knowledge on similar problems is required. Automated Machine Learning, or AutoML, can collaborate in this process, since its objectives is the search for the most adequate architectures for the input problem in an automated way and no human interaction. However, Machine Learning methods, or even AutoML methods, generally do not consider individual characteristics from instances that belong to the analyzed datasets. By considering information that reflects each sample’s characteristics, it is possible to evaluate the methods’ habilities. Item Response Theory (IRT) is a psychometrics approach that can be adapted to Machine Learning and can o↵er complexity descriptions on an instance level, and also characterize inherit abilities to Machine Learning models. In this work, we propose an innovative methodology based on AutoML and IRT that is capable of o↵ering high explicability of Convolutional Neural Network models. The method selects the most adequate instances to be submitted to di↵erent models, based on the diculty and discrimination information and also models abilities. Several experiments were conducted in order to evaluate the methodology viability on real datasets, comparing the results with other benchmark methods. Results show that the proposed method presents, in most cases, better performance over the other methods. The proposed methodology was capable of generating average accuracies of 96.96% for a specific dataset, while a Convolutional Neural Network manually created presented 78.43%, a voting approach with hundreds of models presented 69.06% and an already existing AutoML system presented 91.81% of accuracy.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorUniversidade Federal de Minas Gerais2021-04-07T20:45:11Z2025-09-09T00:17:13Z2021-04-07T20:45:11Z2020-12-07info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://hdl.handle.net/1843/35575porhttp://creativecommons.org/licenses/by/3.0/pt/info:eu-repo/semantics/openAccessHabib Asseiss Netoreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2025-09-09T00:17:13Zoai:repositorio.ufmg.br:1843/35575Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T00:17:13Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
| dc.title.none.fl_str_mv |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| title |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| spellingShingle |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias Habib Asseiss Neto Computação – Teses Aprendizado do computador – Teses Redes neurais convolucionais – Teses Teoria de Resposta ao Item – Teses Leite - Análise -Teses Aprendizado de Máquina AutoML Teoria de Resposta ao Item Redes Neurais Convolucionais |
| title_short |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| title_full |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| title_fullStr |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| title_full_unstemmed |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| title_sort |
Metodologia de aprendizado AutoML baseado em informações de complexidade de instâncias |
| author |
Habib Asseiss Neto |
| author_facet |
Habib Asseiss Neto |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Habib Asseiss Neto |
| dc.subject.por.fl_str_mv |
Computação – Teses Aprendizado do computador – Teses Redes neurais convolucionais – Teses Teoria de Resposta ao Item – Teses Leite - Análise -Teses Aprendizado de Máquina AutoML Teoria de Resposta ao Item Redes Neurais Convolucionais |
| topic |
Computação – Teses Aprendizado do computador – Teses Redes neurais convolucionais – Teses Teoria de Resposta ao Item – Teses Leite - Análise -Teses Aprendizado de Máquina AutoML Teoria de Resposta ao Item Redes Neurais Convolucionais |
| description |
Analysis of biological data is a very important task since it allows one to obtain useful information and to expand knowledge about some biological domain. Biological materials can be analyzed by several techniques and a widely used method is the infrared spectroscopy analysis, which allows the information extraction through the emission of infrared light in the samples. Spectroscopy techniques produce a large volume of data, which make manual analysis by experts complex. Computer Science, through Machine Learning, can help in this task, o↵ering ways to understand and produce important knowledge from spectral samples. Convolutional Neural Networks, generally successfully applied to image recognition, are specifically suitable for spectral data of samples, since spectral structure can be seen as an image. This work conducts experiments in order to detect the occurrence of adulteration in bovine milk through infrared spectroscopy analyzes using a proposed convolutiona neural network architecture and ensemble decision trees models. In the experiments, spectral data from thousands of pure and adulterated samples of bovine milk were subjected to the proposed convolutional neural network, to the ensemblee models and to other commonly used methods for this purpose, allowing a comparison of di↵erent approaches. The proposed approach was able to detect adulterants with an accuracy of up to 98.76% for convolutional neural network and tree embedding, while the commonly used baseline methods produced average accuracy of 65.88%. Despite the good performance of the convolutional neural network for the milk problem, designing neural network architectures that o↵er good performance for generic problems is a challenging task. Generally, the search for an adequate architecture is a specific process for the problem addressed and it is conducted by specialized scientists through manual and extensive tests, and also prior knowledge on similar problems is required. Automated Machine Learning, or AutoML, can collaborate in this process, since its objectives is the search for the most adequate architectures for the input problem in an automated way and no human interaction. However, Machine Learning methods, or even AutoML methods, generally do not consider individual characteristics from instances that belong to the analyzed datasets. By considering information that reflects each sample’s characteristics, it is possible to evaluate the methods’ habilities. Item Response Theory (IRT) is a psychometrics approach that can be adapted to Machine Learning and can o↵er complexity descriptions on an instance level, and also characterize inherit abilities to Machine Learning models. In this work, we propose an innovative methodology based on AutoML and IRT that is capable of o↵ering high explicability of Convolutional Neural Network models. The method selects the most adequate instances to be submitted to di↵erent models, based on the diculty and discrimination information and also models abilities. Several experiments were conducted in order to evaluate the methodology viability on real datasets, comparing the results with other benchmark methods. Results show that the proposed method presents, in most cases, better performance over the other methods. The proposed methodology was capable of generating average accuracies of 96.96% for a specific dataset, while a Convolutional Neural Network manually created presented 78.43%, a voting approach with hundreds of models presented 69.06% and an already existing AutoML system presented 91.81% of accuracy. |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020-12-07 2021-04-07T20:45:11Z 2021-04-07T20:45:11Z 2025-09-09T00:17:13Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
| format |
doctoralThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1843/35575 |
| url |
https://hdl.handle.net/1843/35575 |
| dc.language.iso.fl_str_mv |
por |
| language |
por |
| dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by/3.0/pt/ info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by/3.0/pt/ |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
| dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
| instname_str |
Universidade Federal de Minas Gerais (UFMG) |
| instacron_str |
UFMG |
| institution |
UFMG |
| reponame_str |
Repositório Institucional da UFMG |
| collection |
Repositório Institucional da UFMG |
| repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
| repository.mail.fl_str_mv |
repositorio@ufmg.br |
| _version_ |
1856414110378360832 |