Maize common rust resistance classification with machine learning analyzes

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Fonseca, Pollyanna Capobiango da
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Viçosa
Genética e Melhoramento
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://locus.ufv.br//handle/123456789/31122
https://doi.org/10.47328/ufvbbt.2023.202
Resumo: Maize (Zea mays ssp. Mays) is a widely cultivated crop, having one of the highest productivities among cereals, and it is of great importance in human consumption, both in natura and processed. In addition, it has applications in industry as a source of energy through corn ethanol and animal feed. Many diseases can affect maize yield such as the Maize Common Rust (MCR) (Puccinia sorghi Schwein), a leaf disease which causes the appearance of pustules. The aim of this study was to classify maize lines between resistant and susceptible, selecting 50% of them to be carried on the breeding pipeline. A dataset containing three time-point evaluations in two years using a visual score scale and two Unmanned Aerial Vehicle (UAV) - couple sensors (multispectral and thermal) data were analyzed with six machine learning algorithms in order to identify the training time set to deliver the best classification performance. The three time-point evaluations phenotypic data along with the genetic markers data were used to explore the performance of the Support Vector Machine (SVM) and the Artificial Neural Network (ANN) algorithms in a k-fold cross-validation analysis with nine datasets. Their learning curves and feature importance rank were analyzed using the SVM algorithm. Our results showed that the last evaluation training set delivered the highest accuracies, of approximately 80 per cent, with Logistic Regression and SVM outperforming the other algorithms. The results obtained with the analysis by year suggest that a homogenous distribution of scores is of great importance for an effective MCR resistance classification. Our results also demonstrated the advantageous use of the SVM algorithm, in which models had the capacity to generalize using a smaller number of features. Similar performance metrics were achieved with SVM when the third evaluation and the three time-point evaluations combined together were employed. The SVM learning curves indicate that the addition of more training samples would be beneficial for all datasets analyzed. The five most important features for each dataset were listed, resulting in a predominance of the Red wavelength in the first position of the rank. In addition, the protein- coding genes aligned with the markers’ allele sequence ranked as important should be further explored in genomic-functional studies. Keywords: Maize common rust. Machine learning. SVM. ANN. Data mining.
id UFV_e12adced0d04f51c90b8a7baeaac5dc2
oai_identifier_str oai:locus.ufv.br:123456789/31122
network_acronym_str UFV
network_name_str LOCUS Repositório Institucional da UFV
repository_id_str
spelling Maize common rust resistance classification with machine learning analyzesClassificação da resistência à ferrugem comum do milho via análise por machine learningMilho - Doenças e pragasFerrugem comumAprendizado do computadorMineração de dados - (Computação)Melhoramento VegetalMaize (Zea mays ssp. Mays) is a widely cultivated crop, having one of the highest productivities among cereals, and it is of great importance in human consumption, both in natura and processed. In addition, it has applications in industry as a source of energy through corn ethanol and animal feed. Many diseases can affect maize yield such as the Maize Common Rust (MCR) (Puccinia sorghi Schwein), a leaf disease which causes the appearance of pustules. The aim of this study was to classify maize lines between resistant and susceptible, selecting 50% of them to be carried on the breeding pipeline. A dataset containing three time-point evaluations in two years using a visual score scale and two Unmanned Aerial Vehicle (UAV) - couple sensors (multispectral and thermal) data were analyzed with six machine learning algorithms in order to identify the training time set to deliver the best classification performance. The three time-point evaluations phenotypic data along with the genetic markers data were used to explore the performance of the Support Vector Machine (SVM) and the Artificial Neural Network (ANN) algorithms in a k-fold cross-validation analysis with nine datasets. Their learning curves and feature importance rank were analyzed using the SVM algorithm. Our results showed that the last evaluation training set delivered the highest accuracies, of approximately 80 per cent, with Logistic Regression and SVM outperforming the other algorithms. The results obtained with the analysis by year suggest that a homogenous distribution of scores is of great importance for an effective MCR resistance classification. Our results also demonstrated the advantageous use of the SVM algorithm, in which models had the capacity to generalize using a smaller number of features. Similar performance metrics were achieved with SVM when the third evaluation and the three time-point evaluations combined together were employed. The SVM learning curves indicate that the addition of more training samples would be beneficial for all datasets analyzed. The five most important features for each dataset were listed, resulting in a predominance of the Red wavelength in the first position of the rank. In addition, the protein- coding genes aligned with the markers’ allele sequence ranked as important should be further explored in genomic-functional studies. Keywords: Maize common rust. Machine learning. SVM. ANN. Data mining.O milho (Zea mays ssp. Mays) é uma espécie amplamente cultivada, com uma das maiores produtividades entre os cereais, e de grande importância na alimentação humana in natura ou processado. Além disso, é usado na indústria como fonte de energia, através do etanol de milho, e ração animal. Muitas doenças podem afetar a produtividade do milho, como a ferrugem comum do milho (MCR) (Puccinia sorghi Schwein), uma doença foliar que causa o aparecimento de pústulas. O objetivo deste estudo foi classificar linhagens de milho entre resistentes e suscetíveis, selecionando 50% delas para serem usadas em etapas subsequentes do programa de melhoramento. Três avaliações em tempos diferentes do ciclo de desenvolvimento do milho foram realizadas em dois anos usando escala de avaliação visual e dois sensores (multiespectral e termal) acoplados ao Unmanned aerial Vehicle (UAV) e seus respectivos dados foram analisados usando seis algoritmos de aprendizado de máquinas para identificar a avaliação que fornece a melhor performance de classificação da resistência. Os dados fenotípicos das três avaliações e os dados de marcadores genéticos foram usados para explorar o desempenho dos algoritmos Support Vector Machine (SVM) e Artificial Neural Network (ANN) em uma análise de validação cruzada k-fold com nove conjuntos de dados. As curvas de aprendizado e ranking de importância das variáveis preditoras foram analisados usando o algoritmo SVM. Os resultados mostraram que a terceira avaliação forneceu os maiores valores de acurácia, de aproximadamente 80%, com Logistic Regression e SVM superando os demais algoritmos. Os resultados obtidos com a análise por ano sugerem que uma distribuição homogênea de notas visuais é de grande importância para uma classificação eficaz da resistência à MCR. Os resultados também demonstraram que o uso do algoritmo SVM será vantajoso dependendo da capacidade do modelo de generalizar usando um número menor de variáveis preditoras. Métricas de desempenho semelhantes foram alcançadas usando SVM quando a terceira avaliação e as três avaliações combinadas foram empregadas. As curvas de aprendizado usando o algoritmo SVM indicam que a adição de mais amostras de treinamento seria benéfica para todos os conjuntos de dados analisados. As cinco variáveis preditoras mais importantes para cada conjunto de dados foram listadas, resultando em uma predominância do comprimento de onda vermelho na primeira posição do ranking. Além disso, os genes codificadores de proteínas alinhados com a sequência alélica dos marcadores de importância ranqueados devem ser explorados posteriormente em estudos genômico- funcionais. Palavras-chave: Ferrugem comum do milho. Aprendizado de máquinas. SVM. ANN. Mineração de dados.Universidade Federal de ViçosaGenética e MelhoramentoOliveira, Aluízio Borém dehttp://lattes.cnpq.br/1364570911812242Junior, Francelino Augusto RodriguesFonseca, Pollyanna Capobiango da2023-06-27T17:41:31Z2023-06-27T17:41:31Z2023-02-23info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfFONSECA, Pollyanna Capobiango da. Maize common rust resistance classification with machine learning analyzes. 2023. 54 f. Tese (Doutorado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa. 2023.https://locus.ufv.br//handle/123456789/31122https://doi.org/10.47328/ufvbbt.2023.202enginfo:eu-repo/semantics/openAccessreponame:LOCUS Repositório Institucional da UFVinstname:Universidade Federal de Viçosa (UFV)instacron:UFV2024-07-12T08:24:39Zoai:locus.ufv.br:123456789/31122Repositório InstitucionalPUBhttps://www.locus.ufv.br/oai/requestfabiojreis@ufv.bropendoar:21452024-07-12T08:24:39LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)false
dc.title.none.fl_str_mv Maize common rust resistance classification with machine learning analyzes
Classificação da resistência à ferrugem comum do milho via análise por machine learning
title Maize common rust resistance classification with machine learning analyzes
spellingShingle Maize common rust resistance classification with machine learning analyzes
Fonseca, Pollyanna Capobiango da
Milho - Doenças e pragas
Ferrugem comum
Aprendizado do computador
Mineração de dados - (Computação)
Melhoramento Vegetal
title_short Maize common rust resistance classification with machine learning analyzes
title_full Maize common rust resistance classification with machine learning analyzes
title_fullStr Maize common rust resistance classification with machine learning analyzes
title_full_unstemmed Maize common rust resistance classification with machine learning analyzes
title_sort Maize common rust resistance classification with machine learning analyzes
author Fonseca, Pollyanna Capobiango da
author_facet Fonseca, Pollyanna Capobiango da
author_role author
dc.contributor.none.fl_str_mv Oliveira, Aluízio Borém de
http://lattes.cnpq.br/1364570911812242
Junior, Francelino Augusto Rodrigues
dc.contributor.author.fl_str_mv Fonseca, Pollyanna Capobiango da
dc.subject.por.fl_str_mv Milho - Doenças e pragas
Ferrugem comum
Aprendizado do computador
Mineração de dados - (Computação)
Melhoramento Vegetal
topic Milho - Doenças e pragas
Ferrugem comum
Aprendizado do computador
Mineração de dados - (Computação)
Melhoramento Vegetal
description Maize (Zea mays ssp. Mays) is a widely cultivated crop, having one of the highest productivities among cereals, and it is of great importance in human consumption, both in natura and processed. In addition, it has applications in industry as a source of energy through corn ethanol and animal feed. Many diseases can affect maize yield such as the Maize Common Rust (MCR) (Puccinia sorghi Schwein), a leaf disease which causes the appearance of pustules. The aim of this study was to classify maize lines between resistant and susceptible, selecting 50% of them to be carried on the breeding pipeline. A dataset containing three time-point evaluations in two years using a visual score scale and two Unmanned Aerial Vehicle (UAV) - couple sensors (multispectral and thermal) data were analyzed with six machine learning algorithms in order to identify the training time set to deliver the best classification performance. The three time-point evaluations phenotypic data along with the genetic markers data were used to explore the performance of the Support Vector Machine (SVM) and the Artificial Neural Network (ANN) algorithms in a k-fold cross-validation analysis with nine datasets. Their learning curves and feature importance rank were analyzed using the SVM algorithm. Our results showed that the last evaluation training set delivered the highest accuracies, of approximately 80 per cent, with Logistic Regression and SVM outperforming the other algorithms. The results obtained with the analysis by year suggest that a homogenous distribution of scores is of great importance for an effective MCR resistance classification. Our results also demonstrated the advantageous use of the SVM algorithm, in which models had the capacity to generalize using a smaller number of features. Similar performance metrics were achieved with SVM when the third evaluation and the three time-point evaluations combined together were employed. The SVM learning curves indicate that the addition of more training samples would be beneficial for all datasets analyzed. The five most important features for each dataset were listed, resulting in a predominance of the Red wavelength in the first position of the rank. In addition, the protein- coding genes aligned with the markers’ allele sequence ranked as important should be further explored in genomic-functional studies. Keywords: Maize common rust. Machine learning. SVM. ANN. Data mining.
publishDate 2023
dc.date.none.fl_str_mv 2023-06-27T17:41:31Z
2023-06-27T17:41:31Z
2023-02-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv FONSECA, Pollyanna Capobiango da. Maize common rust resistance classification with machine learning analyzes. 2023. 54 f. Tese (Doutorado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa. 2023.
https://locus.ufv.br//handle/123456789/31122
https://doi.org/10.47328/ufvbbt.2023.202
identifier_str_mv FONSECA, Pollyanna Capobiango da. Maize common rust resistance classification with machine learning analyzes. 2023. 54 f. Tese (Doutorado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa. 2023.
url https://locus.ufv.br//handle/123456789/31122
https://doi.org/10.47328/ufvbbt.2023.202
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Viçosa
Genética e Melhoramento
publisher.none.fl_str_mv Universidade Federal de Viçosa
Genética e Melhoramento
dc.source.none.fl_str_mv reponame:LOCUS Repositório Institucional da UFV
instname:Universidade Federal de Viçosa (UFV)
instacron:UFV
instname_str Universidade Federal de Viçosa (UFV)
instacron_str UFV
institution UFV
reponame_str LOCUS Repositório Institucional da UFV
collection LOCUS Repositório Institucional da UFV
repository.name.fl_str_mv LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)
repository.mail.fl_str_mv fabiojreis@ufv.br
_version_ 1855045736682487808