From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Miranda, Melissa Cristina de Carvalho
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/11/11137/tde-02072024-112314/
Resumo: Soybean breeding faces the challenge of evaluating large and complex populations in different environments to obtain accurate genetic values that can be used as selection criteria. This study aims to overcome this challenge by enhancing the understanding of the potential of highthroughput phenotyping (HTP) and the application of machine learning (ML) models in predicting classic phenotypic traits in soybean breeding programs, through the analysis of seed images and aerial canopy images of the plants. The methodology consisted of the phenotypic characterization of 275 soybean genotypes in different environments and management practices, including management with and without fungicide application for the control of Asian rust. In general, predictions based on regression algorithms (support vector machine (SVM), random forest (RF), multilayer perceptron neural network (MLP), and AdaBoosting) were initially evaluated, followed by the use of transfer learning techniques with convolutional neural networks (CNNs) to extract features from images (VGG16, VGG19, ResNet50, InceptionV3, and Inception-ResNetV2) integrated with the same models for prediction. In the first chapter, RGB (red-green-blue) images of seeds from each plot were collected, considering sparsely and densely distributed seeds. A custom image processing pipeline was developed for seed segmentation, which allowed for a detailed morphological evaluation. ML algorithms and different CNNs architectures were compared in predicting the weight of a hundred seeds. The image segmentation technique correctly identified over 98% of the seeds, and the morphological measurements achieved a predictive ability of 0.71, with a mean squared error (MSE) of 3.15. The same results were observed for the CNN features, highlighting the efficiency of the morphological measurements as extractors of image features. The ResNet-50 model stood out as the most accurate CNN for feature extraction. In the second chapter, we investigated the heritability and correlation between vegetation indices obtained from aerial images and traditional phenotypic traits. High heritability of the RGBVI and GLI vegetation indices (mean H2 of 0.56) was found compared to other RGB-based indices, making them promising for genetic evaluations. The use of advanced ML techniques, especially transfer learning with ResNet 50, improved the prediction of traits such as days to R7 stage (DR7) and plant height measurement (PHM) from canopy images. The combination of ResNet 50 with RF for DR7 prediction and with MLP for PHM prediction showed promising results, highlighting the potential of these approaches to optimize decision-making in soybean breeding. In summary, the research concludes that the integration of image data with machine learning models offers a robust decision support system, enabling the prediction of classic phenotypic characteristics of soybeans through images, aiming to optimize the identification of high-performance genotypes.
id USP_265752fbe172610e1c9f8df5959204a7
oai_identifier_str oai:teses.usp.br:tde-02072024-112314
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling From seed to canopy: high-throughput phenotyping and machine learning in soybean breedingDa semente ao dossel: fenotipagem de alto rendimento e aprendizado de máquina no melhoramento da sojaAprendizagem por transferênciaConvolutional neural networkFenômicaImagens RGBÍndices de vegetaçãoMorfologia de sementesPhenomicsRede neural convolucionalRGB imagesSeed morphologyTransfer learningVegetation indicesSoybean breeding faces the challenge of evaluating large and complex populations in different environments to obtain accurate genetic values that can be used as selection criteria. This study aims to overcome this challenge by enhancing the understanding of the potential of highthroughput phenotyping (HTP) and the application of machine learning (ML) models in predicting classic phenotypic traits in soybean breeding programs, through the analysis of seed images and aerial canopy images of the plants. The methodology consisted of the phenotypic characterization of 275 soybean genotypes in different environments and management practices, including management with and without fungicide application for the control of Asian rust. In general, predictions based on regression algorithms (support vector machine (SVM), random forest (RF), multilayer perceptron neural network (MLP), and AdaBoosting) were initially evaluated, followed by the use of transfer learning techniques with convolutional neural networks (CNNs) to extract features from images (VGG16, VGG19, ResNet50, InceptionV3, and Inception-ResNetV2) integrated with the same models for prediction. In the first chapter, RGB (red-green-blue) images of seeds from each plot were collected, considering sparsely and densely distributed seeds. A custom image processing pipeline was developed for seed segmentation, which allowed for a detailed morphological evaluation. ML algorithms and different CNNs architectures were compared in predicting the weight of a hundred seeds. The image segmentation technique correctly identified over 98% of the seeds, and the morphological measurements achieved a predictive ability of 0.71, with a mean squared error (MSE) of 3.15. The same results were observed for the CNN features, highlighting the efficiency of the morphological measurements as extractors of image features. The ResNet-50 model stood out as the most accurate CNN for feature extraction. In the second chapter, we investigated the heritability and correlation between vegetation indices obtained from aerial images and traditional phenotypic traits. High heritability of the RGBVI and GLI vegetation indices (mean H2 of 0.56) was found compared to other RGB-based indices, making them promising for genetic evaluations. The use of advanced ML techniques, especially transfer learning with ResNet 50, improved the prediction of traits such as days to R7 stage (DR7) and plant height measurement (PHM) from canopy images. The combination of ResNet 50 with RF for DR7 prediction and with MLP for PHM prediction showed promising results, highlighting the potential of these approaches to optimize decision-making in soybean breeding. In summary, the research concludes that the integration of image data with machine learning models offers a robust decision support system, enabling the prediction of classic phenotypic characteristics of soybeans through images, aiming to optimize the identification of high-performance genotypes.O melhoramento de soja enfrenta o desafio de avaliar populações grandes e complexas em diferentes ambientes para obter valores genéticos acurados que possam ser utilizados como critérios de seleção. Este estudo objetiva superar esse desafio, aprimorando o entendimento do potencial da fenotipagem de alto rendimento (HTP) e da aplicação de modelos de aprendizado de máquina (ML) na predição de características fenotípicas clássicas em programas de melhoramento de soja, por meio de imagens de sementes e aéreas do dossel das plantas. A metodologia consistiu na caracterização fenotípica de 275 genótipos de soja em diferentes ambientes e manejos, incluindo manejos com e sem aplicação de fungicidas para o controle da ferrugem asiática. De forma geral, primeiramente foram avaliadas predições de aprendizado de máquina baseadas em algoritmos de regressão (máquina de vetores de suporte (SVM), floresta aleatória (RF), rede neural perceptron multicamadas (MLP) e AdaBoosting), e em seguida foram testadas técnicas de aprendizado por transferência com redes neurais convolucionais (CNNs) para extrair características das imagens (VGG16, VGG19, ResNet50, InceptionV3 e Inception-ResNetV2) integrados com os mesmos modelos de predição. No primeiro capítulo, foram coletadas imagens RGB (vermelho-verde-azul) das sementes de cada parcela considerando sementes esparsamente e densamente distribuídas. Foi desenvolvido um pipeline de processamento de imagens para a segmentação das sementes, o que permitiu uma avaliação morfológica detalhada. Comparou-se algoritmos de ML e diferentes arquiteturas de CNNs na predição do peso de cem sementes. A técnica de segmentação de imagem conseguiu identificar corretamente mais de 98% das sementes, e as medições morfológicas alcançaram uma capacidade preditiva de 0,71, com um erro quadrático médio (MSE) de 3,15. Os mesmos resultados foram observados para as características da CNNs, destacando a eficiência das medidas morfológicas como extratores de recursos de imagem. O modelo ResNet-50 se destacou como a CNN mais acurada para a extração de características. No segundo capítulo, por sua vez, investigamos a herdabilidade e correlação entre índices de vegetação obtidos de imagens aéreas e as características fenotípicas tradicionais. Verificou-se alta herdabilidade dos índices de vegetação RGBVI e GLI (H2 médio de 0,56) em comparação com outros índices baseados em RGB, o que os torna promissores para avaliações genéticas. O uso de técnicas avançadas de ML, em especial o aprendizado por transferência com a arquitetura ResNet 50, melhorou a predição de características como os dias até o estágio R7 (DR7) e a medição da altura da planta (PHM) a partir de imagens do dossel. A combinação do ResNet 50 com RF para a predição de DR7 e com MLP para a predição de PHM apresentou resultados promissores, evidenciando o potencial dessas abordagens para otimizar a tomada de decisões no melhoramento de soja. Em suma, a pesquisa conclui que a integração de dados imagéticos com modelos de aprendizado de máquina oferece um sistema robusto de suporte à decisão, permitindo a predição de características fenotípicas clássicas da soja por meio de imagens, visando otimizar a identificação de genótipos de alto desempenho.Biblioteca Digitais de Teses e Dissertações da USPPinheiro, Jose BaldinMiranda, Melissa Cristina de Carvalho2024-05-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/11/11137/tde-02072024-112314/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-07-03T12:56:03Zoai:teses.usp.br:tde-02072024-112314Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-07-03T12:56:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
Da semente ao dossel: fenotipagem de alto rendimento e aprendizado de máquina no melhoramento da soja
title From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
spellingShingle From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
Miranda, Melissa Cristina de Carvalho
Aprendizagem por transferência
Convolutional neural network
Fenômica
Imagens RGB
Índices de vegetação
Morfologia de sementes
Phenomics
Rede neural convolucional
RGB images
Seed morphology
Transfer learning
Vegetation indices
title_short From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
title_full From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
title_fullStr From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
title_full_unstemmed From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
title_sort From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding
author Miranda, Melissa Cristina de Carvalho
author_facet Miranda, Melissa Cristina de Carvalho
author_role author
dc.contributor.none.fl_str_mv Pinheiro, Jose Baldin
dc.contributor.author.fl_str_mv Miranda, Melissa Cristina de Carvalho
dc.subject.por.fl_str_mv Aprendizagem por transferência
Convolutional neural network
Fenômica
Imagens RGB
Índices de vegetação
Morfologia de sementes
Phenomics
Rede neural convolucional
RGB images
Seed morphology
Transfer learning
Vegetation indices
topic Aprendizagem por transferência
Convolutional neural network
Fenômica
Imagens RGB
Índices de vegetação
Morfologia de sementes
Phenomics
Rede neural convolucional
RGB images
Seed morphology
Transfer learning
Vegetation indices
description Soybean breeding faces the challenge of evaluating large and complex populations in different environments to obtain accurate genetic values that can be used as selection criteria. This study aims to overcome this challenge by enhancing the understanding of the potential of highthroughput phenotyping (HTP) and the application of machine learning (ML) models in predicting classic phenotypic traits in soybean breeding programs, through the analysis of seed images and aerial canopy images of the plants. The methodology consisted of the phenotypic characterization of 275 soybean genotypes in different environments and management practices, including management with and without fungicide application for the control of Asian rust. In general, predictions based on regression algorithms (support vector machine (SVM), random forest (RF), multilayer perceptron neural network (MLP), and AdaBoosting) were initially evaluated, followed by the use of transfer learning techniques with convolutional neural networks (CNNs) to extract features from images (VGG16, VGG19, ResNet50, InceptionV3, and Inception-ResNetV2) integrated with the same models for prediction. In the first chapter, RGB (red-green-blue) images of seeds from each plot were collected, considering sparsely and densely distributed seeds. A custom image processing pipeline was developed for seed segmentation, which allowed for a detailed morphological evaluation. ML algorithms and different CNNs architectures were compared in predicting the weight of a hundred seeds. The image segmentation technique correctly identified over 98% of the seeds, and the morphological measurements achieved a predictive ability of 0.71, with a mean squared error (MSE) of 3.15. The same results were observed for the CNN features, highlighting the efficiency of the morphological measurements as extractors of image features. The ResNet-50 model stood out as the most accurate CNN for feature extraction. In the second chapter, we investigated the heritability and correlation between vegetation indices obtained from aerial images and traditional phenotypic traits. High heritability of the RGBVI and GLI vegetation indices (mean H2 of 0.56) was found compared to other RGB-based indices, making them promising for genetic evaluations. The use of advanced ML techniques, especially transfer learning with ResNet 50, improved the prediction of traits such as days to R7 stage (DR7) and plant height measurement (PHM) from canopy images. The combination of ResNet 50 with RF for DR7 prediction and with MLP for PHM prediction showed promising results, highlighting the potential of these approaches to optimize decision-making in soybean breeding. In summary, the research concludes that the integration of image data with machine learning models offers a robust decision support system, enabling the prediction of classic phenotypic characteristics of soybeans through images, aiming to optimize the identification of high-performance genotypes.
publishDate 2024
dc.date.none.fl_str_mv 2024-05-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/11/11137/tde-02072024-112314/
url https://www.teses.usp.br/teses/disponiveis/11/11137/tde-02072024-112314/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815257812592230400