Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Pereira, Gislaine Silva
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
ACP
PCA
Link de acesso: https://www.teses.usp.br/teses/disponiveis/11/11152/tde-04012024-105156/
Resumo: Methods and tools for precision agriculture are the key to ensuring increased soybean production. In this respect, knowledge of intra-field variability is the key to helping the agricultural producer in the decision-making process. Although methods for modelling production are based on regional conditions and/or agroecosystem models that do not represent local scales. The aim of this thesis is to use machine learning techniques to improve data quality for predicting yield at the management-zone level. The research was divided into three chapters that use techniques and methods focused on precision agriculture to validate the need to guarantee greater support to the farmer at the local level. The first chapter sought to use machine learning to improve the quality of data and the information from high-resolution sensors in generating management zones (MZs). In addition to validating the differences between and within MZs related to soil factors. The hypothesis of this first chapter was centred on the need to use principal component analysis (PCA) to improve the quality of MZ prediction based on observed data. The second chapter aimed to estimate soybean yield in each MZ over several years based on maps of soil water and crop development. One hypothesis for the chapter was the need to confirm the existence of the variability of intra-regional yield. The second hypothesis focused on testing the quality of near infrared reflectance (NIR) surfaces to represent crop development compared to using vegetation index (NDVI). The third hypothesis was that the machine learning technique Random Forest (RF) affords better quality yield prediction due to its efficiency in working with unbalanced data compared to the conventional method of multiple linear regression analysis (MLR). The aim of the third chapter was to understand the sensitivity of crop models (Aquacrop and CROPGRO) in estimating soybean yield at the management-zone level, especially as a function of available soil water. The hypothesis of this chapter was in the ability of crop models to show less variability when estimating yield based on the variations in soil water. The results of Chapter 1 showed that the PCA techniques afforded higher-quality clustering compared to the conventional method of normalization, besides ensuring greater stability in defining the number of MZs. Soil variables were fundamental for validating the specific characteristics of each region using the classification tree technique. The results of Chapter 2 showed the differences between digital soil water surfaces as a function of the MZs, demonstrating the importance of different management practices in each region, even at the local level. NIR reflectance improved quality predictions of soybean yield in each region compared to the use of NDVI. The RF method afforded higher-quality estimates compared to the MLR method. The results of Chapter 3 showed that the Aquacrop and CROPGRO models showed variable performance when estimating soybean yield in each zone in occurrence of wet and dry years. More studies should be carried out using crop models to predict soybean yield at local level. In this way, was possible to highlight the importance of evaluation on a local scale, with the use of machine learning methods and digital mapping to support precision agriculture. The use of MZs is the adequate to understanding the variability of soil and plant factors that will later influence planning for the localized use of inputs, impacting yield at same field. For future studies, the use of local sensors to continuously monitor variability of climate, soil and plant variability to improve precision of machine learning methods in agriculture.
id USP_8648ec82fef72d63d7b82b6c7832c459
oai_identifier_str oai:teses.usp.br:tde-04012024-105156
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parametersZonas de manejo e predição espaço-temporal da variabilidade da produção de soja: técnicas de machine learning aplicadas a parâmetros de qualidade física do soloACPAgrupamentosAquacropAquacropÁrvores de classificaçãoClassification treesClustersCROPGROCROPGROPCARandom forestRandom forestSojaSoybeanMethods and tools for precision agriculture are the key to ensuring increased soybean production. In this respect, knowledge of intra-field variability is the key to helping the agricultural producer in the decision-making process. Although methods for modelling production are based on regional conditions and/or agroecosystem models that do not represent local scales. The aim of this thesis is to use machine learning techniques to improve data quality for predicting yield at the management-zone level. The research was divided into three chapters that use techniques and methods focused on precision agriculture to validate the need to guarantee greater support to the farmer at the local level. The first chapter sought to use machine learning to improve the quality of data and the information from high-resolution sensors in generating management zones (MZs). In addition to validating the differences between and within MZs related to soil factors. The hypothesis of this first chapter was centred on the need to use principal component analysis (PCA) to improve the quality of MZ prediction based on observed data. The second chapter aimed to estimate soybean yield in each MZ over several years based on maps of soil water and crop development. One hypothesis for the chapter was the need to confirm the existence of the variability of intra-regional yield. The second hypothesis focused on testing the quality of near infrared reflectance (NIR) surfaces to represent crop development compared to using vegetation index (NDVI). The third hypothesis was that the machine learning technique Random Forest (RF) affords better quality yield prediction due to its efficiency in working with unbalanced data compared to the conventional method of multiple linear regression analysis (MLR). The aim of the third chapter was to understand the sensitivity of crop models (Aquacrop and CROPGRO) in estimating soybean yield at the management-zone level, especially as a function of available soil water. The hypothesis of this chapter was in the ability of crop models to show less variability when estimating yield based on the variations in soil water. The results of Chapter 1 showed that the PCA techniques afforded higher-quality clustering compared to the conventional method of normalization, besides ensuring greater stability in defining the number of MZs. Soil variables were fundamental for validating the specific characteristics of each region using the classification tree technique. The results of Chapter 2 showed the differences between digital soil water surfaces as a function of the MZs, demonstrating the importance of different management practices in each region, even at the local level. NIR reflectance improved quality predictions of soybean yield in each region compared to the use of NDVI. The RF method afforded higher-quality estimates compared to the MLR method. The results of Chapter 3 showed that the Aquacrop and CROPGRO models showed variable performance when estimating soybean yield in each zone in occurrence of wet and dry years. More studies should be carried out using crop models to predict soybean yield at local level. In this way, was possible to highlight the importance of evaluation on a local scale, with the use of machine learning methods and digital mapping to support precision agriculture. The use of MZs is the adequate to understanding the variability of soil and plant factors that will later influence planning for the localized use of inputs, impacting yield at same field. For future studies, the use of local sensors to continuously monitor variability of climate, soil and plant variability to improve precision of machine learning methods in agriculture.Os métodos e ferramentas da agricultura de precisão são chave para garantir o aumento da produção de soja. Para isso, conhecer as variabilidades intra-campo é a chave para auxiliar na tomada de decisão do produtor agrícola. Embora os métodos de modelagem para estimativa da produção sejam baseados em condições regionais e/ou modelos agroecossistêmicos que não representam escalas locais, esta tese tem como objetivo utilizar técnicas de aprendizado de máquina em busca de melhorar a qualidade de dados para previsão de produtividade a nível de zonas de manejo. Sendo assim, esta pesquisa foi dividida em três capítulos que utilizam técnicas e métodos com foco em agricultura de precisão para validar a necessidade de garantir um maior suporte ao produtor agrícola a nível local. O primeiro capítulo teve como objetivo utilizar machine learning para melhorar a qualidade de dados oriundos do mapeamento de produtividade e informações de sensores de alta resolução na geração de zonas de manejo (MZs) além de validar as diferenças entre e intra MZs sob os aspectos relacionados as variáveis de solo. A hipótese deste primeiro capítulo esteve centrada na necessidade de utilizar a técnica de análise multivariada de componentes principais (PCA) para melhorar a qualidade de predição das MZs a partir dos dados originais. O segundo capítulo teve como objetivo estimar a produtividade de soja em cada MZs para múltiplos anos, em função de mapas de água no solo e do desenvolvimento das culturas. Como hipóteses para o capítulo se avaliou a necessidade de comprovar a existência de variabilidade da produtividade intra-regiões. Uma segunda hipótese focou em testar a qualidade de superfícies de refletância no infravermelho próximo (NIR) para representar o desenvolvimento da cultura em comparação ao uso de índice de vegetação por diferença normalizada (NDVI). A terceira hipótese foi de que a técnica de machine learning Random Forest (RF) apresenta uma maior qualidade de predição da produtividade devido sua eficiência em trabalhar com dados desbalanceadas em comparação ao método convencional de análise de regressão múltipla (MLR). O objetivo do terceiro capítulo foi entender a sensibilidade de modelos de cultura (Aquacrop e CROPGRO) na estimativa da produtividade de soja a níveis de zona de manejo em função de fatores de solo. A hipótese deste capítulo verificou a capacidade de modelos de cultura em apresentar variabilidade reduzida para estimar a produtividade em função da variação desses fatores, principalmente água no solo. Os resultados do capítulo 1 evidenciaram que a técnica de PCA resulta em maior qualidade de agrupamento em relação ao método convencional de normalização, além de garantir uma maior estabilidade na definição do número de MZs. As variáveis de solo foram fundamentais para validação das especificidades em cada região, o que foi demonstrado com a técnica de árvore de classificação. Os resultados do capítulo 2 mostraram as diferenças entre as superfícies de água no solo em função das MZs, evidenciando a importância do manejo diferenciado nas regiões, mesmo em nível local. A refletância NIR melhorou a qualidade da previsão da produtividade de soja nas regiões em comparação ao uso do NDVI. O método de RF apresentou desempenho superior nas estimativas em comparação ao método de MLR. Os resultados do capítulo 3 evidenciaram que os modelos de cultura Aquacrop e CROPGRO apresentaram desempenho variável na estimativa da produtividade de soja nas zonas em decorrência de anos predominantemente secos ou úmidos. Mais estudos devem ser realizados com modelos de cultura para previsão da produtividade de soja a nível local. Por fim, como resultado do trabalho foi possível evidenciar a importância da avaliação em escala local e do uso de métodos de machine learning e mapeamento digital como suporte à agricultura de precisão. Verificou-se que o uso de MZs é adequado para conhecer a variabilidade de fatores de solo e planta que podem influenciar no planejamento para uso localizado de insumos e impactar nos resultados de produtividade em um mesmo talhão. Como estudos futuros, sugere-se aqueles envolvendo o uso de sensores locais para monitorar a variabilidade temporal do clima, solo e planta, como meios para elevar o desempenho de métodos de machine learning na agricultura.Biblioteca Digitais de Teses e Dissertações da USPGimenez, Leandro MariaPereira, Gislaine Silva2023-10-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/11/11152/tde-04012024-105156/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-01-08T12:16:02Zoai:teses.usp.br:tde-04012024-105156Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-01-08T12:16:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
Zonas de manejo e predição espaço-temporal da variabilidade da produção de soja: técnicas de machine learning aplicadas a parâmetros de qualidade física do solo
title Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
spellingShingle Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
Pereira, Gislaine Silva
ACP
Agrupamentos
Aquacrop
Aquacrop
Árvores de classificação
Classification trees
Clusters
CROPGRO
CROPGRO
PCA
Random forest
Random forest
Soja
Soybean
title_short Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
title_full Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
title_fullStr Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
title_full_unstemmed Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
title_sort Management zones and space-time prediction of soybean yield variability: machine learning techniques applied to soil physical quality parameters
author Pereira, Gislaine Silva
author_facet Pereira, Gislaine Silva
author_role author
dc.contributor.none.fl_str_mv Gimenez, Leandro Maria
dc.contributor.author.fl_str_mv Pereira, Gislaine Silva
dc.subject.por.fl_str_mv ACP
Agrupamentos
Aquacrop
Aquacrop
Árvores de classificação
Classification trees
Clusters
CROPGRO
CROPGRO
PCA
Random forest
Random forest
Soja
Soybean
topic ACP
Agrupamentos
Aquacrop
Aquacrop
Árvores de classificação
Classification trees
Clusters
CROPGRO
CROPGRO
PCA
Random forest
Random forest
Soja
Soybean
description Methods and tools for precision agriculture are the key to ensuring increased soybean production. In this respect, knowledge of intra-field variability is the key to helping the agricultural producer in the decision-making process. Although methods for modelling production are based on regional conditions and/or agroecosystem models that do not represent local scales. The aim of this thesis is to use machine learning techniques to improve data quality for predicting yield at the management-zone level. The research was divided into three chapters that use techniques and methods focused on precision agriculture to validate the need to guarantee greater support to the farmer at the local level. The first chapter sought to use machine learning to improve the quality of data and the information from high-resolution sensors in generating management zones (MZs). In addition to validating the differences between and within MZs related to soil factors. The hypothesis of this first chapter was centred on the need to use principal component analysis (PCA) to improve the quality of MZ prediction based on observed data. The second chapter aimed to estimate soybean yield in each MZ over several years based on maps of soil water and crop development. One hypothesis for the chapter was the need to confirm the existence of the variability of intra-regional yield. The second hypothesis focused on testing the quality of near infrared reflectance (NIR) surfaces to represent crop development compared to using vegetation index (NDVI). The third hypothesis was that the machine learning technique Random Forest (RF) affords better quality yield prediction due to its efficiency in working with unbalanced data compared to the conventional method of multiple linear regression analysis (MLR). The aim of the third chapter was to understand the sensitivity of crop models (Aquacrop and CROPGRO) in estimating soybean yield at the management-zone level, especially as a function of available soil water. The hypothesis of this chapter was in the ability of crop models to show less variability when estimating yield based on the variations in soil water. The results of Chapter 1 showed that the PCA techniques afforded higher-quality clustering compared to the conventional method of normalization, besides ensuring greater stability in defining the number of MZs. Soil variables were fundamental for validating the specific characteristics of each region using the classification tree technique. The results of Chapter 2 showed the differences between digital soil water surfaces as a function of the MZs, demonstrating the importance of different management practices in each region, even at the local level. NIR reflectance improved quality predictions of soybean yield in each region compared to the use of NDVI. The RF method afforded higher-quality estimates compared to the MLR method. The results of Chapter 3 showed that the Aquacrop and CROPGRO models showed variable performance when estimating soybean yield in each zone in occurrence of wet and dry years. More studies should be carried out using crop models to predict soybean yield at local level. In this way, was possible to highlight the importance of evaluation on a local scale, with the use of machine learning methods and digital mapping to support precision agriculture. The use of MZs is the adequate to understanding the variability of soil and plant factors that will later influence planning for the localized use of inputs, impacting yield at same field. For future studies, the use of local sensors to continuously monitor variability of climate, soil and plant variability to improve precision of machine learning methods in agriculture.
publishDate 2023
dc.date.none.fl_str_mv 2023-10-11
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/11/11152/tde-04012024-105156/
url https://www.teses.usp.br/teses/disponiveis/11/11152/tde-04012024-105156/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815257894949486592