Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Julio César Alves
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://hdl.handle.net/1843/38570
Resumo: Deep Reinforcement Learning (DRL) methods have been increasingly used in several areas of knowledge and, recently, this interest has also grown in the Optimization community. In this work, we apply and compare Policy Gradient methods in the problem of planning the production and distribution of products in a supply chain with multiple stages. Most of the previous works that use similar methods only consider serial supply chains or only two echelons, generally limiting the solution possibilities, and none of them consider stochastic lead times. We consider a chain with four echelons and two nodes per echelon, with uncertainties regarding seasonal demands from customers and lead times of production at suppliers and transport along the chain. To our knowledge, this work is the first to apply, in such chain configuration, DRL methods considering a centralized approach to the problem, in which all decisions are taken by a single agent based on the uncertain demands of the end customers. We propose a Markovian Decision Process (MDP) formulation and a Linear Programming (LP)model with uncertain parameters. The MDP formulation is adapted to obtain good results with the application of Policy Gradient methods. In the first phase, after an initial case study, we applied the Proximal Policy Optimization (PPO) algorithm in 17 experimental scenarios, considering seasonal and regular uncertain demands (with different levels of uncertainty) and constant and stochastic lead times. In this phase, an agent built from the solution of a Linear Programming model (given by considering expected demands and average lead times) is used as a baseline. In the second phase, we have compared five algorithms, Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), SAC, and Twin Delayed DDPG (TD3), in 8 of the 17 previous scenarios, using statistical tools for proper comparison of the algorithms. The PPO and SAC algorithms had the best performance in the experiments, the first having a better execution time. Experimental results indicate that Policy Gradient methods, especially PPO, are suitable and competitive tools for the proposed problem. In the third phase, we started to work with a multi-product version of the problem, generalizing the MDP formulation and the LP model. Experiments were carried out with the PPO algorithm in 16 multi-product scenarios, considering two and three products and different cost and demand configurations. The results indicate that, as in the original problem, the PPO performs better than the baseline in scenarios with stochastic lead times.
id UFMG_f56a2b37d7e7d549b7956ce2b1ca6d4d
oai_identifier_str oai:repositorio.ufmg.br:1843/38570
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling 2021-10-30T19:47:03Z2025-09-09T01:15:32Z2021-10-30T19:47:03Z2021-10-06https://hdl.handle.net/1843/38570Deep Reinforcement Learning (DRL) methods have been increasingly used in several areas of knowledge and, recently, this interest has also grown in the Optimization community. In this work, we apply and compare Policy Gradient methods in the problem of planning the production and distribution of products in a supply chain with multiple stages. Most of the previous works that use similar methods only consider serial supply chains or only two echelons, generally limiting the solution possibilities, and none of them consider stochastic lead times. We consider a chain with four echelons and two nodes per echelon, with uncertainties regarding seasonal demands from customers and lead times of production at suppliers and transport along the chain. To our knowledge, this work is the first to apply, in such chain configuration, DRL methods considering a centralized approach to the problem, in which all decisions are taken by a single agent based on the uncertain demands of the end customers. We propose a Markovian Decision Process (MDP) formulation and a Linear Programming (LP)model with uncertain parameters. The MDP formulation is adapted to obtain good results with the application of Policy Gradient methods. In the first phase, after an initial case study, we applied the Proximal Policy Optimization (PPO) algorithm in 17 experimental scenarios, considering seasonal and regular uncertain demands (with different levels of uncertainty) and constant and stochastic lead times. In this phase, an agent built from the solution of a Linear Programming model (given by considering expected demands and average lead times) is used as a baseline. In the second phase, we have compared five algorithms, Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), SAC, and Twin Delayed DDPG (TD3), in 8 of the 17 previous scenarios, using statistical tools for proper comparison of the algorithms. The PPO and SAC algorithms had the best performance in the experiments, the first having a better execution time. Experimental results indicate that Policy Gradient methods, especially PPO, are suitable and competitive tools for the proposed problem. In the third phase, we started to work with a multi-product version of the problem, generalizing the MDP formulation and the LP model. Experiments were carried out with the PPO algorithm in 16 multi-product scenarios, considering two and three products and different cost and demand configurations. The results indicate that, as in the original problem, the PPO performs better than the baseline in scenarios with stochastic lead times.porUniversidade Federal de Minas Geraishttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/info:eu-repo/semantics/openAccessCadeias de suprimentos multiestágioTomada de decisão sequencial sob incertezaAprendizado por reforçoAprendizado profundoMétodos policy gradientComputação - Teses.Markov, Processos de.Aprendizado por reforço - Teses.Aprendizado profundo - Teses.Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezasApplying and comparing policy gradient methods to multi-echelon supply chain problem with uncertaintyinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisJulio César Alvesreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGhttp://lattes.cnpq.br/2547158184816891Geraldo Robson Mateushttp://lattes.cnpq.br/6289602045034353André Carlos Ponce de Leon Ferreira de CarvalhoAdriano Alonso VelosoCristiano Arbex ValleDilson Lucas PereiraAlgoritmos de Aprendizado por Reforço (AR) Profundo têm sido cada vez mais utilizados em diversas áreas do conhecimento e, recentemente, este interesse tem crescido também na comunidade de Otimização. Neste trabalho, aplicamos e comparamos algoritmos do tipo Policy Gradient em um problema clássico de otimização de planejamento de produção e distribuição de produtos em uma cadeia de suprimentos com múltiplos estágios. A maior parte dos trabalhos anteriores que utilizam métodos similares, considera somente cadeias de suprimentos seriais ou com até dois estágios, geralmente limitando as possibilidades de solução, e nenhum deles considera tempos de espera estocásticos. Nós consideramos uma cadeia com quatro estágios e dois nós por estágio, com incertezas nas demandas sazonais dos clientes finais e nos tempos de espera de produção nos fornecedores e de transporte ao longo da cadeia. Pelo nosso conhecimento, este trabalho é o primeiro a aplicar, nesta configuração de cadeia, métodos de AR Profundo, considerando uma abordagem centralizada para o problema, na qual todas as decisões são tomadas por um único agente, a partir das demandas incertas dos clientes finais. Propomos uma formulação de Processo de Decisão de Markov (PDM) e um modelo de Programação Linear (PL) com parâmetros incertos. A formulação PDM é adaptada de forma a se obter bons resultados com a aplicação dos algoritmos Policy Gradient. Em uma primeira fase, depois de um estudo de caso inicial, aplicamos o algoritmo Proximal Policy Optimization (PPO) em 17 cenários experimentais, considerando demandas incertas sazonais e regulares, com diferentes níveis de incerteza para as demandas, e tempos de espera constantes e estocásticos. Nesta fase, um agente construído a partir da solução de um modelo de Programação Linear (dado por considerarmos demandas esperadas e tempos de espera médios) é usado como baseline. Em uma segunda fase, comparamos cinco algoritmos, Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), PPO, Soft Actor-Critic (SAC) e Twin Delayed DDPG (TD3), em 8 dos 17 cenários anteriores, utilizando ferramentas estatísticas para comparação adequada dos algoritmos. Os algoritmos PPO e SAC alcançaram melhor desempenho nos experimentos realizados, sendo que o primeiro tem um melhor tempo de execução. Os resultados experimentais indicam que métodos Policy Gradient, especialmente o PPO, são ferramentas adequadas e competitivas para o problema proposto. Em uma terceira fase, passamos a trabalhar com uma versão multiproduto do problema, generalizando a formulação PDM e o modelo PL com parâmetros incertos. Foram realizados experimentos com o algoritmo PPO em 16 cenários multiproduto, considerando dois e três produtos, e diferentes configurações de custos e de demandas. Os resultados encontrados indicam que, como no problema original, o PPO tem desempenho melhor que o baseline nos cenários com tempos de espera estocásticos.https://orcid.org/0000-0002-4848-9453BrasilPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGCC-LICENSElicense_rdfapplication/octet-stream811https://repositorio.ufmg.br//bitstreams/0a9dd8b5-f0cb-4059-a4aa-16dbf6a81352/downloadcfd6801dba008cb6adbd9838b81582abMD51falseAnonymousREADORIGINALTese_versao_final.pdfapplication/pdf4479522https://repositorio.ufmg.br//bitstreams/188e0a37-d829-4b2d-8e90-8f5b35ae1306/download18b2f3afb9f413f9c2f69dbf4058b03aMD52trueAnonymousREADLICENSElicense.txttext/plain2118https://repositorio.ufmg.br//bitstreams/ec7b2d15-03f0-48aa-8acc-cde43452b96e/downloadcda590c95a0b51b4d15f60c9642ca272MD53falseAnonymousREAD1843/385702025-09-08 22:15:32.271http://creativecommons.org/licenses/by-nc-nd/3.0/pt/Acesso Abertoopen.accessoai:repositorio.ufmg.br:1843/38570https://repositorio.ufmg.br/Repositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2025-09-09T01:15:32Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)falseTElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4K
dc.title.none.fl_str_mv Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
dc.title.alternative.none.fl_str_mv Applying and comparing policy gradient methods to multi-echelon supply chain problem with uncertainty
title Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
spellingShingle Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
Julio César Alves
Computação - Teses.
Markov, Processos de.
Aprendizado por reforço - Teses.
Aprendizado profundo - Teses.
Cadeias de suprimentos multiestágio
Tomada de decisão sequencial sob incerteza
Aprendizado por reforço
Aprendizado profundo
Métodos policy gradient
title_short Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
title_full Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
title_fullStr Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
title_full_unstemmed Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
title_sort Aplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
author Julio César Alves
author_facet Julio César Alves
author_role author
dc.contributor.author.fl_str_mv Julio César Alves
dc.subject.por.fl_str_mv Computação - Teses.
Markov, Processos de.
Aprendizado por reforço - Teses.
Aprendizado profundo - Teses.
topic Computação - Teses.
Markov, Processos de.
Aprendizado por reforço - Teses.
Aprendizado profundo - Teses.
Cadeias de suprimentos multiestágio
Tomada de decisão sequencial sob incerteza
Aprendizado por reforço
Aprendizado profundo
Métodos policy gradient
dc.subject.other.none.fl_str_mv Cadeias de suprimentos multiestágio
Tomada de decisão sequencial sob incerteza
Aprendizado por reforço
Aprendizado profundo
Métodos policy gradient
description Deep Reinforcement Learning (DRL) methods have been increasingly used in several areas of knowledge and, recently, this interest has also grown in the Optimization community. In this work, we apply and compare Policy Gradient methods in the problem of planning the production and distribution of products in a supply chain with multiple stages. Most of the previous works that use similar methods only consider serial supply chains or only two echelons, generally limiting the solution possibilities, and none of them consider stochastic lead times. We consider a chain with four echelons and two nodes per echelon, with uncertainties regarding seasonal demands from customers and lead times of production at suppliers and transport along the chain. To our knowledge, this work is the first to apply, in such chain configuration, DRL methods considering a centralized approach to the problem, in which all decisions are taken by a single agent based on the uncertain demands of the end customers. We propose a Markovian Decision Process (MDP) formulation and a Linear Programming (LP)model with uncertain parameters. The MDP formulation is adapted to obtain good results with the application of Policy Gradient methods. In the first phase, after an initial case study, we applied the Proximal Policy Optimization (PPO) algorithm in 17 experimental scenarios, considering seasonal and regular uncertain demands (with different levels of uncertainty) and constant and stochastic lead times. In this phase, an agent built from the solution of a Linear Programming model (given by considering expected demands and average lead times) is used as a baseline. In the second phase, we have compared five algorithms, Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), SAC, and Twin Delayed DDPG (TD3), in 8 of the 17 previous scenarios, using statistical tools for proper comparison of the algorithms. The PPO and SAC algorithms had the best performance in the experiments, the first having a better execution time. Experimental results indicate that Policy Gradient methods, especially PPO, are suitable and competitive tools for the proposed problem. In the third phase, we started to work with a multi-product version of the problem, generalizing the MDP formulation and the LP model. Experiments were carried out with the PPO algorithm in 16 multi-product scenarios, considering two and three products and different cost and demand configurations. The results indicate that, as in the original problem, the PPO performs better than the baseline in scenarios with stochastic lead times.
publishDate 2021
dc.date.accessioned.fl_str_mv 2021-10-30T19:47:03Z
2025-09-09T01:15:32Z
dc.date.available.fl_str_mv 2021-10-30T19:47:03Z
dc.date.issued.fl_str_mv 2021-10-06
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1843/38570
url https://hdl.handle.net/1843/38570
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br//bitstreams/0a9dd8b5-f0cb-4059-a4aa-16dbf6a81352/download
https://repositorio.ufmg.br//bitstreams/188e0a37-d829-4b2d-8e90-8f5b35ae1306/download
https://repositorio.ufmg.br//bitstreams/ec7b2d15-03f0-48aa-8acc-cde43452b96e/download
bitstream.checksum.fl_str_mv cfd6801dba008cb6adbd9838b81582ab
18b2f3afb9f413f9c2f69dbf4058b03a
cda590c95a0b51b4d15f60c9642ca272
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv repositorio@ufmg.br
_version_ 1862105707612995584