Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo

Oliveira, Thiago Henrique Freire de

Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	Oliveira, Thiago Henrique Freire de
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Otimização multiobjetivo Q-Learning ε−constraint Fronteira de Pareto Hypervolume Abordagem de política única
Link de acesso:	https://repositorio.ufrn.br/handle/123456789/32753
Resumo:	Multi-objective optimization problems depict real situations and therefore, this class of problems is extremely important. However, even though it has been studied for decades, this class of problems continues to provide challenging situations, especially with the increasing complexity of problems that arise over time. Among all the difficulties that we can find in the optimization of multiple objectives simultaneously, whether conflicting or not, one of the main ones with which existing algorithms and approaches possess is the need for a priori knowledge of the problem, causing a predefined importance for each of the objectives, seeking to establish an isomorphic between weighting and a solution. When dealing with this class of problems through reinforcement learning, two approaches are predominant: single policy (single-policy) and multiple policies (multi-policy). Algorithms and techniques that use the first approach suffer from the need for prior knowledge of the problem, an inherent characteristic of multi-objective problems. The second approach has other difficulties, such as: limiting the set of solutions and high computational cost. Given this presented context, the work proposes two hybrid algorithms, called Q-Managed with reset and Q-Managed without reset. Both are a hybridization of the Q-learning algorithm and the econstraint approach, respectively techniques belonging to reinforcement learning and multi-objective optimization. In summary, the proposed algorithms work as follows: Q-Learning is used for environment exploration, while the econstraint approach is used for the environment dynamic delimitation — restriction in the solution space search —, allowing to keep intact the essence of how the algorithm Q-Learning works. This delimitation has the following purpose: to impose the learning agent can learn other solutions by blocking actions that lead to solutions already learned and without improving them, that is, solutions to which the learning agent has already converged. This blocking actions feature is performed by the figure of a manager, where it is responsible for observing everything that occurs in the environment. Regarding the difference between the proposed algorithms, basically it is the choice of whether or not to take advantage of the knowledge already acquired of the environment after a solution is considered to be learned, that is, the learning agent has converged to a particular solution. As a way of testing the effectiveness of Q-Managed two versions, traditional benchmarks were used, which were also adopted in other works, thus allowing a fairer comparison. Thus, two comparative approaches were adopted, the first of which was through the implementation of third-party algorithms for direct comparison, while the second was done through a common metric to everyone who used the same benchmarks. In all possible tests, the algorithms proposed here proved to be effective, always finding the entire Pareto Front.

Metadados do item

id	UFRN_ccde4d2032f127d981290008e412f66f
oai_identifier_str	oai:repositorio.ufrn.br:123456789/32753
network_acronym_str	UFRN
network_name_str	Repositório Institucional da UFRN
repository_id_str
spelling	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivoOtimização multiobjetivoQ-Learningε−constraintFronteira de ParetoHypervolumeAbordagem de política únicaMulti-objective optimization problems depict real situations and therefore, this class of problems is extremely important. However, even though it has been studied for decades, this class of problems continues to provide challenging situations, especially with the increasing complexity of problems that arise over time. Among all the difficulties that we can find in the optimization of multiple objectives simultaneously, whether conflicting or not, one of the main ones with which existing algorithms and approaches possess is the need for a priori knowledge of the problem, causing a predefined importance for each of the objectives, seeking to establish an isomorphic between weighting and a solution. When dealing with this class of problems through reinforcement learning, two approaches are predominant: single policy (single-policy) and multiple policies (multi-policy). Algorithms and techniques that use the first approach suffer from the need for prior knowledge of the problem, an inherent characteristic of multi-objective problems. The second approach has other difficulties, such as: limiting the set of solutions and high computational cost. Given this presented context, the work proposes two hybrid algorithms, called Q-Managed with reset and Q-Managed without reset. Both are a hybridization of the Q-learning algorithm and the econstraint approach, respectively techniques belonging to reinforcement learning and multi-objective optimization. In summary, the proposed algorithms work as follows: Q-Learning is used for environment exploration, while the econstraint approach is used for the environment dynamic delimitation — restriction in the solution space search —, allowing to keep intact the essence of how the algorithm Q-Learning works. This delimitation has the following purpose: to impose the learning agent can learn other solutions by blocking actions that lead to solutions already learned and without improving them, that is, solutions to which the learning agent has already converged. This blocking actions feature is performed by the figure of a manager, where it is responsible for observing everything that occurs in the environment. Regarding the difference between the proposed algorithms, basically it is the choice of whether or not to take advantage of the knowledge already acquired of the environment after a solution is considered to be learned, that is, the learning agent has converged to a particular solution. As a way of testing the effectiveness of Q-Managed two versions, traditional benchmarks were used, which were also adopted in other works, thus allowing a fairer comparison. Thus, two comparative approaches were adopted, the first of which was through the implementation of third-party algorithms for direct comparison, while the second was done through a common metric to everyone who used the same benchmarks. In all possible tests, the algorithms proposed here proved to be effective, always finding the entire Pareto Front.Problemas de otimização multiobjetivo retratam situações reais e por isso, esta classe de problemas é extremamente importante. No entanto, mesmo já sendo estudada há décadas, esta classe de problemas continua a proporcionar situações desafiadoras, ainda mais com a crescente complexidade dos problemas que surgem ao longo do tempo. Dentre todas as dificuldades que podemos encontrar na otimização de múltiplos objetivos simultaneamente, sejam eles conflitantes ou não, uma das principais com que os algoritmos e abordagens existentes se possuem é a necessidade de conhecimento a priori do problema, ocasionando uma predefinição de importância para cada um dos objetivos, buscando estabelecer isomórfica entre a ponderação e uma solução. Já quando tratamos esta classe de problemas por meio da aprendizagem por reforço, duas abordagens são predominantes: política única (single-policy) e múltiplas políticas (multi-policy). Algoritmos e técnicas que utilizam a primeira abordagem sofrem com a necessidade de conhecimento prévio do problema, característica inerente dos problemas multiobjetivo. Já a segunda abordagem possui outras dificuldades, tais como: limitação do conjunto de soluções e elevado custo computacional. Diante deste contexto apresentado, o trabalho propõe dois algoritmos híbridos, chamados de Q-Managed with reset e Q-Managed without reset. Ambos são uma hibridização do algoritmo Q-Learning e a abordagem econstraint, respectivamente técnicas pertencentes a aprendizagem por reforço e otimização multiobjetivo. De forma resumida, os algoritmos propostos atuam da seguinte forma: o Q-Learning é utilizado para a exploração do ambiente, enquanto que a abordagem econstraint é utilizada para a delimitação dinâmica do ambiente — restrição da busca no espaço de soluções —, permitindo manter intacta a essência de como o algoritmo Q-Learning atua. Essa delimitação tem a seguinte finalidade: impor que o agente de aprendizagem possa aprender outras soluções por meio do bloqueio de ações que o levem a soluções já aprendidas e sem melhoria das mesmas, ou seja, soluções para qual o agente de aprendizagem já convergiu. Tal característica do bloqueio de ações é realizada pela figura de um supervisor (Manager), onde o mesmo é responsável por observar tudo o que ocorre no ambiente. Com relação a diferença entre os algoritmos propostos, basicamente trata-se da escolha de aproveitar ou não o conhecimento já adquirido do ambiente após uma solução ser considerada aprendida, ou seja, o agente de aprendizado ter convergido para uma determinada solução. Como forma de testar a eficácia das duas versões do Q-Managed, foram utilizados benchmarks tradicionais, os quais também foram adotados em outros trabalhos, permitindo assim uma comparação mais justa. Assim, duas abordagens comparativas foram adotadas, sendo a primeira delas por meio da implementação dos algoritmos de terceiros para uma comparação direta, enquanto que a segunda se deu por meio de uma métrica comum a todos que utilizaram os mesmos benchmarks. Em todos os testes possíveis, os algoritmos aqui propostos se mostraram eficazes, sempre encontrando toda a Fronteira de Pareto.Universidade Federal do Rio Grande do NorteBrasilUFRNPROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃODoria Neto, Adrião Duartehttp://lattes.cnpq.br/0465224964961501http://lattes.cnpq.br/1987295209521433Araújo, Aluizio Fausto Ribeirohttp://lattes.cnpq.br/8715023255304328Araújo, Daniel Sabino Amorim dehttp://lattes.cnpq.br/4744754780165354Lima Júnior, Francisco Chagas dehttp://lattes.cnpq.br/9342041276186254Melo, Jorge Dantas dehttp://lattes.cnpq.br/7325007451912598Fernandes, Marcelo Augusto Costahttp://lattes.cnpq.br/3475337353676349Oliveira, Thiago Henrique Freire de2021-06-22T16:49:50Z2021-06-22T16:49:50Z2021-01-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfOLIVEIRA, Thiago Henrique Freire de. Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo. 2021. 86f. Tese (Doutorado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2021.https://repositorio.ufrn.br/handle/123456789/32753info:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRN2021-06-22T16:50:52Zoai:repositorio.ufrn.br:123456789/32753Repositório InstitucionalPUBhttp://repositorio.ufrn.br/oai/repositorio@bczm.ufrn.bropendoar:2021-06-22T16:50:52Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.none.fl_str_mv	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
title	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
spellingShingle	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo Oliveira, Thiago Henrique Freire de Otimização multiobjetivo Q-Learning ε−constraint Fronteira de Pareto Hypervolume Abordagem de política única
title_short	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
title_full	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
title_fullStr	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
title_full_unstemmed	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
title_sort	Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo
author	Oliveira, Thiago Henrique Freire de
author_facet	Oliveira, Thiago Henrique Freire de
author_role	author
dc.contributor.none.fl_str_mv	Doria Neto, Adrião Duarte http://lattes.cnpq.br/0465224964961501 http://lattes.cnpq.br/1987295209521433 Araújo, Aluizio Fausto Ribeiro http://lattes.cnpq.br/8715023255304328 Araújo, Daniel Sabino Amorim de http://lattes.cnpq.br/4744754780165354 Lima Júnior, Francisco Chagas de http://lattes.cnpq.br/9342041276186254 Melo, Jorge Dantas de http://lattes.cnpq.br/7325007451912598 Fernandes, Marcelo Augusto Costa http://lattes.cnpq.br/3475337353676349
dc.contributor.author.fl_str_mv	Oliveira, Thiago Henrique Freire de
dc.subject.por.fl_str_mv	Otimização multiobjetivo Q-Learning ε−constraint Fronteira de Pareto Hypervolume Abordagem de política única
topic	Otimização multiobjetivo Q-Learning ε−constraint Fronteira de Pareto Hypervolume Abordagem de política única
description	Multi-objective optimization problems depict real situations and therefore, this class of problems is extremely important. However, even though it has been studied for decades, this class of problems continues to provide challenging situations, especially with the increasing complexity of problems that arise over time. Among all the difficulties that we can find in the optimization of multiple objectives simultaneously, whether conflicting or not, one of the main ones with which existing algorithms and approaches possess is the need for a priori knowledge of the problem, causing a predefined importance for each of the objectives, seeking to establish an isomorphic between weighting and a solution. When dealing with this class of problems through reinforcement learning, two approaches are predominant: single policy (single-policy) and multiple policies (multi-policy). Algorithms and techniques that use the first approach suffer from the need for prior knowledge of the problem, an inherent characteristic of multi-objective problems. The second approach has other difficulties, such as: limiting the set of solutions and high computational cost. Given this presented context, the work proposes two hybrid algorithms, called Q-Managed with reset and Q-Managed without reset. Both are a hybridization of the Q-learning algorithm and the econstraint approach, respectively techniques belonging to reinforcement learning and multi-objective optimization. In summary, the proposed algorithms work as follows: Q-Learning is used for environment exploration, while the econstraint approach is used for the environment dynamic delimitation — restriction in the solution space search —, allowing to keep intact the essence of how the algorithm Q-Learning works. This delimitation has the following purpose: to impose the learning agent can learn other solutions by blocking actions that lead to solutions already learned and without improving them, that is, solutions to which the learning agent has already converged. This blocking actions feature is performed by the figure of a manager, where it is responsible for observing everything that occurs in the environment. Regarding the difference between the proposed algorithms, basically it is the choice of whether or not to take advantage of the knowledge already acquired of the environment after a solution is considered to be learned, that is, the learning agent has converged to a particular solution. As a way of testing the effectiveness of Q-Managed two versions, traditional benchmarks were used, which were also adopted in other works, thus allowing a fairer comparison. Thus, two comparative approaches were adopted, the first of which was through the implementation of third-party algorithms for direct comparison, while the second was done through a common metric to everyone who used the same benchmarks. In all possible tests, the algorithms proposed here proved to be effective, always finding the entire Pareto Front.
publishDate	2021
dc.date.none.fl_str_mv	2021-06-22T16:49:50Z 2021-06-22T16:49:50Z 2021-01-11
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	OLIVEIRA, Thiago Henrique Freire de. Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo. 2021. 86f. Tese (Doutorado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2021. https://repositorio.ufrn.br/handle/123456789/32753
identifier_str_mv	OLIVEIRA, Thiago Henrique Freire de. Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo. 2021. 86f. Tese (Doutorado em Engenharia Elétrica e de Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2021.
url	https://repositorio.ufrn.br/handle/123456789/32753
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO
publisher.none.fl_str_mv	Universidade Federal do Rio Grande do Norte Brasil UFRN PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA ELÉTRICA E DE COMPUTAÇÃO
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN
instname_str	Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str	UFRN
institution	UFRN
reponame_str	Repositório Institucional da UFRN
collection	Repositório Institucional da UFRN
repository.name.fl_str_mv	Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv	repositorio@bczm.ufrn.br
_version_	1855758877447946240

Algoritmos de aprendizagem por reforço para problemas de otimização multiobjetivo

Registros relacionados