Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity
| Ano de defesa: | 2021 |
|---|---|
| Autor(a) principal: | |
| Orientador(a): | |
| Banca de defesa: | |
| Tipo de documento: | Dissertação |
| Tipo de acesso: | Acesso aberto |
| Idioma: | eng |
| Instituição de defesa: |
Não Informado pela instituição
|
| Programa de Pós-Graduação: |
Não Informado pela instituição
|
| Departamento: |
Não Informado pela instituição
|
| País: |
Não Informado pela instituição
|
| Palavras-chave em Português: | |
| Palavras-chave em Inglês: | |
| Link de acesso: | http://hdl.handle.net/10183/231836 |
Resumo: | Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed. |
| id |
URGS_9f3bd8543ee328327e1cfbba8be22578 |
|---|---|
| oai_identifier_str |
oai:www.lume.ufrgs.br:10183/231836 |
| network_acronym_str |
URGS |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da UFRGS |
| repository_id_str |
|
| spelling |
Anquise, Candy Alexandra HuancaBazzan, Ana Lucia Cetertich2021-11-17T04:24:22Z2021http://hdl.handle.net/10183/231836001133526Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed.application/pdfengSistemas multiagentesAprendizagemMulti-objectiveDecision-makingMulti-objective route choiceReinforcement learningMulti-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarityinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2021mestradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT001133526.pdf.txt001133526.pdf.txtExtracted Texttext/plain101462http://www.lume.ufrgs.br/bitstream/10183/231836/2/001133526.pdf.txte315b3dcd3ed4e087c015975acb736bdMD52ORIGINAL001133526.pdfTexto completo (inglês)application/pdf1569013http://www.lume.ufrgs.br/bitstream/10183/231836/1/001133526.pdfc47b2d66b3e7f3aa50fe776779b9e99fMD5110183/2318362021-11-20 05:41:42.464006oai:www.lume.ufrgs.br:10183/231836Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br||lume@ufrgs.bropendoar:18532021-11-20T07:41:42Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
| dc.title.pt_BR.fl_str_mv |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| title |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| spellingShingle |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity Anquise, Candy Alexandra Huanca Sistemas multiagentes Aprendizagem Multi-objective Decision-making Multi-objective route choice Reinforcement learning |
| title_short |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| title_full |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| title_fullStr |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| title_full_unstemmed |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| title_sort |
Multi-objective reinforcement learning methods for action selection : dealing with multiple objectives and non-stationarity |
| author |
Anquise, Candy Alexandra Huanca |
| author_facet |
Anquise, Candy Alexandra Huanca |
| author_role |
author |
| dc.contributor.author.fl_str_mv |
Anquise, Candy Alexandra Huanca |
| dc.contributor.advisor1.fl_str_mv |
Bazzan, Ana Lucia Cetertich |
| contributor_str_mv |
Bazzan, Ana Lucia Cetertich |
| dc.subject.por.fl_str_mv |
Sistemas multiagentes Aprendizagem |
| topic |
Sistemas multiagentes Aprendizagem Multi-objective Decision-making Multi-objective route choice Reinforcement learning |
| dc.subject.eng.fl_str_mv |
Multi-objective Decision-making Multi-objective route choice Reinforcement learning |
| description |
Multi-objective decision-making entails planning based on a model to find the best policy to solve such problems. If this model is unknown, learning through interaction provides the means to behave in the environment. Multi-objective decision-making in a multi-agent system poses many unsolved challenges. Among them, multiple objectives and non-stationarity, caused by simultaneous learners, have been addressed separately so far. In this work, algorithms that address these issues by taking strengths from different methods are proposed and applied to a route choice scenario formulated as a multi-armed bandit problem. Therefore, the focus is on action selection. In the route choice problem, drivers must select a route while aiming to minimize both their travel time and toll. The proposed algorithms take and combine important aspects from works that tackle only one issue: non-stationarity or multiple objectives, making possible to handle these problems together. The methods used from these works are a set of Upper-Confidence Bound (UCB) algorithms and the Pareto Q-learning (PQL) algorithm. The UCB-based algorithms are Pareto UCB1 (PUCB1), the discounted UCB (DUCB) and sliding window UCB (SWUCB). PUCB1 deals with multiple objectives, while DUCB and SWUCB address non-stationarity in different ways. PUCB1 was extended to include characteristics from DUCB and SWUCB. In the case of PQL, as it is a state-based method that focuses on more than one objective, a modification was made to tackle a problem focused on action selection. Results obtained from a comparison in a route choice scenario show that the proposed algorithms deal with non-stationarity and multiple objectives, while using a discount factor is the best approach. Advantages, limitations and differences of these algorithms are discussed. |
| publishDate |
2021 |
| dc.date.accessioned.fl_str_mv |
2021-11-17T04:24:22Z |
| dc.date.issued.fl_str_mv |
2021 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/231836 |
| dc.identifier.nrb.pt_BR.fl_str_mv |
001133526 |
| url |
http://hdl.handle.net/10183/231836 |
| identifier_str_mv |
001133526 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
| instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
| instacron_str |
UFRGS |
| institution |
UFRGS |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da UFRGS |
| collection |
Biblioteca Digital de Teses e Dissertações da UFRGS |
| bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/231836/2/001133526.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/231836/1/001133526.pdf |
| bitstream.checksum.fl_str_mv |
e315b3dcd3ed4e087c015975acb736bd c47b2d66b3e7f3aa50fe776779b9e99f |
| bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
| repository.mail.fl_str_mv |
lume@ufrgs.br||lume@ufrgs.br |
| _version_ |
1831316121754533888 |