Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Fernández, Milton Raúl Condori
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/
Resumo: In probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.
id USP_3d215a733f945c103903ea2b89fbc5d2
oai_identifier_str oai:teses.usp.br:tde-30062021-195910
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling Heuristics based on projection occupation measures for probabilistic planning with dead-ends and riskHeurísticas baseadas na projeção de medidas de ocupação para planejamento probabilísticoHeurísticaPlanejamento probabilísticoPlanning as heuristic searchProbabilistic planningRisk sensitive SSPSSP as dual linear programIn probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.não disponívelBiblioteca Digitais de Teses e Dissertações da USPBarros, Leliane Nunes deFernández, Milton Raúl Condori2020-02-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-09-03T18:03:03Zoai:teses.usp.br:tde-30062021-195910Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212021-09-03T18:03:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
Heurísticas baseadas na projeção de medidas de ocupação para planejamento probabilístico
title Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
spellingShingle Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
Fernández, Milton Raúl Condori
Heurística
Planejamento probabilístico
Planning as heuristic search
Probabilistic planning
Risk sensitive SSP
SSP as dual linear program
title_short Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_full Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_fullStr Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_full_unstemmed Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_sort Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
author Fernández, Milton Raúl Condori
author_facet Fernández, Milton Raúl Condori
author_role author
dc.contributor.none.fl_str_mv Barros, Leliane Nunes de
dc.contributor.author.fl_str_mv Fernández, Milton Raúl Condori
dc.subject.por.fl_str_mv Heurística
Planejamento probabilístico
Planning as heuristic search
Probabilistic planning
Risk sensitive SSP
SSP as dual linear program
topic Heurística
Planejamento probabilístico
Planning as heuristic search
Probabilistic planning
Risk sensitive SSP
SSP as dual linear program
description In probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.
publishDate 2020
dc.date.none.fl_str_mv 2020-02-18
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/
url https://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815258594047688704