Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk

Fernández, Milton Raúl Condori

Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk

Detalhes bibliográficos
Ano de defesa:	2020
Autor(a) principal:	Fernández, Milton Raúl Condori
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Heurística Planejamento probabilístico Planning as heuristic search Probabilistic planning Risk sensitive SSP SSP as dual linear program
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/
Resumo:	In probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.

Metadados do item

id	USP_3d215a733f945c103903ea2b89fbc5d2
oai_identifier_str	oai:teses.usp.br:tde-30062021-195910
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str
spelling	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and riskHeurísticas baseadas na projeção de medidas de ocupação para planejamento probabilísticoHeurísticaPlanejamento probabilísticoPlanning as heuristic searchProbabilistic planningRisk sensitive SSPSSP as dual linear programIn probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.não disponívelBiblioteca Digitais de Teses e Dissertações da USPBarros, Leliane Nunes deFernández, Milton Raúl Condori2020-02-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-09-03T18:03:03Zoai:teses.usp.br:tde-30062021-195910Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212021-09-03T18:03:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk Heurísticas baseadas na projeção de medidas de ocupação para planejamento probabilístico
title	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
spellingShingle	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk Fernández, Milton Raúl Condori Heurística Planejamento probabilístico Planning as heuristic search Probabilistic planning Risk sensitive SSP SSP as dual linear program
title_short	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_full	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_fullStr	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_full_unstemmed	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
title_sort	Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk
author	Fernández, Milton Raúl Condori
author_facet	Fernández, Milton Raúl Condori
author_role	author
dc.contributor.none.fl_str_mv	Barros, Leliane Nunes de
dc.contributor.author.fl_str_mv	Fernández, Milton Raúl Condori
dc.subject.por.fl_str_mv	Heurística Planejamento probabilístico Planning as heuristic search Probabilistic planning Risk sensitive SSP SSP as dual linear program
topic	Heurística Planejamento probabilístico Planning as heuristic search Probabilistic planning Risk sensitive SSP SSP as dual linear program
description	In probabilistic planning an agent interacts with an environment and the objective is to find an optimal policy (state-action mapping) that allows the agent to achieve a goal state from an initial state, while minimizing the expected accumulated cost. Efficient solutions for large instances of probabilistic planning are, in general, based on Stochastic Shortest Path (ssp) problems, and use heuristic search techniques. However, these approaches have two limitations: (1) they can not guarantee to return optimal policies in the presence of dead-ends (states from which it is not possible to reach the goal) and (2) they may present a high variance in terms of cost. In instances where unavoidable dead-ends exist, we can plan in two phases: maximizing the probability to reach the goal (maxprob) and then minimizing the expected cost (mincost); or yet, we can define a penalty for reaching a dead-end state and only minimize the expected cost (mincost-with-penalty). While there exist several heuristics to solve the mincost problem, there are no efficient heuristics to solve maxprobproblem. A recent work proposed the first heuristic that takes into account the probabilities, called hpom, which solves a relaxed version of an ssp as a linear program in the dual space. In this work we propose two new heuristics based on hpom to solve probabilistic planning problems with unavoidable dead-ends, that includes new variables and constraints for dead-end states. The first, h_p_pom(s), estimates the maximum probability to reach the goal from s, and is used to efficiently solve maxprob problems by ignoring action costs and considering only the probabilities. The second, used to solve mincost-with-penalty problems, called h_pe_pom(s), estimates the minimal cost to reach the goal from state s and adds an expected penalty for reaching dead-ends. In order to deal with the second limitation of traditional ssp solutions, we propose a third heuristic, called h_rs_pom, also based on hpom, for a modified version of an ssp, called risk sensitive ssp (rs-ssp), whose optimization criterion is to minimize an exponential utility function including a risk factor to characterize the agent attitude as: (i) risk-averse ( > 0); (ii) risk-prone ( < 0); or (iii) risk-neutral ( 0). Empirical results show that the proposed heuristics can solve larger planning instances when compared to the state-of-the-art solutions for ssps with dead-ends and rs-ssp problems.
publishDate	2020
dc.date.none.fl_str_mv	2020-02-18
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/
url	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-30062021-195910/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815258594047688704

Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk

Registros relacionados