Jump to content

Stochastic dynamic programming

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Zakhx150 (talk | contribs) at 12:54, 6 March 2017 (Added tags to the page using Page Curation (technical)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Stochastic dynamic programming is a technique for modelling and solving problems of decision making under uncertainty involving multi-stage stochastic systems that evolve over a planning horizon. Originally introduced in (Bellman 1957), it is a branch of stochastic programming that takes a functional equation approach to the discovery of optimum policies for this class of problems.

Formal background

Consider a discrete system defined on stages in which each stage is characterized by

  • an initial state , where is the set of feasible states at the beginning of stage ;
  • a decision variable , where is the set of feasible actions at stage – note that may be a function of the initial state ;
  • an immediate cost/reward function , representing the cost/reward at stage if is the initial state and the action selected;
  • a state transition function that leads the system towards state .

Let represent the optimal cost/reward obtained by following an optimal policy over stages . Without loss of generality in what follow we will consider a reward maximisation setting. In deterministic dynamic programming one usually deals with functional equations with the following structure

where and the boundary condition of the system is

The aim is to determine the set of optimal actions that maximises . Given the current state and the current action , we know with certainty the reward secured during the current stage and – thanks to the state transition function – the future states. In practice, however, even if we know the state of the system at the beginning of the current stage as well as the decision taken, the state of the system at the beginning of the next stage and the current period reward are often random variables that can be observed only at the end of the current stage.

Stochastic dynamic programming deals with problems in which the current period reward and/or the next period state are random. The decision maker's goal is to maximise expected (discounted) reward over a given planning horizon.

In their most general form, stochastic dynamic programs deal with functional equations with the following structure

where

  • is the maximum expected reward that can be attained during stages , given state at the beginning of stage ;
  • belongs to the set of feasible actions at stage ;
  • is the discount factor;
  • is the conditional probability that the state at the beginning of stage is given current state and selected action .

Markov decision process represent a special class of stochastic dynamic programs in which the underling stochastic process is a stationary process that features the Markov property.

Solution methods

Stochastic dynamic programs can be solved to optimality by using backward or forward recursion algorithms. Memoization is typically employed to enhance performance. However, like deterministic dynamic programming also its stochastic variant suffers from the curse of dimensionality. For this reason approximate solution methods are typically employed in practical applications.

Forward and backward recursion

Forward and backward recursion approaches are discussed in (Bertsekas 2000).

Approximate dynamic programming

An introduction to Approximate Dynamic Programming is provided by (Powell 2009).

Further reading

  • Bellman, R. (1957), Dynamic Programming, Princeton University Press, ISBN 0-486-42809-5 {{citation}}: ISBN / Date incompatibility (help). Dover paperback edition (2003).
  • Ross, S. M.; Bimbaum, Z. W.; Lukacs, E. (1983), Introduction to Stochastic Dynamic Programming, Elsevier, ISBN 978-0-12-598420-1.
  • Bertsekas, D. P. (2000), Dynamic Programming and Optimal Control (2nd ed.), Athena Scientific, ISBN 1-886529-09-4. In two volumes.
  • Powell, W. B. (2009), "What you should know about approximate dynamic programming", Naval Research Logistics, 56 (1): 239–249, doi:10.1002/nav.20347 {{citation}}: |access-date= requires |url= (help)

See also