Stochastic dynamic programming

Stochastic dynamic programming is a technique for modelling and solving problems of decision making under uncertainty involving multi-stage stochastic systems that evolve over a planning horizon. Originally introduced in (Bellman 1957), it is a branch of stochastic programming that takes a functional equation approach to the discovery of optimum policies for this class of problems.

Formal background

Consider a discrete system defined on $n$ stages in which each stage $t=1,\ldots ,n$ is characterized by

an initial state $s_{t}\in S_{t}$ , where $S_{t}$ is the set of feasible states at the beginning of stage $t$ ;
a decision variable $x_{t}\in X_{t}$ , where $X_{t}$ is the set of feasible actions at stage $t$ – note that $X_{t}$ may be a function of the initial state $s_{t}$ ;
an immediate cost/reward function $p_{t}(s_{t},x_{t})$ , representing the cost/reward at stage $t$ if $s_{t}$ is the initial state and $x_{t}$ the action selected;
a state transition function $g_{t}(s_{t},x_{t})$ that leads the system towards state $s_{t+1}=g_{t}(s_{t},x_{t})$ .

Let $f_{t}(s_{t})$ represent the optimal cost/reward obtained by following an optimal policy over stages $t,t+1,\ldots ,n$ . Without loss of generality in what follow we will consider a reward maximisation setting. In deterministic dynamic programming one usually deals with functional equations with the following structure

f_{t}(s_{t})=\max _{x_{t}\in X_{t}}\{p_{t}(s_{t},x_{t})+f_{t+1}(s_{t+1})\}

where $s_{t+1}=g_{k}(s_{t},x_{t})$ and the boundary condition of the system is

f_{n}(s_{n})=\max _{x_{n}\in X_{n}}\{p_{n}(s_{n},x_{n})\}.

The aim is to determine the set of optimal actions that maximises $f_{1}(s_{1})$ . Given the current state $s_{t}$ and the current action $x_{t}$ , we know with certainty the reward secured during the current stage and – thanks to the state transition function $g_{t}$ – the future states. In practice, however, even if we know the state of the system at the beginning of the current stage as well as the decision taken, the state of the system at the beginning of the next stage and the current period reward are often random variables that can be observed only at the end of the current stage.

Stochastic dynamic programming deals with problems in which the current period reward and/or the next period state are random. The decision maker's goal is to maximise expected (discounted) reward over a given planning horizon.

In their most general form, stochastic dynamic programs deal with functional equations with the following structure

f_{t}(s_{t})=\max _{x_{t}\in X_{t}}\left\{({\text{expected reward during stage }}t\mid s_{t},x_{t})+\alpha \sum _{s_{t+1}}\Pr(s_{t+1}\mid s_{t},x_{t})f_{t+1}(s_{t+1})\right\}

where

$f_{t}(s_{t})$ is the maximum expected reward that can be attained during stages $t,t+1,\ldots ,n$ , given state $s_{t}$ at the beginning of stage $t$ ;
$x_{t}$ belongs to the set $X_{t}$ of feasible actions at stage $t$ ;
$\alpha$ is the discount factor;
$\Pr(s_{t+1}\mid s_{t},x_{t})$ is the conditional probability that the state at the beginning of stage $t$ is $s_{t+1}$ given current state $s_{t}$ and selected action $x_{t}$ .

Markov decision process represent a special class of stochastic dynamic programs in which the underling stochastic process is a stationary process that features the Markov property.

Solution methods

Stochastic dynamic programs can be solved to optimality by using backward or forward recursion algorithms. Memoization is typically employed to enhance performance. However, like deterministic dynamic programming also its stochastic variant suffers from the curse of dimensionality. For this reason approximate solution methods are typically employed in practical applications.

Forward and backward recursion

Forward and backward recursion approaches are discussed in (Bertsekas 2000).

Approximate dynamic programming

An introduction to Approximate Dynamic Programming is provided by (Powell 2009).

Stochastic dynamic programming

Formal background

Solution methods

Forward and backward recursion

Approximate dynamic programming

Further reading

See also