Forward algorithm - Revision history

BjornChunt: /* Complexity */ edited for correctness and clarity

2025-05-24T14:43:11Z

Complexity: edited for correctness and clarity

← Previous revision		Revision as of 14:43, 24 May 2025
Line 83:		Line 83:

	==Complexity==		==Complexity==
	Complexity of Forward Algorithm is <math>\Theta(nm^2)</math>, where <math>m</math> is the number of ~~hidden~~ or latent ~~variables,~~ like weather in the example above, and <math>n</math> is the length ~~of the sequence~~ of the observed ~~variable~~. This is clear reduction from the ~~adhoc~~ method of exploring all the possible states ~~with~~ a complexity of <math>\Theta(nm^n)</math>.		Complexity of Forward Algorithm is <math>\Theta(nm^2)</math>, where <math>m</math> is the number of possible states for a latent variable (like the number of weather conditions in the example above), and <math>n</math> is the length of the observed sequence. This is a clear reduction from the ad hoc method of exploring all the possible states, which has a complexity of <math>\Theta(nm^n)</math>.

	==Variants of the algorithm==		==Variants of the algorithm==

Kku: link speech recognition

2024-05-10T07:32:05Z

link speech recognition

← Previous revision		Revision as of 07:32, 10 May 2024
Line 93:		Line 93:

	==History==		==History==
	The forward algorithm is one of the algorithms used to solve the decoding problem. Since the development of speech recognition<ref name="speechRecognition">[[Lawrence Rabiner\|Lawrence R. Rabiner]], "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989. [https://dx.doi.org/10.1109/5.18626 10.1109/5.18626]</ref> and pattern recognition and related fields like [[computational biology]] which use HMMs, the forward algorithm has gained popularity.		The forward algorithm is one of the algorithms used to solve the decoding problem. Since the development of [[speech recognition]]<ref name="speechRecognition">[[Lawrence Rabiner\|Lawrence R. Rabiner]], "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". ''Proceedings of the [[IEEE]]'', 77 (2), p. 257–286, February 1989. [https://dx.doi.org/10.1109/5.18626 10.1109/5.18626]</ref> and pattern recognition and related fields like [[computational biology]] which use HMMs, the forward algorithm has gained popularity.

	==Applications==		==Applications==

Manoguru: /* Pseudocode */ reduction

2024-04-25T19:12:42Z

Pseudocode: reduction

← Previous revision		Revision as of 19:12, 25 April 2024
Line 74:		Line 74:
	#For <math>t = 1</math> to <math>T</math>		#For <math>t = 1</math> to <math>T</math>
	#:<math>\alpha(x_t) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\alpha(x_{t-1})</math>.		#:<math>\alpha(x_t) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\alpha(x_{t-1})</math>.
	#~~Calculate~~ <math>~~\beta_T~~ = \sum_{x_T} \alpha(x_T) </math>		#Return <math>p(x_T\|y_{1:T})= \frac{\alpha(x_T)}{\sum_{x_T} \alpha(x_T)}</math>
	#Return <math>p(x_T\|y_{1:T})= \frac{\alpha(x_T)}{\beta_T}</math>

	{{frame-footer}}		{{frame-footer}}

Manoguru: /* Algorithm */ correction regarding initial value; making it more similar to the article on Baum-Welch algorithm

2024-04-25T19:11:30Z

Algorithm: correction regarding initial value; making it more similar to the article on Baum-Welch algorithm

Manoguru: /* Algorithm */ further explanation

2024-04-25T18:12:39Z

Algorithm: further explanation

← Previous revision		Revision as of 18:12, 25 April 2024
Line 14:		Line 14:

	==Algorithm==		==Algorithm==
	The goal of the forward algorithm is to compute the [[Joint probability distribution\|joint probability]] <math>p(x_t,y_{1:t})</math>, where for notational convenience we have abbreviated <math>x(t)</math> as <math>x_t</math> and <math>(y(1), y(2), ..., y(t))</math> as <math>y_{1:t}</math>. Once the joint probability <math>p(x_t,y_{1:t})</math> is computed, the other probabilities <math>p(x_t\|y_{1:t})</math> and <math>p(y_{1:t})</math> are easily obtained. Both the state <math>x_t</math> and observation <math>y_t</math> are assumed to be discrete, finite random variables. The model's transition probabilities <math>p(x_t\|x_{t-1})</math> and emission probabilities <math>p(y_t\|x_t)</math> are assumed to be known. Computing <math>p(x_t,y_{1:t})</math> directly would require [[Marginal distribution\|marginalizing]] over all possible state sequences <math>\{x_{1:t-1}\}</math>, the number of which grows exponentially with <math>t</math>. Instead, the forward algorithm takes advantage of the [[conditional independence]] rules of the [[hidden Markov model]] (HMM) to perform the calculation recursively.		The goal of the forward algorithm is to compute the [[Joint probability distribution\|joint probability]] <math>p(x_t,y_{1:t})</math>, where for notational convenience we have abbreviated <math>x(t)</math> as <math>x_t</math> and <math>(y(1), y(2), ..., y(t))</math> as <math>y_{1:t}</math>. Once the joint probability <math>p(x_t,y_{1:t})</math> is computed, the other probabilities <math>p(x_t\|y_{1:t})</math> and <math>p(y_{1:t})</math> are easily obtained. Both the state <math>x_t</math> and observation <math>y_t</math> are assumed to be discrete, finite random variables. The model's state transition probabilities <math>p(x_t\|x_{t-1})</math> and observation/emission probabilities <math>p(y_t\|x_t)</math> are assumed to be known. Computing <math>p(x_t,y_{1:t})</math> directly would require [[Marginal distribution\|marginalizing]] over all possible state sequences <math>\{x_{1:t-1}\}</math>, the number of which grows exponentially with <math>t</math>. Instead, the forward algorithm takes advantage of the [[conditional independence]] rules of the [[hidden Markov model]] (HMM) to perform the calculation recursively.

	To demonstrate the recursion, let		To demonstrate the recursion, let

Manoguru: some structural changes with some additional information

2024-04-25T17:55:45Z

some structural changes with some additional information

Show changes

Manoguru: /* Algorithm */ correction

2024-04-25T00:15:33Z

Algorithm: correction

← Previous revision		Revision as of 00:15, 25 April 2024
Line 33:		Line 33:
	Thus, since <math>p(y_t\|x_t)</math> and <math>p(x_t\|x_{t-1})</math> are given by the model's [[Hidden Markov model#Structural_architecture\|emission distributions]] and [[Hidden Markov model#Structural_architecture\|transition probabilities]], which are assumed to be known, one can quickly calculate <math>\alpha(x_t)</math> from <math>\alpha(x_{t-1})</math> and avoid incurring exponential computation time.		Thus, since <math>p(y_t\|x_t)</math> and <math>p(x_t\|x_{t-1})</math> are given by the model's [[Hidden Markov model#Structural_architecture\|emission distributions]] and [[Hidden Markov model#Structural_architecture\|transition probabilities]], which are assumed to be known, one can quickly calculate <math>\alpha(x_t)</math> from <math>\alpha(x_{t-1})</math> and avoid incurring exponential computation time.

	The recursion formula given above can be written in a more compact form. Let <math>a_{ij}=p(x_t=j\|~~x_t~~=i)</math> be the transition probabilities and <math>b_{ij}=p(y_t=j\|x_t=i)</math> be the emission probabilities, then		The recursion formula given above can be written in a more compact form. Let <math>a_{ij}=p(x_t=i\|x_{t-1}=j)</math> be the transition probabilities and <math>b_{ij}=p(y_t=i\|x_t=j)</math> be the emission probabilities, then

	::<math>\mathbf{\alpha}_t = \mathbf{b}_j \odot \mathbf{A} \mathbf{\alpha}_{t-1}</math>		::<math>\mathbf{\alpha}_t = \mathbf{b}_i^T \odot \mathbf{A} \mathbf{\alpha}_{t-1}</math>

	where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_j</math> is the j-th ~~column~~ of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = j</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)\|hadamard product]] between <math>\mathbf{b}_j</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.		where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_i</math> is the i-th row of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = i</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)\|hadamard product]] between <math>\mathbf{b}_i^T</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.

	The initial condition is set as some prior probability over <math>x_0</math> as		The initial condition is set as some prior probability over <math>x_0</math> as

Manoguru: /* Example */ remove unnecessary subscripts

2024-04-25T00:09:01Z

Example: remove unnecessary subscripts

← Previous revision		Revision as of 00:09, 25 April 2024
Line 80:		Line 80:

	==Example==		==Example==
	This example on observing possible states of weather from the observed condition of seaweed. We have observations of seaweed for three consecutive days as dry, damp, and soggy in order. The possible states of weather can be sunny, cloudy, or rainy. In total, there can be <math>3^3=27</math> such weather sequences. Exploring all such possible state sequences is computationally very expensive. To reduce this complexity, Forward algorithm comes in handy, where the trick lies in using the conditional independence of the sequence steps to calculate partial probabilities, <math>\~~alpha_t~~(x_t) = p(x_t,y_{1:t}) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\~~alpha_{t-1}~~(x_{t-1})</math> as shown in the above derivation. Hence, we can calculate the probabilities as the product of the appropriate observation/emission probability, <math>p(y_t\|x_t)</math> ( probability of state <math>y_t</math> seen at time t from previous observation) with the sum of probabilities of reaching that state at time t, calculated using transition probabilities. This reduces complexity of the problem from searching whole search space to just using previously computed <math>\alpha</math>'s and transition probabilities.		This example on observing possible states of weather from the observed condition of seaweed. We have observations of seaweed for three consecutive days as dry, damp, and soggy in order. The possible states of weather can be sunny, cloudy, or rainy. In total, there can be <math>3^3=27</math> such weather sequences. Exploring all such possible state sequences is computationally very expensive. To reduce this complexity, Forward algorithm comes in handy, where the trick lies in using the conditional independence of the sequence steps to calculate partial probabilities, <math>\alpha(x_t) = p(x_t,y_{1:t}) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\alpha(x_{t-1})</math> as shown in the above derivation. Hence, we can calculate the probabilities as the product of the appropriate observation/emission probability, <math>p(y_t\|x_t)</math> ( probability of state <math>y_t</math> seen at time t from previous observation) with the sum of probabilities of reaching that state at time t, calculated using transition probabilities. This reduces complexity of the problem from searching whole search space to just using previously computed <math>\alpha</math>'s and transition probabilities.

	==Applications of the algorithm==		==Applications of the algorithm==

Manoguru: /* Pseudocode */ remove unwanted subscripts

2024-04-25T00:08:20Z

Pseudocode: remove unwanted subscripts

← Previous revision		Revision as of 00:08, 25 April 2024
Line 70:		Line 70:
	#:emission probabilities, <math>p(y_t\|x_t) </math>,		#:emission probabilities, <math>p(y_t\|x_t) </math>,
	#:observed sequence, <math>y_{1:T}</math>		#:observed sequence, <math>y_{1:T}</math>
	#:prior probability, <math>\~~alpha_0~~(x_0)</math>		#:prior probability, <math>\alpha(x_0)</math>
	#For <math>t = 1</math> to <math>T</math>		#For <math>t = 1</math> to <math>T</math>
	#:<math>\~~alpha_t~~(x_t) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\~~alpha_{t-1}~~(x_{t-1})</math>.		#:<math>\alpha(x_t) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\alpha(x_{t-1})</math>.
	#Calculate <math>\beta_T = \sum_{x_T} \~~alpha_T~~(x_T) </math>		#Calculate <math>\beta_T = \sum_{x_T} \alpha(x_T) </math>
	#Return <math>p(x_T\|y_{1:T})= \frac{\~~alpha_T~~(x_T)}{\beta_T}</math>		#Return <math>p(x_T\|y_{1:T})= \frac{\alpha(x_T)}{\beta_T}</math>

	{{frame-footer}}		{{frame-footer}}

Manoguru: /* Algorithm */ removing unnecessary subscripts

2024-04-25T00:07:26Z

Algorithm: removing unnecessary subscripts

← Previous revision		Revision as of 19:11, 25 April 2024
Line 14:		Line 14:

	==Algorithm==		==Algorithm==
	The goal of the forward algorithm is to compute the [[Joint probability distribution\|joint probability]] <math>p(x_t,y_{1:t})</math>, where for notational convenience we have abbreviated <math>x(t)</math> as <math>x_t</math> and <math>(y(1), y(2), ..., y(t))</math> as <math>y_{1:t}</math>. Once the joint probability <math>p(x_t,y_{1:t})</math> is computed, the other probabilities <math>p(x_t\|y_{1:t})</math> and <math>p(y_{1:t})</math> are easily obtained. Both the state <math>x_t</math> and observation <math>y_t</math> are assumed to be discrete, finite random variables. The model's state transition probabilities <math>p(x_t\|x_{t-1})</math> ~~and~~ observation/emission probabilities <math>p(y_t\|x_t)</math> are assumed to be known. Computing <math>p(x_t,y_{1:t})</math> ~~directly~~ would require [[Marginal distribution\|marginalizing]] over all possible state sequences <math>\{x_{1:t-1}\}</math>, the number of which grows exponentially with <math>t</math>. Instead, the forward algorithm takes advantage of the [[conditional independence]] rules of the [[hidden Markov model]] (HMM) to perform the calculation recursively.		The goal of the forward algorithm is to compute the [[Joint probability distribution\|joint probability]] <math>p(x_t,y_{1:t})</math>, where for notational convenience we have abbreviated <math>x(t)</math> as <math>x_t</math> and <math>(y(1), y(2), ..., y(t))</math> as <math>y_{1:t}</math>. Once the joint probability <math>p(x_t,y_{1:t})</math> is computed, the other probabilities <math>p(x_t\|y_{1:t})</math> and <math>p(y_{1:t})</math> are easily obtained.

			Both the state <math>x_t</math> and observation <math>y_t</math> are assumed to be discrete, finite random variables. The hidden Markov model's state transition probabilities <math>p(x_t\|x_{t-1})</math>, observation/emission probabilities <math>p(y_t\|x_t)</math>, and initial prior probability <math>p(x_0)</math> are assumed to be known. Furthermore, the sequence of observations <math>y_{1:t}</math> are assumed to be given.

			Computing <math>p(x_t,y_{1:t})</math> naively would require [[Marginal distribution\|marginalizing]] over all possible state sequences <math>\{x_{1:t-1}\}</math>, the number of which grows exponentially with <math>t</math>. Instead, the forward algorithm takes advantage of the [[conditional independence]] rules of the [[hidden Markov model]] (HMM) to perform the calculation recursively.

	To demonstrate the recursion, let		To demonstrate the recursion, let
Line 36:		Line 40:
	where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_t</math> is the i-th row of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = i</math> at time <math>t</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)\|hadamard product]] between the transpose of <math>\mathbf{b}_t</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.		where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_t</math> is the i-th row of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = i</math> at time <math>t</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)\|hadamard product]] between the transpose of <math>\mathbf{b}_t</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.

	The initial condition is set as ~~some~~ prior probability over <math>x_0</math> as		The initial condition is set in accordance to the prior probability over <math>x_0</math> as

	::<math>\alpha(x_0) = p(x_0)~~</math> such that <math>\sum_{x_0} \alpha~~(x_0) ~~= 1.~~</math>		::<math>\alpha(x_0) = p(y_0\|x_0)p(x_0)</math>.

	Once the joint probability <math>\alpha(x_t) = p(x_t,y_{1:t})</math> has been computed using the forward algorithm, we can easily obtain the related joint probability <math>p(y_{1:t})</math> as		Once the joint probability <math>\alpha(x_t) = p(x_t,y_{1:t})</math> has been computed using the forward algorithm, we can easily obtain the related joint probability <math>p(y_{1:t})</math> as

	::<math>~~\beta_t =~~ p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \alpha(x_t)</math>		::<math>p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \alpha(x_t)</math>

	and the required conditional probability <math>p(x_t\|y_{1:t})</math> as		and the required conditional probability <math>p(x_t\|y_{1:t})</math> as

	::<math>p(x_t\|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\alpha(x_t)}{\~~beta_t~~}.</math>		::<math>p(x_t\|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\alpha(x_t)}{\sum_{x_t} \alpha(x_t)}.</math>

	Once the conditional probability has been calculated, we can also find the point estimate of <math>x_t</math>. For instance, the MAP estimate of <math>x_t</math> is given by		Once the conditional probability has been calculated, we can also find the point estimate of <math>x_t</math>. For instance, the MAP estimate of <math>x_t</math> is given by
Line 54:		Line 58:
	while the MMSE estimate of <math>x_t</math> is given by		while the MMSE estimate of <math>x_t</math> is given by

	::<math>\widehat{x}_t^{MMSE} = \mathbb{E}[x_t\|y_{1:t}] = \sum_{x_t} x_t p(x_t\|y_{1:t}) = \frac{1}{\~~beta_t~~}\sum_{x_t} ~~x_t~~ \alpha(x_t).</math>		::<math>\widehat{x}_t^{MMSE} = \mathbb{E}[x_t\|y_{1:t}] = \sum_{x_t} x_t p(x_t\|y_{1:t}) = \frac{\sum_{x_t} x_t \alpha(x_t)}{\sum_{x_t} \alpha(x_t)}.</math>

	The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the [[Linear–quadratic_regulator#Finite-horizon,_discrete-time_LQR\|Markov jump linear system]].		The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the [[Linear–quadratic_regulator#Finite-horizon,_discrete-time_LQR\|Markov jump linear system]].

← Previous revision		Revision as of 00:07, 25 April 2024
Line 21:		Line 21:
	To demonstrate the recursion, let		To demonstrate the recursion, let

	::<math>\~~alpha_t~~(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t})</math>.		::<math>\alpha(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t})</math>.

	Using the [[Chain rule (probability)\|chain rule]] to expand <math>p(x_t,x_{t-1},y_{1:t})</math>, we can then write		Using the [[Chain rule (probability)\|chain rule]] to expand <math>p(x_t,x_{t-1},y_{1:t})</math>, we can then write

	::<math>\~~alpha_t~~(x_t) = \sum_{x_{t-1}}p(y_t\|x_t,x_{t-1},y_{1:t-1})p(x_t\|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1})</math>.		::<math>\alpha(x_t) = \sum_{x_{t-1}}p(y_t\|x_t,x_{t-1},y_{1:t-1})p(x_t\|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1})</math>.

	Because <math>y_t</math> is conditionally independent of everything but <math>x_t</math>, and <math>x_t</math> is conditionally independent of everything but <math>x_{t-1}</math>, this simplifies to		Because <math>y_t</math> is conditionally independent of everything but <math>x_t</math>, and <math>x_t</math> is conditionally independent of everything but <math>x_{t-1}</math>, this simplifies to

	::<math>\~~alpha_t~~(x_t) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\~~alpha_{t-1}~~(x_{t-1})</math>.		::<math>\alpha(x_t) = p(y_t\|x_t)\sum_{x_{t-1}}p(x_t\|x_{t-1})\alpha(x_{t-1})</math>.

	Thus, since <math>p(y_t\|x_t)</math> and <math>p(x_t\|x_{t-1})</math> are given by the model's [[Hidden Markov model#Structural_architecture\|emission distributions]] and [[Hidden Markov model#Structural_architecture\|transition probabilities]], which are assumed to be known, one can quickly calculate <math>\~~alpha_t~~(x_t)</math> from <math>\~~alpha_{t-1}~~(x_{t-1})</math> and avoid incurring exponential computation time.		Thus, since <math>p(y_t\|x_t)</math> and <math>p(x_t\|x_{t-1})</math> are given by the model's [[Hidden Markov model#Structural_architecture\|emission distributions]] and [[Hidden Markov model#Structural_architecture\|transition probabilities]], which are assumed to be known, one can quickly calculate <math>\alpha(x_t)</math> from <math>\alpha(x_{t-1})</math> and avoid incurring exponential computation time.

	The recursion formula given above can be written in a more compact form. Let as <math>a_{ij}=p(x_t=j\|x_t=i)</math> be the transition probabilities and <math>b_{ij}=p(y_t=j\|x_t=i)</math> be the emission probabilities, then		The recursion formula given above can be written in a more compact form. Let <math>a_{ij}=p(x_t=j\|x_t=i)</math> be the transition probabilities and <math>b_{ij}=p(y_t=j\|x_t=i)</math> be the emission probabilities, then

	::<math>\mathbf{\alpha}_t = \mathbf{b}_j \odot \mathbf{A} \mathbf{\alpha}_{t-1}</math>		::<math>\mathbf{\alpha}_t = \mathbf{b}_j \odot \mathbf{A} \mathbf{\alpha}_{t-1}</math>

	where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_j</math> is the j-th column of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = j</math>, and <math>\mathbf{\alpha}_t = [\~~alpha_t~~(x_t=1),\ldots, \~~alpha_t~~(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)\|hadamard product]] between <math>\mathbf{b}_j</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.		where <math>\mathbf{A} = [a_{ij}]</math> is the transition probability matrix, <math>\mathbf{b}_j</math> is the j-th column of the emission probability matrix <math>\mathbf{B} = [b_{ij}]</math> which corresponds to the actual observation <math>y_t = j</math>, and <math>\mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T</math> is the alpha vector. The <math>\odot</math> is the [[Hadamard product (matrices)\|hadamard product]] between <math>\mathbf{b}_j</math> and <math>\mathbf{A} \mathbf{\alpha}_{t-1}</math>.

	The initial condition is set as some prior probability over <math>x_0</math> as		The initial condition is set as some prior probability over <math>x_0</math> as

	::<math>\~~alpha_0~~(x_0) = p(x_0)</math> such that <math>\sum_{x_0} \~~alpha_0~~(x_0) = 1.</math>		::<math>\alpha(x_0) = p(x_0)</math> such that <math>\sum_{x_0} \alpha(x_0) = 1.</math>

	Once the joint probability <math>\~~alpha_t~~(x_t) = p(x_t,y_{1:t})</math> has been computed using the forward algorithm, we can easily obtain the related joint probability <math>p(y_{1:t})</math> as		Once the joint probability <math>\alpha(x_t) = p(x_t,y_{1:t})</math> has been computed using the forward algorithm, we can easily obtain the related joint probability <math>p(y_{1:t})</math> as

	::<math>\beta_t = p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \~~alpha_t~~(x_t)</math>		::<math>\beta_t = p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \alpha(x_t)</math>

	and the required conditional probability <math>p(x_t\|y_{1:t})</math> as		and the required conditional probability <math>p(x_t\|y_{1:t})</math> as

	::<math>p(x_t\|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\~~alpha_t~~(x_t)}{\beta_t}.</math>		::<math>p(x_t\|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\alpha(x_t)}{\beta_t}.</math>

	Once the conditional probability has been calculated, we can also find the point estimate of <math>x_t</math>. For instance, the MAP estimate of <math>x_t</math> is given by		Once the conditional probability has been calculated, we can also find the point estimate of <math>x_t</math>. For instance, the MAP estimate of <math>x_t</math> is given by

	::<math>\widehat{x}_t^{MAP} = \arg \max_{x_t} \; p(x_t\|y_{1:t}) = \arg \max_{x_t} \; \~~alpha_t~~(x_t),</math>		::<math>\widehat{x}_t^{MAP} = \arg \max_{x_t} \; p(x_t\|y_{1:t}) = \arg \max_{x_t} \; \alpha(x_t),</math>

	while the MMSE estimate of <math>x_t</math> is given by		while the MMSE estimate of <math>x_t</math> is given by

	::<math>\widehat{x}_t^{MMSE} = \mathbb{E}[x_t\|y_{1:t}] = \sum_{x_t} x_t p(x_t\|y_{1:t}) = \frac{1}{\beta_t}\sum_{x_t} x_t \~~alpha_t~~(x_t).</math>		::<math>\widehat{x}_t^{MMSE} = \mathbb{E}[x_t\|y_{1:t}] = \sum_{x_t} x_t p(x_t\|y_{1:t}) = \frac{1}{\beta_t}\sum_{x_t} x_t \alpha(x_t).</math>

	The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the [[Linear–quadratic_regulator#Finite-horizon,_discrete-time_LQR\|Markov jump linear system]].		The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the [[Linear–quadratic_regulator#Finite-horizon,_discrete-time_LQR\|Markov jump linear system]].