Longest common subsequence

The Longest-common subsquence problem is the search for an efficient method of finding the longest common subsequence (LCS). This computer science problem has gained promience thanks in part to the Bioinformatics field.

An old method of searching for LCS was to employ a brute force policy: Given a sequence X, determine all possible of subsequences of X, and check to see if each subsequence was a subsequence of Y, keeping track of the longest subsequence found. Each subsequence of X would be in the set of {1,2,3,4,....,k}. Using number theory proofs, we find that there would be 2^k subsequences of X. This would be in exponential time, making this search extremely ineffective for long sequences, such as human DNA strands.

TODO: Add in four-step algorithm which calculates LCS in linear time

Four Steps to LCS, Linear Time Edition

1. Analyze LCS properties
Many computer scientists have written papers on LCS properties, including one where LCS has an optimal-substructure property.

The Optimal-Substructure of an LCS Theorem is

Let X = < x₁,...,x_m > and Y = < y₁,...,y_n > be sequences, and let Z = < z₁,...,z_k > be any LCS of X and Y.

If x_m = y_n, then z_k = x_m = y_n and Z_k-1 is an LCS of X_m-1 and Y_n-1.
If x_m ≠ y_n, then z_k ≠ x_m, implies that Z is an LCS of X_m-1 and Y.
If x_m ≠ y_n, then z_k ≠ y_n implies that Z is an LCS of X and Y_n-1.

2. Devise a recursive solution

TODO

3. Compute the LCS

TODO

4. Construct the LCS

TODO

Modern LCS search methods have yielded algorithms which have cut down the exponential time to linear time. The LCS continues to be examined by computer scientists, trying to find an even faster time, perhaps one in logarithmic time.