Jump to content

String-to-string correction problem: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Add inline citations, and correct the definition of the problem
New section: Extension
Line 1: Line 1:
In [[computer science]], the '''string-to-string correction problem''' refers to determining the minimum cost sequence of edit operations necessary to change one [[String (computer science)|string]] into another (i.e., computing the shortest [[edit distance]]). Each type of edit operation has its own cost value.<ref name="wagnerfischer">{{cite journal |first=Robert A. |last=Wagner |first2=Michael J. |last2=Fischer |author2-link=Michael J. Fischer |title=The String-to-String Correction Problem |journal=Journal of the ACM |volume=21 |issue=1 |year=1974 |pages=168–173 |doi= 10.1145/321796.321811}}</ref> A single edit operation may be changing a single [[Character (computing)|symbol]] of the string into another (cost W<sub>C</sub>), deleting a symbol (cost W<sub>D</sub>), or inserting a symbol (cost W<sub>I</sub>).<ref name="lowrancewagner">{{cite journal |last1=Lowrance |first1=Roy |last2=Wagner |first2=Robert A. |title=An Extension of the String-to-String Correction Problem |journal=Journal of the ACM |date=April 1975 |volume=22 |issue=2 |pages=177–183 |doi=10.1145/321879.321880 |url=https://dl.acm.org/doi/10.1145/321879.321880}}</ref>
In [[computer science]], the '''string-to-string correction problem''' refers to determining the minimum cost sequence of edit operations necessary to change one [[String (computer science)|string]] into another (i.e., computing the shortest [[edit distance]]). Each type of edit operation has its own cost value.<ref name="wagnerfischer">{{cite journal |first=Robert A. |last=Wagner |first2=Michael J. |last2=Fischer |author2-link=Michael J. Fischer |title=The String-to-String Correction Problem |journal=Journal of the ACM |volume=21 |issue=1 |year=1974 |pages=168–173 |doi= 10.1145/321796.321811}}</ref> A single edit operation may be changing a single [[Character (computing)|symbol]] of the string into another (cost W<sub>C</sub>), deleting a symbol (cost W<sub>D</sub>), or inserting a new symbol (cost W<sub>I</sub>).<ref name="lowrancewagner">{{cite journal |last1=Lowrance |first1=Roy |last2=Wagner |first2=Robert A. |title=An Extension of the String-to-String Correction Problem |journal=Journal of the ACM |date=April 1975 |volume=22 |issue=2 |pages=177–183 |doi=10.1145/321879.321880 |url=https://dl.acm.org/doi/10.1145/321879.321880}}</ref>


If all edit operations have the same unit costs (W<sub>C</sub> = W<sub>D</sub> = W<sub>I</sub> = 1) the problem is the same as computing the [[Levenshtein distance]] of the two strings.
If all edit operations have the same unit costs (W<sub>C</sub> = W<sub>D</sub> = W<sub>I</sub> = 1) the problem is the same as computing the [[Levenshtein distance]] of the two strings.
Line 5: Line 5:
Several [[algorithm]]s exist to provide an efficient way to determine string distance and specify the minimum number of transformation operations required.<ref>[[Edit distance#Computation]]</ref><ref>{{cite journal |first=Walter F. |last=Tichy |title=The string-to-string correction problem with block moves |journal=ACM Transactions on Computer Systems |volume=2 |issue=4 |year=1984 |pages=309–321 |doi= 10.1145/357401.357404|url=http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1377&context=cstech }}</ref> Such algorithms are particularly useful for [[Delta encoding|delta]] creation operations where something is stored as a set of differences relative to a base version. This allows several versions of a single object to be stored much more efficiently than storing them separately. This holds true even for single versions of several objects if they do not differ greatly, or anything in between.
Several [[algorithm]]s exist to provide an efficient way to determine string distance and specify the minimum number of transformation operations required.<ref>[[Edit distance#Computation]]</ref><ref>{{cite journal |first=Walter F. |last=Tichy |title=The string-to-string correction problem with block moves |journal=ACM Transactions on Computer Systems |volume=2 |issue=4 |year=1984 |pages=309–321 |doi= 10.1145/357401.357404|url=http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1377&context=cstech }}</ref> Such algorithms are particularly useful for [[Delta encoding|delta]] creation operations where something is stored as a set of differences relative to a base version. This allows several versions of a single object to be stored much more efficiently than storing them separately. This holds true even for single versions of several objects if they do not differ greatly, or anything in between.
Notably, such difference algorithms are used in [[molecular biology]] to provide some measure of kinship between different kinds of organisms based on the similarities of their [[macromolecule]]s (such as [[protein]]s or [[DNA]]).
Notably, such difference algorithms are used in [[molecular biology]] to provide some measure of kinship between different kinds of organisms based on the similarities of their [[macromolecule]]s (such as [[protein]]s or [[DNA]]).

== Extension ==

The extended variant of the problem includes a new type of edit operation: swapping any two adjacent symbols, with a cost of W<sub>S</sub>.

This version can be solved in polynomial time under certain restrictions on edit operation costs.<ref name="lowrancewagner" /><ref name="wagner">{{cite journal |last1=Wagner |first1=Robert A. |title=On the complexity of the Extended String-to-String Correction Problem |journal=STOC '75: Proceedings of the seventh annual ACM symposium on Theory of computing |date=May 1975 |pages=218–223 |doi=10.1145/800116.803771 |url=https://dl.acm.org/doi/10.1145/800116.803771}}</ref>

Robert A. Wagner (1975) showed if W<sub>I</sub> < W<sub>C</sub> = W<sub>D</sub> = ∞ and 0 < W<sub>S</sub> < ∞, then the problem is [[NP-complete]].<ref name="wagner" />


== References ==
== References ==

Revision as of 04:36, 1 April 2022

In computer science, the string-to-string correction problem refers to determining the minimum cost sequence of edit operations necessary to change one string into another (i.e., computing the shortest edit distance). Each type of edit operation has its own cost value.[1] A single edit operation may be changing a single symbol of the string into another (cost WC), deleting a symbol (cost WD), or inserting a new symbol (cost WI).[2]

If all edit operations have the same unit costs (WC = WD = WI = 1) the problem is the same as computing the Levenshtein distance of the two strings.

Several algorithms exist to provide an efficient way to determine string distance and specify the minimum number of transformation operations required.[3][4] Such algorithms are particularly useful for delta creation operations where something is stored as a set of differences relative to a base version. This allows several versions of a single object to be stored much more efficiently than storing them separately. This holds true even for single versions of several objects if they do not differ greatly, or anything in between. Notably, such difference algorithms are used in molecular biology to provide some measure of kinship between different kinds of organisms based on the similarities of their macromolecules (such as proteins or DNA).

Extension

The extended variant of the problem includes a new type of edit operation: swapping any two adjacent symbols, with a cost of WS.

This version can be solved in polynomial time under certain restrictions on edit operation costs.[2][5]

Robert A. Wagner (1975) showed if WI < WC = WD = ∞ and 0 < WS < ∞, then the problem is NP-complete.[5]

References

  1. ^ Wagner, Robert A.; Fischer, Michael J. (1974). "The String-to-String Correction Problem". Journal of the ACM. 21 (1): 168–173. doi:10.1145/321796.321811.
  2. ^ a b Lowrance, Roy; Wagner, Robert A. (April 1975). "An Extension of the String-to-String Correction Problem". Journal of the ACM. 22 (2): 177–183. doi:10.1145/321879.321880.
  3. ^ Edit distance#Computation
  4. ^ Tichy, Walter F. (1984). "The string-to-string correction problem with block moves". ACM Transactions on Computer Systems. 2 (4): 309–321. doi:10.1145/357401.357404.
  5. ^ a b Wagner, Robert A. (May 1975). "On the complexity of the Extended String-to-String Correction Problem". STOC '75: Proceedings of the seventh annual ACM symposium on Theory of computing: 218–223. doi:10.1145/800116.803771.