Jump to content

Boolean satisfiability algorithm heuristics: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
unsourced, misplaced, not a heuristic, and confusingly written
Genusfour (talk | contribs)
m clean up language and references
Line 1: Line 1:
In [[computer science]], there are certain classes of algorithms ([[Heuristic (computer_science)|heuristics]]) that solves types of the [[Boolean satisfiability problem]] despite there being [[P versus NP problem|no known efficient algorithm]] in the general case.
The [[Boolean satisfiability problem]] (frequently abbreviated SAT) can be stated formally as:
given a Boolean expression <Math>B</Math> with <Math>V = \{v_0, \ldots , v_n\}</Math> variables, finding an assignment <Math>V^*</Math> of the variables such that <Math>B(V^*)</Math> is true. It is seen as the canonical [[NP-complete]] problem. While no efficient algorithm is known to solve this problem in the general case, there are certain [[heuristic]]s, informally called 'rules of thumb' in programming, that can usually help solve the problem reasonably efficiently.


== Overview ==
Although no known algorithm is known to solve SAT in polynomial time, there are classes of SAT problems which do have efficient algorithms that solve them. These classes of problems arise from many practical problems in [[AI planning]], [[Circuit satisfiability problem|circuit testing]], and software verification.<ref>Aloul, Fadi A., "On Solving Optimization Problems Using Boolean Satisfiability", American University of Sharjah (2005), http://www.aloul.net/Papers/faloul_icmsao05.pdf</ref><ref name = "princeton">Zhang, Lintao. Malik, Sharad. "The Quest for Efficient Boolean Satisfiability Solvers", Department of Electrical Engineering, Princeton University. https://www.princeton.edu/~chaff/publication/cade_cav_2002.pdf</ref> Research on constructing efficient SAT solvers has been based on various principles such as [[Resolution (logic)|resolution]], search, local search and random walk, binary decisions, and Stalmarck's algorithm.<ref name = "princeton"/> Some of these algorithms are [[Deterministic algorithm|deterministic]], while others may be [[Randomized algorithm|stochastic]].

The Boolean satisfiability (or SAT) problem can be stated formally as:
given a Boolean expression <Math>B</Math> with <Math>V = \{v_0, \ldots , v_n\}</Math> variables, finding an assignment <Math>V^*</Math> of the variables such that <Math>B(V^*)</Math> is true. It is seen as the canonical [[NP-complete]] problem. Although no known algorithm is known to solve SAT in polynomial time, there are classes of SAT problems which do have efficient algorithms that solve them.

The classes of problems amenable to SAT heuristics arise from many practical problems in [[AI planning]], [[Circuit satisfiability problem|circuit testing]], and software verification.<ref>Aloul, Fadi A., "On Solving Optimization Problems Using Boolean Satisfiability", American University of Sharjah (2005), http://www.aloul.net/Papers/faloul_icmsao05.pdf</ref><ref name = "princeton">Zhang, Lintao. Malik, Sharad. "The Quest for Efficient Boolean Satisfiability Solvers", Department of Electrical Engineering, Princeton University. https://www.princeton.edu/~chaff/publication/cade_cav_2002.pdf</ref> Research on constructing efficient SAT solvers has been based on various principles such as [[Resolution (logic)|resolution]], search, local search and random walk, binary decisions, and Stalmarck's algorithm.<ref name = "princeton"/> Some of these algorithms are [[Deterministic algorithm|deterministic]], while others may be [[Randomized algorithm|stochastic]].


As there exist polynomial-time algorithms to convert any Boolean expression to [[conjunctive normal form]] such as [[Tseytin transformation|Tseitin's algorithm]], posing SAT problems in CNF does not change their computational difficulty. SAT problems are canonically expressed in CNF because CNF has certain properties that can help prune the search space and speed up the search process.<ref name = "princeton"/>
As there exist polynomial-time algorithms to convert any Boolean expression to [[conjunctive normal form]] such as [[Tseytin transformation|Tseitin's algorithm]], posing SAT problems in CNF does not change their computational difficulty. SAT problems are canonically expressed in CNF because CNF has certain properties that can help prune the search space and speed up the search process.<ref name = "princeton"/>
Line 8: Line 12:
== Branching heuristics in conflict-driven algorithms ==
== Branching heuristics in conflict-driven algorithms ==


One of the cornerstone [[Conflict-Driven Clause Learning]] SAT solver algorithms is the [[DPLL algorithm]]<ref name = "princeton"/>. The algorithm works by iteratively assigning free variables, and when the algorithm encounters a bad assignment, then it backtracks to a previous iteration and chooses a different assignment of variables. It relies on a Branching Heuristic to pick the next free variable assignment; the branching algorithm effectively makes choosing the variable assignment into a decision tree. Different implementations of this heuristic produce markedly different decision trees, and thus have significant effect on the efficiency of the solver.
Source:<ref name = "princeton"/>


=== Early branching Heuristics ===
One of the cornerstone [[Conflict-Driven Clause Learning]] SAT solver algorithms is the [[DPLL algorithm]]. The algorithm works by iteratively assigning free variables, and when the algorithm encounters a bad assignment, then it backtracks to a previous iteration and chooses a different assignment of variables. It relies on a Branching Heuristic to pick the next free variable assignment; the branching algorithm effectively makes choosing the variable assignment into a decision tree. Different implementations of this heuristic produce markedly different decision trees, and thus have significant effect on the efficiency of the solver.


Early branching Heuristics (Bohm's Heuristic, Maximum Occurrences on Minimum sized clauses heuristic, and Jeroslow-Wang heuristic) can be regarded as [[greedy algorithms]]. Their basic premise is to choose a free variable assignment that will satisfy the most already unsatisfied clauses in the Boolean expression. However, as Boolean expressions get larger, more complicated, or more structured, these heuristics fail to capture useful information about these problems that could improve efficiency; they often get stuck in local maxima or do not consider the distribution of variables. Additionally, larger problems require more processing, as the operation of counting free variables in unsatisfied clauses dominates the run-time.
Heuristics such as Bohm's Heuristic, Maximum Occurrences on Minimum sized clauses heuristic, and Jeroslow-Wang heuristic can be regarded as [[greedy algorithms]]. Their basic premise is to choose a free variable assignment that will satisfy the most already unsatisfied clauses in the Boolean expression. However, as Boolean expressions get larger, more complicated, or more structured, these heuristics fail to capture useful information about these problems that could improve efficiency; they often get stuck in local maxima or do not consider the distribution of variables. Additionally, larger problems require more processing, as the operation of counting free variables in unsatisfied clauses dominates the run-time.


=== Variable State Independent Decaying Sum ===
Another heuristic called Variable State Independent Decaying Sum (VSIDS) attempts to score each variable. VSIDS starts by looking at small portions of the Boolean expression and assigning each phase of a variable (a variable and its negated complement) a score proportional to the number of clauses that variable phase is in. As VSIDS progresses and searches more parts of the Boolean expression, periodically, all scores are divided by a constant. This discounts the effect of the presence of variables in earlier-found clauses in favor of variables with a greater presence in more recent clauses. VSIDS will select the variable phase with the highest score to determine where to branch.
An influential heuristic called Variable State Independent Decaying Sum (VSIDS) attempts to score each variable. VSIDS starts by looking at small portions of the Boolean expression and assigning each phase of a variable (a variable and its negated complement) a score proportional to the number of clauses that variable phase is in. As VSIDS progresses and searches more parts of the Boolean expression, periodically, all scores are divided by a constant. This discounts the effect of the presence of variables in earlier-found clauses in favor of variables with a greater presence in more recent clauses. VSIDS will select the variable phase with the highest score to determine where to branch.


VSIDS is quite effective because the scores of variable phases is independent of the current variable assignment, so backtracking is much easier. Further, VSIDS guarantees that each variable assignment satisfies the greatest number of recently searched segments of the Boolean expression.
VSIDS is quite effective because the scores of variable phases is independent of the current variable assignment, so backtracking is much easier. Further, VSIDS guarantees that each variable assignment satisfies the greatest number of recently searched segments of the Boolean expression.
Line 20: Line 25:
== Stochastic solvers ==
== Stochastic solvers ==


Source:<ref>Sung, Phil. "Maximum Satisfiability" (2006) http://math.mit.edu/~goemans/18434S06/max-sat-phil.pdf</ref>
For [[MAX-SAT]], the version of SAT in which the number of satisfied clauses is maximized, solvers also use probabilistic algorithms<ref>Sung, Phil. "Maximum Satisfiability" (2006) http://math.mit.edu/~goemans/18434S06/max-sat-phil.pdf</ref>. If we are given a Boolean expression <Math>B</Math>, with <Math>V = \{v_0, \ldots , v_n\}</Math> variables and we set each variable randomly, then each clause <Math>c</Math>, with <Math>|c|</Math> variables, then the chance of <math>B</math> being satisfied by a particular variable assignment is
:Pr(<Math>c</Math> is satisfied) = <Math>1-2^{-|c|}</Math>.
This is because each variable in <Math>c</Math> has probability ½ of being satisfied, and we only need one variable in <Math>c</Math> to be satisfied. This works for all <Math> |c| \geq 1</Math>, so
: Pr(<Math>c</Math> is satisfied) = <Math>1-2^{-|c|} \geq \frac{1}{2}</Math>.


Now we show that randomly assigning variable values is a ½-approximation algorithm, which means that is an optimal approximation algorithm unless P = NP. Suppose we are given a Boolean expression <Math>B = \{c_i\}_{i=1}^n</Math> and
[[MAX-SAT]] (the version of SAT in which the number of satisfied clauses is maximized) solvers can also be solved using probabilistic algorithms. If we are given a Boolean expression <Math>B</Math>, with <Math>V = \{v_0, \ldots , v_n\}</Math> variables and we set each variable randomly, then each clause <Math>c</Math>, with <Math>|c|</Math> variables, has a chance of being satisfied by a particular variable assignment Pr(<Math>c</Math> is satisfied) = <Math>1-2^{-|c|}</Math>. This is because each variable in <Math>c</Math> has <Math>\frac{1}{2}</Math> probability of being satisfied, and we only need one variable in <Math>c</Math> to be satisfied. This works <Math>\forall ~|c| \geq 1</Math>, so Pr(<Math>c</Math> is satisfied) = <Math>1-2^{-|c|} \geq \frac{1}{2}</Math>.

Now we show that randomly assigning variable values is a <Math>\frac{1}{2}</Math>-approximation algorithm, which means that is an optimal approximation algorithm unless P = NP. Suppose we are given a Boolean expression <Math>B = \{c_i\}_{i=1}^n</Math> and


: <math>\delta_{ij} = \begin{cases}
: <math>\delta_{ij} = \begin{cases}
Line 30: Line 36:
1 &\text{if } c_i \text{ is not satisfied}. \end{cases}</math>
1 &\text{if } c_i \text{ is not satisfied}. \end{cases}</math>


then we have that
: <Math>
: <Math>
\begin{align}
\begin{align}
Line 39: Line 46:
This algorithm cannot be further optimized by the [[PCP theorem]] unless P = NP.
This algorithm cannot be further optimized by the [[PCP theorem]] unless P = NP.


Other Stochastic SAT solvers, such as [[WalkSAT]] and [[GSAT]] are an improvement to the above procedure. They start by randomly assigning values to each variable and then traverse the given Boolean expression to identify which variables to flip to minimize the number of unsatisfied clauses. They may randomly select a variable to flip or select a new random variable assignment to escape local maxima, much like a [[simulated annealing]] algorithm.
Other stochastic SAT solvers, such as [[WalkSAT]] and [[GSAT]] are an improvement to the above procedure. They start by randomly assigning values to each variable and then traverse the given Boolean expression to identify which variables to flip to minimize the number of unsatisfied clauses. They may randomly select a variable to flip or select a new random variable assignment to escape local maxima, much like a [[simulated annealing]] algorithm.


== Weighted SAT problems ==
== Weighted SAT problems ==
Line 61: Line 68:
== Data structures for storing clauses ==
== Data structures for storing clauses ==


As SAT solvers and practical SAT problems (e.g. circuit verification) get more advanced, the Boolean expressions of interest may exceed millions of variables with several million clauses; therefore, efficient data structures to store and evaluate the clauses must be used <ref name = "princeton"/>.
Source:<ref name = "princeton"/>

As SAT solvers and practical SAT problems (e.g. circuit verification) get more advanced, the Boolean expressions of interest may exceed millions of variables with several million clauses; therefore, efficient data structures to store and evaluate the clauses must be used.


Expressions can be stored as a list of clauses, where each clause is a list of variables, much like an [[adjacency list]]. Though these data structures are convenient for manipulation (adding elements, deleting elements, etc.), they rely on many pointers, which increases their memory overhead, decreases [[cache locality]], and increases [[cache misses]], which renders them impractical for problems with large clause counts and large clause sizes.
Expressions can be stored as a list of clauses, where each clause is a list of variables, much like an [[adjacency list]]. Though these data structures are convenient for manipulation (adding elements, deleting elements, etc.), they rely on many pointers, which increases their memory overhead, decreases [[cache locality]], and increases [[cache misses]], which renders them impractical for problems with large clause counts and large clause sizes.

Revision as of 05:21, 7 November 2024

In computer science, there are certain classes of algorithms (heuristics) that solves types of the Boolean satisfiability problem despite there being no known efficient algorithm in the general case.

Overview

The Boolean satisfiability (or SAT) problem can be stated formally as: given a Boolean expression with variables, finding an assignment of the variables such that is true. It is seen as the canonical NP-complete problem. Although no known algorithm is known to solve SAT in polynomial time, there are classes of SAT problems which do have efficient algorithms that solve them.

The classes of problems amenable to SAT heuristics arise from many practical problems in AI planning, circuit testing, and software verification.[1][2] Research on constructing efficient SAT solvers has been based on various principles such as resolution, search, local search and random walk, binary decisions, and Stalmarck's algorithm.[2] Some of these algorithms are deterministic, while others may be stochastic.

As there exist polynomial-time algorithms to convert any Boolean expression to conjunctive normal form such as Tseitin's algorithm, posing SAT problems in CNF does not change their computational difficulty. SAT problems are canonically expressed in CNF because CNF has certain properties that can help prune the search space and speed up the search process.[2]

Branching heuristics in conflict-driven algorithms

One of the cornerstone Conflict-Driven Clause Learning SAT solver algorithms is the DPLL algorithm[2]. The algorithm works by iteratively assigning free variables, and when the algorithm encounters a bad assignment, then it backtracks to a previous iteration and chooses a different assignment of variables. It relies on a Branching Heuristic to pick the next free variable assignment; the branching algorithm effectively makes choosing the variable assignment into a decision tree. Different implementations of this heuristic produce markedly different decision trees, and thus have significant effect on the efficiency of the solver.

Early branching Heuristics

Heuristics such as Bohm's Heuristic, Maximum Occurrences on Minimum sized clauses heuristic, and Jeroslow-Wang heuristic can be regarded as greedy algorithms. Their basic premise is to choose a free variable assignment that will satisfy the most already unsatisfied clauses in the Boolean expression. However, as Boolean expressions get larger, more complicated, or more structured, these heuristics fail to capture useful information about these problems that could improve efficiency; they often get stuck in local maxima or do not consider the distribution of variables. Additionally, larger problems require more processing, as the operation of counting free variables in unsatisfied clauses dominates the run-time.

Variable State Independent Decaying Sum

An influential heuristic called Variable State Independent Decaying Sum (VSIDS) attempts to score each variable. VSIDS starts by looking at small portions of the Boolean expression and assigning each phase of a variable (a variable and its negated complement) a score proportional to the number of clauses that variable phase is in. As VSIDS progresses and searches more parts of the Boolean expression, periodically, all scores are divided by a constant. This discounts the effect of the presence of variables in earlier-found clauses in favor of variables with a greater presence in more recent clauses. VSIDS will select the variable phase with the highest score to determine where to branch.

VSIDS is quite effective because the scores of variable phases is independent of the current variable assignment, so backtracking is much easier. Further, VSIDS guarantees that each variable assignment satisfies the greatest number of recently searched segments of the Boolean expression.

Stochastic solvers

For MAX-SAT, the version of SAT in which the number of satisfied clauses is maximized, solvers also use probabilistic algorithms[3]. If we are given a Boolean expression , with variables and we set each variable randomly, then each clause , with variables, then the chance of being satisfied by a particular variable assignment is

Pr( is satisfied) = .

This is because each variable in has probability ½ of being satisfied, and we only need one variable in to be satisfied. This works for all , so

Pr( is satisfied) = .

Now we show that randomly assigning variable values is a ½-approximation algorithm, which means that is an optimal approximation algorithm unless P = NP. Suppose we are given a Boolean expression and

then we have that

This algorithm cannot be further optimized by the PCP theorem unless P = NP.

Other stochastic SAT solvers, such as WalkSAT and GSAT are an improvement to the above procedure. They start by randomly assigning values to each variable and then traverse the given Boolean expression to identify which variables to flip to minimize the number of unsatisfied clauses. They may randomly select a variable to flip or select a new random variable assignment to escape local maxima, much like a simulated annealing algorithm.

Weighted SAT problems

Numerous weighted SAT problems exist as the optimization versions of the general SAT problem. In this class of problems, each clause in a CNF Boolean expression is given a weight. The objective is the maximize or minimize the total sum of the weights of the satisfied clauses given a Boolean expression. weighted Max-SAT is the maximization version of this problem, and Max-SAT is an instance of weighted MAX-SAT problem where the weights of each clause are the same. The partial Max-SAT problem is the problem where some clauses necessarily must be satisfied (hard clauses) and the sum total of weights of the rest of the clauses (soft clauses) are to be maximized or minimized, depending on the problem. Partial Max-SAT represents an intermediary between Max-SAT (all clauses are soft) and SAT (all clauses are hard).

Note that the stochastic probabilistic solvers can also be used to find optimal approximations for Max-SAT.

Variable splitting

Variable splitting is a tool to find upper and lower bounds on a Max-SAT problem. It involves splitting a variable into new variables for all but once occurrence of in the original Boolean expression. For example, given the Boolean expression: will become: , with being all distinct variables.

This relaxes the problem by introducing new variables into the Boolean expression,[4] which has the effect of removing many of the constraints in the expression. Because any assignment of variables in can be represented by an assignment of variables in , the minimization and maximization of the weights of represent lower and upper bounds on the minimization and maximization of the weights of .

Partial Max-SAT

Partial Max-SAT can be solved by first considering all of the hard clauses and solving them as an instance of SAT. The total maximum (or minimum) weight of the soft clauses can be evaluated given the variable assignment necessary to satisfy the hard clauses and trying to optimize the free variables (the variables that the satisfaction of the hard clauses does not depend on). The latter step is an implementation of Max-SAT given some pre-defined variables. Of course, different variable assignments that satisfy the hard clauses might have different optimal free variable assignments, so it is necessary to check different hard clause satisfaction variable assignments.

Data structures for storing clauses

As SAT solvers and practical SAT problems (e.g. circuit verification) get more advanced, the Boolean expressions of interest may exceed millions of variables with several million clauses; therefore, efficient data structures to store and evaluate the clauses must be used [2].

Expressions can be stored as a list of clauses, where each clause is a list of variables, much like an adjacency list. Though these data structures are convenient for manipulation (adding elements, deleting elements, etc.), they rely on many pointers, which increases their memory overhead, decreases cache locality, and increases cache misses, which renders them impractical for problems with large clause counts and large clause sizes.

When clause sizes are large, more efficient analogous implementations include storing expressions as a list of clauses, where each clause is represented as a matrix that represents the clauses and the variables present in that clause, much like an adjacency matrix. The elimination of pointers and the contiguous memory occupation of arrays serve to decrease memory usage and increase cache locality and cache hits, which offers a run-time speed up compared to the aforesaid implementation.

References

  1. ^ Aloul, Fadi A., "On Solving Optimization Problems Using Boolean Satisfiability", American University of Sharjah (2005), http://www.aloul.net/Papers/faloul_icmsao05.pdf
  2. ^ a b c d e Zhang, Lintao. Malik, Sharad. "The Quest for Efficient Boolean Satisfiability Solvers", Department of Electrical Engineering, Princeton University. https://www.princeton.edu/~chaff/publication/cade_cav_2002.pdf
  3. ^ Sung, Phil. "Maximum Satisfiability" (2006) http://math.mit.edu/~goemans/18434S06/max-sat-phil.pdf
  4. ^ Pipatsrisawat, Knot; Palyan, Akop; Chavira, Mark; Choi, Arthur; Darwiche, Adnan (2008). "Solving Weighted Max-SAT Problems in a Reduced Search Space: A Performance Analysis". Journal on Satisfiability Boolean Modeling and Computation (JSAT). 4(2008). UCLA: 4. Retrieved 18 April 2022.