Boolean satisfiability algorithm heuristics

Given a Boolean expression $B$ with $V={v_{0},...,v_{n}}$ variables, finding an assignment $V^{*}$ of the variables such that $B(V^{*})$ is true is called the Boolean satisfiability problem, frequently abbreviated SAT, and is seen as the canonical NP-complete problem.

Although no known algorithm is known to solve SAT in polynomial time, there are classes of SAT problems which do have efficient algorithms that solve them. These classes of problems arise from many practical problems in AI planning, circuit testing, and software verification.^[1]^[2] Research on constructing efficient SAT solvers has been based on various principles such as resolution, search, local search and random walk, binary decisions, and Stalmarck's algorithm.^[2]

Some of these algorithms are deterministic, while others may be stochastic.

As there exist polynomial-time algorithms to convert any Boolean expression to conjunctive normal form (link to this), posing SAT problems in CNF does not change their computational difficulty. SAT problems are canonically expressed in CNF because CNF has some nice properties that can help prune the search space and speed up the search process.^[2]

Branching Heuristics in Conflict-Driven Algorithms ^[2]

One of the cornerstone Conflict-Driven Clause Learning SAT solver algorithms is the DPLL algorithm. The algorithm works by iteratively assigning free variables, and when the algorithm encounters a bad assignment, then it backtracks to a previous iteration and chooses a different assignment of variables. It relies on a Branching Heuristic to pick the next free variable assignment; the branching algorithm effectively makes choosing the variable assignment into a decision tree. Different implementations of this heuristic produce markedly different decision trees, and thus have significant effect on the efficiency of the solver.

Early branching Heuristics (Bohm's Heuristic, Mamimum Occurences on Minimum sized clauses heuristic, and Jeroslow-Wang heuristic) can be regarded as greedy algorithms. Their basic premise is to choose a free variable assignment that will satisfy the most already unsatisfied clauses in the Boolean expression. However, as Boolean expressions get larger, more complicated, or more structured, these heuristics fail to capture useful information about these problems that could improve efficiency. Additionally, larger problems require more processing, as the operation of counting free variables in unsatisfied clauses dominates the run-time.

Another heuristic called Variable State Independent Decaying Sum (VSIDS) attempts to score each variable. VSIDS starts by looking at small portions of the Boolean expression and assigning each phase of a variable (a variable and its negated complement) a score proportional to the number of clauses that variable phase is in. As VSIDS progresses and searches more parts of the Boolean expression, periodically, all scores are divided by a constant. This discounts the effect of the presence of variables in earlier-found clauses in favor of variables with a greater presence in more recent clauses. VSIDS will select the variable phase with the highest score to determine where to branch.

VSIDS is quite effective because the scores of variable phases is independent of the current variable assignment, so backtracking is much easier. Further, VSIDS guarantees that each variable assignment satisfies the greatest number of recently searched segments of the Boolean expression.

Stochastic Solvers ^[3]

MAX-SAT (the version of SAT in which the number of satisfied clauses is maximized) solvers can also be solved using probabilistic algorithms. If we are given a Boolean expression $B$ , with $V={v_{0},...,v_{n}}$ variables and we set each variable randomly, then each clause $c$ , with $|c|$ variables, has a chance of being satisfied by a particular variable assignment Pr( $c$ is satisfied) = $1-2^{-|c|}$ . This is because each variable in $c$ has ${\frac {1}{2}}$ probability of being satisfied, and we only need one variable in $c$ to be satisfied. This works $\forall ~|c|\geq 1$ , so Pr( $c$ is satisfied) = $1-2^{-|c|}\geq {\frac {1}{2}}$ .

Now we show that randomly assigning variable values is a ${\frac {1}{2}}$ -approximation algorithm, which means that is an optimal approximation algorithm unless P = NP. Suppose we are given a Boolean expression $B=\{c_{i}\}_{i=1}^{n}$ and $\delta _{ij}={\begin{cases}0&{\text{if }}c_{i}{\text{is satisfied}},\\1&{\text{if }}c_{i}{\text{is not satisfied}}.\end{cases}}$

$E[NumClausesSatsified]=\sum _{i}E[\delta _{i}]=\sum _{i}1-2^{-|c_{i}|}\geq \sum _{i}{\frac {1}{2}}={\frac {1}{2}}|i|={\frac {1}{2}}OPT$

This algorithm cannot be further optimized by the PCP theorem unless P = NP.

Other Stochastic SAT solvers, such as WalkSAT and GSAT are an improvement to the above procedure. They start by randomly assigning values to each variable and then traverse the given Boolean expression to identify which variables to flip to minimize the number of unsatisfied clauses. They may randomly select a variable to flip or select a new random variable assignment to escape local maxima, much like a simulated annealing algorithm.

2-SAT Heuristics

Unlike general SAT problems, 2-SAT problems are tractable. There exist algorithms that can compute the satisfiability of a 2-SAT problem in polynomial time. This is a result of the constraint that each clause has only two variables, so when an algorithm sets a variable $v_{i}$ , the satisfaction of clauses, which contain $v_{i}$ but are not satisfied by that variable assignment, depend on the satisfaction of the second variable in those clauses, which leaves only one possible assignment for those variables.

Backtracking

Suppose we are given a Boolean expressions:

$B_{1}=(v_{3}\lor \neg v_{2})\wedge (\neg v_{1}\lor \neg v_{3})$

$B_{2}=(v_{3}\lor \neg v_{2})\wedge (\neg v_{1}\lor \neg v_{3})\wedge (\neg v_{1}\lor v_{2})$ .

With $B_{1}$ , the algorithm can select $v_{1}=true$ , so to satisfy the second clause, the algorithm will need to set $v_{3}=false$ , and resultantly to satisfy the first clause, the algorithm will set $v_{2}=false$ .

If the algorithm tries to satisfy $B_{2}$ in the same way it tried to solve $B_{1}$ , then the third clause will remain unsatisfied. This will cause the algorithm to backtrack and set $v_{1}=false$ and continue assigning variables further.

Graph Reduction ^[4]

2-SAT problems can also be reduced to running a depth first search on a strongly connected components of a graph. Each variable phase (a variable and its negated complement) is connected to other variable phases based on implications. In the same way when the algorithm above tried to solve $B_{1}$ , $v_{1}=true\implies v_{3}=false\implies v_{2}=false\implies v_{1}=true$ .

However, when the algorithm tried solve $B_{2}$ , $v_{1}=true\implies v_{3}=false\implies v_{2}=false\implies v_{1}=false\implies ...\implies v_{1}=true$ , which is a contradiction.

Once a 2-SAT problem is reduced to a graph, then if a depth first search finds a strongly connected component with both phases of a variable, then the 2-SAT problem is not satisfiable. Likewise, if the depth first search does not find a strongly connected component with both phases of a variable, then the 2-SAT problem is satisfiable.

Data Structures for Storing Clauses ^[2]

As SAT solvers and practical SAT problems (e.g. circuit verification) get more advanced, the Boolean expressions of interest may exceed millions of variables with several million clauses; therefore, efficient data structures to store and evaluate the clauses must be used.

Expressions can be stored as a list of clauses, where each clause is a list of variables, much like an adjacency list. Though these data structures are convenient for manipulation (adding elements, deleting elements, etc.), they rely on many points, which increases their memory overhead, decreases cache locality, and increases cache misses, which renders them impractical for problems with large clause counts and large clause sizes.

When clause sizes are large, more efficient analogous implementations include storing expressions as a list of clauses, where each clause is represented as a matrix that represents the clauses and the variables present in that clause, much like an adjacency matrix. The elimination of pointers and the contiguous memory occupation of arrays serve to decrease memory usage and increase cache locality and cache hits, which offers a run-time speed up compared to the aforesaid implementation.

References

[1] ttp://www.aloul.net/Papers/faloul_icmsao05.pdf

[princeton-2] ttps://www.princeton.edu/~chaff/publication/cade_cav_2002.pdf

[3] ttp://math.mit.edu/~goemans/18434S06/max-sat-phil.pdf

[4] ttp://www.greatandlittle.com/studios/index.php?post/2013/03/26/Strongly-Connected-Components-and-the-2-SAT-Problem-in-Dart

[1]

[2]

[3]

[4]

Branching Heuristics in Conflict-Driven Algorithms [2]

Stochastic Solvers [3]