Interesting post, but I’m not sure this really speaks to what goes into actually writing what would be considered a “fast” SAT solver. It seems more like a post about how SAT pops up in a lot of places if you look at them right. For the state of the art in what constitutes fast solvers, the annual SAT competition papers are quite interesting to read if you’re interested in the techniques people come up with to make them fast. A few years ago I was working through Knuth’s satisfiability book and writing my own solvers, and was always amazed how stunningly fast the SAT competition winners were compared to the ones I’d code up.
SAT turns up everywhere because it's almost universal kind of problem. Since it is NP complete, everything in NP can be transformed into an instance of SAT. Since P is a subset of NP, everything in P can be also be turned into an instance of SAT. Nobody knows if things in PSPACE can be, though.
Add to this that propositional logic (the language in which we express SAT) is a versatile language to code problems in. Finding cliques in a graph is also NP complete, but it is less natural to use it as a language to code other problems.
> Since P is a subset of NP, everything in P can be also be turned into an instance of SAT.
This statement is kind of trivial. The same is true for any language (other than the empty language and the language containing all strings). The reduction is (1) hardcode the values of one string, y, that is in the language and another string, z, that is not in the language (2) solve the problem on the given input x in polynomial time poly(x) (3) return y if x is to be accepted and z otherwise.
The total running time is at most poly(x)+O(|y|+|z|) which is still poly(x) since |y| and |z| are hardcoded constant values.
One would assume there are some low-hanging fruits that make up the bulk of the speed-ups, but maybe it’s really a huge pile of small incremental improvements?
My impression is that going from a very simple case like this to a moderately fast solver involves bringing in a fair number of intuitive improvements like memoization of anti-patterns and heuristic reordering of the search.
But getting a really fast solver requires multiple strategies that don't work well together (so they have to work semi-independently). This leads to other problems related to managing shared resources like how much you should let different strategies cache information at the expense of how much other strategies can cache information. Tuning all of these trade-offs is exceedingly difficult to do well since it depends a lot on the types of problems that you need to solve.
SAT shows up in a lot of problems. Im doing my PhD in multi-agent robotics after spending some time working on real life multirobot deployments. Ive been frustrated because most roboticists I talk to think SAT is a dead end, but we have been having insane advances in solver speeds over the years. I guess everyone is obsessed about the ML hypetrain right now, but where sat shines is when we need to orchestrate at a task level. I feel theres defintely work to be done to bridge both worlds.
It really helps to have a hard problem to solve in the first place. Try scheduling all the classes in a university timetable into available classrooms, subject to constraints like maximum seating and not double booking an instructor. Now try to add constraints like “art classes must be in a studio.” Then to make it really fun, “no more than one fourth of the Electrical Engineering classes may be taught outside of the EE building.”
And even better, consecutive tutorials must be in the same room or nearby rooms (tutors and teaching materials can't teleport), and there are two or three simultaneous sequences of tutorials in big courses (so consecutive tutorials actually can be in distant rooms if they're in different sequences), and the number of students requires that the university's rooms be in use 90% of the time, and medium to large lecture theatres must be in use 98% of the time.
This is an annual battle I engage in. The software sucks.
It does, but better scheduling will only go so far at an institution that underinvests in classroom space. Between “looks fancy” and “meets instructional needs,” donors prefer to have their names on “looks fancy” every time.
It's a public university, donors are not a significant source of funding, and government funding has not increased commensurately with the number of students, which has gone up dramatically in recent years (mainly due to relative cost of living, so no guarantee it will last).
The examples are in C#, but easy to adapt to other languages. I found them quite useful to write a program for symbolic execution in F# that calls Z3 to simplify conditionals, here is its interface to Z3:
The bindings are perhaps a bit cumbersome, but rather easy to understand and work very well. Once you appropriately wrap what you need, you can forget about the bindings as well as avoiding SMT-LIB completely.
For constraint programming solvers, you need to define a model (i.e. what are the variables that the solver can change). Typically, a good model naturally enforces hard constraints. For instance, consider the employee scheduling problem, where you have a list of shifts that need to be assigned a single employee. Two possible models for it are:
- Use a boolean variable that is true if and only if a particular employee is assigned to a particular shift. For 2 shifts (A, B) and 2 employees (Amy, Beth), the variables would be Amy_A, Amy_B, Beth_A, Beth_B
- Use an int variable, where each employee is mapped to a number. For 2 shifts (A, B) and 3 employees (Amy, Beth, Carl), the variables would be A, B (which will have value 0 for Amy, 1 for Beth, 2 for Carl).
Using an int variable is usually better, since it automatically encodes the "each shift must have exactly one employee constraint" which would otherwise need to be added. That being said, sometimes the boolean model is used so a SAT solver can be used instead of a Integer Linear Programming Solver.
Typically, for theorem based solvers (such as Z3 or OR Tools), you add a group of similar constraints in a loop where you iterate through relevant variables. For instance, to add constraints for overlapping shifts, you would have a directory mapping each shift to the shifts its overlaps, and add a not equals constraint for each pair (since if they are equal, they have the same employee, and employees usually are unable to be at two places at the same time).
for shift in shift_vars:
for overlapping_shift in overlapping_shifts[shift]:
solver.add(shift != overlapping_shift)
There are also local search solvers, such as Timefold, which allows you use your domain objects and functions directly in your constraints. For instance, the above constraint would look like this:
Read up on smt-lib: learning how to encode problems in that is a good way to start. The Python z3 bindings are a good starting point to play with it too.
I wish I had something to offer, but I think there is little available but grit. The solvers are magical in what they can do, but structuring a problem into the DSL is an exercise in pain. Seemingly few available public examples of patterns you can crib.
Huh. As a UMD grad who faced this same problem, it's an interesting post. What I'll say is that their ECE (which CS does not fall into, idk how they do it) department advisors gave us updated 4 year plans every semester or so. It meant I never had to worry about not having the right classes to graduate. And we had to get permission for every class which while annoying meant my advisor looked over every major-specific class. I can't even count how many people I know or know who know that wasted upwards of years on classes that didn't count. None of that for me.
On the other hand, I remember for my acceptance (which wasn't too the ECE program) I had to pick classes before I could confirm going to UMD? I don't remember why but I remember panicking because here I am, a high schooler still, picking college classes that would set my next 4 years. I was even more terrified when my first ever class not only was I late because it was in the back of a basement, but the instructor made some comment about it not being the right class for engineers or something. It was, though. You fix this and other problems:
1. I started using the UMD provided schedule planner and some 3rd party ones, with multiple backups for when they'd fill up
2. I made an app that showed most buildings' floor plans completely offline. No more wandering around during stressful times.
3. I did try and make a tree of classes using prerequisite days scraped from the SoC, but it wasn't a regular expression so I gave up immediately.
The blogger like CMSC430 and I agree it was a good class though I had a different professor.
I really like the styling of this blog. It's nice on the eyes, gets out of the way, and the collapsed containers for extra info is a nice touch. There's a bit of layout shift though, but that's about it.
> As a result, in order to determine if a formula is satisfiable, first convert it to conjunctive normal form, then convert the new formula into a course catalog.
I know this is a consequence of NP-completeness and so on and so forth, but I also find it a funny and charming way to phrase it. Once we've solved the fundamental problem (what courses to take), we're able to solve simple specializations and derivatives (boolean satisfiability).
True, finding one solution is easy but finding all the solutions can be a fun little optimization challenge.
I made a repo many years ago with a bunch of grab bag solutions for comparisons [1]; from dumb brute force to DLX (Knuth's Dancing Links) and a multithreaded bitwise backtracking algorithm. And one where I just hardcoded the answers because all the counts up to 27 are known.
So I'm all for just jumping to the existing known solutions, but it seems like the OP is having fun while they learn a little bit. They seem to just be a college freshman.