Great essay. Maybe another thing to note is that in industry we often don't need the singular best answer, but one that is "close" or "good". In fact, sometimes this is a requirement - like a sudoku solver that can handle impossible boards by generating a solution with a small number of collisions.
As evidenced by the fact that traveling salespeople exist (and are not, in general modeling experts or big AWS users), approximate solutions are often significantly easier to find - both practically and in theory.
I bring this up because it's interesting... and because my professors really didn't emphasize it until deep into my undergraduate career. So someone today might be one of today's lucky 10,000.
Right. Both the traveling salesman problem and linear programming are problems that are worst-case NP-hard, but far easier for almost all cases.
The halting problem is an extreme case of this. While there are programs for which halting is either undecidable (with infinite memory) or NP-hard, for most useful programs, it's decidable.
The Microsoft Static Driver Verifier is a proof of correctness system used by Microsoft to eliminate crash-type bugs from Windows drivers. It's the main reason that kernel drivers don't crash Windows any more. It decides the halting problem for drivers. If it can't reach a decision in some fixed length of time, the driver is rejected. If it's that difficult to decide if a kernel driver finishes, you're doing it wrong.
A problem I've thought about lately has been making a programming language that optimizes for being expressive up to, but not including, being Turing complete (specifically, making non-terminating programs unexpressible).
Basically you disallow general recursion (while loops, etc) and just allow structural/well-founded/inductive recursion (for loops, finite structure walking, etc).
The problem being that if you can reduce a problem to being well ordered on the naturals it is already in the computable set.
Note how the limits on recursion as given by the original paper apply to the fibonacci sequence, but restricted to the Natural numbers to avoid the -2 -1 1 0 1,1 2 portion that would break total ordering.
Any problem that you can map to von Neumann ordinals will be computable. Same thing with the Cantor diagonalization example below.
Basically if you can reduce your problem to a monoid, you can typically compute it easily.
But as a group has to be a magma with inverse, the recursive example provided couldn't be used in a case that function needed to be applied to algebras, rings, fields etc....
Note that as Cantor diagonalization was used as a counterexample to Laplacian causal determinism, showing that no two computational device can completely predict each other when defined by external behavior, there are further constraints.
For me, an easier lens to see if there is a P solution to a typically NP problem is looking at Schaefer's dichotomy theorem. Total Functional Programing is exactly equivalent to #3,
1) all relations which are not constantly false are true when all its arguments are true;
2) all relations which are not constantly false are true when all its arguments are false;
3) all relations are equivalent to a conjunction of binary clauses;
4) all relations are equivalent to a conjunction of Horn clauses;
5) all relations are equivalent to a conjunction of dual-Horn clauses;
6) all relations are equivalent to a conjunction of affine formulae.
The mapping to the Natural numbers is a leaky abstraction, as the Natural numbers are a non-empty totally ordered set with no upper bound.
Any function that takes finitely many natural numbers as arguments and produce a value which is a single natural number are by definition computable. That restricted problem will always be computable no matter if the more generalized problem is in NP or not.
Yeah you can definitely make a language that is not Turing complete but your goal of "up-to" is not possible.
For any complexity class C that is not Turing complete, there is a complexity class C' that is a strict super-set of C and is also not Turing complete.
Sure, the insight/intuition comes from diagonalization, or perhaps more directly from Cantor's theorem.
For any complexity class O(F(N)), there must exist a complexity class O(2^F(N)) that is strictly a superset of O(F(N)). If F(N) is decidable then so too is 2^F(N).
That's interesting. My Algorithms professor illustrated this point by handing us Kirkman's schoolgirl problem[0] as a homework assignment. Sure the naive solution falls into the category of "intractable withing the lifespan of the universe", but with some clever heuristics you can find one of the solutions quickly (A lackluster program in a couple minutes, a well-thought-through one instantly).
I remember an old physicist friend's comment about his books of tables to look up common things (sine, cosine, log, etc. -- yes, this was a long time ago). They had a precision of three decimal places. He said "for almost every problem, three is plenty. More than that just increases the work for the same result in the end."
Yep. And there are the formal optimality gaps like "this algorithm provably finds a solution in polynomial time no less than twice the optimum tour length", and there are also the informal "I have no proof of anything, but this algorithm usually finds optimal tours or tours within 1% of optimality on problems of up to 10 million cities" type of gaps. Both are useful to know and understand in practice.
If my job were to solve TSP instances, I'd not bother trying to tell someone why it's hard. I'd through Iterated Lin-Kernighan or something similar at it and go get some lunch.
Modeling of problems is something that feels like a lost art. And completely a subtext in so much of what programmers discuss.
What I mean by subtext is that OO is a way to model a problem as much as it is a way to organize code. Same for what we often call functional. Yet we spend far more time and effort talking only about what it means to organize the code, with kind of a byline on how it helps to model the solution.
I do think it is there, but by the time you show someone how a SAT solver is just working on the true/false of various variables, and that it often can be seen as "learning" combinations of those variables that are known true/false, that is so far away from the typical understanding of the problem that it looks alien.
I think that this kind of novel problem modeling was always in (comparatively) short supply in programming. It's just so _easy_ to tell the machine to do what you want it to do, that it's a tough sell to figure out how to tell it to do something completely different that your modeling says will magically make your solution easier.
It feels like progress here is really jerky, with periods of aimless milling about interspersed with breakthroughs that spread rapidly. Some examples that come readily to mind are A* graph traversal, BNF parser generators, BSP trees for game world rendering.
It seems rare that a working developer gets to sit back and think, "huh, this problem actually feels a bit like that physics/math/statistics thing I was reading about the other day/year/decade" and spend any time usefully thinking about it. So any language or library we have to hand gets bent into solving the problem the way we're thinking about it right now.
Agreed with all of that. I would add BDDs and such in there, as great breakthroughs. In a very real sense, they are essentially moving how the computer builds up logical circuits into a data structure. (That is, I'm asserting that the math unit on a computer is closer to a BDD than it is any other data structure in computer science.)
But, even outside of advanced modeling, I'm always amazed at how much effort we go through to not reduce our problems to the computation that we want them to be. Instead of seeing our programs as tools to translate from a high level model to a low level one, we seem to want that low level model to be completely implicit and not visible in what we build.
Even code I have seen that uses solvers, there is rarely a good separation of "convert this problem specification into a SAT problem" with a corresponding "convert this SAT solution into the problem specification."
I /think/ this is where programs that do focus on text seem to have a win. You have a "how to convert this text to a SAT problem", "how to convert this text to a problem specification", and "how to convert this solution to a problem specification." Stabilize what those text representations are, and you get to spend a ton of effort at each section without immediate impact on the other areas.
Sadly, I am mainly going on the praise I've seen from Knuth and similar. Use in CAD for circuits is what wikipedia has to say, but doesn't really give any links.
As such, mayhap I am misattributing how impactful they are?
My vague impression is that BDDs are good for the kinds of problems Knuth is interested in: exact answers, exhaustive enumeration, naive backtracking would be too slow, etc. So things like "there are exactly 13,267,364,410,532 undirected closed knight's tours on an 8x8 board". For those, BDDs can be much more efficient than writing out a giant truth table (so they're an amazing trick that can make certain things feasible to program that wouldn't otherwise be, and you can see why he likes them), but they hit their limits: the catch is that they are useful precisely when your Boolean function of interest has a small BDD, and not all functions will.
In some sense, optimization algorithms (for traveling salesman, integer linear programming…) are all about avoiding exhaustive enumeration with lots of tricks like branch-and-bound etc, so BDDs are not very useful there (AFAIK). The examples mentioned on Wikipedia (circuits, formal verification) are ones where exhaustive enumeration is desired and you really want to examine every case, which is not true for the bulk of SAT solver applications / other NP-complete optimization problems.
Fair. I had fun playing with a basic BDD to "solve" a linear boolean program at https://taeric.github.io/trading-with-bdds.html. That said, I know that is a toy usage of the data structure, as such I don't have any other data there. I /thought/ I had seen more examples of where it was a good data structure, but I can't find those offhand. :(
Got any ideas of books that address this? I'm generally curious about some frequent problems that pop-up all the time that would be easier solved by some theoretical algorithm like an SAT solver.
Two years ago I had to create a quarterly schedule for a team of four, and about 20 tasks with dependencies.
I had just read a piece about using SAT solvers for designing keys and locks that was on HN, and decided to model my problem as a SAT problem.
An hour later (about 59:30 coding, 00:30 minicrypto3 running), I had the perfect schedule! It was optimal since it was also proven to be the shortest time to do all tasks.
No one caring about this schedule and it didn't matter - one hours work.
The joy I got from the exercise - priceless.
Probably would have been better to use the word 'difficult' because NP-complete does in fact automatically imply NP-hard. I was expecting a very different article based on the title.
Haskell type inference is EXPTIME, O(2^p(n)), yet it works fine in practice. This reminds me of the quote by Rob Pike on algorithms completely:
>>Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don’t get fancy.
That rule, while sensible at the time, has aged pretty poorly. n used to be small in the 70s, but now people have terabytes of data and run UNIX tools over them.
Is that a theoretical possibility based on the fact that Linux can run in a memory footprint < 768MB, or has someone actually demonstrated it? Can you boot one of these processors without any RAM installed?
Coreboot sets up cache as RAM through memory mapping tricks, to have something a normal C compiler can work with. (DRAM needs to be trained and set up properly before you have any RAM; before that, you have only registers, which is pretty painful.)
This reasoning works okay until you have to deal with machine-generated inputs, at which point your app freezes when it gets sent down a quadratic or worse complexity path.
The article explains the P vs NP issue [1] very well.
If there is one thing to nitpick, then
> in fact “convert NP-complete problem to SAT” is in P.
is not the most precise, since P is a class of decision problems, while said conversion [2] is not a decision problem, but something that runs in polynomial time.
It is nitpicky, but the fact that these classes deal with decision problems and not general problems is something which feels like should be mentioned more often.
Of course this ends up not mattering in practice since we can convert a problem with arbitrary (bitstring) output into a decision problem. To do this we just include a index n in the input and make the decision problem about whether the bit at given index n is 1.
> Of course this ends up not mattering in practice since we can convert a problem with arbitrary (bitstring) output into a decision problem.
Not always. My favorite counter-example is coloring k-colorable graphs. Consider that for some reason (and there could be many), you are guaranteed that all the graphs on input are k-colorable. Still, the input is just a graph without any coloring. Your algorithm is tasked to find a proper coloring using the smallest number of colors.
It is both a problem that has been already studied in terms of approximation and at the same time an optimization problem that has no good decision version, as far as I am aware.
The comment I was referring to was talking about "decision problems and general problems" and there always being a reduction between them.
Now "general problems" is a bit vague, but in my classes on optimization the students are intuitively led to believe that optimization problems always have a decision problem associated with them, so we can talk about NP-hardness of optimization problems, too. Which is often true, but not always.
As a good example, if you consider graph coloring, you can argue that the associated decision problem is "given a graph G, and a parameter k, answer yes if G is colorable with at most k colors". This way, slightly informally, you can talk about NP-hardness of finding the smallest number of colors for a graph.
However, the optimization problem I presented -- coloring k-colorable graphs -- is a valid optimization problem, it is interesting and has been studied in the past, but it has no good decision problem associated with it.
Your conversion doesn't address determining the length of the output.
Another conversion avoids that issue, by converting a polynomial time function f into the language
> This is true in principle, but the reality is more complicated. When we say “NP-complete is hard” we’re talking about worst-case complexity, and the average-case complexity is often much more tractable. Many industry problems are “well-behaved” and modern SAT solvers can solve them quickly.
This is a really cool point, no one ever mentioned this to me when I learned about this stuff at university. I wonder though, is there data for this ? Are businesses willing to take the risk and how do they mitigate ?
I'm not sure how popular they are in the wild, but there are a lot of approximation algorithms as well for solving version of these problems that are almost-but-not-quite the same. I wonder if that's just what lots of real-world use cases actually need.
it's actually a very open question, even theoretically, whether we can build an "hard on average problem" from a "worst case hard problem" in NP. This is why we haven't yet managed to design cryptography from 3SAT, for instance.
It's kinda obscure that SAT solvers, SMT solvers, and production rules engines (all popular in the old AI) have improved greatly post-2000, almost as much as neural networks.
The wording is a tiny bit misleading because assuming P ≠ NP, then the set of SAT problems that SAT solvers can solve in polynomial time is, by definition, in P and not NP-hard (if they were, then, again, by definition of NP-hard, we could reduce all of the problems in NP to those in that set, proving P = NP, contrary to our assumption). I.e. assuming that there's a subset of SAT that SAT solvers can solve -- of any problem size -- in polynomial time, then that subset is not NP-complete. SAT is still NP-hard (and NP complete), because we can reduce any problem in NP to SAT, but not to that "easy" subset. It's just that the mystery, as Hillel alludes, is that we don't yet know how to classify that subset so it doesn't have a name.
In other words, it's not that NP-complete isn't always hard; assuming P ≠ NP -- it is. It's just that (in addition to obvious subclasses of SAT) there's a non-obvious subclass of SAT that's in P that SAT solvers can solve (again, assuming they actually solve all of the problems in that class in polynomial time) that we have yet to classify, and it is, indeed, an interesting mystery.
I totally agree with this, both the part about having to be clever with the model implementation and how fast and usable it is. For my own research, it ended up that it was a type of set cover problem. To me it felt large, ended up with almost a million non-zero values to optimize (and had to do 10k of them), but the state of the art is so fast. CPLEX optimizes them in like 3 seconds and can tell me that it's provably optimal. Even open source solvers (HiGHS) solves in like 10 seconds
It is true that when I phrased the problem wrong (with multiplication not addition), it was taking forever, and I had to mess around a long time to figure it out. ChatGPT actually sorta helped a little
But it does feel like magic and not used enough. I think Julia JuMP goes a long way towards just expressing the problem easily. It might take more work/tricks to get it to work (big-M tricks are crazy to me) but at least as a first pass if it just works is worth a try, I think
And to be fair, I really wanted it to be provably optimal for the fun of it. But when I set even a 10 minute timelimit, it got a really good solution across the board much faster than I could have found with manual programming
Also like O(N^2) is surprisingly doable even for very big N. Maybe I shouldn't have been surprised, but another problem was N^2 for a pretty big N (100k) with a dynamic programming. I was worried it would never finish
But making a 100k by 100k matrix in julia really isn't so bad (and can use memoization), just rent a big machine on GCP for an hour or two, and it chewed through it fast
Competitive programming gives you some good intuition on these types of things. N of around 100000 is where you stop looking for O(n^2) algorithms, but the key there is you only have single-digit-seconds for your solution to run.
Which is just to say: yeah, machines can chew through _a lot_ of crap, especially if it's not something a user has to wait for and you can do it over-night or something.
The field of parametric complexity deals with exactly this notion. The idea that some things are NP Hard, but this difficulty only materializes in specific and rare instances, and this does not stop them from being solved for most realistic scenarios.
There is an interesting result in complexity theory that shows that the obvious greedy algorithm for the minimum set cover problem (add the element that increases your coverage the most, repeat) is essentially the best-possible polynomial time approximation. This sometimes causes people to use the greedy algorithm and think they can't do better.
But this result is about the worst case approximation quality-- the quality on pathological inputs. On ordinary inputs the greedy algorithm by itself is not that great and can easily be outperformed by algorithms with heuristics that anyone can come up with after thinking about the problem for a bit. Heuristics with polynomial complexity just don't improve the worst case performance.
I think this is just a more narrow case of the same underlying error: Taking an interesting yet often rather narrow and academic result and assuming it applies more generally. I suspect it's exacerbated by researchers talking up the significance of their work. I suspect it's also partially a result of the fact that an answer can be surprising and worth study yet still not very useful. The fact that we can say much at all about the space of all possible polynomial time algorithms is very surprising, but it's much more surprising than it is practically useful.
The main points I tell people: Results from complexity theory are usually about best and worst cases, practice usually cares much more about typical cases (which might be application constrained to never be pathological). Results from complexity theory are usually about exact solutions, usually a good approximation is all that we need or it's fine to solve the problem exactly most of the time buy give a bad result in a rare case. Many problems are actually small enough they can be exhaustively checked. Many problems are large enough that only a very rough approximation will ever meet the timing requirements (so just because you have a polytime algorithm it might not help). All these factors mean that while it can be useful to know the limits, they often don't matter.
That's a lot of fancy sounding words to say something that is obvious to any programmer. Big O algorithm analysis assumes that input size increases indefinitely. However, in real world problems the input size usually has a natural upper bound. As long as you can write a fast enough implementation (exact or approximate) for the largest conceivable input size, the problem is tractable. All algorithms designed for a bounded input size are technically O(1).
Even reducing it to SAT and invoking a SAT solver is too much. Most of these problems are solvable with simple enumerative recursion with optimizations like memoization, backtracking, and branch-and-bound.
For example, airline computer systems solve NP-Complete problems on a daily basis.
You can also mix problems to change the average complexity.
Take SAT (in NPC) and encode each instance with a word over a binary alphabet (eg. A and B). Then add 2SAT (in POLY) and encode it using an alphabet with four letters (eg. A, B, C and D). If a word has only As and Bs, treat it as SAT, otherwise as 2SAT.
If you chose a random instance of length n (= a sequence of n letters from A to D), the probability it is from SAT (ie contains no C or D) is less than 1/2^n.
Thus most instances are solvable in poly time, and only with neglectable probability you get a "harder" problem from SAT.
Still, this problem is NP complete, as it is in NP and you can trivially reduce SAT to it.
> "I thought that SAT solvers were widely used in package dependency management. Turns out they’re not!"
Proceeds to link to PubGrub, a "Pub's version solving algorithm, called Pubgrub, solves these issues by adapting state-of-the-art techniques for solving Boolean satisfiability and related difficult search problems."
It's basically a sat solver that uses the problem domain directly. Which they later acknowledge might be a reason for not using an "off the shelf" solver.
It's just funny to present it this way - but I enjoyed the article nontheless.
The class P is a huge ocean, and so far we have barely scratched the surface of a tiny droplet from that water. This gives us a false impression of comfort, as if a problem was somehow "solved" merely because it belongs to that class. Typical problems in P are not linear nor quadratic, but O(n^k) with k being an humungous constant built of several iterations of Ackerman functions. That P looks "easy" to us is about our lack of imagination and generic examples, not about the class itself.
The same thing applies to undecidable problems. Yes, the halting problem is undecidable in the general case. But most real life programs are expected to always halt, or to repeatedly run a procedure that always halts (event loops, servers).
A trivial way to make sure a program halts is to add an upper bound on how much time it takes to run. A more complex way is using theorem provers. Turing completeness is usually not required for real life programs.
Typically you'll also see people chip away at Turing completeness. There are programs that in theory are correct, but aren't allowed by the rules of the environment which cannot tell your correct solution from a slightly different and wholly nefarious one.
We don't always need to solve the general problem, and in cases where the general problem is undecidable, you're foolish to even try. Your boss needs to be set down and explained that in this house we obey the Laws of Thermodynamics, Lisa.
Joe is a traveling salesman and has to
visit 10 cities, each once. He wants to
minimize the driving distance. That is an
example of an NP-complete problem. Here
NP abbreviated "non-deterministic
polynomial" that can use a little
explanation (below).
Writing a computer program to solve Joe's
problem is easy.
Here in one step more generality: Let the
number of cities be the positive integer
n. For Joe we had n = 10. Joe has a
friend Sam with the same problem, but for
Sam n = 1000. Sam's problem is harder
than Joe's in the sense that the same
computer program will run longer for Sam.
Actually, a computer program that will
also do well (explained below) for Sam
is also easy to write, should be the same
program that did well for Joe.
Now we can get into the "hard" stuff: Joe
and Sam are picky and want the minimum
distance, down to the last tiny fraction
of 1 inch, perfectly so, all the time,
including the worst case, no matter
where the cities are. Hmm. They want a
perfect program.
Here it appears that the running time of
the perfect program on the worst case
grows like 2^n, exponential growth. Joe
and his n = 10 and especially Sam with his
n = 1000 would be very happy with growth
of the worst case running time like n^k
for some positive integer k. That is, the
want n^k polynomial running time.
NP? One approach is to be
non-deterministic, that is, use a random
number generator to pick the cities and
maybe get very lucky, and that is so easy
to do can do it in polynomial time.
Biggie point: So far no one knows if a
program that runs in time n^k is possible.
But, remember, Joe and especially Sam are
picky -- they want perfection, the minimum
distance down to the last tiny fraction of
1 inch all the time on worst case
problems.
Uh, in practice, if are willing to get
close on nearly all real problems, we are
in quite good shape in practice.
How? For Joe and Sam, one way is to build
a minimum spanning tree of the n cities --
there are at least two simple, fast
algorithms for that. Then exploit that
the cities are in a space with Euclidean
distances. Then for the solution, just
follow the spanning tree omitting any
trips don't have to take. A paper with
this solution is old.
It goes on this way: There are lots of
problems in combinatorial optimization,
all NP-complete, and if have that there is
a polynomial algorithm for any one of them
then there is for all of them, which is
what the "complete" means.
But, e.g., for one of the relatively
general (in practical terms of simple
problem formulation) NP-complete problems
is 0-1 integer linear programming (ILP),
can tweak the simplex algorithm of linear
programming to do well on nearly all
such practical problems. A source of such
0-1 ILP problems is ILP set covering.
E.g., I encountered one of those:
Consider FedEx and its airplanes, the
cities to be served, and the loads to be
carried. Each plane leaves Memphis, makes
its stops, and returns to Memphis. Call
what one plane does a tour. There are
lots of candidate tours, but a lot of them
can't fly because of scheduled arrival
times, limits on loads and distances, etc.
So, call the candidate tours that can fly
feasible. For each of the feasible
tours, find the operating cost. Then the
problem is to cover the set of all
cities to be served with the feasible
tours such that the total cost is
minimized. That is an example of 0-1 ILP
-- ILP set covering.
0-1 ILP set covering is nice, simple
stuff: Get to evaluate the individual
tours, or what stands in for those in some
other problem, considering lots of really
goofy costs and constraints, maybe even
some probabilistic stuff where get to take
averages, and then do the optimization
itself after this initial enumeration
work. It's fairly general stuff. Keep it
in mind; you might find another
application.
The FedEx airplanes are expensive. A good
set covering solution should have saved
FedEx a lot of money, a significant
fraction of their total operating cost
otherwise. The FedEx founder liked my
lecture on set covering, assigned me to do
the work. But the promised stock was
late, so I want for a pure/applied math
Ph.D.
With such a set covering problem, by doing
well, we mean saving a lot of money
and/or nearly all the money to be saved
even if had a solution that minimized cost
down to the last fraction of an inch or
dollar or whatever were trying to save.
And, nicely enough, often the 0-1 ILP
techniques, e.g., some cases of duality,
Lagrangian relaxation, etc., can tell when
can't save any more than, say, another 1%
of the total. One problem I did once, in
0-1 ILP, but not for FedEx, with my
solution knew that couldn't save any more
than 0.025% more of the total cost. That
problem had 40,000 ILP constraints and
600,000 ILP 0-1 variables -- that is,
might not be regarded as medium sized
and not small and trivial.
So, generally can do well for Joe and
even Sam even if the question about
NP-complete, polynomial times, minimum
cost, and worst case problems is not
solved.
Net, in practice, that a problem is in
NP-complete, in practice, is nearly always
of no concern, in practice. In practice,
if save $5 million a day and a perfect
solution might save another $100 a day, in
practice no one cares, in practice. Did I
mention "in practice"? They care a lot
about the $5 million and don't want to
decline to save the $5 million because
there might be $100 more to be saved that
they don't know how to save.
By now the techniques of 0-1 ILP and more
generally discrete or combinatorial
optimization, e.g., in the field
operations research or optimization,
are old, 50+ years old. There are lots of
carefully written, clever books and papers
and lots of mature software.
The question of a polynomial algorithm for
the NP-complete problems is interesting,
curious, one heck of a challenging
research problem, maybe a start on some
really important work in math and
applications of wide variety, maybe a
pillar of what is fundamental about
computing, etc. But, in practice, for
Joe, Sam, Fedex, and more, we've been able
to do well for many decades, in
practice.
Early in the discovery of NP-complete
problems there was a book written at Bell
Labs. Bell needed to do network design,
and that was discrete optimization. At
the start of their book they confessed
that they really didn't know how to
solve the design problems and then gave
a cartoon showing a long list of other
people who didn't know either. But by
solve they were picky like Joe and Sam
-- minimum cost down to the last tiny
fraction of the last dollar or penny even
on worst case problems.
Since then at least experience shows that
their confession and cartoon were way too
pessimistic, that for saving the first,
say, 99% on essentially all the actual,
practical problems was doable. Their
attitude was one of the planet's biggest
examples, for some work important in
practice, of letting the perfect stand in
the way of the very good.
As evidenced by the fact that traveling salespeople exist (and are not, in general modeling experts or big AWS users), approximate solutions are often significantly easier to find - both practically and in theory.
I bring this up because it's interesting... and because my professors really didn't emphasize it until deep into my undergraduate career. So someone today might be one of today's lucky 10,000.