Hacker News new | past | comments | ask | show | jobs | submit login
Faster Integer Programming (acm.org)
167 points by pseudolus 4 months ago | hide | past | favorite | 55 comments



> “In practice they are very efficient. They solve industrial-scale problems all the time,” sometimes with tens of thousands of variables.

Although the number of variables or constraints is not a good measure of the hardness of an ILP, "tens of thousands" is not that unusual. During my PhD, I often solved programs with millions of variables and constraints. These programs were so large that even "building" them in the solver library took several seconds. Impressively, gurobi was often able to optimize them in under 24 hours. I have two results here where gurobi solved 1M * 8M problems to optimality in under 2 hours. That was several years ago, so I expect newer versions to be even faster. Even for larger problems (around 5M * 50M) it often found provably near-optimal solutions (within 5% of optimality) very fast. gurobi is really an impressive piece of software, and free for academic use.


I was on CPLEX team for a few years. Core developers are all PhDs from Stanford/MIT and etc. So it's very hardcore stuff, no less than AI research.

Completely agree that size is not a good proxy for estimating MIP difficulties. Internally we colllected a bunch of very hard problems to sovle from different domains. Some are actually pretty small, say a few thousand variables/constraints. IMHO what made hard problems difficult to solve is actually the 'intneral structure' of the problems. And modern industry solvers all have a lot of built-in heurstics to take advantages of the structures, i.e., what kind of cuts, presolve/diving/branching strategy to apply, how to get the bounds ASAP.

Interestinglly at some point some folks even tried using machine learning to predict strategies. Didn't work quite well back then (10+ years ago). There was some work of using seq2seq for MIP (pointer network, I think) a few years ago; worked OK. So I'm really looking forward to some breakthroughs by LLM.

It's a shame that after IBM aquired ILOG (which owns CPLEX), most of the ppl left for Gurobi.


> IMHO what made hard problems difficult to solve is actually the 'intneral structure' of the problems.

20 years ago I wrote a master’s thesis in computer vision. The stereo matching algorithm I developed could be expressed as a big integer linear program. But after pondering it for some time I realized it could also be expressed as a dynamic programming problem, with tiny integer linear programs as subproblems. Reduced the runtime by like a factor of 1000x, or more.


Not that uncommon.

I feel most of those big search problems could be solved much easier and quicker with some form of annealing/tree search/dynamic or greedy algorithms with results very close to the theoretical linear optimum

But of course those won't get you a thesis ;)


> with results very close to the theoretical linear optimum

In this case I could prove it was the globally optimal solution.

But this was only possible of course due to the internal structure of the problem: it was in effect a simpler problem hiding within a linear integer program. Standard solvers couldn’t find this structure, but it was possible to do by hand.


Reformulation is a good strategy for many hard problems.

It might also be possible to add problem-specific cuts/heurstics to the solver so that it can solve it fast.


Wow, this sounds interesting! Do you have a link to that thesis?


Unfortunately not. It’s on paper and in Swedish only.


What's the situation with CPLEX these days? Is IBM still investing any serious resources into it? Gurobi keeps getting massively better (either in performance or expressiveness) every release.


CPlex has shown no progression in the benchmarks in the past years, so it is safe to assume that the number of developers they have employed is either 0 or maybe 1.


As fare as I know it is jist a torso of its former glory. This is mostly the reason why we migrated to Gurobi.


Are you aware of any survey papers on some of the strategies used by (near) state of the art solvers?


Not the same person but maybe "Mixed-Integer Nonlinear Programming: A Survey of Algorithms and Applications" could be a good start? I haven't read it yet.

HiGHS is the best open source MILP solver at the moment, and MATLAB of all things has a pretty good explanation of how it works internally: https://nl.mathworks.com/help/optim/ug/mixed-integer-linear-...


I am an academic in the field. A good starting point would be Tobias Achterberg's PhD thesis:

"Constraint Integer Programming"

It details the implementation of SCIP, which is one of the leading open source solvers, and explains some of the most important tricks.

That said, there is a lot of literature out there; if you're interested in a particular aspect like e.g. presolving or cutting planes, then feel free to ask me further questions. It is worth noting that the implementation specifics are hidden by the top commercial solvers, so these are difficult to find anywhere


thank you very much!

when i investigated the situation five years ago in https://dercuano.github.io/notes/linear-optimization-landsca..., scip was not open source, though zimpl was. looking at https://github.com/scipopt/scip?tab=License-1-ov-file#readme, it appears that scip has become open-source since then, which is very welcome news!

how does highs compare to scip?


I love how researchers state that a piece of code doing actual work is "slow", once it takes a few seconds to run and yet windows 11 often takes multiple seconds to show me the context menu, when i right click something.

We need more optimization in this industry.


> We need more optimization in this industry.

The Windows 11 context menu lag (and overall Windows bloat) is a product of Conway's law. Microsoft creates bloated software because it's a monopoly. So, "optimization" in this case would essentially be antitrust action.


Every other website is a mess of bloated and slow JavaScript code, both large and small sites alike.

I think the explanation isn't quite as simple as you make it seem.


I think the other piece going into it is that hardware keeps getting faster. I would imagine especially so the hardware of the average Windows developer. If something doesn't generally appear to be slow to those building it, they're unlikely to spend a lot of time optimizing.


I'm a fan of including benchmarks in CI pipelines. The Windows developers should have a historical record [1] of the performance degradation and be forced to look at it daily.

[1]: https://x.com/jmmv/status/1671670996921896960


To be fair, that machine is pretty high-spec for NT 3.51, given that the minimum requirements are 12 MB RAM and a 386 (which tops out at 40 MHz). It'd certainly be nice to see that kind of performance in newer Windows versions, though.


> I think the other piece going into it is that hardware keeps getting faster.

It's also that you're expected to update windows whether or not they've actually improved performance. Why would microsoft prioritize that when the bulk of their sales are driven by other concerns, like enterprise sales and what some might call basic functionality (e.g. Windows Defender, when it was first released).

Hell, now they're moving to a service model and there's even less reason to court consumers directly.


it depends on what you're using it for. if you use an optimization algorithm after every keystroke to relayout the paragraph you're typing, it had better not take a few seconds to run. (and indeed there are optimization algorithms for that problem in particular that are much faster than a few seconds.)


Anyone have any idea how large Gosplan's* matrices were?

How about matrices used by RAF/AAF targeters for mincut purposes, eg https://en.wikipedia.org/wiki/Oil_campaign_of_World_War_II ?

* surprisingly, en.wikipedia has more information than ru.wikipedia, but neither even has a hint as to headcounts over the years, let alone more technical details.


I have the book “planning in the user” somewhere packed away. I remember it speaking about the number of variables and size of the problem in it. Sorry I don’t remember the numbers off the top of my head.

Obviously it didn’t work out though.

The other book worth getting is “red plenty”.


Planning in the U.S.S.R Problems of theory and Organisation

It is available on annas archive. Along with other similar books (Search: "Planning in the USSR").

LOOK UPON ME AND DESPAIR!


Interesting, do you mind telling a bit about the context where the problems you were solving arose? What was the application?

What kind of hardware did you need for such large problems? Single machine or cluster?


It was a graph drawing problem, solved on a single 16 core machine with around 64 GB of RAM.


Out of curiosity, how does one prove near optimality in problems like this?

E.g., do you assume that the optimal value is the one produced by the non-integer LP?


It is called duality. For each linear program that is maximizing, you can find a dual linear program wich is minimization. They both have the same optimal objective value. Thus one serves as a lower bound and the other as an upper bound on the optimal objective value. The gap between them is used to measure the closeness to the global optima.

Great thing is that you can find the dual really easily and each feasible solution of the primal problem can be used to get a feasible solution of the dual.

The math behind that is really neat.


LPs can prove optimality of a result but sometimes this is to expensive and you stop as soon as you reach near optimality based on known bounds.

For several problems you can prove lower and/or upper bounds for the optimal results. For example you can prove that a route between two points may not be shorter than their distance.

If you use LP in production you often don't care about optimality and can work with suboptimal results very well. Worst case you stop after a timeout and just pick the best result.

Edit: I forgot to mention that during solving of a LP you usually get dynamically computed bounds. The branch and bounds algorithm works by computing bounds on relaxed problems that give upper bounds for the original problem.


I helped implement production and labor planning software on top of FICO xpress some years ago. The paradigm of LP/ILP was all new to me though I was very much into math. Our software was solving hierarchical/nested optimization problems involving millions of variables and constraints every 15 minutes. It felt like magic that this was possible. I would really encourage anyone that has never worked with these tools before to explore them as it can open a palette of tools and ideas you may not have thought of before. A great free way to get into this is to read the examples and use PuLP for Python.


The best solvers are really impressive. Unfortunately, they also cost a very large sum of money.

We should be happy that many of the best tools in other areas are free.


Pretty cool you got to see this stuff in the real world!

Questions:

1) Was the problem able to be solved with one machine?

2) Was the problem reliably able to be solved within a time-bound, or was it "spiky"?

3) Was the solution all or nothing? Aka would you get a partial/suboptimal solution after a time-bound?


Not an author but

1) It is usually done on a single machine. Often times even on single core.

2) Spikiness of the solve time is a real problem in practice.

3) You get partial solutions but they tend to be fare apart but with great improvements.


Thank you, that's useful!


this is a pretty good overview; it's unfortunate that they omitted linear from the title of the article, because it's pretty crucial

my notes on the landscape five years ago: https://dercuano.github.io/notes/linear-optimization-landsca...


The optimization solver landscape changes slow enough that a 5 year old overview it is still very relevant. However, in recent years the open source solver HiGHS (https://highs.dev/) has emerged and even surpassed the older open source solvers.


thank you very much!


Linearity has been a sticking point since the very beginning, with the famous exchange between Hotelling and Von Neumann. MILP solvers are amazing, and what they can solve feels like it should be impossible, but even so you get stuck sometimes banging nonlinear pegs into linear holes.


I wish the original article offered code, similarly to yours. I wish you could elaborate more in the Octave section though.


dercuano is a book i published in 02019; it's not being updated. you can download the final copy at http://canonical.org/~kragen/dercuano-20191230.tar.gz or a somewhat inferior pdf version for cellphones at http://canonical.org/~kragen/dercuano.20191230.pdf

it's in the public domain, so you can take the article and make your own copy of it with the expanded octave section you wish for


Note this is an improvement to a theoretical bound and doesn’t present any immediate improvements for most practically solvable Integer Programming problems (which are generally solved with more efficient heuristics).


Yes, and in practice you also often want to solve 'mixed' integer programming problems. Ie those that have both continuous and integer variables.

That works really well in practice (because most solvers relax your (mixed) integer programming problem to a fully continuous problem anyway, and then try to 'repair' any fractional solutions as necessary), but is apparently really hard to analyse in theory.


Once we know it can be done, there is much higher chance somebody will look really hard into it and with a bit of luct, they will come up with a practical algorithm. Or at least good one for some edge cases.


It misses Julia's JuMP: https://jump.dev/JuMP.jl/stable/


That seems to be only a frontend for declaring the problems, it is not a solver itself.


ah yes, my memory did not serve me well :-( thanks for your comment :-)


What struck me when writing integer programs was how quickly the rule and variable count grows with the problem. Expressing simple ideas often required O(n²) rules as every pair of objects needed constrained relations. The default solver in PuLP solved all the problems with ease.


> “The proof basically says that you can solve an integer linear program, in theory, almost as well as you can solve one where the integer variable is only zero or one,”

I'm not sure I understand this. It's usually the opposite way

A linear problem is super fast. With integers it goes to branch and bound (a kinda fancy name for manual search). But lots of 1/0 variables just make your problem slow


If the parameters are allowed to be real numbers, it's very fast. just move "up" to an edge (constraint), then follow the edges upward to your maximum.

If the parameters need to be integers, then it becomes way more difficult. The optimum for the continuous case is probably not valid. So you need to enumerate a lot of points.

If the parameters can just be 0 or 1, it's a special case of the integer case: you still need to enumerate, but way less points.

Hope this helps


> If the parameters are allowed to be real numbers, it's very fast. just move "up" to an edge (constraint), then follow the edges upward to your maximum.

Sounds like the simplex method, which is complicated and/or has potentially quite bad worst-case complexity. (You can implement it naively in very little code, and I’ve done this on some programming competitions, but you shouldn’t do this for serious use.) You can achieve better asymptomatic behavior with “interior point” methods, where you arrange for all the constraints to repel your optimum, solve a (nice, convex, polynomial time) problem to find the best point that touches none of the constraints, and then relax the repulsion from the constraints a bit and iterate. All that being said, a good simplex implementation may well run circles around interior point methods for well-behaved inputs.

(I’m a fan of interior point methods because they can also naturally handle fairly large classes of nonlinear constraints while still running in more or less polynomial time. Check out CVXOPT for a nice example of what you can do with great ergonomics.)


The point I was trying to make (it seems I failed) is that in the real case, you have a sense of direction (gradient) while this does not really exist in the discrete case.


all the stuff you're talking about is continuous optimization, but the hard problem that the article is about is discrete optimization


I believe they're talking about unbounded integer programming vs binary integer programming.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: