The Traveling Salesman with Simulated Annealing, R, and Shiny

cm2012 · on Sept 17, 2014

I once had a real life issue with the travelling salesman problem - I wanted to invite boutique fashion retailers to a conference in '09, and I thought the best way would be to tell them about it face to face. I mapped out the address of every single one on some app on top of Google Maps, and ended up optimizing based on how clustered different groups of businesses were, to minimize my walking time while maximizing number of visits.

e_modad · on Sept 17, 2014

That is so cool! I love HN for stories like this.

graycat · on Sept 17, 2014

Of course the traveling salesman problem is combinatorial optimization, and of course Uber, Lyft, Sidecar, etc. have such problems, as has long been the case for dial a ride.

And, for the case of trips regularly planned, e.g., using one car to get the same several people to work each day, would have a deterministic problem. Of course, part of the problem would be which people shared which car.

Otherwise there is something of a probabilistic problem where the driver gets a travel request at some random time for some random origin and destination and has to decide what to do, e.g., accept the offer or not and how to fit the offer in with what else he is doing.

Might be a lot of work, but might save some money and/or give better service to the customers.

At one time, my Ph.D. advisor wanted me to work on dial a ride scheduling. But I already had a problem I'd made good progress on and used that for my dissertation instead! One reason: For the problem I used, I could claim to have a solution that was optimal in a quite strong sense, but for dial a ride any solution would be only a big mess probabilistically, some big mess that was just heuristic, and that could be evaluated only empirically. Bummer.

Now, I also have another problem! So, anyone who wants to work on combinatorial optimization and hope to get interest from Uber, Lyft, Sidecar, etc. -- go for it!

graycat · on Sept 17, 2014

Nice visualization of an application of simulated annealing.

We should keep in mind that the OP is doing the traveling salesman problem (TSP) in the plane. So we should recall the nice Karp result in about 1972: With meager assumptions on the probability distribution of the cities, for any number x, no matter how small, as the number of cities n goes to infinity, there is a simple approach that will come within x% of optimality with probability 1 - x.

This simple approach? Start by connecting the cities with a minimum spanning tree. Then for the TSP tour, just traverse the spanning tree except going directly to the next city in the tree traversal not already visited. That is, when following the tree would cause visiting a city a second time, just go directly, not following an arc in the tree, to the next city in the tree traversal not yet visited.

So intuitively the value is, for large n, get about as close, about as often as wish, and for small n just enumerate!

Intuitively what is going on is that one of the main challenges is identifying the appropriate clusters of cities and, then, connecting the clusters, and the minimum spanning tree tends to find the right clusters.

Of course, there are at least two algorithms for finding minimum spanning trees, and as I recall both run in time proportional to n^2. So, we have a polynomial algorithm that is, for large n, for TSP in the plane, as close as we please.

For the "plane", I meant R^2, where R is the set of real numbers, that is, ordinary Euclidean 2-space. But Karp's approach also works the same in R^m for any positive integer m.

hi · on Sept 17, 2014

This is awesome and I can see so many use cases with logistics in cities like NYC by using addresses, and not capitals, as waypoints.

I would love to see this applied to the NYC taxi data to see how to optimise taxi movements and see how much ridesharing would decrease car traffic.

Code is here: https://github.com/toddwschneider/shiny-salesman

mck- · on Sept 17, 2014

I really don't mean to self-promote, but this is what we do: Route-Optimization-as-a-Service [1]

[1] https://www.routific.com/developers

cfontes · on Sept 18, 2014

How is the business like?

I kind of do the same thing but for a different sector, but we are not SaaS so would be nice to know.

hi · on Sept 17, 2014

sweet

HCIdivision17 · on Sept 17, 2014

I enjoy finding retrospectively obvious details from essays like this: an optimal solution won't have any crossing paths, and simply swapping cities is a handy way to try detangling the path! The animated graphs make this plain, but is it required? Having never thought about the TSP much (and not really having the mathematical chops to prove it myself) I looked it up, and sure enough SO user lhf [0,1] had a link to a grindingly in-depth article about the problem [2].

[0] http://stackoverflow.com/questions/2444125/crossing-edges-in... [1] http://stackoverflow.com/a/2444288 [2] http://www.ams.org/samplings/feature-column/fcarc-tsp

pushedx · on Sept 17, 2014

Simulated annealing is used heavily in chip design automation (EDA). I first used it when writing a toy placement tool while taking a course on the subject.

Slide 5 of this deck shows some of the problems in chip-design, most of which have been heavily optimized by simulated annealing. http://www.ispd.cc/slides/ISPD12Slides/ISPD12_3_2.pdf

I love the beauty of the analogy to annealing a metal in metallurgy, and how surprisingly well it translates into software. The fact that the result is non-deterministic, but always useful, makes me smile.

eevilspock · on Sept 17, 2014

The probability of accepting an inferior tour is a function of how much longer the candidate is compared to the current tour, and the temperature of the annealing process.

Shouldn't the distance of the "nearest" neighbor be a function of temperature, rather than (or at least in addition to) the probability of acceptance? Isn't that how simulated annealing can escape a local minimum far from better or global maxima? Intuitively that is how the real annealing process works.

pushedx · on Sept 17, 2014

One of the fundamental premises of simulated annealing is this "looking backwards" approach. It's for the sake of the sanity of the implementor. How do you modify scalar values that determine a random process in such a way that the outcome does not exceed some error condition? You can't (in general), that's why you turned to annealing in the first place. You were unable to control error bounds. Instead, you make a completely random guess and only accept it if it's within the current "acceptable" range. This is easy to implement and gives suprisigly good results.

makosdv · on Sept 17, 2014

It's cool to see this kind of stuff. I used simulated annealing for my thesis on cellular network optimization back in college. Haven't used any algorithms like this since then, but they have their uses.

tjl · on Sept 17, 2014

I used simulated annealing for my Master's thesis. I was trying to fit a differential equation to some experimental results. With small changes in the parameters, you could get huge changes in the solution of the DE. For some of the experiments, the parameters it chose fit almost perfectly. As a result, simulated annealing is one of the techniques I look at for highly non-linear optimization.

Fede_V · on Sept 18, 2014

Simulated annealing is not required for non-linear optimization. It's useful for non-convex optimization. Non-linear but convex problems are trivial to solve with Newton's method - a simple example is finding the weights of an SVM.

tjl · on Sept 19, 2014

True. But, in my case, I didn't know if it was convex. A small change of parameters could go from a feasible solution to an infeasible solution.

joshvm · on Sept 17, 2014

Presumably this could be extended to include, say, average flight costs and other modes of transport so you could work out the an optimally fast and cheap way to travel around the world?

alanctgardner3 · on Sept 17, 2014

It depends on how much you want to idealize things. If you add the assumption that all flights/trains/etc. are available every day, and the price never changes, then yes. In reality, trying to traverse a weighted graph optimally with a bunch of constraints (arrival times, how long to stay in each place, preference for difference modes of transit, transit times) is a whole field of research in math/CS. Operations research [1] is the study of optimizing processes with many interrelated factors.

1. http://en.wikipedia.org/wiki/Operations_research

pyk · on Sept 18, 2014

Wow, the article didn't even mention the "original" solve of the 48 US city problem (one in each state) in 1954 (they included Washington D.C., so 49 to be precise)! For those interested, Dantzig (creator of the simplex method), Fulkerson and Johnson made quite the stir when they tried out the TSP at Rand in California more than half a century ago in their paper, "Solution of a large scale traveling salesman problem" [1].

The paper contains a historical context of the problem and their methods towards finding and proving (important!) that their solution was in fact optimal.

For those interested in Dantzig, he was the famed PhD student who came late to class one day and wrote down two open problems mistakenly thinking it was homework. He ended up using those solutions as a portion of his PhD thesis [2].

Here's an excerpt from Dantzig himself:

It happened because during my first year at Berkeley I arrived late one day at one of [Jerzy] Neyman's classes. On the blackboard there were two problems that I assumed had been assigned for homework. I copied them down. A few days later I apologized to Neyman for taking so long to do the homework — the problems seemed to be a little harder than usual. I asked him if he still wanted it. He told me to throw it on his desk. I did so reluctantly because his desk was covered with such a heap of papers that I feared my homework would be lost there forever. About six weeks later, one Sunday morning about eight o'clock, [my wife] Anne and I were awakened by someone banging on our front door. It was Neyman. He rushed in with papers in hand, all excited: "I've just written an introduction to one of your papers. Read it so I can send it out right away for publication." For a minute I had no idea what he was talking about. To make a long story short, the problems on the blackboard that I had solved thinking they were homework were in fact two famous unsolved problems in statistics. That was the first inkling I had that there was anything special about them.

A year later, when I began to worry about a thesis topic, Neyman just shrugged and told me to wrap the two problems in a binder and he would accept them as my thesis.

The second of the two problems, however, was not published until after World War II. It happened this way. Around 1950 I received a letter from Abraham Wald enclosing the final galley proofs of a paper of his about to go to press in the Annals of Mathematical Statistics. Someone had just pointed out to him that the main result in his paper was the same as the second "homework" problem solved in my thesis. I wrote back suggesting we publish jointly. He simply inserted my name as coauthor into the galley proof.

[1] http://www.iro.umontreal.ca/~gendron/IFT6551/LECTURES/TSP.pd...

[2] http://www.snopes.com/college/homework/unsolvable.asp

pradn · on Sept 17, 2014

Lovely site design and clear explanation of simulated annealing. I've always more or less equated hill climbing and simulated annealing. Your article makes the difference easy to understand. Thanks!

chromaton · on Sept 17, 2014

I believe the effectiveness of search algorithms goes something like this:

Monte Carlo < Hill climbing < Simulated annealing < Genetic algorithm < Differential evolution

scishop · on Sept 17, 2014

No. It's completely context dependent.

A hill-climbing algorithm will take the lunch of every algorithm listed here if you have a clean convex optimization problem.

chromaton · on Sept 17, 2014

Where would you find such a problem in the real world?

gh02t · on Sept 17, 2014

Absolutely everywhere, it is a common subproblem in many many algorithms. For example, the basin hopping algorithm (sister of SA) solves one at every step.

A specific example is in image deblurring via Tikhonov regularization, which involves minimizing a function that is provably convex for many physically realistic types of blurring.

tjl · on Sept 17, 2014

For my Master's thesis, I couldn't get satisfactory results with a genetic algorithm, even after spending quite some time tweaking the parameters from the default. But, with the default parameters of the simulated annealing code, I got some excellent results. For a few cases, I managed to get my model to fit almost perfectly with the experiment. For those cases, the only spots it didn't fit were parts where the experimental apparatus wasn't characterized well.

So, it's highly problem dependent.