Read the top section then the first answer in the Q&A at the bottom.
Basically the problem is to find the least-cost (shortest) path across a graph of nodes from point a to point b, but the best algorithm we have (Djikstra’s) runs in exponential time in the worst case when there are some negative edge weights (see context below). This guy found an algorithm that runs must faster, in near-linear time.
Context that is helpful is to think of edge weights not as distances, but as time or cost (this allows negative weights). Imagine a process workflow or something.
Another helpful context that emerges from this is “negative cycle” which is an endless loop of negative weight edges (therefore asking an algorithm to find the “shortest” lowest weight path from one point to another that includes that cycle will fail). You can have negative weights without negative cycles though
> but the best algorithm we have (Djikstra’s) runs in exponential time in the worst case
Dijkstra doesn't handle negative weights so in that case you can use Ford-Bellman. It's slower than Dijkstra but far from exponential - O(VE) or something like that.
I took this straight from the Q&A I referenced in my link:
Q. Does Dijkstra's algorithm work with negative weights?
A. Yes and no. There are two shortest paths algorithms known as Dijkstra's algorithm, depending on whether a vertex can be enqueued on the priority queue more than once. When the weights are nonnegative, the two versions coincide (as no vertex will be enqueued more than once). The version implemented in DijkstraSP.java (which allows a vertex to be enqueued more than once) is correct in the presence of negative edge weights (but no negative cycles) but its running time is exponential in the worst case.
> The version implemented in DijkstraSP.java (which allows a vertex to be enqueued more than once) is correct in the presence of negative edge weights (but no negative cycles)
This is a strange parenthetical. No shortest-path algorithm can be correct in the presence of a negative cycle; the concept of "shortest path" does not apply to such a graph.
The video presentation from the author is approachable even for a non-domain expert. It looks like this has pretty broad applicability to applications in chemical processes, scheduling, logistics (pick ups and drop offs along the route), re-fueling, etc.
I think @DropPanda gave a great example of regenerative charging in an EV.
In the video the author does use cost/price-function terminology.
Djikstra doesn't handle negative weights, Bellman-Ford is optimal for positive edge only. This apparently is also easier to parallelize.
An interesting use case is if you want to need to apply graph algorithms to a problem where edges multiply instead of adding. Just take the log of the multiplier and now your multiplication problem is addition.
For example, suppose you cannot exchange the US dollar for the Russian ruble. You have to go through some path such as USD -> EUR -> RUB. Of all possible paths, which one gives you the most favorable exchange? You can use Bellman-Ford to find out.
> An interesting use case is if you want to need to apply graph algorithms to a problem where edges multiply instead of adding. Just take the log of the multiplier and now your multiplication problem is addition.
I don't see what this has to add over just using the weights as they are. Dijkstra's algorithm tracks the distance from a source node to a target node. There's nothing about the algorithm that says that distance must be the sum of all the edge weights along a path; if it's the product, the algorithm will work without any changes.[1] The only requirement on the distance function is that you know how to calculate d(s,t) from the two values d(s, t') and d(t', t).
[1] You'll still have problems [with Dijkstra's algorithm] if some weights are greater than 1 and some aren't, which is the exact analog of the problems that occur when distance is additive and some weights differ in sign. But that has no bearing on whether you can use multiplication directly or you need to convert it to addition.
>There's nothing about the algorithm that says that distance must be the sum of all the edge weights along a path
I think it relies on the assumption that d(a, c) >= d(a, b) + d(b, c). You could relax this of course, but then you're just back to bellman-ford algorithm.
When your distance aggregation is instead multiplicative and you allow multipliers < 1, this no longer holds.
Edit: I see your footnote already mentions this. I agree, the key point here isn't really whether things add or not, it's whether your aggregate distances are monotonically increasing or not. If they are, you can use dijkstra. If not, you have to use bellman ford or the fancier algorithms like the one in the arxiv.
You're right about the difference between Dijkstra and Bellman-Ford. But I thought the question was "why would you want negative edge weights?", and this doesn't seem like an answer to that.
> For example, suppose you cannot exchange the US dollar for the Russian ruble. You have to go through some path such as USD -> EUR -> RUB. Of all possible paths, which one gives you the most favorable exchange?
Why is it necessary to assume I can't exchange dollars directly for rubles? If that is possible, wouldn't I still want whatever path has the most favorable exchange rate?
Imagine a step in a process that saves you cost/time (negative edge weight) if you do it at one point, but adds cost/time (positive edge weight) if you do it at a different point
Negative cycles are bad for the exact reason you said. Not "bad" from the perspective of "let's lower the cost as much as possible", but "bad" from the perspective that it violates certain assumptions of Dikjstra's algorithm which leads it to produce incorrect results.
When have a negative weight cycle, you can drive down the cost to be arbitrarily low by running through that cycle as many times as you'd like. What does "shortest path" or "lowest cost path" mean in this context then?
Nit: It's not negative weight cycles that mess up Dijkstra's, it's negative weights in general.
Dijkstra's is effectively a greedy algorithm for finding shortest paths. By their very nature greedy algorithms cannot "look ahead." If there is a negative weight edge locked behind a expensive edge, Dijkstra's won't be able to see the negative weight edge until it's too late.
10
A -----------------> C
\ ^
\ 11 -15 /
\-----> B -----/
I have seen this in video games where there is some resource that respawns too quickly and the players just end up cycling back to the powerup every n-seconds. Easy to get stuck in a local minimum.
It creates a new situation you'll have to handle. If it's assumed the shortest path is finite, how many steps should it maximally take, etc? In many problem settings a negative cycle would indicate the model is broken, and you'd want to know that.
Nit: In a graph with negative cycles, shortest paths are still well-defined for vertices s to t if there is no path from s to a negative cycle or if there is no path from a negative cycle to t.
In the simplest case, suppose you have a graph with exactly one negative cycle that is completely isolated from the rest of the graph. The shortest paths between vertices in the negative cycle are not well-defined, but the shortest paths in the rest of the graph are well-defined.
In general, look at the meta-DAG induced by the SCC decomposition of the graph. Mark any meta-vertices whose SCC contains some negative cycle. For s, t in the original graph, the distance from s to t is well-defined iff there exists no path from s's meta-vertex to t's meta-vertex in the meta-graph that traverses through a marked meta-vertex. (Proof left to the reader.)
No comment on how useful this is for real world applications. But here is a technique of the algorithm. Most of my knowledge is from CS 270, where this was taught last week. (The course notes are public, the lecture is not.) https://cs270.org/spring23/lec/
A naïve attempt at adapting Dijkstra's to graphs with negative weight edges is to add a constant bias to every edge weight to make all weights positive, then run Dijkstra's. But that doesn't work because it changes the ordering of paths (wrt. cost) if their lengths differ. A path with length one will have its cost increased by B, while a path with length 100 will have its length increased by 100 * B. If the length-100 path was cheaper before adding the bias, after adding it bias it can be more expensive. So path ordering is not preserved.
But that idea, modified slightly, is not too far off a technique that works. Instead of adding a weight to each edge, the technique is to add a weight to each vertex (call this phi(v)) and modify each edge (u, v)'s weight to w(u, v) + phi(u) - phi(v). It turns out that modifying each edge's weight in this way preserves the ordering of paths that start from s and end at t. Consider a path s -> a -> b -> t; its new weight under w_phi is:
w(s, a) + phi(s) - phi(a) +
w(a, b) + phi(a) - phi(b) +
w(b, t) + phi(b) - phi(t)
= phi(s) + w(s, a) + w(a, b) + w(b, t) - phi(t)
Notice how phi(a) and phi(b) cancel each other out. In other words, any path from s to t's weight will be changed by exactly phi(s) - phi(t), which is a constant! Therefore, the ordering of paths under cost is preserved under this new weight function (weights modified by phi).
This technique, called "price functions" is not novel -- it has been known from the 70's. The paper's main contribution is how to discover price functions very efficiently. The main way the paper does this as follows: first, decompose the graph into smaller strongly connected components (SCCs) whose paths have a "low" number of negative edge weight graphs by removing some small subset of edges (low diameter decomposition, or LDD). Second, recursively find a phi function that makes all edges in the interiors of these SCCs have positive weight. Third, modify that phi (since phis compose additively) for the DAG induced by the SCC decomposition: "fixup" weights of edges that cross from one SCC into another. Finally, modify that phi again to account for the edges that were removed from the graph. (This last part is usually "slow," but it turns out that it is fast because of the LDD algorithm tends to only remove a small number of edges in any path.)
Glossed over a lot of things, and probably not completely accurate -- see the lecture notes or the paper for the real analysis.
(Almost) Everything we do in machine learning and optimization uses some graph traversal algorithm
In RL we use Bellman's equations as a core component and implementations of Dijkstra's are embedded in many software suites as a useful tools for navigation (navmesh) etc....
This algorithm is faster than all of those, so you know: The same but faster
Ah yes you're correct. Fibonacci heap's aren't usually used in most applications of dijkstra's algorithm (such as road networks) though because trading an O(logn) heap.decrease_key operation for an O(1) heap.decrease_key operation, but getting a slower heap.delete_min operation (by a constant factor) isn't worth it.
This is because there are much fewer heap.decrease_key operations on average than Dijkstra's worst case analysis suggests. The expected number of heap.decrease_key operations is not large enough to offset the loss in average runtime for the heap.delete_min operation.
It's always annoying that algorithm papers only ever include pseudocode. It wouldn't take much more effort to also throw in a python implementation. It also reminds me of cases where people found the presented pseudocode was subtly incorrect (although the idea of the algorithm still held).
If you want to interview for any job at an FPGA company, please read this paper first :-). I majorly blew an interview question which this paper would have helped me ace.
Pathfinding is crucial in the process of compiling your hardware description into a circuit on the FPGA. If you're writing for an FPGA, you wouldn't necessarily need to know it. If you want to work on the compilers, it would be quite relevant.
I mean, the runtimes are right there in the abstract; this is pretty easy to compare. Bellman-Ford takes time Theta(|V||E|), and as best I can tell the improvements you mention don't change that, they're just constant-factor improvements. As opposed to this algorithm being O(|E|*log(|V|)^8*log(W)), where W is the largest absolute value of a negative edge.
The abstract of this paper even compare the runtime to the state of the art, and what it compares to isn't Bellman-Ford or the improvements you mention; there had already been asymptotic improvements on those.
Real world benchmarks and big O measure different things, so one is not a substitute for another.
Real world benchmarks don't just measure the algorithm, they also measure the implementation. By comparing big O, you can compare algorithms directly instead of comparing implementations of the algorithms.
Big O doesn't take into account constant factors, the time variability of CPU operations (e.g. cache hit vs cache miss), implementation details, or that all useful input sizes might be small. This means that even though an algorithm might have a better big O, it might actually be worse than other algorithms for practical size inputs.
Creating an algorithm with a better big O is a mathematical breakthrough, but it's not proof that it's going to be practical in the foreseeable future. If you want to additionally prove that, you need real-world benchmarks. So real-world benchmarks would be useful here for some people. It might not be useful for the authors though if their goal is just to prove algorithmic superiority rather than real-world superiority. Creating a real-world benchmark also has the problem that existing implementations have likely been tuned with microoptimizations for years, and the new algorithm's implementation won't have been, leading to an unfair comparison. Your example of 10ms vs 9ms is such a small difference that microoptimizations would matter much more than it.
I was agreeing with you that real world metrics would be useful.
I was making a couple clarifications. You seemed to imply that real-world metrics can be a substitute for big O, but that's not the case, they do different comparisons.
Also, I was pointing out that the 10ms vs 9ms example isn't a set of numbers that would indicate one algorithm is better than the other, just that one implementation is better than the other.
One application of negative weights in a cost graph could be energy consumption for battery electric vehicles. Downhill, regenerative breaking can result in a net increase of battery charge.
I really do need to up my maths game so I can take advantage of all these papers and their new and novel solutions to things like this. Otherwise I’ll have to wait until someone throws it into a library :(
Well, this is a very interesting theoretical breakthrough, but it remains to be seen how it benchmarks in practice. The constant and log factors might be steep for practical inputs.
https://pimbook.org is accessible, modular, and does a great job in advising how to read intimidating-looking equations and translate them to basic programming contexts.
For most use cases shortest path is super simple and fast problem on modern PCs. So might be useful on extra large synthetic networks.
Neural nets?, Trading logs?, Image processing?
Also negative weights are rare in real life applications. And def not in what you would consider normal path planning.
For path finding heuristic methods give "sub-linear" time usually (if you have a pre-processed unchanging network with random access and amortize its construction over many queries). A-star is the old school one, but folks use fancier methods today. In a server farm linear time is too slow, basically.
Interestingly, on a vehicle just doing Dijkstra's for within-city pathfinding is probably fine though if you've optimised the constant factors down. Likely no negative costs.
Presentation by Aaron Bernstein (coauthor) on the paper https://www.youtube.com/watch?v=Bpw3yqWT_d0
Presentation by Danupon Nanongkai (coauthor) on the paper https://www.youtube.com/watch?v=awvBpvlbG1M
Implementation in Java https://github.com/nevingeorge/Negative-Weight-SSSP