10 milliseconds is far too long for STP for anything in the financial industry. I can see it as also not being great for robotics or several other latency critical industries.
It's interesting. While I've seen a lot of stuff about GC in Go and have been programming in Go on and off for a while now, this is the first time I've seen any quantification of the latency, other than one report that's undoubtedly a far outlier.
For games, 10ms is way too high. I'm working on Oculus Rift stuff at the moment and really, you have about 13ms per frame to play with in total, and skipped frames are bad.
However, for typical web apps without low-latency realtime requirements, a guarantee that the GC will not add more than 10ms latency at once, and no more than 20% overhead (in the same thread) seems pretty good, especially considering that it can do this under high load conditions with relatively constrained resources (Java is probably faster, but likely to use more memory; most other languages will be significantly slower).
The one outlier report I mentioned was of GC randomly stopping processing for several minutes, which clearly isn't within the bounds of acceptable response time for web apps. This will completely eliminate the possibility of that happening, which is good.
All in all it seems like a decent trade-off for the sorts of applications Go is already fairly good at. The stuff it's no good at might come later, but in the meantime everyone knows you can't use Go for those things (in fact you can, if you can design your entire application to use stack allocation only, in which case you can turn off the GC entirely, but that's easier said than done).
Problem is, web apps that require low-latency realtime requirements are not that uncommon. Think about anything related to trading or bidding. Of course, if the plans are to be believed, then a guarantee that the GC doesn't exceed X ms is good for soft real-time requirements, which is often enough.
> The one outlier report I mentioned was of GC randomly stopping processing for several minutes, which clearly isn't within the bounds of acceptable response time for web apps
That's not acceptable for any kind of app and the problem gets bigger the more heap memory you have. The sweat spot for GC enabled apps seems to be 4 GB - anything bigger than that and one should prepare for surprises.
While the transfer of the assets after a sale may take several days, the buying and selling takes place in milliseconds. If something is offered for sale at a bargain price, and 8ms later I accept while a Go user is stuck in a 10ms GC run, I get the deal, and he misses it.
If 8ms matters, aren't we talking about HFT? I'd be incredibly surprised if the rest of the industry ran on that kind of timing. Doesn't it take minutes to hours for a trade to happen if, for example, I click buttons on E-Trade or call my broker?
FYI, if you trade on ETrade, the slower their infrastructure is, the harder it is for you to buy/sell large amounts of anything. Why? When you pull the trigger to buy 100000 shares of xfoobarx, the market will see that and react accordingly. The faster you can actually buy those shares, the less time the market has to react, and as a result, you'll manage to succeed in making more of the trades you want.
Like it or hate it, electronic trading is the natural progression of the markets, just like the car eclipsed the horse. Low latency in trading applications affects me when I move my 401k as much as it affects the big electronic trading firms. Thinking it doesn't affect you is a mistake :)
When you, or a hedge fund, or any other market participant, decide to sell or buy a security, the trade is submitted to a broker. That broker may submit the trade to the floor, an ECN, or market maker, and so on. Only in very specific cases -- particularly an ECN to a specific exchange where it might have a spread with a regional exchange, is timing so critically important.
But for 90% of those trades, it's a block limit order that sits on the order book until it's filled.
For many trades this is irrelevant. For those where it does matter, it is the very last, smallest part of the system -- the execution engine -- where timing is everything. That is the part of the system where HFT engineers are building hyper-speed interconnects and optimizing every single instruction, but it is not relevant to the majority of the stack.
Delays everywhere else in the system are largely meaningless -- yeah prices may have changed slightly, but unless you've mastered market timing (which no one has), it will generally even out and is irrelevant.
You mention two industries with extremely demanding performance profiles (financial industry and robotics). I don't know if I'd trust any 5 year old platform in those contexts.
There are lots of other contexts where the performance profile outlined in this document are sufficient. If you love writing Go, then I'd suggest staying out of the financial industry and robotics.
Not sure about self-driving cars. I work in robotics, and there are a few modules that need to hit hard real-time deadlines. You wouldn't use GC languages for those, but there are a LOT of other parts that don't, and you can get a BIG win by writing those in a more concise language.
Go was written by google for high throughput and concurrent infrastructure. The "web scale" problem is basically the polar opposite of the finance problem. That doesn't mean golang can't (eventually) be used in both, it just needs a bit of work before it gets there.
As far as gc bits, the Azul Systems pauseless gc is certainly as state of the art as it gets. Too bad it is for java
> 10 milliseconds is far too long for STP for anything in the financial industry.
Does Java give any guarantees about garbage collection latency?
The LMAX Disruptor inter-thread communication library[1], that is the heart of the high performance[2][3] limit order book used by the LMAX Exchange is written in Java.
Just because a language can't guarantee that the duration of GC stops are below a certain threshold, does not mean that you cannot write programs in this language that offer better latency 99.99% of the time.
> Does Java give any guarantees about garbage collection latency?
Java has multiple JVMs and garbage collectors available.
The concurrent GC (CMS) in Oracle's JVM doesn't give any guarantee. The new G1 GC in Oracle's Java 7 does provide guarantees while trying to minimize STW pauses. Azul Systems' Pauseless GC is advertised as being completely pauseless, though it is commercial.
Quite painful is that for small heaps, the more you try to do concurrent garbage collection with small latencies, the more throughput suffers. On the other hand, the more you add memory, the bigger the latency. So depending on the app, the memory layout and its access patterns and the hardware used for deployment, one has to pick a GC strategy, as there's no one size fits all.
FWIW, the LMAX Exchange currently uses Oracle's HotSpot JVM, but they've announced they want to switch to Azul's Zing.
I guess my overall point is that the GC guarantee in question is an upper bound. It's perfectly acceptable for a financial application to only deliver x ms latency 99.9% of the time.
Also, the JVM is only one component in the stack. If you aren't using a real-time OS kernel, you're not guaranteed any maximum latency anyway.
I've heard of people who just "GC on market close". In the applications I've worked on you don't have the luxury of a "market close". They're 24/7 applications and literally 90B transactions per day at peaks of 1.8M qps...
It's not out of the question, just use your mind. If your financial upfuckery is implemented as a wide array (say 1000+) instances of a Go program, each of which is capable of saying "stop sending me traffic, I'm about to stop for a full GC" a few milliseconds before actually stopping for GC, then latency will not be impacted and throughput will only be degraded by the time required to cork and uncork the input with respect to the time required for the GC round.
A technique along similar lines is to run your upfuckery implementation on multiple servers and have each of them publish the results. Then you can take the output of the first server, and discard the rest. This can be effective at weeding out a majority of small pauses.
That sounds like a possible solution but a bit overcomplicated. Might as well stick to something like C++ than risk additional complexity in my opinion.
Sure. You will still have uncertainty around memory allocation timing, even in C++. For example tcmalloc or jemalloc may need to take a global lock in order to satisfy a heap allocation if thread-local spans are exhausted.
10 milliseconds is far too long for STP for anything in the financial industry
Somehow very niche HFT trading platforms have come to define finance, when the reality is that the massive bulk of the industry runs on Excel spreadsheets, Java, or Fortran. An extraordinarily small subset of the stack has real-time needs, and is engineered as such.
The financial industry is a huge industry with an enormous variance of needs, and the overwhelming bulk of applications have absolutely no such real-time requirement.
I should mention that I've built systems in the financial industry for about a decade now, so I always marvel when people make broad, untrue, absolute statements.
It is a shame too. I love writing go