You’re always bounded by max single shard latency AND by coordination latency.
Ignoring how expensive it would be, over-sharding and over scaling (I.e. low volumes of data per shard and low shards per host) could reduce max single shard/host latency, however it’ll increase coordination latency but also memory (which directly or indirectly will cause more coordination latency).
Perfect data per shard and perfect shard per host numbers are currently an unsolved problem. They heavily depend on the domain, I.e. data types, data volume, data ingest, mappings, query types, query load.
:) if anyone has found a way to consistently add hosts to reduce latency, please let me know!
More accurately, 99% of the use cases ES is appropriate for don't need sharding. Every time I've needed to shard ES has been a nightmare bad enough that ES was abandoned.
I had a typical case of ingesting a ton of logs into ES. I needed sharding to keep up with multi-threaded writes while something else is doing intensive search queries. I think sharding was very useful in processing a lot of data efficiently.
Yes, It's marked for Q3 because it's a pretty complex feature. And we had a lot of other features to do at the same time. But the good news is that it's very well advanced and is likely to be released in mid-Q2.