Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Fast data structures for disjoint intervals?
185 points by grovesNL 89 days ago | hide | past | favorite | 119 comments
Over the years I've worked on a few applications that needed to model time intervals that are disjoint. For example, there's some kind of equipment available and time slots are booked out. For a data structure to represent this, you generally need to be able to insert, remove, and perform range queries (i.e., looking for the next availability).

In the past I've represented these with some kind of ordered map or tree. In Rust this might look something like `BTreeMap<u32, 32>` with start being the key and the end being the value. This works really well in practice, but it's not as fast I'd like for some real-time applications for really read-heavy range queries with thousands of intervals.

A lot of the specialized interval libraries I've found don't seem to perform better than a plain ordered map like I mentioned (many use an ordered map internally anyway). Many of these also weren't designed with cache-efficiency or SIMD in mind and spend a lot of time on tree (pointer) traversal.

I spent some time prototyping with some other data structures that might fit (adaptive radix trees to take advantage of the small integer key, dense bitsets for small ranges, applying a spatial index in 1-dimension, fixed grid hierarchies) but haven't been able to find anything much faster than an ordered map.

I was wondering if anyone has come across any interesting data structures that handle this well. Approaches that use a combination of multiple data structures to accelerate reads at the cost of slightly slower inserts/removals could be interesting.




A little late to the party...

> The slow cases usually tend to be trying to find wider slots and skipping through many smaller slots. There are often a lot of bookings up to the first wide enough slot too though, so it's a little bit of both.

This right here is the first critical issue. It doesn't matter how fast the data structure is if you're then going to do a linear search over the available slots to find the next one wide enough.

A good data structure for doing this efficiently is a range tree (https://en.wikipedia.org/wiki/Range_tree), where in each node you store the maximum width of the available slots covered by the node. That lets you find the first available slot after some time t and wider than some duration w in O(log(n)), where n is the number of available slots. (It might not be obvious if you're not familiar with range trees; I'm happy to provide more details.)

For the implementation, there are a few options:

A. The simplest is to use a complete range tree, where the leaf nodes are all the possible times. You can lazily create nodes to avoid massive memory usage for sparse ranges. The advantage is that you don't need to do any balancing; the disadvantage is that the time complexity is O(long(T)) where T is the total number of possible times; so it's going to be a little slower on very sparse datasets.

B. The O(log(n)) implementation that's being called for is a self-balancing binary tree (e.g. red-black), modified to also store the maximums in each node. Unfortunately most libraries don't give you low-level control over the tree's nodes, so you'd likely need to copy the code and modify it (or implement the self-balancing tree from scratch).

C. If those are still too slow (and you're certain that your implementation is really O(log(n))), you'll need to improve the cache efficiency. That basically comes down to using larger nodes. The obvious approach is to switch to a B-tree; but you could also keep your nodes binary and just change the way they are allocated to emulate the memory layout of a B-tree (this is simpler but slower because it still uses lots of pointers). Another idea is to replace the first few layers of the tree with a hash table (or a simple array if your data is dense enough). Likewise you can replace the leaf nodes with small arrays.


I’ve run into a similar problem in collaborative text editing. I assign an integer to each edit (which is a single character insert or delete). But obviously, humans usually type characters and delete characters in runs. I needed a fast associative map from edit id (integer) to some associated data. But the associated data is internally run length encoded so that I can assign a single value to an entire sequential run of edit operations. For example, set(10..20, X), set(12..15, Y) gives a map with 3 values: {10..12: X, 12..15: Y, 15..20: X}. I can do lookups by querying individual values, and the query returns the maximum run containing a consistent value.

I’m not sure how applicable my solution is to your problem, but it might work well. I implemented a custom btree with the semantics I needed, since btrees are fast and have great cache locality. The trick is making one that understands ranges, not just singular keys like std BtreeMap.

The code is here if you’re curious. It’s implemented on top of a pair of vecs - which makes it 100% safe rust, and curiously faster than the equivalent tree-of-raw-allocations approach. Please forgive the poor documentation - it’s an internal only type and the code is new enough that the paint hasn’t dried yet.

https://github.com/josephg/diamond-types/blob/master/src/ost...


Thanks, I'll take a look! It sounds like they're similar problems.


I've spent last 8+ years working on various booking engines for a world's major travel company. From the provided information it's a bit hard to see the full requirements so I'll just mention some of the aspects I had to deal with. Managing availability isn't as straightforward as it looks on the first glance.

There are things like:

* in-advance-known closeouts

* sporadic closeouts (e.g. due to the bad weather conditions)

* sell-outs

* varying availability by date/start time

* links to partner services who can partially reduce the availability count

* allotments (availability quotas for different sellers)

* resources linked to availabilities (which is another dimension)

* the list goes on and on...

Anyway, back to the data structures.

After many iterations I've settled with using Guava's Table (yes, we use Java). There are many ways to model this, e.g. you can have start times as rows and dates as columns.

It might not sound as sexy as you'd expect but it's super easy to visualise the model in your head and read/maintain the code later.

Then you can use Guava Range functionality for dates or times and do intersections etc. Hope this helps.


“It might not sound as sexy as you'd expect” - is usually a good sign ;)


Interval trees or ordered segment sets are the "correct" answer but sometimes you can accept constraints in your domain that radically simplify– for example, with equipment availability and time slots, you may have a fixed window and allow for slots to only fall on 5 minute marks. With those 2 constraints in place, a simple bitmask becomes a viable way to represent a schedule. Each bit represents 5 minutes, and 1=avail 0=unavail


I tried a bitmask/bitset/bitvec but found the performance of searches to be a worse than the ordered map I'm using. Intuitively I expected it to perform slightly better for searches because of the cache efficiency, but I guess the way my ranges were distributed made it spend a lot of time searching within each element of the set. I'd like to revisit it eventually though because I think this direction is promising.


A few rough ideas for bitsets:

- To find runs of 15+ 1s you can find bytes with the value 255 using SIMD eq

- For windows of width N you could AND the bitmap with itself shifted by N to find spots where there's an opening at index i AND index i+N, then check that those openings are actually contiguous

- You can use count-trailing-zeros (generally very fast on modern CPUs) to quickly find the positions of set bits. For more than 1 searched bit in a word, you can do x = x & (x-1) to unset the last bit and find the next one. This would require you search for 1's and keep the bitmap reversed. You can of course skip any words that are fully 0.

- In general, bitmaps let you use a bunch of bit hacks like in https://graphics.stanford.edu/~seander/bithacks.html


Thanks for the ideas! I did try lzcnt/tzcnt with bitshifts to find contiguous intervals across elements, but I found that it tended to be more expensive than searching the map depending on the density. I think it's a really promising approach though, and I've like to figure out a way to make it work better for my test case.


Gotcha. I wasn't quite sure from your post, are you storing free slots in the map, or booked slots? And do you know, for the cases that aren't fast enough, how much they tend to search through? Like are there just a ton of bookings before the first free slot, or are the slow cases maybe trying to find very wide slots that are just hard to find?


I'm storing free slots right now, but either would be ok if a data structure would benefit from the other representation.

The slow cases usually tend to be trying to find wider slots and skipping through many smaller slots. There are often a lot of bookings up to the first wide enough slot too though, so it's a little bit of both.


Maybe an additional "lower resolution" index on top of the map could help as a heuristic to skip whole regions, then using the map to double check possible matches. I have a rough idea that I think could possibly work:

Maintain the sum of booked time within each coarser time bucket, e.g. 1 hour or something (ideally a power of two though so you can use shifts to find the index for a timestamp), and keep that in a flat array. If you're looking for a slot that's less than 2x the coarser interval here, e.g. a 90-minute one, look for look for two consecutive buckets with a combined sum <= 30 minutes. 2 hours or more is guaranteed to fully overlap a 1 hour bucket, so look for completely empty buckets, etc. These scans can be done with SIMD.

When candidate buckets are found, use the start timestamp of the bucket (i*1hr) to jump into map and start scanning the map there, up to the end of the possible range of buckets. The index can confidently skip too full regions but doesn't guarantee there's a contiguous slot of the right size. I don't have a great sense of how much this would filter out in practice, but maybe there's some tuning of the parameters here that works for your case.

Updates should be relatively cheap since the ranges don't overlap, just +/- the duration of each booking added/removed. But it would have to deal with bookings that overlap multiple buckets. But, the logic is just arithmetic on newly inserted/deleted values which are already in registers at least, rather than scanning parts of the map, e.g. if you wanted to maintain the min/max value in each bucket and support deletes.


> Maybe an additional "lower resolution" index on top of the map could help as a heuristic to skip whole regions

That sounds a bit like circling back around to a tree-based system (albeit with depth=2) where each parent is somehow summarizing the state of child nodes below it.


Yeah it's definitely similar to having parent nodes summarize their children. The motivation for the flat array structure was to have an entirely fixed-size array that's contiguous in memory to make it friendly for SIMD filtering. Having the data in the tree on the other hand could probably filter a little better since it can summarize at multiple levels, but my bet is that the flat array allows for a faster implementation for the case here with a few thousand intervals. On the one hand, the whole working set should probably fit in L1 cache for both the tree or a flat array, but on the other hand, a tree with pointers needs to do some pointer chasing. The tree could alternatively be represented in a flat array, but then elements need to be shuffled around. I don't know which is faster in practice for this problem, but my gut says the speed of a flat array + SIMD probably outweighs potential asymptotic improvements for the few thousand case


I don't think I've put the same amount of thought into it, but my gut feeling is:

1. If you're summarizing "a contiguous chunk exists from A to B", that would require a mutable rebalancing tree and the borders constantly shift and merge and subdivide.

2. If you're instead just summarizing "the range between constant indices X-Y is [1/0/both] right now", then that can be done with a fixed tree arrangement that can have an exact layout in memory.

A simple example of the latter would be 8 leaves of 01110000, 4 parents of [both, 1, 0, 0], 2 grandparents of [both, 0], etc. The 3-value aspects could also be encoded into two bitmasks, if "has a 1" and "has a 0".


If your ranges end up sparsely distributed, using roaring bitmaps can speed things up a lot.

https://roaringbitmap.org/


Erik Demaine's DS class has a bunch of range trees and cascading method to speed up queries https://courses.csail.mit.edu/6.851/spring21/lectures/


His explanations are great and just wanted to +1 on the idea of range trees. Adding a reference here to a very lightweight read on range trees:

https://courses.csail.mit.edu/6.851/spring10/scribe/lec03.pd...


Thanks, I came across some of Erik's videos in my research but I didn't realize they were part of a bigger series focused on ranges. Cascading methods are exactly the kind of ideas I'm thinking about.


> In Rust this might look something like `BTreeMap<u32, u32>` with start being the key and the end being the value.

This approach would indeed be slow. You can improve your approach by switching to a `BTreeMap<u32, bool>` with the key being a border between intervals and the value telling you whether the next interval is 'in' or 'out'. (And the interval starting at logical minus infinity being eg 'out' by default.)

You can test whether a specific point is 'in' or 'out' by using something like `upper_bound` [0] to find the key just before your query point (and the value associated with that key tells you whether your query point is 'in' or 'out'.)

See https://doc.rust-lang.org/std/collections/struct.BTreeMap.ht...

Insert and remove are pretty simple to code up too, and looking for next availability after a query point is almost trivial: take the cursor you get from `upper_bound`, and if it's not already `out`, advance the cursor by one. (This is assuming you keep the invariant that `in` and `out` intervals interleave. Ie you merge any adjacent intervals of the same kind, by just removing the key-value pair for the second interval from the BTreeMap. Otherwise, you might have to advance your cursor a few more times, perhaps cleaning up your intervals as you go to restore the invariant.) You can also implement merging of two interval maps represented this way pretty efficiently.

I read elsewhere that you also want eg the next free block that's at least n items long, or the next occupied block of at least some minimum size.

You can do these operations by augmenting your tree nodes with caches of eg what's the longest free block under and the longest occupied block under them.

You can use that cache to find the next free interval of at least some specified size quickly, and you can rebuild that cache quickly, too, whenever you make any modification to the tree. (Quickly here meaning in O(number of layers).)


This sounds interesting! From my understanding it’s similar to an inversion list but storing explicit in/out instead of deriving it from the element’s rank (i.e., index if it’s stored as an ordered `Vec`) mod 2.

I’m not too worried about storage requirements though so I wonder if having to search the adjacent node to determine the duration vs. The current representation. I could see how this could improve update performance though.

Augmenting tree nodes and/or keeping track of min/max duration across coarser regions seems like a good approach. I could see each layer being represented in-tree like you described, or as separate layers each with their own granularity, where they could help to tighten bounds during range queries.

Do you know of any libraries implementing this?


Yes, it's basically an inversion list, but organised as a tree.

If you write your own btree, you can drop storing the explicit in/out and get it from the element's rank. I was suggesting the explicit value mostly to be able to re-use Rust's existing standard library BTreeMap, where you don't have access to the rank (as far as I can tell).

You could also use something like a 'log structured merge tree' to store inversion lists as (a collection of) Rust vectors.

> I could see each layer being represented in-tree like you described, or as separate layers each with their own granularity, where they could help to tighten bounds during range queries.

I would suggest keeping them in-tree, so you only need to chase pointers once. And your invariants are much simpler, because whenever you make any changes to the btree you are already touching exactly the same nodes you might need to update for the cache.

> Do you know of any libraries implementing this?

I've looked around a bit, but didn't find much for Rust. There's https://codeforces.com/blog/entry/68419 which comes close.

For Haskell, https://hackage.haskell.org/package/dual-tree should do pretty much exactly what we talked about.

I've only looked very briefly. So I'm sure there's lot I missed that you can maybe find with different keywords or so.

I briefly considered implementing my own version for Rust by hacking up the standard library's BTreeMap. I got as far as extracting the BTreeMap code from the standard library and making it compile and pass tests in its own crate, but then got distracted by actual work. :)

> I’m not too worried about storage requirements though so I wonder if having to search the adjacent node to determine the duration vs. The current representation. I could see how this could improve update performance though.

Well, going to the next node (or previous node) should be cheap on average.

You could cache the duration in the nodes, if you wanted to, at the cost of making your invariants a bit more annoying (and slightly more expensive to update).

Because getting to the next node should be so cheap on average, I can't really predict whether storing your durations also with your nodes would actually improve performance.


Btw, what I described as a 'cache' is explored in the discussion at https://news.ycombinator.com/item?id=24385095 as 'Monoid cached trees'.


It's really going to depend on the queries that you want to optimize. I think the best help might be to point you to a book: https://www.amazon.com/Foundations-Multidimensional-Structur...

An RTree is the first structure that comes to mind, but the way you describe the problem, it sounds like the intervals never overlap, so I have my doubts. Sounds like you might be looking to optimize the query "what is the first interval of at least N days?" Maybe look at priority trees. They're good at queries that are bounded in one dimension and half-open in the other.


Thanks! I did try a 1-dimension R-tree but the performance tended to be much worse than an ordered map.

Priority trees could be really interesting. I did consider them early on but wasn't sure how well they'd apply here, so I'll take another look.


Elsewhere is the thread, it sounds like your range queries with inequality constraint might actually be a nearest neighbor query with inequality constraint. I'm not sure off the top of my head how feasible that would be with a priority search tree.


If time intervals are truly discrete and you don't need to union two different sets of time intervals, then you can keep track of free time intervals and scheduled times separately.

For a wide variety of time slots, keep separate start-time-ordered maps of free slots of size 2^n-1 to 2^(n+1) and search smallest possible first which should be O( log_2 (buckets)) for earliest-possible or O(log N) for arbitrary desired start time, assuming number of buckets isn't too huge.

I'm curious if a heap would be faster than a btree; I think the speed tradeoff is how long it takes to re-add the portion of the free space back to the data structure. With luck, a btree should only need to move a free space entry within a single node which should be very fast.

If appointments are cancelled, insert an appropriately-sized free space into the data structure. Coalesce adjacent free space (probably fastest in a btree) and move to a larger size bucket if necessary.

For high concurrency also use a rotating series of buckets by time interval so that you can entirely drop structures once they have aged off (maximum time interval end is less than now() ), with your free space at the end of each bucket overlapping the next to at least the duration of the maximum appointment length with special case handling to allow an appointment to overlap two buckets (shrinking free space in both). This should allow full multithreaded processing unless every reservation is soonest-available-appointment.

Appointments can live in a hash by their key or ID with their (start,end) interval as the value for O(1) lookups on appointment deletion.

As others have pointed out it's a lot like malloc algorithms with the constraint that you care about address order (time). So linked lists of free buckets of size 2^k become time-ordered maps, and rotating ephemeral maps to account for unbounded time plus the analog of thread-local storage for concurrency.


LMDB uses https://git.burd.me/greg/sparsemap for storing compressed ranges of bitmaps for their page allocator, similar to RoaringBitmap or Linux's fdarray, which might be applicable here. With compressed bitmaps finding the next free is cheap. If your ranges are very sparse then it will have bad performance, however


Thanks for the suggestion! I tried dense bitsets in an earlier iteration but the performance wasn't great, so I thought something like RoaringBitmap probably wouldn't work out very well. The ranges are relatively dense but there also aren't that many of them (only a few thousand or so), so the bitset seemed to spend a huge amount of time searching within elements.


This sparsemap uses essentially run-length encoding so it might still have slightly better performance. I think RoaringBitmap only uses the list of set bits below <1024 before it uses the compressed representation which you'd be over, and then having to do the compressed scan.


You talk a lot about performance in this thread but I haven't seen any numbers. What is the performance requirement before it's "too slow"? On what data size?

Part of your issue is doing the calculation on read. If you can get away with slower writes you can store all available slots for the day on write at minute/5min etc granularity, then remove used slots from the tree as they are taken.


The rough numbers I’ve been using in my test cases are in the scale of thousands of relatively densely distributed intervals and hundreds of thousands of range queries. This is for a real-time interactive use case targeting the range of tens of milliseconds or so (obviously depends on CPU, but ideally supporting the last couple generations or so). “Too slow” is the upper end of that range because it’s part of the critical path for interactions.

Slower writes with a dense representation like that would be great. The problem I’ve been seeing in my prototypes is that querying the slots back tends to be about the same performance as the ordered map. Some people mentioned a variation on this idea that would store some coarser-grained statistics based on minimum duration in certain regions which might help a bit.


Thanks. Also, interesting, so this all runs on the client?


Right, this calculation runs on the client and assumes the clients are fast enough. This helps a lot with interactivity because the network stack isn’t involved at all.

It could be interesting to optionally offload some compute to much faster servers when it would be useful, but that introduces request/response latency overhead too.


Do you really need hundreds of thousands of range queries because you are trying to process hundreds of thousands of network requests, or trying to retrieve/transform hundreds of thousands of entries? Or are you actually attempting to answer a single more complex query and the hundreds of thousands is just you iterating over all potential candidates searching for suitable range?

If it is the later case instead of attempting to speedup hundreds of thousands queries, it might be better to create a proper structure so that you can replace them with single query that directly answers your real question. Well designed augmented search tree for a specific situation can directly answer quite complex read queries (and even perform modifications). Downside is is you will likely need to at least partially roll your own, since there are many different combinations you can make, not every one has of the shelf library or even a dedicated name. Providing a clean interface usually will result in hiding access to some of the implement details needed implement customization required for your specific operations.

For example when I was making a graph layout algorithm I had to answer a query "Given a mapping of position->value, what is the closest position to some position x with value at least y?". It could be done using single log(n) query, no iteration over ranges or values of x.

Assuming amount of queries is orders of magnitude higher than the items in the structures is do you really need a read/write data structure? In most situations I had with similar constraints it was acceptable to have a read only data structure. Meaning it only supported two operations Create(data) and Query. Instead of Create(empty),Insert,Query or Create(),Inserted,Modify,Query. Then overall process is create/fill the structure with all your data, answer the batch of all your queries. Not having to support dynamic inserts/modifications can give good improvement to constant factor (and the hidden non constant factor once you take into account stuff like memory cache behavior). Having a fixed non modifyable tree means you don't have to deal with as much dynamic memory allocations, pointer indirections and tree rebalancing. Not having to deal with self balancing tree stuff, also significantly simplifies the code making it a lot more practical to roll your own specialized tree which directly supports the queries you need.

One more trick that can help depending on your overall constraints is key remapping. If you need all your dataset ahead of time (or it's rarely modified between large batches of read queries), you can map your key range from (float, timestamps, 64bit integers or whatever your range position type is) to integers 0..n where n is the amount of unique positions in your dataset. To do the mapping just sort the values and do a binary search, no need for generic purpose dynamic dictionary. Same sorted array can be later used for mapping query positions. Do your own benchmarks to see how search in sorted array compares to hashmap for your specific constraints. Benefits is that it can make your overall structure more dense, get rid of more complex comparison operations in the tree structure, and in some situations it can allow you to do direct lookup by index in a flat array or the leaf layer of fixed nearly complete b-tree.


Many moons ago I had to program something similar. There were basically two approaches I settled on given the question being asked:

- I'd like to acquire <resource> at time T for N units of time.

- I'd like to acquire <resource> for N units of time, give me possible T's.

First, it's was a great benefit to identify what "unit of time" was. For my case it was 30 minutes; this helped tremendously as you'll see.

Next, for the first question (I want <resource> at time T), once I had enough data that actually searching became slow enough such that a return of "resource not available" became problematic, I found a bloom filter seriously improved things. It allowed for not only the common case fails to fail really fast, but bit-shifting the filter also allowed for some very fast parallel queries of "nearby" time slots. I was able to rapidly return things like "2pm isn't available, but 1:30pm is".

Last, for the second question (I'd like <resource> for N units of time), this is _very_ similar to a heap memory allocator. At its simplest it's just a b-tree of slots, etc. Grabbing a big slot, splitting it up, etc. This gets even easier if there are time boundaries that cannot be crossed (eg, a day).

To be clear, the allocator approach isn't about you managing a BTree like you have been. It's about a tree of _available_ time slots that are _free_. Initially it begins as a tree with a single node for all time. Make a request, scan the tree for a time slot large enough to meet the length needed, subdivide it, etc. If a time slot is not longer needed, you put it back into the tree by scanning for the node it belongs to (or fits between), coalescing, etc.

Hope this helps!


Thank you! I should have clarified that my map does currently store free time slots instead of booked time slots, but either representation would be ok if it would be beneficial for the data structure.


If BTreeMap is insufficient, and you're talking about cache-efficiency or SIMD, you probably can't use anything off-the-shelf. To design the right thing for your use case, it would help to know more about your workload.

BTreeMap should be nearly optimal, up to fiddling with the layout, changing the number of entries stored in one node, and vectorizing your within-node query function.

For people who need to store sorted sparse records and have memory to spare, or who know the keys of their sorted sparse records take on few values (like only a few million?), they can use a robinhood hash table without any hash function. The whole thing may not fit in a relevant cache, but you generally won't hit more than 1 cache line for the types of queries described. Again, I can't really recommend a library, but it's pretty easy to write this yourself.

Edit: Here's a link to one robinhood hash table: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

For queries like "next/previous free interval with length at least K", scanning within the appropriate level of a Fenwick tree may be better than scanning within a list of occupied intervals. Maybe you could use a layout that keeps all the entries for a level together, rather than the usual layout that mixes in less useful information and puts useful information powers-of-2 apart just to make the cache behavior as bad as possible. For example, if you have K=100, then you could look in the level that has sums of runs of 64. It's necessary but not sufficient to see 2 consecutive entries with at most 28 missing or 3 consecutive entries with at most 92 missing and the middle entry completely empty. Generalizing to arbitrary K is left as an exercise for the reader.


Totally agree with other comments that a sorted array will be hard to beat. I don't see how binary search could help with the "look for the next availability" query, but even a linear search could be fast on a strong CPU with cache prefetching, out-of-order execution, etc.

A few thousand elements is still small for a computer. I would be curious to see a benchmark comparing sorted arrays of intervals against the fancier structures.


I did some simple benchmarks of a `Vec<(u32, u32)>` to a `BTreeMap<u32, u32>` for the test case and found them to be roughly comparable. The intervals are relatively dense so I was hoping for the vector to perform better than it did.

The binary search was useful to find the start position if there are more than a few hundred elements or so (whatever the breakeven point for linear search vs. binary search is on your CPU).


The first place I'd look is using binary search over a sorted `Vec<(u32, u32)>`. The `superslice` crate is handy for this, or just using the built-in binary search methods.

The improved cache behavior alone can make an order-of-magnitude difference. The cost you pay, of course, is on inserts and removals, but perhaps these can be ammortized in a read-heavy workload (build up a list of changes, and then rebuild in one go).


Thanks, I had a similar idea and tried this with Meta's `sorted_vector_map` (https://github.com/facebookexperimental/rust-shed/tree/main/...) but found the performance to be slightly worse than a `BTreeMap<u32, u32>` for my test case. I did try to change it a bit to do fewer binary searches (`range` tried to find the end immediately) but the binary searches seemed to be slightly worse than finding starting intervals in the map.


Honest questions (this might be over my head...):

My assumption (is it correct?) that sorted Vec<(u32, u32)> represents start/end times in a tuple?

What comes to mind at first is always storing less information. Do you _need_ 32bits? What is the granularity of your time intervals?

Then, it might make sense to encode intervals differently if you have a lot of adjacent slots:

Instead of using start/end times, you might be able to use just a single time and tag it. The tag would tell you whether you're looking at the next (adjacent) start time, or at an end time (rest of the slots are free until the next time).

That could be encoded as an enum like so: Start(u32) | End(u32). I assume that this would take up a little bit of memory for the tag and then 32bits + padding for the rest. I'm not familiar enough with Rust, but I assume you end up with at least 40 bits unfortunately because of alignment.

Another approach could be to use a i32, you only get half of the granularity and would have to do a little bit of bit twiddling but you have a much more compressed data structure.

(Aside: In Zig this might be a bit easier because you can have arbitrarily sized integers).

You said it's a read heavy use case so it might be useful to only having to search for an end tag (or "negative" number) in order to get free slots.

Another, slightly more radical, approach would be to only concern yourself with availability, flipping the problem on its head, or at least have a separate availability data structure.

This could make some write operations a bit more involved, but you'd have a data structure that specifically cares about finding free slots.


I implemented a toy market ledger in Rust. I initially thought to use a B-Tree because that's the conventional wisdom right? It was sooooo slow. Running valgrind showed that 99.5% of the time was spent doing memory allocation and traversing pointers.

A hashmap of vectors was so stupidly fast. If you keep them sorted each insertion is dirt cheap, and binary search is dirt cheap.


Do you mean time was used as the key for the hashmap? What was the vector used for in that setup?


Vectors contain buy/sell orders and are sorted by price, the keys of the hashmap were different securities. Buy orders and sell orders lived in separate vectors


In the past for this sort of thing I've used an interval tree

https://en.wikipedia.org/wiki/Interval_tree

However, that was mainly for the case where I needed to detect overlapping intervals. Depending on your application perhaps a K-d tree, in this case, K=2 (one dimension for start-time the other for the end-time)? If you know that all you intervals are entirely disjoint (no overlap at all) I don't think you'll be able to do better than just an ordered map or list.


You can probably recast your problem into a 1D multiple particle collisions problem.

Each interval is represented by a particle of radius half the interval, positioned at the middle of the interval.

Because the radius vary, you can either use adaptive algorithm, or split the interval into multiple small particles.

There is not a lot to gain when doing a single query, but when you are batching queries, you can share the computation and memory accesses. Using radix sort for sorting the integers you get a complexity for iterating over all collisions of O( NbKeys + NbQueries + NbCollisions ).


AFAICT the query is about "the next unbooked interval of at least a given duration". I assume the problem with the B-tree is that when the schedule is contentious, it essentially degenerates into a scan for the next large gap. (And when it's not contentious, the B-tree will be approximately optimal so there'd be no point looking for a better data structure for that regime).

To improve queries that are contentious I'd try build something like a R-tree or range/interval-tree of the unbooked intervals (the complement of the booked intervals). One invariant will be that availability intervals are placed at the highest possible layer in the tree (the smallest region that they entirely fit inside). Then query the R-tree (or interval tree) but only to the depth corresponding to the interval size requested. (Any lower levels would contain only availability windows that are too small, so we can skip searching them).

That would avoid scanning all the small intervals. Not sure how much overhead this would add and if it would work out significantly faster than your B-tree though...


I written a data structure for grouping intervals into disjoint connected clusters. Each cluster in the data structure is a connected component of the interval space (see https://en.wikipedia.org/wiki/Connected_space for the definition of connected component).

This is useful to determining maximum concurrent use of a resource; for instance, if there are only 3 projectors, a maximum of 3 lessons that require projectors can overlap at any given time.

The basic idea is to use a sorted multi-set of each interval's end points, doing case analysis on insertion to determine what interval clusters to merge; removal re-computes the entire cluster the removed range was in, splitting the cluster into two or more when gaps are encountered.

For the implementation, see https://github.com/TimefoldAI/timefold-solver/blob/main/core...

As a side effect of keeping track of connected clusters, the data structure also keeps track of gaps between clusters (which could be used to find the next availability).

For how it is used in practice, there an example in Timefold's docs: https://docs.timefold.ai/timefold-solver/latest/constraints-...

A simpler algorithm can be used when each interval represent a time grain (fixed length, no two intervals overlap). See https://github.com/TimefoldAI/timefold-solver/tree/main/core... for that.


Thank you for the reference, I’ll take a look! I like the idea of being able to use connected components somehow if it could avoid some of the overhead of range queries during updates. I’ve been considering whether there might be a way to apply union-find in a way that would still be faster than some kind of bitset representation.


Last time I had to do something similar (for TCP reassembly) I simply used a sorted array.

I didn't think it though fully, but did you consider using a bit set only storing gap/non-gap transitions and vice versa then compressing it in a rank-select data structure? The even-odd result of a rank query will tell you if you are in a gap or not. It should also be possible to use select to find the closest hole.

The problem is that AFAIK rank-select structures cannot be modified, so you will need to regenerate it from scratch at every update. That (or an hybrid scheme) will work only if updates are rare enough.


That makes sense. I tried dense bit sets but storing a transition bitset could be interesting. That sounds similar to an inversion list compressed into a bitset.

The rank-select sounds like a great idea and seems like it would work great for mostly static intervals, but updates are pretty common in my case.


You mentioned that the intervals are disjoint, but are they adjacent? I've had success with maintaining a "coalescing interval set" structure, where adjacent intervals are merged into a single interval. It's very, very efficient for lookups, and not terrible for insertion / removal of intervals. At the very least it might work as a view for your read-heavy processes.

You mentioned shop scheduling below as well. If your intervals represent booked time, you could also insert 'unavailable' time into the coalesced set, so that, when fully booked, the interval set is reduced to a single interval.


Exactly, intervals are disjoint and adjacent (automatically coalesced). The ordered map I'm using will coalesce them together which works pretty well. The problem is that I'd like to accelerate range queries even further, probably by excluding certain intervals with some acceleration structure on the side (e.g., skip any intervals with a duration less than X) at the cost of slightly more expensive updates.


Can you give any more details about the range queries? e.g. are you interested in finding any gap of at least duration X, or the number of bookings in a date range?

Put another way, are you more interested in what's been consumed, or what's free?


I'm interested in both, but quickly finding the nearest gaps of at least duration X is most important.


"Nearest gap" doesn't sounds like a range query to me. It sounds more like a nearest neighbor query. It sounds like you have (start time, duration) tuples and want to search for nearest neighbor on start time with an at-least-X constraint on duration. (start time, duration) is more point-like than interval-like (as far as data structures go), so anything that can handle nearest neighbor on point-like data would be a candidate. If this is an accurate mapping for your problem, you might check out KD trees. You'd probably have to write a custom tree traversal algorithm to query NN on one dimension and >X on the other, but I think it could be done. Sounds kinda fun.


I have (start time, duration) tuples but would like to find the nearest entries (in either direction) given a specific time and minimum duration.

Thanks for the suggestion! I've tried some other spatial acceleration structures (e.g., R-tree and some others) and applying them in 1-d but they didn't outperform the ordered map unfortunately. It could be interesting to try out KD trees at some point though.


Sure thing! I'd reframe this as a nearest neighbor search over (start time, duration) tuples. KD trees are what I'm most familiar with for that problem, but there are others that would fit. Check out chapters 1 ("Multidimensional Point Data") and 3 ("Intervals and Small Rectangles") of Foundations of Multidimensional and Metric Data Structures: https://books.google.com/books?id=vO-NRRKHG84C&pg=PR9&source...

R-trees are more generic in that they can store intervals, shapes, volumes, etc, but (start time, duration) tuples are point-like, and that opens more possibilities. The ordered map is going to give you good memory locality. It might be that you'll only beat that when searching for larger durations (that require you to iterate over more of your ordered map). There will probably be an inflection point somewhere, depending on the distribution of your data and queries.

Hope that helps!


There are so many nuances here, but perhaps you could maintain two coalescing structures: what's free and what's used, complements one one another. Adding to one means removing from the other. One might give you a more constrained search space than the other.

An ordered structure feels like the way to go, here, since you said "nearest gaps". Once you know the time you'll get locality in your search. You could also run the search in both directions on two threads.


Have you tried a 1-d kd tree, that at each subinterval keeps track of the largest gap? Should be fairly simple to keep this updated and effectively match/generalize your "coalescing" logic.



It seems like a good way to describe a common language around interval operations. I'd be interested in data structures that were specialized for some of these operations.


I've built an interval treap for a similar application. Each node holds a spanning interval derived as follows:

   node->spanning = union(node->interval, node->left->spanning, node->right->spanning) 
On rotations, only two spanning intervals need to be recomputed. You can search for an interval and the result is the subset of nodes whose intervals overlap with it. And of course, rotations keep the treap balanced at all times.


You can come up with all kinds of fancy stuff, but if they're disjoint I don't think you're going to to much better than storing them in an efficient data structure for ordered data.

Something to look into would be to ensure that the starts of intervals are sorted after the ends of intervals if they happen to be at the same time. If you got that then the starts and ends of intervals simply alternate and most queries easily translate to one or two lookups.


I'll explain what my preferred 3D method would look like collapsed to 1D. This was originally for octtrees.

Make a binary tree where node sizes are powers of 2. I'm assuming you can define some "zero" time. To add an interval, first check if it exceeds the tree size and if so, double the size of the tree until it fits inside. Doubling involves making a new root node and inserting the existing tree under it (this was extra fun in 3D). Then decide what node size your interval should reside in. I would use the first power of 2 less than or equal to the interval size. Then insert that interval into all nodes of that determined size. Yes, I allow more than one object in a node, as well as allowing smaller nodes under one containing an object. An object may also span more than one node. In other words the objects (intervals) determine their own position in this tree regardless of others in the tree. It's not perfectly optimal but it allows inserts and deletes in O(logN) as well as intersection checks.

If this sounds not too terrible for the 1D case, I can elaborate on the data structure a bit.

Edit: From another comment "quickly finding the nearest gaps of at least duration X is most important." That possibly invalidates this whole approach.


Thanks! This sounds very similar to the experiment I tried with sieve-tree (https://github.com/Ralith/sieve-tree/) which can store children directly on a node until it reaches some threshold. I had some problems with the nearest query as you mentioned, because children are in arbitrary order and you might have to search multiple tree levels to find the nearest.


Not my area at all, but having done some work on ray-tracing back in the days, it kinda reminded me of the ray-triangle acceleration structures there[1], except your entities are 1-D lines rather than 3-D triangles.

One which came to mind which possibly might be interesting would be the Bounding Interval Hierarchy[2], where you partition the space but also keep the min/max intervals of all children in each node. The "Instant ray tracing" paper[3] details some interesting techniques for speeding up construction.

Or perhaps it's a rubbish idea, haven't really thought it through.

[1]: https://en.wikipedia.org/wiki/Binary_space_partitioning

[2]: https://en.wikipedia.org/wiki/Bounding_interval_hierarchy

[3]: http://ainc.de/Research/BIH.pdf


Definitely! It's a great idea and something I've been trying out. So far I've tested typical bounding volume hierarchies and spatial acceleration structures in 1-d (e.g, some of the cache friendly quadtree designs applied to 1d, r-trees, k-d trees) but they weren't able to outperform my simple ordered map yet unfortunately.


Since you mentioned SIMD acceleration, did you look at things like QBVH[1]? I mean it would be kinda like a B-tree I suppose but with the min/max stuff.

[1]: https://jo.dreggn.org/home/2008_qbvh.pdf


I did look at QBVH and flattening it to a single dimension, but my understanding is that most specialized BVHs like this prefer mostly static geometry because updates are so expensive. I'd be happy to be wrong on this though - QBVH looked complicated enough that I didn't want to experiment with it without some strong signals that it would be the right direction.


From my ray-tracing days, I recall that the majority of the time spent in the acceleration structure was due to cache misses when traversing nodes.

It might be you want to use a binary partitioning algorithm or similar for just a few levels, and then have the leaf nodes be N spans in a (sorted) list, where N is somewhat large. Then you can have some fast loop to mow through the leaf spans.


Fair point. They mention they do a regular construction and then reduce it to the QBVH, so was wondering if the BIH approximate sort trick could be used effectively.

Again not put a lot of though into it.

Sounds like a fun challenge though.


I think OR-Tools uses a sorted `std::vector` for this in `SortedDisjointIntervalList` [1] (if I get your description of the problem right).

--

1: https://or-tools.github.io/docs/cpp/classoperations__researc...


Interesting, thanks! It makes sense that OR-Tools would have something similar. I wonder if they have an explanation behind using `std::vector` vs. some other data structures. I could imagine `std::vector` doing pretty well but in practice I found vectors (Rust's `Vec` in my case) to be roughly on par with the ordered map I'm using. Obviously it can vary depending on how widely it has to binary search.


Since you're using Rust, the Cranelift JIT compiler implements something like this[0] to construct an e-graph for its expression rewriting subsystem called ISLE, however if I'm not mistaken it is for disjoint sets (not intervals), and therefore it does not deal with ordering.

Maybe you can adapt it for your use case and add those new constraints in?

Keep in mind though that this was not written to be in the hot-path itself, you could probably do significantly better by pouring your soul into the SIMD rabbit hole (though SIMD in Rust is usually very annoying to write)

Best of luck, hope this helps!

[0] https://github.com/bytecodealliance/wasmtime/blob/7dcb9bd6ea...


Thanks for the reference! This looks like an implementation of a disjoint set/union-find data structure. I think it could be an interesting direction to explore by adapting union-find to handle intervals efficiently, or maybe apply some of the union-find amortized operations tricks to another interval-based tree or similar.


I'd recommend a combination of data structures.

1st - Put all the actual data in a sorted / linked list. Keep this updated as you always have

2nd - As a separate thread, Keep an ordered set of lists, each list matching a time in the past, present, future, with increasing increments as you move from the present. Each of those lists can be all the items available at that time. The past is for reporting on old events, etc.

  -365 days
  -180 days
  -90 days
  -45 days
  -30 days
  -29 days...
  -1 day
  -23 hours... -1 hour

  NOW

  +1 hour... +23 hours
  +1 day...  +30 days
  +45, 90, 180, 365 days
In each of those lists, have pointers to all the relevant records in the main list.

If there's a pointer error, scan the whole list, the slow but reliable way.


What I came up with is interval arithemtic. Point is that operations of intervals are lazy (there are two main opeations over set of intervals, union and intersection). When doing one of this two operations, you just provide another view over underlaying data without doing any actual oparation on data.

It turns out doing such algebra, you can use any structure for underlaying representation for set of intervals (it is trivially replacable), and I beleve that using vector would be most efficient.

I do have some protoyping code laying around, let me know if you want to play around with it.


A comment that's orthogonal to the datastructure you're choosing, but do you really need u32?

If you need second level precision for scheduling, you probably do, but a u16 will cover a 45 day range with minute level precision.


You're looking for an implicit interval tree. The key idea is to store sorted ranges and simulate the interval tree via binary search. The best implementation I know of is coitrees: https://github.com/dcjones/coitrees, but you could also roll your own. cgranges is another good implementation: https://github.com/lh3/cgranges


Thank you, I should look into these more. I actually came across them earlier but it wasn’t obvious that these would outperform a plain sorted `Vec` with binary search, which seemed to have slightly worse performance than the ordered map I’m currently using. my understanding is that some of the bioinformatics-oriented interval trees assume intervals are mostly static, so they don’t support updates very well without expensive tree rebuilding after each batch of updates.


Yup, a Vec with binary search is basically all these are. Tree rebuilding is "just" sorting the Vec. To the extent that's fast enough for small updates (merge sort with new elements) you can update rather often without issue.


Check out interval trees.

Example use case: https://chrlschn.dev/blog/2022/11/concurrent-processing-dotn...

    Queries require O(log n + m) time, with n being the total number of intervals and m being the number of reported results. Construction requires O(n log n) time, and storage requires O(n) space.


This is the same problem as `malloc`, right?

I imagine the answer depends on exactly what queries you need. If you have a bunch of different standard event sizes, then managing multiple different sorted freelists for the next open 1 hour slot, the next open 3 hour slot, etc might work. If you need high concurrency and "next available slot" is allowed to be fuzzy, then managing four distinct "event heaps" and parallelizing across them might work.


There are definitely a lot of similarities. I spent some time reviewing allocator designs to see if some ideas might carry over.

One problem I found is that a lot of allocators benefit from being able to track freelists based on some minimum sizes needed for allocations (e.g., https://github.com/sebbbi/OffsetAllocator). This problem is a bit different in that I want to know the first available block even though it might be much bigger than I need. Having multiple lists might still be useful though, but it's not as effective as how they're used in allocators - you'd usually need to search multiple freelists to know the first available instead of one.


What if you structure the "lists" so that each "freelist" contains all slots >= 2^n units of time? Then you only ever need to search the list that's the next size down from your target slot?

This will be bad for write performance, but solves the problem of needing to do multiple searches to find the next available slot, and you mentioned that you might not mind sacrificing write performance.


That's definitely possible and could be a reasonable tradeoff. I'd be slightly worried about the extent that it sacrifices write performance, but it might not be too bad if if the number of levels are kept small.


How about hierarchical timing wheels?

"Hashed and Hierarchical Timing Wheels: Data Structures for the Efficient Implementation of a Timer Facility" by George Varghese and Tony Lauck

http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing...


Sounds interesting, I’ll check it out, thanks! Skimming through it quickly it seems like it may be focused on actual running timers instead of tracking intervals, but some of the ideas might still apply to this problem.


It sounds like you should precompute the duration available at offsets, and let yourself, not simply: search for the first good enough slot; but: retrieve the smallest satisfying interval from a list, precomputed to some acceptable count.

If it's not worth doing that pre-computation, as your trials have shown and others say, ordered maps might be the best available.


How granular are the durations? If days for example you could bucket into N-week intervals (eg intervals with 7N consecutive days) to reduce the search scope. Then you only need to store start times, and can join or use a set to combine other ranges with the same availability.


The duration granularity is days right now, but the total duration of the map might extend out for tens of years or so. I did try bucketing them into hundreds of day long intervals but it didn't seem to help very much.


[Vague response that probably misses the mark]

Your description of the problem smells a bit like job-shop scheduling. Optimizing job-shop scheduling is NP-Hard, so at best one general purpose algorithm might be better than another, but you can't determine which by simple inspection. Efficiency will vary based on the actual details of your data.

The standard approach (after managing IO) is to tune for the regularities in your data. Regularities are what makes your arbitrary data arbitrary data and not random values. Tuning the algorithm can get you a long way in the time domain. No data structure alone can get you out of NP. Good luck.


Thank you! Absolutely, the problem I'm solving is closer to the resource-constrained project scheduling problem (RCPSP) but it's also closely related to job shop scheduling.

I've mostly focused on reducing the search space so far, but I've always wondered if there's a way to significantly accelerate range queries. My B-tree map implementation is already very fast in practice, but intuitively it seems like a data structure should be able to advantage of the fact that intervals are disjoint to greatly improve performance. For example, inversion lists take advantage of disjointedness to reduce storage cost - I'd like to do the same but for performance. Many range queries also have a minimum duration requirement, so it could be useful if a data structure could take advantage of this to quickly exclude intervals during the search.


> Many range queries also have a minimum duration requirement, so it could be useful if a data structure could take advantage of this to quickly exclude intervals during the search.

Check out priority search trees. They search two dimensions, one of them being half-open (that would be your minimum duration requirement). Depends if the other half of your queries fits the closed dimension of a priority tree or if you can adapt it to fit your needs.


Thanks I hadn't heard of RCPSP, but that's not surprising. My first thought was "can RCPSP be expressed as job shop scheduling?" because I have a mild interest in scheduling problems. The mathematics of scheduling provides insight into why-are-things-that-way questions when when it seems like there ought to be something better than that-way...anyway...

My intuition is that "available ranges" sounds like a sorted index and that suggests "SQLite" (or "SQLserver" etc.) as data structure. I mean if you are already building on B-trees, you're conceptually on the way to a RDBMS, just without the robust tooling and low level IO optimizations and tuning interfaces already built and tested and supported with maintenance contracts.

Or to put it another way data is always snowflake. RDBMS's are a great tool for handling snowflakes. And reasoning about them.

Of course I might be wrong about the particulars of your use case and in any organization there's NIH and politics. And maybe an RDBMS was where you started, etc. etc. It's a brownfield project.


The representation I suggested in https://news.ycombinator.com/item?id=41053437 takes advantage of intervals being disjoint.


> Optimizing job-shop scheduling is NP-Hard, so at best one general purpose algorithm might be better than another, but you can't determine which by simple inspection. Efficiency will vary based on the actual details of your data.

Lots of job scheduling can be solved in P. And even most instances of most NP hard problems aren't that hard to solve in practice.


edit: from OP's comments, the library was already performing the coalescing.

There's Discete Interval Encoding Tree (Diet). The basic idea is that, in a BST of intervals, if an insertion fills a hole of two intervals, they get merged into a single node.

The paper: https://web.engr.oregonstate.edu/~erwig/diet/

A Scala implementation example: https://typelevel.org/cats-collections/diet.html


This DuckDB blog post on range joins might be relevant: https://duckdb.org/2022/05/27/iejoin.html


Maybe look into implicit in-order forests (https://thume.ca/2021/03/14/iforests/)


Out of curiosity and domain ignorance, why is the performance of something like your scheduling example something that needs to be optimized so much that you’ve tried so many structures already to no avail?


In this specific case, it’s part of an automatic scheduler that performs hundreds of thousands of range queries in a real-time interactive use case, so small improvements here can make a huge difference to the overall feel of the application. A lot of scheduling applications take seconds or minutes to run queries like that, but that would be way too slow for real-time interactions like I’m doing.

The current approach I’m using takes tens of milliseconds for some large test cases (which is great!), however I’d like to improve on it more so I could eventually run much bigger cases at real-time interactive speeds too.


Does the range query aggregate the elements in range? If the aggregation operation forms a group (follows associativity law, has inverse) you can use prefix sum data structures like Fenwick tree.

If it doesn't have inverse so that you cannot subtract prefix sum, sparse table can be used.


>dense bitsets for small ranges

Have you tried roaring bitmaps?, it will internally use either a sparse or dense representation based on the bits set and uses SIMD for all its operations.


I haven't tried roaring bitmaps on this problem yet. I think it's generally fairly dense so that's why I tried dense bitsets first, but it's a fair point. It might still be beneficial to go with a sparse representation depending on the interval lengths for example.

I also didn't have a good way to measure runs from roaring bitmap's Rust API (https://github.com/RoaringBitmap/roaring-rs/issues/274) which makes it hard to track interval duration, but would be interested to compare this approach.


Thats fair, the rust port isn't as feature rich as the original Java library. The Java version has methods for previous/next absent value which i think would solve your issue ?.

The method is implemented using count trailing zeros intrinsic in the dense container so it is very performant for dense bitmaps https://github.com/RoaringBitmap/RoaringBitmap/blob/d2c6c3bc...

Maybe this can be ported to the rust lib as well.


"range query" is different from "find next availability".

Do consider to explicitly use 16(-ish) bit pointers instead of native sized ones, and go for a SIMD-friendly binary trie; with some degree of implicitnes where it is addressed as if it was storing a set of mutually prefix-free variable-length (capped to something though, like 32 in your example) bitstrings (no entry is a prefix of another entry), but you're not looking to distinguish `{00, 01}` from `{0}`.

Operations are restricted, but it should be able to be very fast for "find first empty of at least X large starting at or after Y" as well as "clear from X to Y inclusive" and "add/mark from X to Y inclusive".


I think this is an interesting idea. It might have some of the same problems as the dense bitset designs I've tried so far though but I'd like to try it out. Is there a detailed description of this anywhere?


Not yet; I haven't gotten around to doing any of the implementing yet.

The main difference is that this is trying to lock it's big-O's to the number of ranges, assuming that the data will not benefit (much) from dumb dense bitset representation.


Could you separate out the data structure, i.e. keep your BTreeMap<u32, u32> for intervals, but add a Vec<u32> for available slots. Update both on insert/remove, then for range queries use the Vec<u32>?

Sorry if this is dumb I'm a> not a rustian and b> not really a computer scientist lol


Definitely possible! I tried doing something similar which stored the ranges in a single `Vec<(u32, u32)>` but found the performance a bit worse than just using the ordered map.


Reading through your other responses, I'm gathering your requirements are something like a throughput of 300 clock cycles per query, working with thousands of items in the data structure. Details are necessary for specific recommendations, but here are a few observations which might be applicable here or elsewhere.

- If you're only querying one of these things frequently, just do something better than a linear scan and ensure the backing memory is contiguous. It'll fit in the L1 cache and be fast enough.

- If you're querying hundreds of thousands of these (e.g., checking different pieces of equipment?), don't do that. Go up one level of abstraction so that you don't have to do hundreds of thousands of queries. Going to the equipment example, some of that equipment will be equivalent for a given query (maybe all of it, maybe each piece is truly unique for a given application, but for any incoming query there exists a category describing which subset of the equipment you care about). The simplest thing that could possibly work is if you have a low enough category count that you can just have an ordered multiset of available intervals for each category, with some auxilliary data for the bookkeeping on mutations. Any time you're looking at hundreds of thousands of disjiont things in a loop though, that's suggestive of the wrong level of abstraction (from a performance perspective). Even throwing it into in-memory sqlite with a multi-column index is probably better.

- It's almost always worth looking at your desired performance for each possible action on a data structure. E.g., I'm imagining "insert 1" is an atomic thing you don't want to feel laggy to the user, same with "delete 1", and same with "query all." If that's a good enough approximation of your requirements then it buys you a boatload of flexibility since you can move work from query time (which, if you have to do a linear scan of subsets of 100k biggish things is going to be pushing uncumfortably close to the limits of random-access main memory loads no matter which data structure you choose) to mutation time. You can linearly scan the entire L1 cache a thousand times in 10ms, which is a lot of budget to compact a data structure, duplicate its indices hierarchially, or otherwise manipulate the data to make queries faster.

- You can make that idea more flexible with some sort of "deferred maintenance" approach. E.g., if you're inserting `n` things, you can almost certainly afford to defer maintenance till after they're all inserted, hopefully saving some work if a few of those were in the same structure. E.g., if the only maintenance you're doing is compaction for cache reasons, you can maintain a counter of how many inserts have happened (or how slow recent queries were), and the next insert once the scale is tipped will be the one to actually do that maintenance -- allowing a simple strategy of blindly rebuilding these things every time to actually scale to reasonable volumes of bulk inserts within a 10ms window (to keep users from noticing lag).

- You gain a lot of power by your intervals being disjoint. You gain more if they're never adjacent (i.e., a booked block is always an endpoint in the data structure or bounded by unbooked blocks, and vice versa). Representing contiguous blocks of bookings as a single unit referencing the actual blocks (that unit could be some sort of balanced tree if you want, easy to split in half on deletions, just hold a pointer to it and its endpoints though and basically never peek into the details till you decide to actually render that contiguous block as being comprised of its parts).


All great points, thank you! I'll need to think about this some more.

For context, queries are trying to find the next available equipment for tens of thousands of tasks, so hundreds of thousands of queries isn't too bad relative the to the amount of tasks. Equipment candidates are matched based on their capabilities (some equipment might support A or B, while others support B or C) and tasks might require certain combinations. This might complicate the multiset slightly but it seems like a good idea.

The ordered map I'm currently using handles adjacent available intervals by automatically coalescing them together, so I'm definitely taking advantage of that.


Yeah, so with capabilities, the naive idea of "categories" might or might not make sense. It depends how much duplication you get (how many pieces of equipment have large counts of capabilities). Most real-world scenarios I've seen only have 2-10 capabilities per piece of equipment though, which makes the simplest, dumbest solution work in the real world (anything fancier pushes closer to the database literature, and there are good solutions, but you hopefully don't have to care).

> tens of thousands of tasks

Am I misinterpreting, or are you saying that you have tens of thousands of queries and a much lower equipment count? That enables all kinds of solutions. Even on a single machine, multi-thread it. For deletions and insertions, use something like bounded-wait hazard pointers [0] with a bias toward cheap reads. If you have concurrent tasks, horizontal scaling is the name of the game. 10 microseconds is an easy baseline for intra-datacenter networking, easily allowing 10 millisecond reads and mutations even with a distributed solution.

Related, simply caching results might get you the performance you want on a single core if you only have hundreds of pieces of equipment you're searching each time. For each capability, just record the last time you mutated it. You can re-use any query result whose associated results don't have potential mutations. Especially if writes are much less frequent than reads, and especially if you have a bias toward not handing out equipment with all possible capabilities when something cheaper would suffice, most of the time you never have to hit the raw data structure, allowing a single core to handle a much higher query load even with worse data structures.

[0] a decent walkthrough: https://pvk.ca/Blog/2020/07/07/flatter-wait-free-hazard-poin...


Multi-threading would be really nice, although it's not clear to me how to parallelize this in a way where the synchronization cost wouldn't regress performance. Tasks are prioritized, so each booking happens in sequence (i.e., each task needs to know where the higher priority tasks ended up being placed). There are cases where it's possible to determine whether they might be fully isolated (e.g., if one booking couldn't possibly affect another) to run those in parallel, but sometimes determining that might be non-trivial too.

I have been experimenting with a few caching approaches, but the caches tend to be invalidated so often that they didn't seem to help much yet. Some of the coarser-grained caches mentioned in some of the other comments could be interesting though, as a way to quickly skip over entire regions when the remaining availability intervals are too short for typical queries.


> multi-threading

This is one of those places where the details matter a ton. E.g. low contention hazard pointers are nearly free (under 10 clock cycles per read), and if writes are less common than reads (seems likely? you present 10-100 reads for available equipment before somebody chooses 1 or 0 of the available options) you'll barely notice the overhead. Synchronization only gets expensive when you have contention.

Examining the problem of allocating equipment, how often do you actually have contended writes on the same piece of equipment? If it's very often, presumably the top few matches are nearly interchangeable (or, even if they aren't, milliseconds separate who wins the non-interchangeable equipment, and from a user's perspective if you're not running a stock exchange they don't know if they didn't win because the equipment wasn't available or because you have a few milliseconds of fuzz behind the scenes). Can you tweak the business requirements and uniformly choose from the top k matches when displaying results to readers to artificially reduce contention?

Adding onto one of the previous ideas, even if tasks are prioritized, if outside parties don't communicate those priorities there isn't much practical difference over the span of a few milliseconds between low and high priority tasks. Sure, if a high priority task has been waiting longer than a few milliseconds you can prioritize it over low priority tasks, but you have a few milliseconds of slop available to quickly execute a "maybe wrong" solution and increase system throughput.

> caching doesn't work

That happens. It's almost always possible to construct malicious workloads against any caching strategy. If your real workload isn't very cachable, that's annoying, but it's fine.

Did you happen to measure the relative performance of a cache hit vs a cache miss? Your perf budget is fairly tight, and I'd be curious to know if the lack of improvement is because the caches aren't optimized or because the misses reduce the gains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: