Hacker News new | past | comments | ask | show | jobs | submit login
Parallel Universe (YC S12) Developing Spatial Databases For Matrix-Style Games (techcrunch.com)
95 points by pron on July 10, 2012 | hide | past | favorite | 42 comments



What a terrible article.

> Oracle and Redis are examples of traditional, spatial database environments.

Redis is traditional? Redis is spatial? What are they talking about?

> Spaccebase is an in-memory technology, which is how it can be so fast. Thats’ what truly separates it from the rest. It tracks not when or what happened but where it occurred. And that all occurs in-memory. It’s like any application that runs this way. It’s very fast in getting data. In this case, we are talking about spatial data.

This entire paragraph is ridiculous. Just a bunch of buzz words repeated ad nauseum. What database does not use "in-memory" technology? I guess that's why "it's like any application that runs this way". Yeah, and French cuisine is like any food that's cooked that way.


Yet we still link to TC articles...

YC startups always launch on Techcrunch and top HN within an hour.

Is there not a better place?


In this case, a direct link to the site (http://paralleluniverse.co/) would be better. There is actual information there. To be fair, the first two words of the TC article were a link to the site. Signal to Noise over the first couple words of that article is quite high!


I remember years ago when a commenter on Techcrunch bashed an article and called Techcrunch "the special olympics of tech journalism." At the time, I thought it was an insult to Techcrunch. Now I realize it's an insult to the fine athletes of the Special Olympics.


I also enjoyed:

> It’s the NOSQL capabilities that makes this so interesting. A NOSQL database is designed for big data.

Still, glad to see the HN comments here provide the perspective and analysis that TC couldn't.


Lol. I shudder whenever a Techcrunch writer starts throwing around the term "NoSQL", as if NoSQL automatically makes something fast.......

Attention buzzword slinging pseudo-technologist journalists: NoSQL is a term which describes what something ISN'T, not what it is.


Maybe they meant MongoDB, which has spatial queries and doesn't live in memory?


couldn't agree more, anyone know what the cutoff is for getting a down vote button on posts!


It's being marketed as a database technology, but from a game development perspective it sounds more like "Engine" technology. The lines are admittedly blurry sometimes, but this doesn't seem to offer durability of data, which is what I think of when someone says "database" in the context of MMO servers. That said, the problems this is solving are important problems for any MMO server engine.

Requiring a JVM be embedded in your server is probably not going to go over well with many game developers. It's not initially going over well with me, but if I were currently trying to solve this problem, I would at least want to run some benchmarks against it.

I guess the big question I have is, how would this make existing MMOs better? If we take WoW as an example, dumping everyone into a single realm (shard) would solve the problem of wanting to play with your friends, who are on different realms. But 200x as many players competing for kills or standing around the auction house aren't going to make the game better. Most of your time would still be spent in instances, with a maximum of 25 players (40 if you go back in time - a case whre less massive was deemed more fun). And that's ignoring the problem of actually rendering that much stuff on the client.

I don't mean to sound overly critical, I could find ways to use this, but I'm not sure they're marketing it very well to MMO developers. Maybe the Eve guys would like it, but other than that game, I'm not sure the scaling problems MMOs have are the same scaling problems that this solves.


"Most of your time would still be spent in instances, with a maximum of 25 players (40 if you go back in time - a case whre less massive was deemed more fun). And that's ignoring the problem of actually rendering that much stuff on the client."

you haven't played Eve Online have you? Its a game that allows you to play, on the same "shard", with hundreds of players on grid at the same time, fighting it out.

Any tech that puts the massive back into MMO is good - lately, its all been instancing and sharding. That sort of game isn't MMO enough to make it MMO. In fact, i've been hearing lately, people calling Diablo 3 an MMO. What a farce.

Edit: oh, didn't you mentioned Eve already.


I always thought instancing and sharding were solutions to the social aspects of server overpopulation - it's not fun to do a quest when you have to wait for spawns, etc.


I don't think it will make existing MMOs better. It will free them from the terrible restrictions that MMOs had to take on (like shards and instances) based on technology, rather than gameplay choice.


I'd be curious to hear more about what they're actually doing (technically speaking). Is this a wrapper around kd-trees, R-trees, and their friends and relatives? Is it something fancier?


The spatial index is an R-tree variant that doesn't degrade and allows concurrent writes (multiples writers at a time, let alone readers). Readers don't block writers and don't use locks, while writers lock a small subset of the database for atomic transactions (one of the most common operations we need to support is a "move" of an object from one location to another - that has to be atomic).

For parallelization of queries and transactions we use fork-join.


How do you do lock-free updates to the R-tree structure?


Most updates aren't lock-free.


I think what pron is saying is that initial writes are lock-free but updates to existing data will require locks in most cases?


As a longtime profesional game developer, I kind of don't get it. Is this company out of business as soon as Oracle implements a loose octree that is ACID? Is there something wrong with spatial hashing?

etc, etc.


It depends on the requirements of your data model. Anybody with a modicum of competence can scale a point cloud but real-world non-point geometry models is where systems like Oracle have difficulty. Polygons, lines, vectors, etc are a real problem. So-called "spatial hashing" (it had a different name in the 1970s and again in the early 1990s -- the wheel of computer science) has a number of real limitations which is why it was never really used (Oracle has patents on it that have already expired!).

Also, traditional transactional database engines are not designed for the kinds of insert/update rates that are common for many spatial applications. This is a problem for machine-generated data sources generally. It requires an architecture designed specifically for that kind of (ab)use case.


How would you do it then?

I mean, if you wanted to implement a 3d spatial db like that yourself?


This a big topic but there are two main components.

First, you need a storage engine architectures that is designed for very fast appends concurrent with queries. This is trickier than it sounds because you can't use secondary indexes and queries still needs to be efficient. Some recent database engines focused on non-batch "real-time analytics" are designed for this; it is a different internal model than traditional distributed analytic engines. Database engine boffins know ways of achieving this, esoteric but well-understood.

Second, you need a distributed interval index i.e. a distributed data structure that can act as an efficient index for 3-dimensional cube types. Scalable distributed interval indexing requires that data models be embedded in a higher dimensionality space, so at least 4-dimensions. The well-known example from literature is multi-level grids but those have many limitations. The state-of-the-art structures are adaptive spatial sieves; advanced versions are computationally efficient even for very high dimensionality cubes. However, these algorithms are encumbered and little has been published on them. (Disclosure: I am the inventor of the first useful spatial sieve algorithms. The idea dates back to at least 1990 but had unsolved theoretical issues until 2007.)

I am building a real-time analytical database similar to this right now, and petabyte-scale 3-dimensional spatial data models are a core part of its functionality. Building fast, distributed 3d spatial databases are achievable, it just requires a different data structures and algorithms skill set than you would use for more traditional database designs.


Is this going to be targeted towards big companies or startups (in pricing) ? I'm working on a game at the moment that could definitely benefit from this tech and I'd love to try it out but I'm bootstrapping so not sure if it would be affordable.


Shoot us an e-mail.


So, say hypothetically, if I wanted to try my hand at implementing a toy version of something similar for fun, what would I need to at least have a grasp at?


Hmm, I wonder how many potential customers they have.


Far more than you might imagine, at least if they develop a more sophisticated geometry model. The real limitation may be the pure in-memory model. Many of the high-value applications have enormous working sets, much larger than what will fit in a small RAM-based cluster. Hundreds of terabytes is where the applications just start to become interesting. In this sense, focusing on game worlds is probably a good idea because it is one of the use cases that will fit within their immediate scaling targets.

They are correct that traditional database spatial indexes are slow and scale terribly, being designed for relatively small and static data sets. It does not sound like they are pushing the state-of-the-art, just meeting an under-served need in the gaming space, a market which I can validate as existing but with somewhat limited revenue potential even if you sign the major game companies. It is a good "base hit" startup opportunity if they can execute it well. (I have designed massive-scale real-time spatial database engines for a number of years; we passed up the gaming market because the size of the market was too small relative to other markets for this technology.)


Actually, we're very much pushing state-of-the-art :)

It's just that we're tackling a different problem - low latency applications - and when I say low latency I mean in the microseconds range. Big spatial data is a very interesting problem, and tracking a large number (though less than billions) of moving objects is another - though very different - interesting problem.

For working sets that don't fit in one machine's RAM we offer a cluster.


While I do not currently work on ultra-low latency spatial databases (more like milliseconds), I have in the past and so have some idea of what is out there. :-) I am not all that familiar with the design of your system so I was mostly working off the scale numbers offered.

The best example I can think of is an ultra-low latency in-memory prototype I designed in 2009 on a parallel cluster. The working set was several billion irregular 3-cubes ranging in (metaphorical) size from birds to hurricanes. The average CPU cost of an access operation was sub-microsecond so the latency was mostly interconnect related (which was a slow but proper low-latency supercomputing fabric). The current work I do uses complex geodetic polygons geometries so the computational cost of operations is quite a bit more but the actual computational cost of the access method is below the noise floor of the network fabric.

You are correct though that if you are mostly dealing with tracking points or cubes then in-memory is sufficient to hold many applications. It is the sensing data that really kills you... :-)


Forgive the basic question: Is cost the only limiting factor for pure in-memory with large datasets?


Cost is a big factor but it is not the only consideration. Even if cost was no object and you could just throw a lot of machines at the problem, designing distributed algorithms that will scale to thousands of machines is a different kind of problem than what you can get away with on several dozen machines especially when talking spatial access methods (which have their own unique nuances).

This is surmountable but few people know how to design the data structures and algorithms required to make non-trivial spatial data models scale to that level. There are significant gaps in the published literature.


Having talked to Ron's cofounder, it sounds like there are a surprising number of applications for SpaceBase outside of gaming. The TC article seems to focus on gaming but I have a hunch it's mostly that they're going after one vertical at a time.


The company I work for is a potential customer. We store thousands of locations per minute in one of our Postgres databases.


Location based services could be big. Asking "what's nearby meeting criteria A,B,C" requires a non traditional backend.


It doesn't necessarily require a non-traditional backend, as you could design your backend using plain old rdbms. What does require a non-traditional design would be your app, however.


Not really, Postgres works fine for most of this stuff, and as of 9.1 without PostGIS

'nearby' means ordering your query using the <-> operator.

'meeting criteria' goes in the where clause.

For example to find interesting things nearby in postgres:

  SELECT * from interesting_things as it where (it.A ='A' AND it.B ='B' AND it.C = 'C') 
  order by it.location <-> point '(current_lat,current_lon)' LIMIT 10
http://wiki.postgresql.org/wiki/Whats_new_in_PostgreSQL_9.1#...


Some of the data sources of interest for location-based applications generate tens of millions of updates per second. I love PostgreSQL but it does not operate at that scale. The index locking alone would kill it even if you weren't touching disk.

Extremely high spatial ingest rates concurrent with queries requires different architecture and algorithms.


Founder of Parallel Universe here.

Sure, Postgres works great and it's not our competition. The idea is that you'll store all of your static stuff in Postgres, and all of your dynamic objects in SpaceBase. We've been approached by companies that need to track thousands of vehicles, each updating its location every second or so, and Postgres just couldn't handle it. For LBSs, SpaceBase complements Postgres.


E-mail us and we'll gladly send you a trial version.


Yeah postgres would fall down pretty fast on that workload without sharding, and it's the kind of ephemeral data that doesn't need ACID semantics.

Do you have a trial? Or other demo? I may be encountering those kinds of problems in the near future and would like to see whats out there. Right now Postgres is perfect for our workload but we've crafted the problem we're solving to avoid the realtime issue.


Is this limited to 2d and 3d space or can it handle n-dimensions?


2d and 3d only. Data of higher dimensions is usually not as dynamic (the data points are created and deleted but rarely moved), and SpaceBase is optimized for lots of updates. Also, higher dimensions require a different data-structure (see: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6...)


n-dimensional would be useful for realtime machine learning applications.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: