Hacker News new | past | comments | ask | show | jobs | submit login
GUN 0.7.9 – 15M read/sec, 15K write/sec, 2K sync/sec MIT Licensed Graph Database (github.com/amark)
142 points by marknadal on June 11, 2017 | hide | past | favorite | 47 comments



That collision resolution algorithm looks like a doozie from a "malicious peer" point of view.

Updates in the past are "recorded and discarded", updates in the future queue up. First, let's see if we can run our peers out of memory with a few (billion) quick future updates. Funny thing, gzip; it's so easy to compress highly repetitive patterns. Could also just try and spam the historical log too, could we fill the disk as well as working memory?

If we don't run everything out of memory, let's just write out a few billion updates for every state interval, and make sure it evaluates as "greater than" (yay, JavaScript) any real value. Those updates should preemptively overwrite every other update that comes in.

Do you have an operating window of less than, say, 300ms? Heaven help the poor client from Sidney who keeps trying to update a master in London. Their updates will always be discarded (I'm not certain, but the docs read as if this occurs even when updating a value which hasn't been overwritten by a future state). Darn you, speed of light; why can't you be just a little faster?

I guess you can only hope that your clients all decide to be honest and never change your code. Or get their state counter (or clock) too far out of sync.


This is a really great comment, thank you - hopefully I can address some of the points you brought up:

- LRU/GC hasn't been added (planned for v0.8.x), so you are correct sending in a bunch of updates will crash the peer currently. The "out of bound" updates (for everybody else not sure what the parent is referencing, see this tech talk where I explain what is going on: http://gun.js.org/distributed/matters.html ) are volatile (this is intentional) because they are considered potentially malicious, so upon crash they'll be lost as it is the origin peer's responsibility to retry updates (and gun automatically does) until ACKs have been received. Thankfully, updates that are "within bounds" will be kept safe.

- However, you are right (in your "If we don't...") "within bound" updates may also be malicious. GUN's base algorithm is designed to work in an entirely ad-hoc anonymous mesh network. To deal with this, we just recently announced our Security, Encryption, and Authorization framework to handle trusted peers, here: https://github.com/amark/gun/wiki/auth

- GUN is master-master, so no, even with a sliding window of 300ms from latency shouldn't effect the data. Your Sydney to London example is good, I've played a live action game built on top of GUN from Australia <-> USA, and even with P2P logic (no master server) it is responsive. The whole space ship game is only 190 LOC and you can play it here: http://gunjs.herokuapp.com/game/space.html (Warning: it is kinda a lame game, but proof conflicting updates in game states work just fine.)

- Also, fun fact, in a Master-Master system, the clock drift/skew on machines from different continents can become quite bad. Since we don't have atomic clocks like Spanner, we had to write a P2P version of NTP that runs along side the game. You can test how well it works across your devices here: http://gunjs.herokuapp.com/game/nts.html

- Additionally, very rarely are you going to have that many writes within a 300ms span on the /same/ record. However you very easily might have that many for a Twitter like app. Back a year ago, we ran this load test on a prototype storage driver for work loads like this (append-only) and scaled to 100M+ messages for $10/day (all costs: CPU, disk, backup), check out the proof here: https://www.youtube.com/watch?v=x_WqBuEA7s8

Thank you very much for writing your comment. Did I miss anything? More than happy to address any other concerns. The more we can challenge the assertions/claims of database vendors (like me), the better the industry will be off. Let me know if I can answer anything else. Thanks again!


It would be good to see your claims backed up with data and benchmarks. I would love to see you run the Jepsen test suite against GUN also, so it's clear how it behaves in weird circumstances.


Agreed! I know Kyle, and we're planning on having him review the system, but we're still polishing some things up on our end and he's backlogged. You'll definitely hear about it when it happens.

As for the other claims, most of these links are already scattered throughout the comments, but here they are again:

- Numbers on the performance benchmark: https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2... , also see https://youtu.be/BEqH-oZ4UXI , and run it yourself clone repo/test/ptsd/ptsd.html (we'll make this easier in the future)

- Tests, load/scaling, split brain, and other: see https://youtu.be/x_WqBuEA7s8 , https://youtu.be/-i-11T5ZI9o , https://youtu.be/-FN_J3etdvY , with https://github.com/gundb/panic-server you can run (let me know if you need any help getting it set up) https://github.com/amark/gun/blob/master/test/panic/load.js , https://github.com/amark/gun/blob/master/test/panic/holy-gra... .

Anything else I can provide?


Some people asked why this release is so important:

- It fixes several critical bugs that happened during the performance rewrite. Example: If a server crashed and had its data wiped, there wound up being some sync issues. But this release fixes those.

- First time for us to hit 2K table inserts/second synced end-to-end across a federated (browser <-> server <-> server <-> browser) network topology. This load test was running on low end hardware, so expect better results on better hardware.

- These tests are now available for anyone to run, using our distributed testing framework called PANIC, which simulates failure cases (inspired by Aphyr's Jepsen.io tests). Code and docs for it at https://github.com/gundb/panic-server .

- - If you want to run (or write your own) please read through this well-commented 300LOC test: https://github.com/amark/gun/blob/master/test/panic/load.js .

- - The test that was added in this release simulates what happens to GUN in a split brain network partition during a server loss. We expect the data to converge once the network heals (it previously was not, but now does). The PANIC test for this is here: https://github.com/amark/gun/blob/master/test/panic/holy-gra... (Warning: not commented, please see the previous test to understand what is going on)

Happy to answer any other questions. For anybody using GUN, this is one of the most important releases and upgrading is strongly recommended.


What are some appropriate real production use-cases for this? Particularly since you can't trust the user to tell you the 'real' state of a resource, the REST model is to send discrete state changes to resources and having an explicit endpoint to validate that the change is allowed (and logical).


We're currently seeing these use cases:

- Autonomous IoT on mission critical hardware. (We rolled out a pilot with a government for this)

- Distributed machine learning. (https://github.com/cstefanache/cstefanache.github.io/blob/ma...)

- Games (built with React! https://github.com/PsychoLlama/connect-four ) and realtime GPS tracking (https://youtu.be/7ALHtbC9aOM).

- And we're chatting with a bunch of companies doing some really cool stuff: robotic manufacturing, ethereum/blockchain 'lightning network' state channels, realtime decision engine / predictive fraud detection, federated home servers / DNS on top of gun, multiplayer VR, and more.

You should NOT use gun for any globally/strongly consistent data, like bank account balances and such.

Ever since we introduced our Security, Encryption, Authorization framework, people are starting to build more user facing apps, like P2P crypto social networks, and collaborative tools.

Which leads to.... if you cryptographically sign the data, you can trust that the user's copy of the data! It is very untraditional, but that is what we are trying to push/encourage. We even produced an entire animated explainer series, 1 minute each, to explain the concepts: http://gun.js.org/explainers/data/security.html !

REST is a very good model, I actually designed gun around the best parts of it - GET and PATCH (although for simplicity it is just called PUT). Graphs are a stateless representation of the data, which when statefully connected (via WebSockets or something), allow us to efficiently transfer only the delta/diffs and maintain sync. While you can run a traditional server with gun as an endpoint for validation, it is actually possible to do validation client side (as long as all clients are running the same rules) as well.


> - Autonomous IoT on mission critical hardware. (We rolled out a pilot with a government for this)

> You should NOT use gun for any globally/strongly consistent data, like bank account balances and such.


Autonomous IoT data is exactly opposite of needing global consistency. I can't say what it is, but this system isn't even hooked up to the internet. It needs immediate local and autonomous decisions made, part of why they chose gun.

@jimktrains2 it is nearly impossible to lose data with gun because of its master-master replication even into the edge/peers. It is about as fault tolerant as you can get (check out the links to the tests I mention in the other comments).

I understand now how it might have sounded confusing, sorry about that! Mission critical systems often require HA, not master-slave. Master-slave is very important in other use cases.


> - Autonomous IoT on mission critical hardware. (We rolled out a pilot with a government for this)

What mission critical system are you working with where it's OK to lose data?


For god's sake, man, run a linter on this thing. The mixing of spaces and tabs alone makes the code embarrassingly difficult to read.

http://imgur.com/a/XlSQD


Fix your editor and set the tabspace to the correct value ? I'm guessing 2 spaces is the right value in this case.


Or maybe it's the developer who needs to fix his editor. Proper editor should convert tabs into spaces or vise versa and not create complications for everyone else.


Last I checked I don't have access to "fix" how GitHub displays source code, although it looks like the maintainers can add an .editorconfig to their project to address this particular case.


   We're getting even better numbers on other devices:

   Android phone, ~5M ops/sec.
   Macbook Air, Chrome, ~30M ops/sec.
   Macbook Pro, Chrome Canary, ~80M ops/sec.
   Lenovo netbook, IE6, ~100K ops/sec

   [...]

   These numbers represent a breakthrough in performance not possible with other databases. 

Care to share some details on this benchmark? What kind of operations were these? Your results don't really sound likely for any kind of non-trivial operation - is it possible that you were testing a routine that was optimized out (to a static return/noop) by the JIT?

from https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2...


yeah 5 million "ops"/second on an Android phone? I want to know what the "ops" were. Is it just reading a value from a static memory address? Network request? What's going on here?


JS performance is terrible, it took us a year to rewrite everything from scratch to get these type of numbers.

As mentioned near the bottom of that link, they are on cached reads which is unrealistic in real world settings but a useful baseline number to compare against.

To understand our methodology on the benchmark approach, check out this tech talk on all the problems we encountered while testing (including things like you mentioned, dealing with how the JIT optimizes, or often times in Chrome, doesn't): https://youtu.be/BEqH-oZ4UXI

However, a perk of the push-based model (realtime data sync like with Firebase), is that cached reads become a lot more realistic because often the data will already be there before you read it (unlike with a pull/poll based model).

Any details I can expand on for you?


> Android phone, ~5M ops/sec.

what are the "ops" in this case?


Sorry, I was trying to reply to both of you but I was not clear - I apologize. The ops are cached reads also, it was the same test across different devices.

For people wanting to run them yourself, clone the repo and go to test/ptsd/ptsd.html - we need to make it easier in the future though.


Look not to sound harsh but since you're claiming the "80mm ops/sec" as a technical achievement "not possible with other databases" I found it to be fair to actually review the benchmark you linked. What you're benchmarking for the "read" is this:

    benchmark(function(){
      gun.val(ok);
    });
Looking at what the gun.val call does it seems to be more or less an identitiy/noop function (that simply returns its input).

It appears your benchmark is basically testing how fast you can do a simple javascript function call which has nothing to do with any database operation whatsoever? How is that useful?

The benchmark doesn't reflect the performance of your gun database more than it reflects the performance of _any javascript application_. The only thing that is being benchmarked here is the javascript interpreter. Seriously, this is like somebody claiming their database runs at 4 billions ops/sec because that is how many instructions the CPU can perform.

Given how much effort you (clearly) put into your marketing and the way you worded this as a "cached read" I think you probably know all of this though -- so here's a plea: Please don't pull these kinds of completely dishonest stunts (you even put the BS numbers in the title of this submission). You just make others in the javascript [database] space look bad by association. Cheating on benchmarks will not convince anybody but the most junior web developers.

source (hopefully I'm wrong ;) ): https://github.com/amark/gun/blob/master/test/ptsd/perf.js#L...


The nitty gritty details are on this readthesource.io podcast: https://youtu.be/70dn1oZQFCk (watch on 2X speed).

Each process does concurrency control and then has a centralized in-memory cache for the values (like what a lot of other in-memory databases do). So when `gun.val(cb)` is called, prototype context holds its value and is able to do an immediate read - without this, JS is so slow that every function call logarithmically decreases your performance (see the previously linked https://youtu.be/BEqH-oZ4UXI ).

That is the advantage of the realtime/push-based model, you can cache most data before it is even read. Even in memory databases, like Redis and others, do this so there is no BS here, but I appreciate you trying to call it out. Additionally, not all in-memory reads are equal - my previous implementation of gun could only do a thousand or so reads/sec despite being cached. And please compare against other in-memory javascript database benchmarks: https://github.com/techfort/LokiJS/wiki/Indexing-and-Query-P... (Joe's work is very good!)

I sincerely wish it was as easy as leaving it up to the JS interpreter ;) but I've unfortunately found JS to be very slow. :P

Hit me up with any other hard questions or if I missed anything. And please keep on trying hard to call out database vendors - I agree, it is extremely important to keep us open and honest. :) Cheers!


I don't really get what you're trying to say.

Are you defending the fact that you benchmarked whats essentially a "function() { return 1; }", called that a "database operation" and then proceeded to claim to have achieved a level of performance "not possible with other databases"? Can you really not see how that is wrong on more than just one level?

Or are you saying it's okay because other "javascript database" vendors are also cheating in their benchmarks?


Woah woah, that is not true! `gun.val(cb)` performs a real dynamic read, you can even uncomment the `console.log(val)` to see it printed in the debugging console (although note: `console.log` has terrible performance, so don't have it on when you are actually benchmarking).

Other javascript databases and even other regular in-memory databases do similar benchmarks. We (both them and us) note at the bottom of the article that: "Take all performance testing benchmarks with a huge grain of salt." Which is why we have our PANIC tests (see my other comments on where to find them and how to run them, let me know if you need any help).

There is nothing unethical about our tests (and they are not the only tests), so /please/ call me out, but please don't misinform other readers that the op is a static function, that is misleading and damaging. Is there anything specific I can do to alleviate any concerns?


The title and the first line of the README

> GUN is a realtime, distributed, offline-first, graph database engine. Doing 15M+ ops/sec in just ~12KB gzipped.

But no mention of any guarantees provided by the system. What kind of transactions are supported? are writes consensus-based, best effort, eventual consistent?

The "GUN Survives a Primary Fault" raises more questions than answers. The "primary" appears to be the only server, and the clients are simply caching un-confirmed updates. Unless every client is a full copy of the database (which would seem prohibitive for the types of things this is aimed at?) there wasn't really any fault tolerance showed here.

Also, all clients resolved to the most recent update, the one didn't fail because the row had been updated between the last read. While that's OK for some designs, it just smells fishy. I shouldn't be able to trash someone else's update.

I feel like this is yet another datastore trying to call itself a database, but offering no guarentees of any kind that your data is safe.


The README mentions it in the #documentation section and links to these resources: (let me know how they can be improved)

- CAP Theorem tradeoffs: https://github.com/amark/gun/wiki/CAP-Theorem (We are AP, which is eventually consistent)

- My tech talk of how the CRDT works: http://gun.js.org/distributed/matters.html (compared to consensus protocols)

Clients keep a copy of data that they are interested in (not the whole data set, correct), and together they can reconstruct the entire data set. You can also run multiple server peers that backup the whole data set, or use a shard key to determine which subsets. If this isn't fault tolerant, would you mind giving me some examples of what is?

Yes, good point, to prevent data from getting trashed you need to run auth on the system, more info here: https://github.com/amark/gun/wiki/auth

We certainly aren't a storage engine, but we are a database/datastore (whatever you want to call it), or more appropriately: an open data sync protocol. I have tried to make our guarantees/tradeoffs very well laid out in the above mentioned articles/README, anything I can do to make this more clear/obvious?


> - My tech talk of how the CRDT works: http://gun.js.org/distributed/matters.html (compared to consensus protocols)

Maybe I'm just being silly, but that talk describes the problem. It doesn't really going into detail about the CRDT(s) you're using.

> Yes, good point, to prevent data from getting trashed you need to run auth on the system, more info here: https://github.com/amark/gun/wiki/auth

I wasn't talking about auth.

If conflict resolution is timestamp based, It feels like you could very easy end up with inconsistent data. I update field Y based upon the value of field X I see. So does someone else, but based upon a different value for field X. If I can't wrap this this in a transaction, then I could update fields based on bad/out-of-date data.

> Clients keep a copy of data that they are interested in (not the whole data set, correct), and together they can reconstruct the entire data set. You can also run multiple server peers that backup the whole data set, or use a shard key to determine which subsets. If this isn't fault tolerant, would you mind giving me some examples of what is?

I mean, I guess if you accept being able to loose data as fault tolerant, then OK. Having multiple, dedicated, full-database backups is what every other distributed database does as well.

I mean, perhaps I'm being too hash; there are many usecases, particularly with data of little value, that systems like this make easy to build. I just feel like so many people are working on datastores with loose guarantees about anything and hail them as awesome, new things.

Heck, even Wave had a more refined system where you wouldn't just randomly loose data.

You're not Google. You're not Amazon. You don't need to embrace all of this looseness and lack of guarantees.


Thanks for the reply! I really appreciate the dialogue.

Yes, the first half of the talk reviews the problem, but the last half (starting with the boundary functions) explains the CRDT. If that wasn't helpful, then there is also this article on the implementation (not written by me) https://github.com/amark/gun/wiki/Conflict-Resolution-with-G....

Timestamps are bad, yes (I mention this in the talk as well), GUN uses a hybrid vector/timestamp/lexical CRDT. Lets take your analogy "updating Y when someone else sees X", a perfect example of this is realtime document collaboration (gDocs, etc.). Even with GUN, you'd not want to have the collaborative paragraph as a value on a property in the node. Each value is treated as atomic, which if two people write at the same time would cause what you are saying: them to overwrite each other. Instead, we can preserve the intent by running distributed linked list (a DAG, actually) on gun, this works quite nicely. See:

- Early working prototype/demo here: https://youtu.be/rci89p0o2wQ

- Based off the interactive data algorithms explainer article here: http://gun.js.org/explainers/school/class.html

If instead, you don't want the results to merge, but indeed be atomic (only one person or the other "wins") then even with transactions only one is going to overwrite the other. Transactions don't help unless you have a journal to rollback from - and guess what, that is possible with gun too. Or alternatively, if you just want users to be informed of the conflict such that they can decide, it is trivial to store both updates and then present them with the conflicting values to choose from, which it then saves.

Does that make sense? Do you see/have any problems with those approaches? Thanks for your input so far.

Regarding clients. Yes, absolutely, you should still run multiple dedicated full-database backups. I don't disagree with you on that point, in fact we make it easy and scalable (see our demo video of a prototype storage engine that did 100M+ records for $10/day all costs - servers, disk, and backup: https://youtu.be/x_WqBuEA7s8). The unique thing about GUN is that it is still capable of surviving complete data center outages, because the data is also backed up on the edge peers.

You can also reduce your bandwidth costs by having edge peers distribute data to their nearby peers, versus always having to pull from your data center. Aka, the Bittorrent model.

Lets be clear here, there is a big difference between Master-Slave systems and data guarantees. Databases like Cassandra have better data availability guarantees because they are HA (and gun is the same), despite not being Master-Slave.

But at the end of the day, you are right: Your not-losing-your-data is only as good as how many full replication backups you have. What I hope to have communicated to you is that gun makes it ridiculously easy to make full (and partial) replications beyond traditional databases/datastores/whatever-you-call-thems, and I hope you do think that is awesome.

Just don't use us to balance bank account data, cause we don't provide those types of guarantees, but we do provide the HA / AP / fault-tolerance ones. :)


> Timestamps are bad, yes (I mention this in the talk as well), GUN uses a hybrid vector/timestamp/lexical CRDT. Lets take your analogy "updating Y when someone else sees X", a perfect example of this is realtime document collaboration (gDocs, etc.). Even with GUN, you'd not want to have the collaborative paragraph as a value on a property in the node. Each value is treated as atomic, which if two people write at the same time would cause what you are saying: them to overwrite each other. Instead, we can preserve the intent by running distributed linked list (a DAG, actually) on gun, this works quite nicely. See:

This is actacly my point. And GDocs is a perfect example of what _I_ mean. The Wave protocol (and operational transforms in general), or I guess they use a decedent of it now, was made for exactly these types of use-cases so that you don't loose data unexpectedly when editing values.

> Or alternatively, if you just want users to be informed of the conflict such that they can decide, it is trivial to store both updates and then present them with the conflicting values to choose from, which it then saves.

> Does that make sense? Do you see/have any problems with those approaches? Thanks for your input so far.

How is it trivial? Do you have array-value types? What's trivial about it?

And yes, I do have problems when the default conflict resolution method is to simply select a value and toss the other out. It's a bad default. You have a default that causes data loss, it _will_ come back to bite you.

This is perhaps the most egregious thing and honestly makes me have 0 trust in your system.

> Lets be clear here, there is a big difference between Master-Slave systems and data guarantees. Databases like Cassandra have better data availability guarantees because they are HA (and gun is the same), despite not being Master-Slave.

I never mentioned master-slave. You can set up most SQL databases to be async- or sync- master-master.

> You can also reduce your bandwidth costs by having edge peers distribute data to their nearby peers, versus always having to pull from your data center. Aka, the Bittorrent model.

Maybe I'm not sure what you mean by edge here. Do you mean it in the CDN sense, or in the client sense? If you're expecting your clients to be part of your DR scheme, I'm sorry, but that's moronic. You can't guarantee client availability. You can't guarantee that all available clients will even have a copy of all data. Sure, it's great that you're using the same protocol between servers, but you can't say that because of that clients can be part of DR.


Do you plan to add presence to GUN? One of the key features that we require from Firebase is the ability to tell the server to delete a key in the event the client disconnects.

https://firebase.google.com/docs/reference/android/com/googl...


Not yet! Good suggestion though. Deletes right now are currently very difficult in gun. But we plan on fixing that in v0.8.x releases. Your idea of being able to do it on disconnects is a great one, I'll add it to the to-do list!


I checked GUN out a while back and it looked cool, but I wanted something to use with react-native.

Now there seems to be a package to do this (https://github.com/staltz/gun-asyncstorage), but I am still unsure about how production ready this is. Any thoughts in general or experiences with Gun and react-native?


Yes, Andre Staltz's (of CycleJS) work finally got it working on Android and iOS with React-Native!

We're planning on having an example/starter app for it soon (unfortunately, Google Chrome and iOS follow the WebSocket spec just slightly differently, so it errors on iOS currently but it is an easy fix, we just need to figure out a way for them to both work simultaneously).


That sounds great :). Any place I should keep an eye on for updates?


The https://gitter.im/amark/gun is the best place, but it is also pretty noisy with chatter. Else you can probably just keep your eye on Andre's twitter, he's posted his other stuff there. Anywho, would love to learn what you are working on! Shoot me an email or something. :)


Come on man, do we have to do this everytime ?

Next time post on a weekday so you'll get maximum criticism.

They should've implemented scylladb on top gundb, not amateur-designed-by-kernel-hackers seastar-framework.

Do reavaluate your time, seriously & sincerly.


> ... and then when the network comes back online GUN will automatically synchronize all the changes and handle any conflicts for you.

Any conflicts? How?


Very important question! The best explanations that cover how it works, as well as its tradeoffs/weaknesses are here:

(Note: You should not use GUN for any data you need strong/global consistency with, like bank account balances and such. There are much better solutions than GUN.)

- http://gun.js.org/distributed/matters.html

- https://github.com/amark/gun/wiki/CAP-Theorem

- https://github.com/amark/gun/wiki/Conflict-Resolution-with-G...


This looks really cool. I'm no database expert, but a decentralised graph database is an intriguing idea. Has it been tried before? (decentralised databases I mean, not graph databases)



I'm building one, slightly different use case though.


Nice! Have any links to share?


not yet but perhaps in the future. We're hiring fwiw.


I hope it was a sarcastic comment.


The docs seem to indicate you do not store properties on relationships? Is that right? Then why call it a graph database when it is really an RDF store?


You can do this, but it is not a built in primitive (like it is in Neo4j). We chose to do this because the majority of edges are usually directed, and we wanted to keep the base system as small as possible (all of gun is only about 12KB). It is easy to create "edge" nodes that store relationship properties, but nobody has created a full feature extension/framework for this yet, but I expect one will be built because this is a very useful feature to have.


This poster has been working on this and spamming it all over HN for years, it's been proven to have major flaws and full of misleading claims with all those posts.

millions of reads per second on mobile devices for a distributed database? just come on, why even post this?


Throughout the rest of the comments I've tried to provide a bunch of evidence/links for the performance claims (and tests that you can run yourself). Let me know if I missed anything and I'd love to chat about it!

But proven flawed? Could you send me some links/evidence on that? I'd love to do my best to reply/address any concerns.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: