"However, it’s still not an endorsement of MySQL as much as it is a recognition of Facebook’s database chops."
That's certainly one way to interpret Facebook's dilemma (I know a few DBAs, and this statement certainly isn't false). Another way is that nothing else scales as well as MySQL, or nothing else offers the same tools to manage them at scale.
Relational data stores aren't going anywhere; there are still quite a few use cases they fulfill that NoSQL, NewSQL, and Graph DBs do not. This just triggers the "use the right tool for the job" reflex in me.
Can someone explain this to me ? In terms of acid principles wouldn't you want your transactions to be durable.
>Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.
I'm a cofounder at MemSQL. The explanation around durability and in-memory databases is more nuanced than what is assumed with disk-based systems.
First, it's worth mentioning that you use in-memory databases when you have performance requirements that far exceed physical capabilities of hard disks or even SSDs. The workload will have a high degree of contention, high concurrency, etc.
Second, you're probably working with non-human-generated data. If you're not a bank that needs to guarantee a debit and credit went through (and that's slow human-generated data anyway), then you're looking at in-memory technology because you can guarantee every read will hit memory.
For writes, you basically have to use more machines to guarantee durability.
Any in-memory database worth its salt will write a transaction log to disk, so in the event of a power failure, it will read back from disk into memory.
You can increase the probability of 0 data loss, even in high contention and concurrent workloads, if you run the dataset across a set of machines, which both multiplies the number of disks capable of writing a sequential log as well as storing an extra copy in memory for high availability.
Transaction is being used in different senses there. The first sense refers to an OLTP workload. The second sense refers to ACID transactions. The in-memory architecture would relate to details of how the database manages memory internally. There's no question that updates need to be written out to disk as soon as possible for them to be durable.
I think VoltDB could be huge, but the utter reliance I stored procedures is misguided in my opinion. The reasoning behind it is really to optimize database access at the database level, including targeting features like JIT compilation and leveraging zero-copy, but it's so hard to get this all to work right, unit test, debug, etc... On the other hand, his "NewSQL" approach can probably become a good replacement for many use cases where MongoDB would otherwise be used.
VoltDB currently requires stored procedures for multi-statement transactions, but you can send single-statement ad-hoc queries to it just fine (JDBC/Native Clients/HTTP-JSON). Our SQL has gotten significantly more sophisticated lately and many kinds of interesting queries and updates can be performed with one statement.
Stored procs are used to allow rich transactions without using client-side transaction control. For all of the concerns about stored procs (of which some are valid), the fundamental issue is that external transaction control is performance's nemesis. It's not a coincidence that few No/NewSQL systems support external transaction control, and those that do don't publish many benchmarks using that feature.
I suppose it depends heavily on application, but I don't think external transaction control is necessarily tied to performance. However, that being said, it's usually up to the user to come up with some sort of managed transaction on the application/middleware side which usually needs to be coupled with some sort of database locking mechanism. In reality, that's really hard to get right, even if you "get" multithreading.
Again, I think VoltDB is going places. I haven't been able to actually dig into it personally due to time constraints, and when I realized Command Logging was only available in the Enterprise Edition that effectively killed a few use cases for me.
One of the uses I hope to investigate is to use it as a nice materialized-view playground for bridging data on-demand from various databases, in coordination with Presto:
It depends what you mean by scale. A serious corporate database might have a hundred applications connecting to it, written in a dozen different languages, and the guys who wrote the oldest one retired before the current guys were out of college. How many times do you want to re-implement the same validation logic? How many apps do you want to have to change and test when something in the business changes? How many great engineers can you hire when you mention that legacy maintenance is a big part of the job?
For the 99.999% of people who will never need to worry about Google or Facebook's "scale", stored procedures are absolutely the only sane way to build large systems.
> pushing more and more work into the single hardest piece of software to scale - your database.
I never understood this argument. Maybe you can elaborate a bit? Almost all our apps use sp for data lifting work (most of the so called "business logic"), and we never got scaling problems. Well, we don't have that much users or data either, but how many users/data do you have that your stored procs could not keep up with? What numbers are we talking about?
What makes the db slow is throwing bad sql at it, not Stored Procs.
Just for reference: a simple query by pk index on my veryveryvery low end pc¹ on a table with 1mil rows takes about 0.04 ms (so says pgsql analyze). That is 25000 queries per second. What type of app are we talking here where the db needs to scale? Something like reddit? Stackoverflow? Just curious. Never saw such a beast where a DB server needs that much power. Where I work we have 200 apps on the same DB server, and it's idling most of the time.
¹i5 CPU 2.40GHz with 2GB of RAM
> run Python scripts as stored procs. Great way to slow down your DB.
There is more than one way to shoot yourself in the foot :)
One size could fit everything. With a good flexible query language and an sufficiently smart compiler, it might be possible. Which means: one size, one winner and SQL isn't flexible enough. SQL will fall down to earth.
At some point, I think traditional database platforms like Oracle and SQL Server will build things that make it easier to treat them like a NoSQL store (simpler replication, ability to distribute read only queries across multiple replicas, a few enhancements to sparse columns) and the NoSQL stores will add ACID, a query language, ODBC support etc. At which point talking about the "relational market" or the "NoSQL market" will not make a lot of sense. They'll all just be competing storage engines.
That's certainly one way to interpret Facebook's dilemma (I know a few DBAs, and this statement certainly isn't false). Another way is that nothing else scales as well as MySQL, or nothing else offers the same tools to manage them at scale.
Relational data stores aren't going anywhere; there are still quite a few use cases they fulfill that NoSQL, NewSQL, and Graph DBs do not. This just triggers the "use the right tool for the job" reflex in me.