Hacker News new | past | comments | ask | show | jobs | submit login
DB-Engines Ranking ranks database management systems according to popularity (db-engines.com)
46 points by X4 on Nov 23, 2013 | hide | past | favorite | 28 comments



What's not apparent from the title is that this is a popularity ranking, similar to the TIOBE Programming Community Index:

http://www.tiobe.com/index.php/content/paperinfo/tpci/index....


Interesting but I wonder how accurate it really is. Two (at a minimum) of the metrics in their average can't really be used to measure popularity (of use):

- Number of mentions of the system on websites

- General interest in the system

There are two different problems with these metrics:

- Both will include people who have negative opinions of the system. There's no effective way to separate out the people who are discussing their distaste for a system (remember comcastsucks.com?).

- Both are subject to misrepresentation caused by external factors such as the quality of the system's documentation, use of forums for support and how consistent the system is in general. Some systems require much more research to be run in a stable fashion.

I've used Oracle, PostgreSQL and CouchDB extensively and MySQL plus a couple other to a (much) lesser extent and I get the sense that the rankings are in the right order (I have no idea about MSSQL) but I doubt that the scale is so logarithmic.


Mention of the database, no matter the popularity, are signals of its use and the activity of it's community. I guess it depends if you consider the definition of popularity as whether it's liked, or whether it's deployed and being used. They should make the raw data available if they want this to be believed. I'd like to know to which metrics most contribute to a specifics db's rank.

I have a problem, however, with the use of google as a source. Not sure how you can trust it to give you an unbiased result now that it produces personalized results using who knows what (ie incognito on chrome still exposes IP addresses)


Yeah, the order looks about right but the jumps in numbers look far too large. DB2 should also probably be a bit lower, after SQLite and MongoDB.


There's little use for me in popularity rankings. Much more useful are TPC-C rankings (http://www.tpc.org/tpcc/results/tpcc_results.asp?orderby=dbm...)


wtf, are you kidding?! There are ONLY commercial solutions from a handful commercial vendors. Yeah, truly trustful resource.. really, have you even looked at the results, I mean dude IBM,MS,ORACLE,NEC and then it starts to repeat...

So I understand that you request a cost per transaction column and it makes sense, but that link sucks.


That's still a useful set of benchmarks. Do you know of something similar that includes F/OSS? I'm assuming tpc.org doesn't include them because open source OLTP isn't as performant. The TPC-C benchmark is particularly interesting to me because I work with high availability systems where eventual consistency isn't something I can consider.


> I'm assuming tpc.org doesn't include them because open source OLTP isn't as performant.

That assumption is incorrect. The open source databases are not included due to nobody having spent the time and money necessary to be included there. Sun did it for PostgreSQL many years ago and got decent results.


Ouch. Your combination of ignorance and arrogance is painful to behold. If you don't know anything about a subject, take an inquisitive approach rather than a dismissive one.

TPC is a non-profit that creates and publishes standard benchmarks that can be used to compare database systems. They had done so for over 20 years, and with widespread consensus among competing vendors (who outright hate each other) that the benchmarks are fair and representative of real work loads.

Mysql and postgresql have both been in the benchmarks published by TPC. They are rarely there because people simply don't spend the time and money to do the benchmarks. Back when Sun was providing commercial support for postgresql, they ran benchmarks for postgresql and submitted them. Anyone is welcome to submit benchmarks of any database, running on any OS and hardware combination.


The fact that MySQL and PostgreSQL are not included in any recent benchmarks still removes much of TPC's usefulness.


Hmm How does Oracle come out that far ahead given they supposedly measured?: - Number of mentions of the system on websites - General interest in the system


It's not really surprising to me. Oracle has been the DB system for enterprise systems for years - they have a lot of corporations running on their platform.


This is a decent list.

Columns I'd like to see added:

1. Language the db is written in 2. Lines of code

Popularity tells me little about how capable the developer is and the quality of the software.


How do the implementation language and LOC counts tell you anything about the capability of the developer and the quality of the software?


See my other reply.

In short (pun intended), it is part of the heuristics I have developed over the years for evaluating software.

I like small programs. That's not unheard of. I also look for programs that compile quickly and easily on more than one platform.

The latter is actually of practical significance, since I cannot as quickly and easily try new software if it will not compile without modifications, or if it takes too much memory or too much time. If it takes me longer to compile some userland program that it takes me to compile my kernel, I'm more likely to look for lighter weight alternatives.

Personally, I have not found popularity alone to be a reliable heuristic when it comes to software.


They list the implementation language for many systems, e.g. for Oracle: http://db-engines.com/en/system/Oracle Lines of code are typically not disclosed by the vendors; could be done for open-source databases, but then it's not obvious what really should be counted (eg. third party libraries/frameworks).


Yeah, it's not easy to measure, since there is so much use of third party libraries.


A famous computer entrepreneur had this† to say about the usefulness of the LOC metric: “Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”

http://www.goodreads.com/quotes/536587-measuring-programming...


Thank you. But that computer entrepreneur's products do not interest me. In fact they have cost me countless hours of wasted time.

Another "computer entrepreneur" I know writes terse code. His company is on that db list. I'm not sure he has anything to say on the matter of LOC. He keeps his programs small and simple. As far as I'm concerned, that speaks for itself. And, if having the world's largest investment banks as loyal clients is any indication, his code is high quality. Anyway, I consider the code high quality.

Now, I cannot prove that LOC is related to quality, but I know several programmers whose software I like very much who have a keen appreciation for brevity. Keeping things small and simple tells me something about the mind of the programmer. Not to mention is saves me time: less to read. I like things small and simple.

Perhaps not all 'computer entrepreneurs' agree when it comes to software aesthetics.


Based on that quote and my understanding of aeronautical engineering, I would say Mr Bill agrees with you. Even if, as you imply, he has been less than successful in achieving the goal.


Where's Datomic? Maybe it's excluded because it relies on other databases as storage services?


It would be great if column oriented DBs weren't tagged as relational databases!


Column orientation is an implementation detail. It has nothing to do with whether or not a DBMS is relational. Whether a system should be classified as relational depends on how fully it supports the standard SQL interfaces, not on how it works under the covers.


It would be interesting to have a programming language with an integrated OO DBMS.


It's cool to see graph DBs like Titan and Neo4j moving up the ranks.

There are some major advancements coming down the pipe in the world of open-source graph computing that will make working with big graph data accessible to anyone, not just the Googles, Facebooks, and Twitters of the world.

Here are some of the things coming down the pipe:

Titan 0.4 was just released https://github.com/thinkaurelius/titan), and the number of backend datastores Titan supports is growing. Datagrid support was just added for Hazelcast, and it can serve as the reference implementation if anyone wants to add support for Infinispan, Galaxy (http://puniverse.github.io/galaxy/), or one of the other datagrids.

MapR is in the final stages of certifying Titan on M7 Tables (https://groups.google.com/d/msg/aureliusgraphs/RTeFVssIvoI/m...), which will allow you to run Titan on HBase without all the HBase complexity. AWS (http://aws.amazon.com/elasticmapreduce/mapr/) and GCE (http://www.mapr.com/products/google-cloud-platform) already have direct support for M7 so it's easy to spin up a cluster.

TinkerPop3 is scheduled for release within the next six months (https://github.com/tinkerpop/tinkerpop3/wiki), and it will blur the lines between graph databases and graph processing engines.

Marko just released the first version of TinkerPop3's OLAPGraph -- this is Blueprints for graph-processing engines, which means that in addition to OLTP graph databases like Titan and Neo4j, TinkerPop3 will support OLAP engines like Giraph, HAMA, Faunus, GraphLab, and the new GraphX engine in Spark (http://amplab.github.io/graphx/, http://www.youtube.com/watch?v=mKEn9C5bRck&list=PLbDk7g7PotW...).

You'll be able to run Gremlin over any Blueprints-enabled graph-database or graph-processing engine, and Gremlin will be able to jump between the database and processing engine, depending on if it's a local or global graph algorithm (http://markorodriguez.com/2011/04/19/local-and-distributed-t...).

And Marko's recent breakthrough on swarm computing over derived graphs (https://groups.google.com/d/msg/gremlin-users/1KObZ8F2d00/CJ...) means you'll be able to run traditional graph algos over property graphs. This will pave the way for the community to construct a massive library of graph algorithms in Gremlin (https://github.com/tinkerpop/furnace/wiki).

All of this is coming together now. 2014 will see a major leap in open-source graph computing.


I understand very little of what you just posted, but purely because of my own ignorance. This is wonderful, I now have a reference doc on what to read up on this weekend. HN FTW.


Amazes me to see that there exist this many databases.


Yeah me too. I have the feel that developers find themselves attacking the problem of compression, entropy and abstraction of meaning more and more often these days. Which I see as a good future, while I hope that some convergence of the evolutionary better DBs happens. Commercial DBs rot and die in popularity compared to opensource solutions, which marks the value of our time, that is cooperation and usefulness, over individual benefit.

Can't wait for 2014, and not just because 2013 was a bad year.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: