Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Learn to stop using shiny new things and love MySQL (pinterest.com)
56 points by nreece on Oct 10, 2015 | hide | past | favorite | 62 comments


I agree, I've found through painful experience that the shiny and new, while incredibly fun, tends to cost more than it's worth.

With that said, if you're going to take the prudent route and use an established database platform, why in the name of all that is holy would you pick MySQL over Postgres?

Granted, I am rather biased, but I do have extensive experience using both in anger. Every time I have to use MySQL I walk away with the impression that it was built with the same design mentality that PHP was. The whole point of a transactional RDBMS is to keep your data as consistent and accurate as possible. I cannot in good faith use a database that (silently!) will coerce badly formatted input into what it thinks should be in the column.


I don't know why anyone in 2015 would pick mysql when postgres is an option. Live postgres. So nice to work with. It's json support is amazing too


Agreed. I consider MySQL a red flag and avoid any projects or teams that choose it over Postgres. Postgres is reliable, flexible, fast, and well documented. It's one of the crowning achievements of the open source world.


> I consider MySQL a red flag and avoid any projects or teams that choose it over Postgres.

I am in the "pro Postgres" camp, but this attitude is too black and white for me. For many projects Postgres in superior to MySQL, but there are situations and requirements where MySQL makes a lot more sense.

For just one example, I can get MySQL as a service much cheaper and easier than I can get Postgres as a service. Which isn't to say Postgres as a service doesn't exist (it does), just to say MySQL is more readily available and there is more competition in that sector.

Additional if I had a DBA who was a MySQL expert and didn't know a thing about Postgres, I'd go with MySQL for my project. Since a good DBA is worth their weight in gold, and an expert in a particular DBMS can be a massive asset to a project.

But given no restrictions or needs, I'd pick Postgres as my "default" choice. I just don't think it is right to say you'd avoid a team/project just because they picked MySQL, sometimes there are completely legit reasons (or politics).


What I love about Postgres is how it's the only remaining innovator in the SQL space. It's like Microsoft and Oracle were circa 1999 (the last true innovation out of either was sparse columns in MSSQL). Genetic algorithms, the works. Everything else is mostly stagnant and archaic.


Depends on your data. Sometimes (particularly when you have only 1-3 tables) MySQL is much faster and you have to go out of your way writing stored procedures to make Postgres compete.

Example:

   select COUNT(1), status FROM some_table GROUP BY status ORDER BY status; --  Postgres 10x slower.
Also, mysql CLI is IMO much nicer than the postgres version. Better and faster autocompletes.


Regarding CLI: mycli and pgcli are both much nicer than the built in alternatives.


I'd like to see your full setup for your claim about 10x slower, because I suspect you're doing something wrong.


Sure, here's the pastebin of the postgres one: http://pastebin.com/Phc8KnYU

I thought I was doing something wrong, so I asked on SO:

http://stackoverflow.com/questions/32755348/group-by-count-q...

This is a very simple example, but I always run into major performance issues with Postgres. Some of them are my fault, from not knowing that array aggregation is a slow operation.

MySQL advantages for me are that there is an actual UPSERT operation (an inferior version of which might be added to Postgres 9.5), that the CLI is miles ahead, and that it's much faster for a small number of tables with a lot of data and with non-complex operations.


I'm going to ignore your baseless (and incorrect) assertion about "an inferior version" of UPSERT (MySQL's doesn't work properly except for the primary key). Your problem is that you are not doing an index only scan. The reason you are not doing an index only scan is, most likely, that the visibility bit is not set on the table. This is usually as a result of excessive writing to the table, but sometimes it is due to incorrectly tuned planner settings. Try setting your cpu_tuple_cost higher and/or cpu_index_tuple_cost lower. I have had to do this in the past to get index only scans to work. BTW, a quick way to force the issue is to SET enable_seq_scan TO off in psql.

I would add, you should really upgrade to 9.4, as there were some performance improvements to aggregates.


Thanks for the tips.

I am not sure in which cases the MySQL one doesn't work correctly. ON DUPLICATE KEY UPDATE checks every unique fields in table, not just PRIMARY key.

Also, I'll try these suggestions, but you can see that it can be frustrating that simple queries don't work quickly or randomly work really slowly and you have to find some configuration settings to hand tune.


You know what, about two weeks ago I was starting a new Rails hobby project, and I thought "all I've heard about for the past year is how everyone loves Postgres", and was about to throw it in there, when I saw a link to Google search trends comparing the two, and MySQL is still so far ahead: https://www.google.com/trends/explore#cmpt=q&q=mysql,+postgr...

It's quite possible this is just due to people supporting existing applications, but I dunno, I am tempted to stick with MySQL for new stuff for the next little while.


MySQL is popular because it's easy to use (in the same sense as PHP), easy to setup, and has a lot of tutorials written for it. Also speed used to be a pretty big factor, but that was only because MyISAM took shortcuts with data consistency.

If you care about your data and would rather not deal with the many problems MySQL inevitably brings, then Postgres is the better option. I can't say I've seen many compelling reasons to use MySQL over Postgres other than "what if we need to replace you with a freelance PHP guy", "I want to pay lower wages", and "I want to use phpmyadmin". (Btw: check out Sequel Pro and Querious!)

Popularity isn't a very good reason to choose something. Postgres will remain a juggernaut for a long time. Unless you really think there's a good chance you'll need to hire somebody that only knows MySQL, then I'd go with Postgres.


That may be why MySQL is popular for small installations, but that's not why it's popular with companies like Facebook, Google, Twitter, and Pinterest who run replicated clusters that number in the thousands.

It's popular at scale because unlike Postgres, it has replication that is well understood, is reliable, and isn't a massive pain in the ass to operate.


Please point me to the well-understood, reliable replication solution for MySQL (I'll agree about pain in the ass to operate re: Postgres, but you might want to check out aphyr's recent posts).


Given I haven't done anything with it in nearly 7 years now.. but every time I've touched MySQL for anything, there was always a quirky behavior that bit me in the ass that no other dbms does...

Postgres, MS-SQL, Firebird, Oracle, DB2 are all better options in my mind... unless you need a good replication story, without a pricey support contract, then Postgres and Firebird are sadly out of the running imho.


Is google trends really how you should be choosing your stack?


Choose? I'd argue no. But inform your choice? Sure.

Looking at trends can help you:

- See how easy hiring people will be in the future.

- See the likelihood that your chosen framework will lose support.

- See the likelihood of third party support/addons/libraries/etc.

- How easy it will be to find help/tutorials/guides/etc.

But looking at it, to me, seems prudent.


One of the reasons MySQL is still popular is Wordpress.


It seems weird to me that Postgres and MySQL are both going down. I'd expect interest in Postgres to be rising at the very least—what are people switching to?


Probably hosted SaaS offernings, some of which may actually be Postgres and MySQL... just there's less need for queries about operations issues.

There's also a lot of NoSQL solutions that are being ever more widely used. Most of which have much lower costs for HA early on.


Good article on the dangers and time-wasters of MySQL: http://grimoire.ca/mysql/choose-something-else


mySQL has more baked in solutions for replication that don't require a support contract or a convoluted mess of choosing between half a dozen half baked options for higher availability?

Honestly, it's the one area that I feel that open-source PostgreSQL is really lacking, and that is a good replication/automagic-failover story. I think MS-SQL is probably the best (but almost as expensive as EnterpriseDB support contract) for the supported versions. MySQL is probably the best option for a lot of people.

That said, I think RethinkDB is a pretty decent all-around option for most workloads at this point.


As someone who took a stab at automating slony deployment and configuration, I have to agree that the replication situation is not quite there yet. There are maybe 2-3 examples online on how to use some of those tools, and maybe half a dozen people who actually would be able to answer questions about them. Betting your business's scaling future on that kind of tooling is terrifying.

Would be great to have both out of the box failover (you can get that in RDS nowadays) and logical replication. I suppose the latter can stave off the need for sharding for a while.

I believe there's work happening for UDR/BDR right now that will be in the product by 9.5 or 9.6


You can go a long way with read-only replication, as long as you have a good failover strategy for writes before you need sharding... The only time you really need sharding is once either your reads or writes require more than a single system in you cluster could keep up with in terms of requests.

That said, using something like RethinkDB (not sql) from the start isn't a bad thing... And depending on your needs Cassandra, ElasticSearch and others are viable options... beyond that there's no need to stick with a single database for different types of data. It should be a case by case basis.

As to the comments about Postgres replciation/failover, it's a pretty sad state imho for an otherwise very respectible database.


The most scaled websites are written on PHP/MySQL today. Facebook, Wikipedia, Tumblr, Wordpress, Baidu, Yahoo...

I know it doesn't look cool to defend the popular technology, but it's hard to do for now.

(Hope Rust will crush on web performance so I can hipster around about it :)


I have a friend at Wikipedia who was looking into writing some their PHP extensions in Rust, so hipster away. :P


How big is the team over at pinterest managing the MySQL infrastructure? I think this is the big difference here.

I have a particular project. The database has well over 1 billion rows in multiple tables which need joins to generate complex reporting.

As the only tech here, there is only so much I can be working on. Before... time spent on trying to get MySQL to work, was time not well spent. I spent countless hours on IRC talking to devs on #mysql/#mariadb and was told many times, you seem to have the indexes in the right places and the query was well written, you are lucky you are getting it that fast.

Sure, if I had a huge team like Instgram (who I think sharded mysql) and Pinterest. I could assign dbas to get my MySQL infrastructure to scale and possibly fork my application to work different.

I even called Percona and Mariadb and was given pricing from $50 to $70k to work the issue. Note, that they didn't even promise to resolve it. They needed between $10 to $20k up front to check my schema and the data and then the other sum to do the work. Really, almost $100k and no concrete answer?

So I don't have a team and I didn't employ Percona/Mariadb to help solve my issue. Instead, I did something completely different. Some may even say drastic.

I need something simple, that will work with the 5x9's and is very robust. So instead, I went to Postgres. I ran to Postgres. I ran and didn't look back.

With Postgres, 80% of my issues went away. PG 9.5 has some features that ALL of my issues will go away and I may be able to have quarterly queries of 10b rows with multiple tables in under 2 seconds. Cloudflare is currently doing it and I'm trying to replicate it.

Bottom line is this. MySQL for me, was a dead end. Postgres seems to every year go from strength to strength.

Postgres isn't shiny. It's dependable, actually cares what you are putting in the database. Makes your schema simple because you can go from var(x)/int(x) to text/int and it's still fast.

I've converted about 10 other dbas who were used to MySQL and thought PG was hard, to switch over. Now, like me. They won't look back.

If you are still using MySQL. Take a look at Postgres. MySQL has lost it's shine. It's now dull.

Fingers crossed PG 9.6 comes with MPP capability. That would really blow the doors off!


I agree. MySQL has been one problem after another. Postgres isn't perfect either, but it is constantly improving and normally does the right thing.

The idea of Postgres being a "shiny new toy" is... an interesting viewpoint.


> MySQL has been one problem after another.

Please elaborate. MySQL/MaraiaDB can be pretty relaxed when it comes to SQL standards (which you can disable/tweak), but I've yet to have data loss or data corruption. And speaking from experience - the system would be at it's knees and MySQL would happily do it's best to answer queries.

I'm not saying Postgres isn't stable (haven't really used it) - I'm just rather suspicious of the comments here. I find it hard to believe that it's as broken as people here claim when big businesses are using it [1].

[1] http://stackoverflow.com/questions/1113381/what-databases-do...


Same here. I've been a MySQL DBA for over 10 years and I've never had data loss or corruption even with power supplies dying and servers shutting down completely.

I've also used PG in quite a few projects and I can't say I find it better or faster at least for my usage. If anything it's more difficult to manage, there are way less decent tools to monitor, profiling is a PIA and those backslash commands just drive me nuts. I hate stuff like "\dt".



A lot of the original article is about a retrospective of technologies pinterest seemed to have picked before they were able to afford a fairly large team. If you do have a really huge engineering budget and can afford a commercial support contract, you are now in the business of trying to spend money moreso to maintain your existing business than to try to scale it faster and faster.


Depending on your reporting needs, or data shaping, you may be better off doing rolling exports, or additional reporting to a Hadoop or ElasticSearch setup for your reporting... Especially since reports are pretty much snapshots in time, not real time... no need to slow down the server for other users.


A couple years ago I built a very robust and intricate virtual inventory system for integrating with third-party vendors. I created two accessible endpoints: one highly customizeable webservice endpoint (I'll literally take any valid XML and work with you to transform it into what I need), and one luddite-friendly SFTP server supporting extraordinarily simple export-from-excel XML output.

I was raked over the coals for providing the SFTP/Excel solution, "FTP?! What is this, the '80s?" was the quote that sticks with me.

Almost all of our vendors turned out to be small shops. Only two of our dozens of vendors use the webservice, even as easy-like-Sunday-morning as it is. Everyone else uses the Excel solution.

It's hard to let yourself write the uncool solution that actually fits the bill.


One of our data providers at MSN gave us this binary blob, which we had a decoder to run over and produce a text file, which we then transformed in something regular, did a diff, and updated the table we needed to update via SQL.

We begged and begged for XML support. Many releases on, they trumpeted XML support! Huzzah! The XML had a header and one huge CDATA block of the Base64 encoded blob they usually sent.

Thanks!


I work a Fortune 500 company in an industry where FTP and EDI are the norm, and SOAP/XML is considered fancy. Half the time our customers (many of whom are other F500 companies) think that all that stuff is too hard or expensive, and prefer to just e-mail us an Excel spreadsheet every day. Enough so that we recently introduced a new service offering where a customer can e-mail their spreadsheet of data to one of our employees, who will upload it to our system for them once a week.

This is just how IT works in most businesses.


I understand what you're saying but please replace MySQL with Postgres. Only one of those two is an actual consistent database.


Agree -- I hope people don't still consider Postgres to be "shiny and new"


You have no idea how many developers I've wanted to hit in the testicles with a baseball bat for crying my ear off to use redis, mongo, etc... anything but mysql. Really you want to take our 99% relational data (billing/invoicing, crm, etc...) and put it in a key store? You're fucking retarded.

What I found is that I can browse the posts here in the AM and then by noon they will be asking me for whatever landed on the front page here. I just laugh and tell them to go back to work.


Please do not use MySQL,,, unless you've read the it's dual licenses. MariaDB was forked for good reason. MariaDB > MySQL


I agree. Also, MariaDB is being actively developed. A lot of Linux distributions are defaulting to MariaDB. The recent release of Debian stable (jessie) has MariaDB. So far I've only been using it on new machines, but I may even migrate my existing MySQL servers to MariaDB.


Overall a good point, but MySQL auto increment sequences and other bugs have given me corrupt state many times, yet good old yet shiny PostgreSQL has caused zero problems over more than 10 years, yet keeps gaining features. I have been using hstore more lately, and PostGIS is hard to match.


> Some technology is designed for massive scale and some is not. You may have to layer a little scale magic on top.

This is as wrong as saying that everyone needs to scale day one.


Relational Databases still rule the roost for almost every possible problem minus a handful of instances.

Personally I would go the Fork of MariaDB or my prefer DB of Postgres.


I haven't used Java in a while, but does Oracle still run around extorting folks for using the JDBC license? I'm curious.

Frankly I don't have any really serious problem with mysql other than it's ownership and speed of improvement. Plausibly MariaDB sorts that out but I agree with other commenters that at this point Postgres seems like the best choice for most new projects.


Some similar articles and threads:

- Choose Boring Technology: http://mcfunley.com/choose-boring-technology, comments at https://news.ycombinator.com/item?id=9291215

- Chasing the Shiny and New: https://www.nemil.com/musings/shinyandnew.html, discussion at https://news.ycombinator.com/item?id=10279008

- Tech is moving too fast for me, I'm out: https://news.ycombinator.com/item?id=7947551

etc.

I especially liked the first one, which described the idea of "innovation tokens": every startup or project gets three innovation tokens. If you decide to use a new development language, you've just spent one. A new framework, you've just spent another, and so on. If you spend more than three, your chances of launching reach epsilon.

I find that as I'm getting to be an older (and older, sigh...) developer, I'm also more and more reluctant to keep throwing away all of the knowledge and hard-won skills I've developed in favor of picking up new ones. It seems enormously, almost criminally wasteful. How many times should I throw away tens of thousands of lines of code in polished, battle-tested libraries, and knowledge of edge cases and gotchas and little efficiencies and everything else, so that I can start over again with a blank slate and a new language and a long road of new bugs and edge cases to learn?

No other industry is as wasteful in terms of talent expended than ours is. Imagine if every single bridge ever built in this country was torn down and replaced every five years with a new bridge that featured new materials and new technology.


Oh my gods, I can't up vote this enough. This should be mandatory reading before one is allowed to use a text editor.

Though PostgreSQL is as good a choice as MySQL. Technically better (AIUI) but with a less gigantic support base.


Oddly, Pinterest is a flagship user of HBase.


Maybe they're just referring to transactional scenarios?


As someone who's only briefly looked into Postgres, are there any disadvantages to going with it instead of MySQL?

It seems (especially from the comments here) that Postgres is something to seriously consider in future projects, but I'm wondering if there are any reasons to stick with MySQL.


I've answered this question once before, and this is what I wrote. MySQL is battle tested and operationally sane to manage at scale; I'm talking specifically about replication.

I've been a DBA for thousands of MySQL hosts in production and just a handful of PostgreSQL clusters, so I'm not a PostgreSQL expert by any stretch. These PostgreSQL machines were also inherited from a company we acquired, so some of my negative impressions can likely be explained by them being set up poorly, which sadly seems to be the norm.

PostgreSQL's replication has evolved slowly over the years, and as a result it's idiosyncratic, complicated to get right, and hard to generalize about. There aren't a lot of great setup guides and tutorials on the subject either.

Postgres replication works by shipping WAL (transaction/redo) logs to slaves, and the slaves are in a constant state of DB recovery. Streaming replication does so as writes happen, and file based log shipping copies the logs when the 16MB segment is complete. Ours were set up to use streaming replication, without having a separate file-based archive log host to use to fetch older logs. Whenever the write load became too much and the slaves could no longer keep up with the master WAL logs on the master would expire and replication would break entirely, requiring you to rebuild all your slaves from backup. This was my introduction to PG replication. WAL logs were also applied without checksums or verification which corrupted the entire replication chain, and as I remember we didn't discover this until the corrupt pages got hit by a query. Software upgrades with PG seemed to be a huge pain as well, we could never figure out how long they were going to take, and even if the on-disk formats didn't change, somehow all the cardinality stats on our tables disappeared, and all our query plans went to shit until we ran some process to rebuild them, which took days.

Operationally there were some things that made our team angry. We couldn't figure out how to reparent a slave without completely re-cloning it from the new parent, even though it was completely up to date from the authoritative master at the time of the reparenting. Also, do you really have to take down the entire cluster to change max connections on a slave? Many such settings seemed to not be dynamic and must remain in sync across the entire chain. One of the things people seem to love about PG is how correct and proper it is respecting the sanctity of your data. As an ops person I'd much rather deal with slightly inconsistent replicas (common in MySQL) than have to fight with how rigid PG is.


Run, and never look back.


In some cases using shiny new things does give you some type of advantage over bigger companies that are not willing to take the risk. That being said, when it comes to choosing a database developers should stick to mature technology with a non-restrictive license aka Postgres


There are lots of love for MySQL, especially from Facebook and Pinterest.

I wish the same for PostgreSQL


Postgres FTW, with native JSON support it is the best RDBMS out there.


%s/MySQL/PostgreSQL/ and I violently agree with this article.


OMG, that article had some serious common sense.


Dam, always getting down voting for stating an opinion. What's up with that. (By the way, I actually read the article.)


Probably because your comment didn't add anything to the discussion. There's somewhat of a culture here of downvoting "Me too" or "+1" type content-free comments, and even jokey one-liners that don't progress the discussion.

My respectful suggestion is that when you want to comment, try to say something that uses your own experience or opinion to add something to the discussion, rather than to mirror or agree with some point in the article or in a comment. I'm far more likely to upvote something that makes me realize something I hadn't already thought of.


The correct response is not to give a damn about something so arbitrary.


You are absolutely correct.

I blame it on a temporary lapse of reason.

(I usually don't run with the herd, so this shouldn't be a surprise to me...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: