Hacker News new | past | comments | ask | show | jobs | submit login
MemSQL Does Oracle’s Own Demo Ten Times as Fast, Sixty Times Cheaper (memsql.com)
156 points by frostmatthew on Oct 1, 2014 | hide | past | favorite | 68 comments



It is generally not that difficult to greatly out-perform Oracle on specific benchmarks. Postgres can do it too. Companies do not buy Oracle for the performance per se but as time has gone on Oracle has definitely been losing ground in that arena.

MemSQL makes a good product but nothing about their architecture is particularly novel, and other systems have a similar design. I've designed several database engines; I would expect numbers like they demonstrated on a decent columnar implementation with the tradeoffs MemSQL has made. And Oracle has never been optimized for these types of queries because those tradeoffs adversely impact performance in other areas Oracle cares about. Features like compiling queries are mundane; many commercial databases did that a decade ago and it is a standard design element today. Same with the two-tier in-memory and columnar disk storage.

I'm sure MemSQL is fast, but "world's fastest" should be assumed to be hyperbole. Most of their benchmarks are table stakes for a modern, real-time analytical database kernel. It is apparently a well-designed product but nothing many other companies are not doing.

Here is what I tell people a properly designed database kernel on modern mid-range hardware should be able to do per server: continuously insert millions of records per second all the way through storage, leaving virtually all of the cores idle for some large number of concurrent queries. If you can't saturate 10 GbE on ingest concurrent with saturating 10 GbE on query output, your design is most likely obsolete or broken.

Things have moved a very long way since I first started designing massive scale database systems (on Oracle, way back in the day).


Oracle designed their Database In-Memory Option specifically for these sorts of scenarios. But it's a fairly crudely designed column store cache and MemSQL are right to bash them.

But the strength of Oracle remains OLTP and I don't see customers fleeing to MemSQL for this.


What in your opinion counts as modern mid-range hardware?


The physical servers used in clusters I buy today run about $4k per server without storage. Outside of the dual 10 GbE interfaces, they are like most other budget servers. Storage costs per node vary depending on the size, type, and mix of disks. Driving the disk I/O at 10 GbE can be trivial with a good I/O scheduler design, so ironically the details of the storage device are almost irrelevant for many workloads. Cheap SSDs are the default target but cheap spinning disk works surprisingly well for many workloads.

Various AWS nodes are also used in clusters quite a bit (though we use ephemeral disks rather than EBS). Nothing exotic.


any good reading for the engineering/algorithms one can use to hit those throughput targets?


- C-store paper - Provably correct lockfree skiplist - Mass tree - BW Tree


There is a lot of use case optimization going on here. The sorting inside the columnar store is in and of itself a huge performance boost. And saying that it scans 138Billion rows per second is completely wrong because it uses a lot of short cuts to get the result and it doesnt even use rows to store the data.

There is nothing stopping Oracle using the same optimization for their use case, although they would be better suited if someone from the crowd queried something that MemSQL didnt sort/optimize for.


It's hard to define what a "fair" columnar store scan is. We use a technique that is called "segment elimination", which allows to skip over compressed segments of data as you scan. However every solid columnar database uses such techniques. It just happens that Oracle 12c doesn't have the ability to sort columnar indexes, and neither does it have the ability to store columnar indexes on flash or disk, which bloats the overall cost.


The much bigger problem with Oracle In-Memory is that a lot of operations like joins and sorts often happen in the SGA, in row form. So you end up needing a truckload of SGA for temporary calculations. And when the SGA runs out, which it invariably does, you end up using TEMPDB. And everything comes to a crawl.


How does your tech differ from SAP HANA's columnar DB?


Isn't this tanamount to saying: "You used an O(log(n)) solution, but I mandated a O(n) solution, and therefore your solution is wrong even though it's faster"

At some point, how you solved the problem becomes moot if you get the same answer and your way is faster in most cases. That's the entire point of most algorithms texts.


Assuming the correctness of the algorithm is maintained!


Then again, MemSQL didn't design the benchmark. They beat Oracle at their own benchmark on massively cheaper nodes.


That doesn't mean a whole lot. Back in the early '00s, ATI and Nvidia were caught optimizing for certain benchmarks that were commonly used to compare video card performance (IIRC one of them was caught with code that specifically cut corners if it detected being run by the benchmark program).

I'm not saying MemSQL is doing anything nefarious, but the fact that the benchmark tests are controlled by someone else doesn't carry a lot of weight.


I would say that doing the other guy's benchmark is the most objective thing you can do. You should take it one step further and include other competitive products. In this case, it's pretty easy to beat the crap out of Oracle, it's an elephant. I would challenge the MemDB guys to run the same benchmark against VoltDB and report on that.

Also, let's get down to brass tacks - I don't see the actual queries that are being graphed in the MemDB post, and what is that graph actually measuring?

I think the LMDB team has done a really good job with this type of comparative benchmark. They took the LevelDB benches, then ran them against a lot of competitive products, both new and legacy, and gave a really good view of the entire field, in terms of performance on a number of dimensions:

http://symas.com/mdb/inmem/

http://symas.com/mdb/inmem/scaling.html


> There is nothing stopping Oracle using the same optimization

There is. Skill, bureaucracy, grit, and the right motive.

The meat here is MemSQL's sophisticated indexing with splay trees. http://en.wikipedia.org/wiki/Splay_tree


MemSQL is doing great work...However from a strategic point of view, I wonder how they plan to compete against 2 of the world's largest DB vendors (SAP and Oracle) at their core product. Oracle's flagship DB has more R&D, sales, marketing than a startup could ever muster even with the help of big name VC's. But I'm sure that's part of startup glory. "no one thought we could do it!" is how it always starts. I suppose they could be an acquisition target for HP or EMC or a company thats lagging in the data management world


They will be able to pull in customers who cannot afford to spend 3 million on hardware for a database but still need very high performance. There should be enough to build a business in that niche.

Those who can afford the high price tag will likely continue to buy from Oracle and the SAP in the short term. Over time, MemSQL can probably win some of these customers as they establish more credibility and a proven track record.


Doing some basic math... Wikipedia is around 80m rows a day, so 4 months of Wikipedia is around 9.5bn rows. But they show 17bn on the graph.

Typical columnar compression gives about 11GB per 1bn rows, so 17bn rows should be 187GB. The AWS machines they are using should be c3.4xlarge which are 30GB, and 6 of them is 180GB. But you can't run an in-memory column store at 100% RAM, you need to run it at 50-70% so you have capacity for calculations.

Is it just me or do the results not make any sense? Seems likely they actually had 9.5bn or 4 months data, which conveniently is what the graph shows?


is memsql a free product, I can seem to find anything about pricing or otherwise on their site ?


The 60x cheaper benchmark is based on hardware costs, comparing 6 EC2 nodes (they say ~$20k worth of servers) vs a > $million Oracle server.


I looked for that too and couldn't find it either.

This is a very annoying trend. I had the same problem checking out VoltDB. Is this free? What license? What price?

In the end looking it up on Wikipedia is the fastest way; for MemSQL it states "Proprietary License".


I was hoping the blogpost would give at least a pointer or some insights about the actual experiment. Am I missing some context here?


I'm amused that Nikita Shamgunov's bio suggests that he was a 'distinguished senior database engineer' at Microsoft. The title "Distinguished Engineer" at Microsoft refers to a pretty select group of people, not just any senior dev


That's right. "Distinguished Engineer" is a title at Microsoft. At the time of my tenure at Microsoft my title was "Senior Engineer". The bio was written without referring to the title, but rather the fact that I was "distinguished" with Gold Star awards and HiPo program. I asked to remove "distinguished" from the bio to remove any confusion. It's already live.


This is a classy response. Corporate bios, like any pr, can walk a fine line in what they communicate.


The lower case "distinguished senior database engineer" seems distinctly different than the awarded title "Distinguished Engineer." I am unsure if he meets the dictionary definition of "distinguished", but if he does, it isn't wrong to use it.


distinguished - "successful, authoritative, and commanding great respect" Is English your second language per chance? Not meaning to be rude but you're taking things very literally and seem to think that MS has somehow trademarked the word "distinguished." If a school gives out "Academic Excellence Awards" to it's students, it doesn't mean that only those students who get the award(I don't know what Mr. Shamgunov's title at MS was) can describe themselves as excellent students. As noted below, the fact that it's in lower case and precedes the words "senior database" makes it pretty clear that it's being used as a descriptor not as an official title. Especially since unless you're from MS, no one would even know about it. I see such literal interpretation of English from immigrants that come to the US when they are adults. Unsolicited advise, try reading fictional books to get a hang of colloquial English :)


My grandfather wrote his own HTML website from scratch. Technically he could be called a "senior programmer", but without further context most people would assume that he is an experienced programmer, not just an elderly one. It's the same problem here; in the context of Microsoft job titles, "distinguished engineer" means the person is very highly paid and is at top of their field. It's not incorrect to use that title, per se, but it can be misleading, which is why the commenter above brought it up. It also adds to the confusion when the entire title is in lowercase, if it was written as "distinguished Senior Database Engineer", it would not be as potentially misleading.

Anyway, the wording was changed in his bio so it's a moot point now.


But the key here is "most people", most people don't know that distinguished engineer is a title at MS and would just assume that he was a well respected engineer at MS. It isn't fair to compare two dictionary definitions of a word, to what is a fairly little known outside the company title structure at MS. There were three descriptors of the word "engineer" senior, database and distinguished. Turns out Mr. Shamgunov's title was Senior Engineer, I didn't read it as his literal title but more like a description of what he did at MS. Database wasn't in his official title either but it still applies. Seems kinda weird that people are so touchy about titles, seems so unSV.


> in the context of Microsoft job titles, "distinguished engineer" means the person is very highly paid and is at top of their field.

How many people (even in tech) know that "Distinguished Engineer" is a specific, special title at Microsoft?


DE has meaning outside of Microsoft. I recognize it, and am not in Microsoft's orbit.

Generally it's one step above Senior Principal Engineer and considered an executive position.

That said, tech titles have not normalized as much as executive titles. The general progression I've witnessed is (with managerial level set):

- Engineer - Senior Engineer (Team Leader) - Principal Engineer (Manager) - Senior Principal Engineer (Senior Manager) - Distinguished Engineer (Director) - C[TI]O (C-Level)

Oddly there's no level set to [AS]VP positions. Engineering tolerates less hierarchy.

Also DE (and all tech ladder positions) are a recognition of sphere of influence over technical acumen as follows:

- Engineer (own work) - Senior Engineer (team) - Principal Engineer (department) - Senior Principal Engineer (corporation) - Distinguished Engineer (industry sector) - C[TI]O (corporation and broader industry)

YMMV


Probably 50-100,000 among their employees, ex-employees, aspiring employees, partners, contractors, and people who've seen it mentioned somewhere like this (including me and you, now).


Google, Oracle and Amazon among others have Distinguished Engineer title so quite a few people know about it.


I wonder if it's a good idea to list comcast as one of you clients...


Sure, why not? It's a big company with a lot of brand recognition. The people most likely to be interested in this kind of product will first think 'oh, they have a big client!' before they think 'oh, they are used by a company I dislike because of anecdotal data or their stance on so called net neutrality.'

Heck, I'd be even more interested if they told me they serve the NSA or Palantir, as much as I dislike everything they stand for.



Ohhh... interesting. Thanks for the link!


"Big companies choose to give us their money instead of to other companies."

What's the issue there?


A font with strokes 1px thick when you scale it to 200 percent? What a great design! I'm sure it looks so cool on the designer's retina display.

You don't want me to read your page? Fine, then I won't.

Edit: Hehe, now the contrast of my comment is about as bad as the contrast of the blog. I guess criticizing Silicon Valley's favorite design fad of the 2010s is too much for the HN crowd to stomach.


I feel your pain. Most of my monitors are 100-dpi. Most new computers screens sold are approx. 100-dpi. I guess these UX developers use the latest macs with retina displays, and create eye-sores for the rest of us.


I've been using the same ~85dpi Samsung SyncMaster monitor since 2007. I see no problem with the font.


Hmm. I wonder if there's something wrong with my config then. Does it not look like this [1] for you?

[1] https://imgur.com/PMBofq0


Nope, much darker: https://imgur.com/FsuVIot

That's from Chrome on OS X 10.9. Firefox and Safari seem identical.

I get something a little closer to yours in form from IE 11 in a Windows 7 VM, but still significantly darker: http://imgur.com/FzjsKJu

Firefox on Windows seems identical or darker.

The closest I can get is Chrome on Windows 7, which does a bad job with this font, though it's not exactly identical to yours: http://imgur.com/YwsxMhl

As I recall, Chrome does its own font rendering, and tries, on Windows, to match Windows' rendering, and does an even worse job of it than Windows does.


On my current system, Konqueror renders quite like the Chrome screenshot that I posted earlier. Firefox is much lighter.

Thanks to your samples, I think I can tweak knobs on my system to render better.

Edit: Yup, my infinality has a setting that it calls "MacIsh". Switching to that made the webpage a bit more bearable: https://imgur.com/na1gTwC


That is irony, your comment's contrast being the result of your complaint. :)

In general, I don't think webfonts are yet appropriate for body text. headlines, maybe. But it's a crap shoot if your reader's UA will render them right or not it seems.


> In general, I don't think webfonts are yet appropriate for body text.

Personally, I would agree with that, yes. For body text, I highly prefer a screen-optimized Arial or Georgia over any oh-so-trendy webfont. But I acknowledge that this is the fault of Windows' subpar font rendering. I understand it if designers who have access to a OS with better rendering, and probably even to a hi-res display, don't want to be stuck with "font-family: Helvetica, Arial, sans-serif;" for life.

But this is not the point that I was trying to make. Here, they choose a font-face/font-size combination that makes the strokes 0.5px thick. It just physically can't work. What I see on the screen is probably just the artifacts of subpixel hinting. It will look better or worse depending on the rendering engine, but it's still a terrible design to rely on the half-pixels.


I agree, the site is just not readable on my windows machine. If one is going to use thin fonts, he should use it on titles ,not in the body of the text. I'm really suprised at people telling you to "buy glasses or change computer screen",this is typography 101,and this website (HN) often boast about taking UI/UX seriously.

Designers,dont do fancy stuffs unless you test them on a wide range of browser or devices.This is a design fail.


That's just the 300-weight version of the Lato font, one of the most widely used fonts on the internet.

If you have trouble reading it, you may need a new monitor or get your eyesight checked.


> you may need a new monitor

No, I'm actually perfectly fine with my Eizo. It's not its fault that it isn't able to show half-pixels. That's by design.

> the Lato font, one of the most widely used fonts on the internet

I seriously doubt that this is the case. Last time I stumbled across it was heartbleed.com, and I had to use the F12 console to de-Lato the page, that's why I remember. That was half a year ago, thankfully.

If you spend your day reading about cool parallax scrolling plug-ins for your site, then you might see a lot more Lato than I do, yeah.


> I seriously doubt that this is the case

http://www.google.com/fonts#Analytics:total . 5th most popular font at the moment ( 2,059,357,755 views for Lato in the last 7 days and 64+ billion in the past year).

And that doesn't even include the Typekit deployments (or locally hosted ones).


> 5th most popular font at the moment

So, Google Web Fonts now is the benchmark for professional screen typography?

None of the sites that I visit on a regular basis use Google Web Fonts. The only one using any webfont for body text is The Guardian, which uses an excellent custom font designed by Christian Schwartz and Paul Barnes.


> So, Google Web Fonts now is the benchmark for professional screen typography?

It's certainly a better benchmark than "none of the sites that I visit on a regular basis"...

And if hard numbers like 64 billion+ views per year don't convince you then I won't bother.


Yea because "sites you visit on a regular basis" is a wayyy better, and more objective metric. /s


It's certainly a more objective measure than what you've offered.


The problem is certainly the weight in this case.It's just too thin.


We nerds dwell more on the function, than the form. Would, say, Knuth's notes be less useful if they were written in Comic Sans? Sure, they'd be hard(er) to read, but the message would be the same.


Knuth is a remarkably odd choice of example for this, since the whole reason for the existence of TeX is his dislike of the form of the time.


Strictly speaking, yes, Knuth's writing would indeed be less useful in Comic Sans as fewer people would read, and hence use, them.


The name of this makes me think of that MongoDB is web scale video[1]'s comment, 'Is /dev/null web scale?'.

[1] https://www.youtube.com/watch?v=b2F-DItXtZs


Utterly stupid post from a company who should know better. The biggest reasons that companies buy expensive appliances is (a) data sovereignty and (b) ongoing support. The cloud e.g. AWS does have decent answers for this yet.

And the companies that are buying these types of appliances like we do couldn't care less about a few hundred thousand dollars.


I think you are missing the point. They chose to deploy on AWS because the 'modestly priced' instances they are talking about are gigantically less powerful than Oracle's 'appliances'.

The article is about software, algorithms and modern database technology. Oracle has basically been flogging the same code base for twenty odd years, and claiming they are competitive by running their ancient, single node system on more powerful nodes. It is pure BS. Using a modern, distributed, in-memory store like MemSQL on crap nodes (AWS), still beats the pants off Oracle's ancient single node POS running on a super-computer.


> The biggest reasons that companies buy expensive appliances is (a) data sovereignty and (b) ongoing support

Don't underestimate reason (c): because some C-levels went golfing together.


That's actually more likely to be reason (a).


From TFA:

    ... $5/hr on Amazon or less than $20k if you wanted to buy 6 servers outright.
I actually read the article...


> The biggest reasons that companies buy expensive appliances is (a) data sovereignty and (b) ongoing support. The cloud e.g. AWS does have decent answers for this yet.

But they state that clients could buy 6 machines outright for just $20K if you didn't want to trust your data to AWS cloud. So while you can be negative on the cloud, but their solution seems pretty reasonable (cost and speed wise) for the use case presented -- I don't think it is "utterly stupid."


In terms of (a), why not build/host the cluster on your own gear?

For (b), how do you earn that trust other than continuing to try taking on customers and... well, supporting them?


Larry,

I thought you stepped down?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: