1.1B Taxi Rides on Kdb+/q and 4 Xeon Phi CPUs

buckie · on Jan 25, 2017

In my experience, kdb+'s k and q (which is broadly speaking a legibility wrapper around k, which again broadly speaking is APL without unicode) are phenomenally fast for dense time series datasets. Though they can struggle (relatively, still pretty fast) with sparse data that's not really what they are built for. They were built for high-performance trading systems, and trading data is dense.

If you like writing dense, clever regexs (which I do) then you'll love k & q. The amount that you can get done with just a few characters is unparalleled.

Which leads to, IMHO, their main drawback: k/q (like clever regexes) are often write-only code. Picking up another's codebase or even your own after some time has passed can be very hard/impossible because of how mindbendingly dense with logic the code it. Even if they were the best choice for a given domain, I'd try to steer clear of using them for anything other then exploratory work that doesn't need to be maintained.

RodgerTheGreat · on Jan 25, 2017

I program in K professionally and help maintain a fairly large codebase. It's an unusual-looking language and it takes practice to learn to "skim" K code in the way that most programmers are used to for their favorite curly-bracket language, but it can be learned.

The biggest messes I have to clean up come less from "clever" code than they do from people who try to program in K as if it were some other language. For example, somebody fond of for loops might write the following to apply a function 'f' to pairings of values and their index in a list:

    result:()
    i:0
    do[#v
      r:r,,f[v[i];i]
      i:i+1
    ]

Ugly, complicated, but "close at hand". There is of course a much nicer and more idiomatic way to do the same thing:

    result: f'[v;!#v]

Most of the time conciseness isn't the goal, but you get it as a side effect of writing good code and working "with the grain" of the language.

shitgoose · on Jan 25, 2017

K, not Q? nice! if you don't mind me asking, where? i am curious who is using K these days.

RodgerTheGreat · on Jan 25, 2017

I work for an NYC-based company called 1010data which specializes in business analytics products. Most of our backend infrastructure is written in K3. And if anyone's curious, yes, we're hiring.

posnet · on Jan 25, 2017

If you don't mind me asking, how does someone start with Q/Kdb+?

Can you learn it with just the free version or realistically does it require working with it in a professional setting and learning from others who already use it.

Are there any sources you recommend for a someone just starting out?

Thanks.

branchless · on Jan 26, 2017

You can get a kdb/q cookbook here:

http://code.kx.com/wiki/Cookbook

This includes a tutorial with a smartmeter concept which has several queries of increasing complexity:

http://code.kx.com/wsvn/code/cookbook_code/tutorial/

Step through this, look at the function definitions, cross-ref with the kdb reference page. Also read Q for mortals on their site:

http://code.kx.com/wiki/Tutorials

Yes it's not as accessible as other langs with loads of books, but on the plus side it's coherent, unlike c++.

dinnerPlaht · on Jan 26, 2017

I interviewed at the NYC office a few months ago and had a good experience. The people I met were smart, friendly, and respected my time.

elemeno · on Jan 25, 2017

Bank of America uses it to quite an extent - especially in eFX.

jxy · on Jan 25, 2017

   > their main drawback: k/q (like clever regexes) are often write-only code

This depends on the code reader's mentality. One line of k (or q or j or apl) would do what 10 lines of verbose languages do. For verbose languages, your expectation is spending 1 minute for 10 lines of code to fully understand it; but for terse languages, you need to change your expectation to spending 1 minute for only 1 line of code. You are not going to understand anything if you still want to spend 6 seconds per line.

On the other hand, proficiency is important. You read English articles in a slow pace in your first grade; you are not going to be a speed reader without practice, even if English is the only language you speak. No one would expect you to speed read Japanese right after you can compose simple Japanese sentences.

gwu78 · on Jan 26, 2017

"If you like writing..."

For me it is the opposite. I like not having to type any more than necessary. I like writing as little code as possible.

I also like being able to read through a program listing without having to wade through page upon page of long, camel case function names, and trying to follow five levels of indentation.

Not because I think that is the "wrong" approach for everyone but because I personally struggle with overcoming language and documentation verbosity in order to make any progress. It is the wrong approach for me. I wonder if perhaps this is how it feels for the typical programmer, in the OP's comment, who might struggle with a lack of verbosity.

Sometimes I streamedit (search and replace) blocks of code to remove indentation and shorten long function names to two letter labels, just so I can read without distraction.

What are the chances of finding someone who has the same preference for terseness and who is as skilled as Arthur Whitney?

q.k is about 240 lines in the version I am using. Most of the functions in .q, .Q and the other namespaces fit on a single line. This is a thing of beauty, IMO. For me, this makes studying the language manageable.

Someone in this thread posted a fizzbuzz solution last week that I thought illustrated how one can "work outward", incrementally adding primitives to the left or right to keep modifying output until the desired result is reached.

I use kdb+ daily but I'm a slow learner and still not very good with composing programs from scratch in k.

What little I have written was also done by "working outward", so the fizzbuzz example gave me some hope maybe I'm not too far off track. Thank you for that example. I hope we see some more.

I am thankful k exists, even if the intepreter is not open source and there's no BSD port.

As someone else commented somewhere, perhaps the most awkward aspect of k is getting it to interface with anything else. I think this is what keeps me from using it more frequently for more daily tasks.

But this may be more of a problem with everything else, and not necessarily with k.

I wish a more competent C programmer than I would write a lightweight replacement using linenoise or libedit to substitute for "rlwrap".

de_Selby · on Jan 25, 2017

That's more on you than it is the language though. You can write obtuse write-only code in any language.

I will concede that there is a culture of trying to be a bit too clever on the k4 mailing list but it's perfectly possible to write maintainable code in kdb+

Cyph0n · on Jan 25, 2017

But from what I recall, the syntax is terse by design - this is not inherently bad, though. In other mainstream languages, you have to go out of your way to write obtuse code (e.g, code golf).

I'm guessing that best way to address this issue is through liberal use of explanatory comments.

de_Selby · on Jan 25, 2017

Terse syntax isn't an issue, you just need to get used to reading it. Using 1 character variable names is another matter though.

There is no reason not to use camel cased variable names and indent functions, if/else blocks etc, and when written this way the code can be perfectly legible even to non q programmers.

Something else that leads people into the write-only trap is that the usual way of working with the language is to use the REPL loop while working, where you can tend to be doing multiple things on 1 line. It's just laziness not to reformat and clean up the code afterwards though.

geocar · on Jan 25, 2017

> There is no reason not to use camel cased variable names and indent functions,

There is: it makes the program bigger. Program source code length is significant, and if you have more lines, you have more opportunities for bugs.

nl · on Jan 26, 2017

Is this supposed to be a joke? Or maybe related to some weird coding environment?

Long variable names may take more characters, but there is no way they increase bugs.

Indenting(!) doesn't even take many more characters if you use tabs...

geocar · on Jan 26, 2017

> Is this supposed to be a joke? … there is no way they increase bugs.

No, it's not supposed to be a joke.

Steve McConnell 1993 observed density is proportional to source program length, so this should be obvious to every programmer: If a small program (as measured in source code bytes) is more likely to be a correct program, then this follows.

A major issue with discussing programming is the sheer number of people who believe they know how to program, when any non-programmer could see quite obviously that they don't: A professional bridge-builder doesn't often fail to build a bridge, but a professional CMS programmer seems to unable to get much past hello world without a bug or two; Less code will therefore produce less bugs.

nl · on Jan 26, 2017

You are completely misrepresenting what McConnell says.

There is a summary of his advice here[1], but just to highlight

Describe everything the routine does And we mean literally everything. If that makes the name ridiculously long, the name isn't the problem. Your routine is.

And

Make names as long as necessary According to McConnell, the optimum name length for a variable is 9 to 15 characters; routines tend to be more complex and therefore deserve longer names. Make your names as long as they need to be in order to make them understandable.

I've read McConnell, and your claims are so completely the opposite of what he recommends that I'm still unconvinced you aren't trolling.

[1] https://blog.codinghorror.com/i-shall-call-it-somethingmanag...

geocar · on Jan 26, 2017

I don't agree with everything McConnell says, but if you have it handy, note at page 173:

"A study from Basili and Perricone found that routine size was inversely correlated with errors: as the size of routines increased (up to 200 lines of code), the number of errors per line decreased (Basili and Perricone 1984).

And the conclusion (on page 174): …None of the studies that reported decreased cost, decreased error rates, or both with larger routines distinguished among sizes larger than 200 lines, and you're bound to run into an upper limit of understandability as you pass 200 lines of code

With that in mind, consider that KDB's SQL92 interface is 35 lines (the parser is 14 lines). It may be an extreme example of what McConnell is observing, and yet was himself unable to learn from.

> I'm still unconvinced you aren't trolling.

Look at it this way: Here is software that is faster than what you can write (or maybe you want to write your own taxi problem implementation), and if you don't try hard, you will miss out in finding out how to do something that you can't do.

That downvote button is so easy, that internet person is just a troll, I've been programming for ages, so of course I know what I'm talking about, but if you look through my comment history, you might find it easier to convince yourself at least that I believe that program length matters and permit yourself a discussion about it. You might learn something.

* https://news.ycombinator.com/item?id=8476294

* https://news.ycombinator.com/item?id=8477064

* https://news.ycombinator.com/item?id=10872209

and so on.

nl · on Jan 27, 2017

Ok, I accept you aren't trolling :)

I find it difficult to agree that using non-representative names for variables or functions improves understadability.

Notably, using something like x to represent a meaningful value means the brain has to hold the mapping between the two, which will decrease the number of useful pieces of information kept in short term memory[1].

The brain doesn't keep track of the number of characters in a variable name.

[1] http://www.psych.utoronto.ca/users/peterson/psy430s2001/Mill...

geocar · on Jan 27, 2017

So icsa[1] said something interesting on this point:

The issue, for me, is not readability but context. Even a word or two (e.g. cuts, begins, dims) helps greatly to establish the context of the code. Like having a map before hiking.

A comment can clearly provide that context without making the variable names long.

[1]: https://news.ycombinator.com/item?id=8747314

> The brain doesn't keep track of the number of characters in a variable name.

Surely you must appreciate that if we use too many characters the window will scroll?

shitgoose · on Jan 27, 2017

long names are distracting. try using short names for variables where code requires scrolling and very short name for variables in short blocks of code (where you see the context without scrolling). something like 'idxLS' instead of 'indexOfLastSlash'. Or x+=totalsMst[i] instead of accumulatorOfTotalValues+=totalValuesFromMasterList[i] if this is the only line in a loop.

as your opponent has mentioned, context is the key. you operate with objects in your brain, and the faster the transition from code element to the brain object, the better you understand the code. long names make this lookup unnecessarily difficult.

ams6110 · on Jan 26, 2017

Mathematical formulas don't use long variable names. They use x, y, and various greek letters.

nl · on Jan 26, 2017

This is generally (amongst mathematicians) considered to be a problem, not a good thing (Source: work with university Math departments).

Edit, see http://mathoverflow.net/questions/8295/origins-of-mathematic... for the confusion this causes..

maemre · on Jan 26, 2017

That might be because:

- you need to write same variable name on each step of derivation of a formula

- paper and blackboards don't have autocomplete

- juxtaposition is used for multiplication, thus SwapBlue can be Swap times Blue or a single variable called SwapBlue, spacing for multiplication is not a good solution when writing to board/paper

Locke1689 · on Jan 26, 2017

And despite lambda calculus being written entirely using single letter variable names, Scheme programmers all use proper word-size variable names in their programs.

Mathematics notation is so notoriously awful that mathematicians often can't even read equations from other disciplines without significant extra context.

valarauca1 · on Jan 25, 2017

Bugs don't hide in variable names or white space.

Compiler errors do. But you catch those first run.

If tab vs 2 spaces vs 3 spaces can induce a logic error that is a purposely obtuse language.

geocar · on Jan 26, 2017

> Bugs don't hide in variable names or white space.

Bugs do hide in variable names and whitespace.

    thisIsALongVariableName = True
    thisIsaLongVariableName = True

is easy to spot when they're side-by-side, but not so much when they're two pages away from each other. Python's tabs and spaces can confuse the indention in cases so that code that "looks" aligned actually isn't.

If you haven't run into a problem caused by a typo, you haven't been programming long enough.

wott · on Jan 26, 2017

This is indeed a problem with languages without mandatory variable declaration. Is it the case of K, Q or whatever its name is?

geocar · on Jan 26, 2017

> This is indeed a problem with languages without mandatory variable declaration.

Typos affect everyone. It is not a function of the language that causes you to mistype. It is that the program is long and you cannot see the whole thing that causes you to miss it.

> Is it the case of K, Q or whatever its name is?

{foo}[] generates an error because foo is unbound. What's the difference between {foox} and {fooy}? They both might be variables, and yet it is only by reading the program that you can tell which one is correct.

dx034 · on Jan 26, 2017

> Typos affect everyone. It is not a function of the language that causes you to mistype. It is that the program is long and you cannot see the whole thing that causes you to miss it.

In any language with mandatory variable declaration the code wouldn't even compile. And any decent IDE would even flag this while writing the code.

geocar · on Jan 26, 2017

> In any language with mandatory variable declaration the code wouldn't even compile.

Nonsense. memmove and memcpy are both defined, and if you choose the wrong one your program compiles and runs, but it simply produces the wrong result.

> And any decent IDE would even flag this while writing the code.

There is no IDE that will flag this. Advanced whole-program static analysis has a hard time with this problem and I'm not aware of any general solutions.

buckie · on Jan 25, 2017

While I agree that it's possible to write maintainable in kdb+, I'd argue that it takes noticeably more effort; or, as in inverse, if you drop your coding standards things get worse faster. I'd say the same for regex as well (though in a more limited sense).

The "characters to clever/unmaintainable" ratios you achieve with k (which most kdb+ platforms end up reaching for at some point) are almost unparalleled. A famous example of how awesome/powerful/ridiculous k can be is the "4 lines of K" text editor: http://www.kparc.com/$/edit.k

I guess my point is that letting your guard down, for even a single line, can be orders of magnitude more trouble than it would be in most languages and for that reason I wouldn't base a new stack on it. Also, the developers for it are rare-ish and expensive + it has a serious learning curve for those unfamiliar with FP or Lisp.

jimbokun · on Jan 25, 2017

Not knowing k, seems like the worst part of the edit.k examples is every variable has a one character name.

Or do those single character tokens have special meanings in k? Which would be even worse.

geocar · on Jan 25, 2017

> Not knowing k, seems like the worst part of the edit.k examples is every variable has a one character name. Or do those single character tokens have special meanings in k? Which would be even worse.

It can seem that way, but there is value in dense programs: They take up less room on the screen. This means you can see where you repeat yourself, and can often find bugs by just reading your code.

I can't count the number of times I scrolled over some code and missed a bug.

hueving · on Jan 25, 2017

Change your scrolling settings?

geocar · on Jan 25, 2017

Not sure what you mean by that.

anonu · on Jan 25, 2017

Yes, the terseness of the language itself lends itself to terseness everywhere - even in variable namings. It's not necessary and I usually avoid that when I write k or q. Purposefully obfuscating code like that lends to the "bad rep" that the language gets.

branchless · on Jan 26, 2017

The main problem with Q is the users. Most users are math grads first, programmers second. They don't use version control. They don't comment their code.

Yes Q is dense but it can be annotated with comments.

I also have to maintain some java code written by this lot and it's nearly as unintelligible.

mtanski · on Jan 25, 2017

I've build columnar OLAP databases and database engines in C++ for work. Now I'm doing it in my free time. Based on my experience the Phi and it's architecture is very exciting for OLAP databases workloads.

Reasons:

- Even in a OLAP database you end up with quite a few places that have very branchy code. Research on GPU friendly algorithms on things like (complex) JOINS and GROUP BY is pretty new. Additionally complex queries will functions and operations that you might not have a good GPU implementation for (like regex matching)

- Compression. You can use input data that compressed in anyway that there is a x86_64 library for. So you can now use LZ4, ZHUFF, GZIP, XZ. You can have 70+ independent threads decompressing input data (it's OLAP so it's pre-partitioned anyways). (Technically branching, again)

- Indexing techniques that cannot efficient implemented on the GPU can be used again. (Again branching)

- If you handle your own processing scheduling well, you will end up with near optimal IO / memory pattern (make sure to schedule the work on the core with local memory) and you not bound PCIe speed of the GPU. With enough PCIe lanes and lots of SSD drives you process as near memory speeds (esp. when we'll have Xpoint memory)

So the bottom line is if can intelligently farm out work in correct size chunks (it's OLAP so it's prob partitioned anyways) the the Phi is fantastic processor.

I'm primarily talking about the bootable package with Omni-Path interconnect (for multiple).

anonu · on Jan 25, 2017

I've been using kdb+/q for a long time (7 years now) - and can attest to its speed. Objects are placed in memory with the intention that you will run vectorized operations over them. So both the memory-model and the language are designed to work together.

Lots of people complain about the conciseness of the language and that it is "write-once" code. I tend to disagree. While it might take a while to understand code you didn't write (or even code you wrote a while ago), focusing on writing in q rather than the terser k can improve readability tremendously.

My only wish is that someone would write a free/open-source 64-bit interpreter for q - with similar performance and speed to the closed version. Kona (for k) gets close https://github.com/kevinlawler/kona

srpeck · on Jan 25, 2017

Some other k-inspired languages to have a look at:

- https://github.com/johnearnest/ok

- https://github.com/zholos/kuc

- https://github.com/ngn/k

- http://t3x.org/klong/

- https://github.com/tlack/xxl

gricardo99 · on Jan 25, 2017

Somewhat related is Kerf, also by Kevin Lawler.

http://www.kerfsoftware.com/

It seems more like a kdb+ competitor than open-source alternative, and isn't using q.

svan99 · on Jan 25, 2017

Nice writeup. If you are interested in learning KDB/Q, please take a look at this book: http://code.kx.com/mkdocs/qformortals3/

picodoc · on Jan 25, 2017

or for a really quick high level overview you can use this: https://learnxinyminutes.com/docs/kdb+/

it's a really beautiful little language once you get into it :-)

nnx · on Jan 25, 2017

I absolutely love this blog series. Can't wait to read what's next :)

First time I noticed (mention of) recap at http://tech.marksblogg.com/benchmarks.html

qume · on Jan 25, 2017

If he made the layout a bit uglier, and made the language more esoteric and generally difficult to understand, this would make a fantastic academic paper.

But seriously, what a wonderful world it would be if all papers were this well written.

chiph · on Jan 25, 2017

Under a second to do an avg across 1.1 billion rows spread over four machines. That's pretty amazing.

jxy · on Jan 25, 2017

For a columnar database, that's a continuous chunk of memory. Assuming 32bit q defaults to 32bit int, 1.1 billion integers across four machines means each 64-core (with 4 threads/core) KNL chip is averaging over 275M elements of int array, or 1.1M 32bit int operations per thread. Now think again whether that's amazing or not.

shaklee3 · on Jan 25, 2017

You're not accounting for the memory bandwidth at all. Yes, that's still amazing. Try doing that in opentsdb.

throwawayish · on Jan 26, 2017

~4.4 GB in 150 ms are just about 30 GB/s.

obl · on Jan 25, 2017

is it ? those things are trivial enough to be entirely bandwidth limited. total_amount is 4 byte, passenger_count is 1 and those are tightly packed in a column layout. streaming through that in 150ms is almost within the reach of a single normal chip with dual channel DDR3 ram.

Of course, not quite, and that's discounting the (small) sync overhead but still, no need to shell out 4 big servers, overpriced phi chips and fancy wide bus memory.

mmcclellan · on Jan 25, 2017

This was a good idea for a test. I'll definitely check out the author's other stuff. Commenting briefly on cost: while the article mentions the free 32 bit version early on, the actual benchmarks were done using the commercial version. I've had the impression the comercial version was cost prohibitive for us poor folks. For those interested in experimenting with Xeon Phi though, it looks like you can get started for ~$5k: http://dap.xeonphi.com/

Twirrim · on Jan 25, 2017

If you just want to meddle with a Xeon Phi, you can get some as cheap as $300: https://www.amazon.com/Intel-BC31S1P-Xeon-31S1P-Coprocessor/... , though that's from the 3100 rather than 7200 family, and so won't perform as fast. That kind of money puts it more into the hobby territory, though.

protomyth · on Jan 25, 2017

Is there a difference in how you program the different Xeon Phi families?

mtanski · on Jan 25, 2017

With the new generation there is now a difference. This is because this generation has Phis in both addin card (PCIe) and a bootable CPU package (when you run Linux or windows or whatever on the Phi).

Generally with the PCIe one you're running something like OpenCL and with system CPU package you run threads and processes like you normally would.

Technically you could run software directly on the old addin cards since they boot to Linux but you had handle the distribution, running and communication of your software with the host. (you could run any x86_64 binary)

VodkaHaze · on Jan 26, 2017

Do you still need the costly intel compiler suite to run some C++ code on it?

Honestly they make it pretty hard to hack with as a device. It could succeed as a device if they let devs easily create cool applications with it.

mtanski · on Jan 26, 2017

Both gcc & llvm can compile binaries targeting the Phi. Since it runs in Linux (as addin PCIe card or host Linux CPU) it can run any kind of ELF x86_64 binary. They also support generating AVX-512 instruction of the Phi.

WhitneyLand · on Jan 25, 2017

I don't see how these results provide much useful information in terms of being able to say x is faster than y.

The hardware doesn't seem consistent across different benchmarks. He says it's fast for a "cpu system", but for practical purposes Phi competes more with GPGPUs.

Would this be just as fast with one redis system with 512GB ram? I don't know too many apples to oranges here.

sologub · on Jan 25, 2017

The author isn't testing just the software but combinations of various software/hardware systems, including for example PostgreSQL on an i5 CPU, 16GB RAM and 850 SSD hardware: http://tech.marksblogg.com/billion-nyc-taxi-rides-postgresql...

WhitneyLand · on Jan 25, 2017

Great, and btw I think it's cool and I enjoyed reading it.

But what useful conclusions can be drawn from it?

LeifCarrotson · on Jan 25, 2017

All conclusions are only valid for similar workloads, but each of MapD and GPUs, Q/kdb+ and Xeon Phi, Redshift, Athena, Big Query, Presto, and Elasticsearch claim to be fast, inexpensive, easy to work with, and otherwise great for Big Data. Which ones really are fast? How fast is fast? How much is this going to cost? Do I need 5 nodes or 50?

A few examples of some useful conclusions:

- Just because a relatively well-optimized PostgreSQL database on a regular workstation takes 5 minutes to run a query doesn't mean you can't get special hardware to run that query faster than you can type.

- Spark + S3 + Amazon Elastic Map Reduce look like an ideal tool to work with large data, but they're pretty slow compared to better tools, and even compared to plain PostgreSQL.

- HDFS really is a lot faster than S3.

- Performance of an Xeon Phi 64-core CPU is within an order of magnitude to an NVidia Titan X.

- Loading 104 GB of compressed data into Q/kdb+ expands to 125 GB with and takes about 30 minutes, but on Redshift expands to 2 TB and takes many hours to upload on a normal connection, plus 4 hours to actually import!

- It might cost $5000 to custom-build a GPU-based supercomputer that can do these queries in under a second, but you can run similar queries if you're willing to wait for 5 minutes each by spinning up instances for a few dollars an hour plus a few more dollars an hour for storage, or by just running PostgreSQL on your workstation.

Also, not a conclusion, but it's incredibly useful to have a simple example exactly how to configure the tool and import some CSV data

WhitneyLand · on Jan 25, 2017

These conclusions don't seem very useful because either they are already well established or are not valid. Some examples:

Just because a relatively well-optimized PostgreSQL database on a regular workstation takes 5 minutes to run a query doesn't mean you can't get special hardware to run that query faster than you can type.

Already well established for years with systems like redis, and more recently with gpu databases, and other techniques posted on HN regularly.

Spark + S3 + Amazon Elastic Map Reduce...is pretty slow compared to better tools, and even compared to plain PostgreSQL.

Not valid because it doesn't generalize. It so much depends on type of work being done, system architecture, etc, that you can only say it may or may not be true.

HDFS really is a lot faster than S3.

This is already well established, Amazon states aa much right in the docs: http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-pl...

Performance of an Xeon Phi 64-core CPU is within an order of magnitude to an NVidia Titan X.

Not precise enough to matter because getting within 10x difference is not close to being competitive.*

Loading 104 GB of compressed data into Q/kdb+ expands to 125 GB with and takes about 30 minutes, but on Redshift expands to 2 TB and takes many hours to upload on a normal connection, plus 4 hours to actually import!

I don't see how it's possible for 104GB of csv text data to decompress into only 125GB. For cvs to compress only ~20%...doesn't make sense.

It might cost $5000 to custom-build a GPU-based supercomputer that can do these queries in under a second

No, two problems here. The hardware in question could have used 1 cheap CPU instead of two expensive Xeons and been much less expensive. Bigger problem: The MapD software itself will be $50,000.

LeifCarrotson · on Jan 25, 2017

The speed comparisons may be well known to you, but as someone only really using trivial desktop app SQLite databases, they weren't known to me. Thanks for pointing out my errors!

> I don't see how it's possible for 104GB of csv text data to decompress into only 125GB. For cvs to compress only ~20%...doesn't make sense.

The CSV file itself is around 500 GB. The internal representation, which might use binary formats for numbers, or compress text, uses 125 GB. Redshift expands it to 2TB for all the indexing and mapping.

> Bigger problem: The MapD software itself will be $50,000.

Ouch. That's a rather large oversight. Is the author affiliated with MapD, perhaps?

juliangoldsmith · on Jan 25, 2017

>Just because a relatively well-optimized PostgreSQL database on a regular workstation takes 5 minutes to run a query doesn't mean you can't get special hardware to run that query faster than you can type.

This is actually my biggest complaint with the article. He used cstore_fdw with Postgres, which doesn't allow much real indexing, and as far as I can tell (knowing only a little bit about it) he didn't really use any of the benefits of cstore_fdw.

I'd be interested to see how plain Postgres, possibly on a compressed filesystem, with properly-indexed tables stacks up.

sologub · on Jan 25, 2017

At least that Xeon Phi is a great piece of engineering! :)

lorenzhs · on Jan 25, 2017

A single machine doesn't have the required memory bandwidth to do this in the same time.

WhitneyLand · on Jan 25, 2017

Why not? Sounds like the initial data load and indexing are done up front. Once you get past that to run the benchmark it's not clear that a quad channel ddr4 system would be saturated.

lorenzhs · on Jan 25, 2017

A current-generation Xeon E7 has a memory bandwidth of up to 102 GB/s. The article says 90, which is probably a realistic estimate of achievable throughput. But the data is 500GB. So the problem is not with loading it from disk (that can be done beforehand), but getting it from CPU to RAM fast enough.

throwawayish · on Jan 26, 2017

Inaccurate; the very point of columnar stores is that if I'm only interested in a column I only expend the memory bandwidth required for that very column and nothing else. Hence typical queries to a columnar store would never stream the whole 500 GB for processing.

lorenzhs · on Jan 26, 2017

Right, a columnar store would touch only a couple (1.1bn * three numerical columns) of those 500 GB. But the question was about Redis :)

The author has an overview of the benchmarks with various systems at http://tech.marksblogg.com/benchmarks.html – but one can't just compare rows as many factors vary (esp. hardware).

mattnewton · on Jan 25, 2017

This is the comparison to the titan X / mapD article I was looking for. Still looks like the gpu is very competitive.

Sort of meta, but Mark's job seems awesome. Gets all these toys and writes about configuring them. (The actual configuring is probably a pain but still)

1024core · on Jan 26, 2017

    % cat startmaster.q

    k).Q.p:{$[~#.Q.D;.Q.p2[x;`:.]':y;(,/(,/.Q.p2[x]'/':)':(#.z.pd;0N)#.Q.P[i](;)'y)@<,/

Looks like line noise... :D

gravypod · on Jan 25, 2017

At what point will CPUs out parallelize GPUs and will we be able to move vidoe rendering back onto the CPU?

I see that as being something I'd very much like.

dunkelheit · on Jan 25, 2017

Pretty wide array of technologies covered in these benchmarks. I wonder how ClickHouse will fare, should be very competitive.

wyldfire · on Jan 25, 2017

How does Phi's MCDRAM compare to GDDR5 (wrt throughput)?

loeg · on Jan 25, 2017

According to wikipedia, the fastest GDDR5 can do 256 Gbit per chip[0]. I don't know how many chips are typically used. MCDRAM in the article does 400 GB/s, or 3,200 Gb/s. That would require 12.5 of those GDDR5 chips, assuming they scale linearly.

[0]: https://en.wikipedia.org/wiki/GDDR5_SDRAM#Commercial_impleme...

throwawayish · on Jan 26, 2017

Speed and size are somewhat independent. Speed is set by the bus frequency and how many chips you use - every chip has a fixed data bus width, typically 32 bits, while overall size is then how many chips you have times how big each chip is.

Unlike typical computer memory architectures, where the memory bus connects multiple chips or modules to one controller, GDDR doesn't do that; every slice of the memory controller only speaks with a single chip, strictly point-to-point. (Reducing bus load and layout issues and thus allowing higher clock rates).

That's why, with GPUs, it's usually sufficient to say how wide the bus is (often 64 - 128 - 256 - 384 - 512 bits) to get a rough idea of it's performance, since memory clock frequencies occupy a rather narrow range. (However, narrow-bus, lower-end GPUs often don't use the same technology as higher-end GPUs, eg. DDR3 instead of GDDR5)

shaklee3 · on Jan 25, 2017

GDDR5 can typically do 240GB/s access time on a typical GPU, and there are multiple chips on many cards (Tesla K80). The newer cards use HBM2 and can do 732GB/s (http://www.nvidia.com/object/tesla-p100.html).

AlphaSite · on Jan 25, 2017

HBM2 is 1024GB/s (256 per stack).

shaklee3 · on Jan 26, 2017

Kind of. Nvidia lowered the voltage on their P100 so it does not hit those rates. Theoretically it can go that high, but the power draw was too large. Next gen we'll likely see that.

astrodust · on Jan 25, 2017

1TB/s is pretty nuts by today's standards but I bet it'll elicit a yawn in ten years time. Amazing indeed.

pvitz · on Jan 25, 2017

Has somebody here experience with Jd and could comment on the status or the performance? Thanks!

eggy · on Jan 26, 2017

The interpreted Jd is fast, but you need the compiled, commercial license Jd, or you used to, for the speed test.

I love J compared with K, but that is because I found it first, and the differences between J and K are minimal, but a different enough to keep me using J.

scottlocklin · on Jan 26, 2017

I think jd is mostly plain old interpreted J. There are a few shared libraries to add functionality to core J, but it's mostly just J. You can probably get a non-commercial license if you ask nicely.

sndean · on Jan 26, 2017

> You can probably get a non-commercial license if you ask nicely.

This was my experience. I got a nice email from Eric Iverson in response along with the activation key.

g0xA52A2A · on Jan 25, 2017

Nice to see some KNL usage outside the traditional large HPC centers :D

nextos · on Jan 25, 2017

Yes, I remained a bit skeptical, but it seems to be taking off after latest iteration.

stuntprogrammer · on Jan 25, 2017

If the combination of such languages, high-performance hardware, and large scale compute problems is interesting.. the startup I work for in Mountain View is hiring...

hpcjoe · on Jan 26, 2017

:D Hope things are well by you!

stuntprogrammer · on Jan 26, 2017

And you too sir!

andrewstuart2 · on Jan 26, 2017

I get the distinct feeling this is not the usual price for a Xeon Phi. Still, might keep an eye to see if it comes back into stock.

https://www.walmart.com/ip/INTEL-SERVER-CPU-SC7120P-XEON-PHI...

tmostak · on Jan 26, 2017

It looks like year is extracted from pickup_datetime at ETL, so hence its not a fair comparison against the other databases that do this at runtime in Q3 and Q4. In something like MapD Q3 would be nearly as fast as Q1 (~20ms) without the extract function, which involves relatively complicated math.

thedarkknight0 · on Jan 25, 2017

Wow, they are some incredibly impressive numbers. Great write up. HT to Mark

smulh76 · on Jan 25, 2017

Unbelievably fast! Interesting blog tbh.

mrcactu5 · on Jan 25, 2017

for reference the city is New York City taxi rides from 2009-2015 and there's about 500GB of metadata

http://tech.marksblogg.com/billion-nyc-taxi-rides-redshift.h...

_pw2q · on Jan 25, 2017

Astonishing speed given the scale...

vegabook · on Jan 25, 2017

All lots of fun, but kdb has an eye watering cost of 200k dollars per year per server.

Here's hoping some combo of Apache Arrow (also cache aware, much more language stack flexibilty), Aerospike (lua built in), Impala, and others, can finally take on this overpriced product, which has had a lack of serious competitors for 20 years, owing to its (price inelastic) finance client base.

kpierre · on Jan 25, 2017

> Apache Arrow (also cache aware, much more cross platform)

kdb+ is available for raspberry pi, is that cross platform enough?

https://kx.com/2016/06/08/kx-releases-raspberry-pi-build-wit...

vegabook · on Jan 25, 2017

32 bit. Please be serious. You know full well that for all non-toy work kdb is exhorbitantly expensive.

bladecatcher · on Jan 25, 2017

Are you sure about that? I think it's significantly lower than 200k

atemerev · on Jan 25, 2017

The binary file of the database (all-inclusive and statically linked) is around 300 kilobytes (!), which makes is probably the most expensive (non-custom) software per kilobyte.

geocar · on Jan 25, 2017

kdb is not statically linked.

edit:

    $ file q/l64/q
    q/l64/q: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18,