Hacker News new | past | comments | ask | show | jobs | submit login

These conclusions don't seem very useful because either they are already well established or are not valid. Some examples:

Just because a relatively well-optimized PostgreSQL database on a regular workstation takes 5 minutes to run a query doesn't mean you can't get special hardware to run that query faster than you can type.

Already well established for years with systems like redis, and more recently with gpu databases, and other techniques posted on HN regularly.

Spark + S3 + Amazon Elastic Map Reduce...is pretty slow compared to better tools, and even compared to plain PostgreSQL.

Not valid because it doesn't generalize. It so much depends on type of work being done, system architecture, etc, that you can only say it may or may not be true.

HDFS really is a lot faster than S3.

This is already well established, Amazon states aa much right in the docs: http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-pl...

Performance of an Xeon Phi 64-core CPU is within an order of magnitude to an NVidia Titan X.

Not precise enough to matter because getting within 10x difference is not close to being competitive.*

Loading 104 GB of compressed data into Q/kdb+ expands to 125 GB with and takes about 30 minutes, but on Redshift expands to 2 TB and takes many hours to upload on a normal connection, plus 4 hours to actually import!

I don't see how it's possible for 104GB of csv text data to decompress into only 125GB. For cvs to compress only ~20%...doesn't make sense.

It might cost $5000 to custom-build a GPU-based supercomputer that can do these queries in under a second

No, two problems here. The hardware in question could have used 1 cheap CPU instead of two expensive Xeons and been much less expensive. Bigger problem: The MapD software itself will be $50,000.




The speed comparisons may be well known to you, but as someone only really using trivial desktop app SQLite databases, they weren't known to me. Thanks for pointing out my errors!

> I don't see how it's possible for 104GB of csv text data to decompress into only 125GB. For cvs to compress only ~20%...doesn't make sense.

The CSV file itself is around 500 GB. The internal representation, which might use binary formats for numbers, or compress text, uses 125 GB. Redshift expands it to 2TB for all the indexing and mapping.

> Bigger problem: The MapD software itself will be $50,000.

Ouch. That's a rather large oversight. Is the author affiliated with MapD, perhaps?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: