Hacker News new | past | comments | ask | show | jobs | submit login
Dremel – Google's tool for analyzing trillion-row tables in seconds [pdf] (melnix.com)
56 points by tillulen on Dec 29, 2010 | hide | past | favorite | 18 comments



Quick view: http://docs.google.com/viewer?url=http%3A%2F%2Fsergey.melnix...

Abstract:

Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.

Overview by Greg Linden: http://glinden.blogspot.com/2010/12/papers-on-specialized-da...


Yahoo also had a tool for interactively querying night datasets. Dremel was way faster... many orders of magnitude.

The ability to query and re-query and refine in real-time lets you learn so much more about your data - the leverage is simply unreal.


Having fast turn around on data of this size is immeasurably important and a significant competitive advantage. I have worked on large search-engine systems up until very recently and I can say without a doubt that turnaround time is more important than overall dataset size.

Our not-so-awesome mitigations were to have strictly defined preprocessed pipes to plug into. This meant having to either rigidly define queries or suck it up and wait hours, if not days for results.


Dremel is the technology behind BigQuery. The talk from Google IO is informative.

http://code.google.com/apis/bigquery/


Dremel, Sawzall, what's next?


I don't know, but I wonder if this may get them in a little hot water. Both Dremel and Sawzall are trademarked.

I also don't particularly like the use of the words, I am very attached to my Dremel and I don't see how it applies to processing a database :)


Trademark restrictions are tied to the field of use.

If Google was releasing power tools and calling them Dremel, they would have a problem. But they can release a programming language called Dremel and are likely to be legally in the clear.

(Or such is my non-lawyerly understanding. I am not a lawyer, this is not legal advice.)

((The reason people say that about not legal advice is that if you give legal advice and someone gets in trouble because of it - even if the advice was correct and the person misunderstood - you are liable for the consequences of that advice.))


But these are not products. They're internal code names and there's no chance that Google will ever be releasing these pieces of software to the public. Thus no possible trademark issues.


Google Naming conventions:

Indexing Systems: Caffeine, Percolator

Database Systems: Spanner, Sawzall, Dremel

Mobile: Cupcake, Donut, Eclair, Froyo, Gingerbread, Haggis


Seeing 'Haggis' really brought a smile to my face. I wish it were the case but 'Honeycomb' is the name there.


You're right; this was from an old source file.


Interesting, but not very useful w/out code =)


The paper describes the algorithms, implementation and operation in several real-world scenarios. The essential routines are described using pseudo code in the appendix.

Not really seeing the problem here. This is actually more useful than a dump of some source code.


You could have said the same thing about papers covering BigTable and MapReduce. Today we have Hadoop...


Its certainly tough to come up with project names that are unique these days, but Dremel? Its a well-known consumer brand of tools in the US. Did Google's Dremel name derive from something other than the tool brand?

Google's project is certainly not in the same market that the Dremel tools brand, but is it ok for a company like Google to borrow from a brand name in a different market?


Footnote on the first page:

   Dremel is a brand of power tools that primarily rely 
   on their speed as opposed to torque. We use this 
   name for an internal project only.


It's an internal tool. Relax.


Google also has a programming language called Sawzall, named for a brand name of electric reciprocating saw.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: