Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.
Having fast turn around on data of this size is immeasurably important and a significant competitive advantage. I have worked on large search-engine systems up until very recently and I can say without a doubt that turnaround time is more important than overall dataset size.
Our not-so-awesome mitigations were to have strictly defined preprocessed pipes to plug into. This meant having to either rigidly define queries or suck it up and wait hours, if not days for results.
Trademark restrictions are tied to the field of use.
If Google was releasing power tools and calling them Dremel, they would have a problem. But they can release a programming language called Dremel and are likely to be legally in the clear.
(Or such is my non-lawyerly understanding. I am not a lawyer, this is not legal advice.)
((The reason people say that about not legal advice is that if you give legal advice and someone gets in trouble because of it - even if the advice was correct and the person misunderstood - you are liable for the consequences of that advice.))
But these are not products. They're internal code names and there's no chance that Google will ever be releasing these pieces of software to the public. Thus no possible trademark issues.
The paper describes the algorithms, implementation and operation in several real-world scenarios. The essential routines are described using pseudo code in the appendix.
Not really seeing the problem here. This is actually more useful than a dump of some source code.
Its certainly tough to come up with project names that are unique these days, but Dremel? Its a well-known consumer brand of tools in the US. Did Google's Dremel name derive from something other than the tool brand?
Google's project is certainly not in the same market that the Dremel tools brand, but is it ok for a company like Google to borrow from a brand name in a different market?
Abstract:
Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system.
Overview by Greg Linden: http://glinden.blogspot.com/2010/12/papers-on-specialized-da...