Why We Chose Redshift

edwintorok · on March 27, 2015

The title should say 'Amazon Redshift'. At first I thought its going to be about redshift vs f.lux: http://jonls.dk/redshift/

Edit: Why the downvote? redshift (and flux) exist since before 2010, whereas Amazon Redshift got introduced just in 2012. I think it is reasonable to assume that someone who has never heard of Amazon Redshift would think of the open source project first (that exists in various distributions as packages), and not the Amazon service.

untog · on March 27, 2015

If we took a poll I suspect the majority would be thinking of the Amazon service - I know I was. The date the projects were introduced isn't necessarily relevant.

O____________O · on March 27, 2015

The UI colorizer is what I thought of immediately, too.

If we took a poll I suspect the majority would be thinking of the Amazon service

That's just personal projection, and is as irrelevant as an argument beginning with, "I think most people would agree that..."

Personally, regardless of Amazon vs UI hack, I'm really tired of ambiguous naming in tech projects.

chc · on March 27, 2015

I'm much more tired of comments on ambiguous naming. There are at least two other people in my city who have my name, and many more who share either my first or last name. Somehow life goes on and this is not a topic of major controversy. But when two pieces of software have similar names, people just can't resist commenting endlessly and upvoting this content-free bikeshedding at the expense of actual discussion.

O____________O · on March 28, 2015

There are at least two other people in my city who have my name

I'm presented with information to parse about technology topics daily. Sometimes, I have to search for them and have all sorts of name collisions.

I never, ever search for information about you.

people just can't resist commenting endlessly and upvoting this content-free bikeshedding at the expense of actual discussion

One man's bike shedding is another thousand men's irritating trend. Personally, I very rarely see anyone called out for the trendy names and awful, buzzword-laden non-descriptions that infest projects.

jakobegger · on March 27, 2015

Yes! Especially when people use an existing word like 'redshift' as their product name. (It seems to be a popular choice! I remember there also was this astronomy software for the Mac called Redshift.)

Of course, it gets even more ridiculous as the words get more common, eg see recent discussions about 'Paper' or 'Layout'

ajkjk · on March 28, 2015

It's not 'just' personal projection, nor irrelevant, if accurate. Obviously you'd have to run the experiment to find out for sure, but it's not meaningless to contribute "I would expect anyone I know to think of Amazon's Redshift first". For someone who doesn't know that others feel the opposite definition is the 'default', it might be useful to find out.

cheshire137 · on March 27, 2015

Same here, since I was recently using Redshift on an Ubuntu laptop after coming from F.lux on my Mac. Never heard of Amazon Redshift.

meritt · on March 27, 2015

Because why would a company make a blog post on what color adjuster their company uses, and if they did, why would anyone care?

Crito · on March 27, 2015

Far weirder things have been blogged about before.

Not to mention that if you are not already familiar with the company, you probably would not know what sort of blog it was just by looking at what was presented on the HN frontpage. Could have been some random joe-schmoes blog.

argonaut · on March 27, 2015

FWIW, I didn't downvote you but you're also kind of hijacking the discussion (especially since this comment is the top comment and it really shouldn't be).

gurkendoktor · on March 28, 2015

But what can the grandparent poster do? It would be better if a mod could rename the post and delete this subthread.

(I also came here expecting to read something about the f.lux competitor :) )

argonaut · on April 2, 2015

He/she can downvote the post. Which is the point.

ecaron · on March 27, 2015

I wish he would talk about how they protect one customer from running a query that brings down the full stack. When we permitted Tableau to start talking to Redshift, we frequently encountered "Oh crap, Peter is running that query and and that's why everything is at a stand-still..."

omgbear · on March 27, 2015

You can set up Workload Management[1] to restrict the amount of compute / query_slots each query/user can use. It splits the memory/compute into slices, and queries can use multiple slices, so you can get some fine-grained control, but it takes a bunch of work.

[1] http://docs.aws.amazon.com/redshift/latest/dg/cm-c-modifying...

sskates · on March 27, 2015

*She :) I can understand why Ben Horowitz tries to default to using female pronouns.

ernestipark · on March 27, 2015

Curious if your funnels are just queries directly in Redshift or if there's more going on behind the scenes.

silverrc21 · on March 27, 2015

Amplitude here - Most of our dashboards are powered separately from Redshift. We offer Redshift access as a way for our customers to answer more complex questions not offered by the dashboards.

ripberge · on March 27, 2015

Why not power your dashboards with it? What do you use?

I am considering using a columnar data store (maybe redshift) with a BI tool like bimeanalytics.com specifically to do dashboards.

nemothekid · on March 27, 2015

My guess is latency - using Redshift for short lived, small queries might not be the best.

fsaintjacques · on March 27, 2015

That extra order of magnitude you pay in pricing you gain in response time.

exelius · on March 27, 2015

Yeah, but a data warehouse isn't supposed to have great response times. Data warehouses are for large, low-value sets of historical data that you don't always know how you want to use.

If you want to use data in real-time, you should be driving it from your transactional systems. Redshift and other data warehouse solutions are for doing reporting and dashboards, not triggering real-time reactions.

luckydata · on March 27, 2015

Well, used to be true, but now those systems are converging. -- Full disclosure, I work for a company working on exactly that problem called Treasure Data.

exelius · on March 27, 2015

Most companies are generally more concerned about reducing their data warehouse costs than they are about improving the performance of their data warehouses. Many companies implement a multi-tiered DW structure to get a mix of the two, but the core driver is managing the cost of storing petabytes of data while keeping performance acceptable.

luckydata · on March 27, 2015

How do you guys handle the constant shifting of analytic schema that happens when handling a fast iterating application?

silverrc21 · on March 27, 2015

We update the table schema as we run into new fields, up to a limit. We also store the unstructured part of the data in a column that can be queried via json_extract_path_text.

bnastic · on March 28, 2015

O.T. but what the hell is a "Director Of Customer Success"?

blumkvist · on March 28, 2015

It's pretty straightfoward I think.

You have a complex product/service, with very diverse application scenarios. => Customer adoption is hindered by this complexity. => Customer is not getting value => Customer is angry and stops paying

You hire a person who is familiar with the applications of your technology. He talks to customers to figure out what they want to do, how they plan to achieve it, what the hurdles. He helps them. Writes best practices, implementation plans, helps marketing to position and sales to close.

It turns out so good that you hire many such people, who specialize in particular customer segments. Those people need management.

You need Director of customer success.

It's something between account manager/service/marketing.

jlintz · on March 27, 2015

Is each customer given their own redshift cluster for their data?

sskates · on March 27, 2015

No, clusters are multi-tenant. We have a cap on the number of customers per cluster and we monitor usage to make sure no one customer is hammering the cluster.

deeviant · on March 27, 2015

Redshift is like a prison, but with excellent accommodations. It's a great platform but it pretty much the perfect example of vendor lock-in.

eva1984 · on March 27, 2015

How is Redshift a vendor-lock in though?

Put your data in S3, in csv/tsv/json format, if you want to switch to other provider, just figure out how to import it, and your are all set. How to figure out the limitation of the different platforms and tuning and optimizing is the difficult part.

Data migration is almost always painful and time-spending. When choosing your data provider, you have to be careful because it is very likely to be a long-term commitment. In that sense, in DW world there is always vendor lock-in. Only it is largely driven by the essence of the application itself, less so by the intention of the provider.

paladin314159 · on March 27, 2015

Compared to the lock-in of the AWS ecosystem in general, Redshift honestly isn't that bad. You can unload all of your data into S3 and then do whatever you want with it. I'd be surprised if most data warehousing solutions had such an easy way of exporting the data.

vosper · on March 27, 2015

In addition, if you store your data in S3 and have Redshift load it from there then you don't even need to do an export - just leave your source data in S3 after Redshift's loaded it, and you're all ready to switch to another platform.

not_kurt_godel · on March 27, 2015

Can you explain what you mean by that? I fail to see how a PostgreSQL query interface could possibly qualify as a perfect example of vendor lock-in.

exelius · on March 27, 2015

If you want to move to another DW platform, it's probably not going to be Postgres-based. As every vendor has a slightly different flavor of SQL with different behaviors, this will require redesigning your queries, schemas, and most if not all of your stored procedures. Depending on the company and age of the platform, this could be many thousands of hours of work.

Really, vendor lock-in is pretty much a given with data warehousing platforms. Though these days, it's not uncommon for large companies to have multiple DW platforms all pulling data from each other. When one platform falls out of favor, the users just migrate themselves to another since most reporting systems not made by SAP or Oracle are compatible with pretty much everything.

kermatt · on March 27, 2015

In contrast, Vertica, Greenplum, Netezza, Teradata Aster, and CitusDB are all based on PostgreSQL forks. In many cases, the client libraries behave like psql, and ease conversions at that level.

As to SQL language differences, no DW platform uses "standard SQL", just as no two RDBMS use the exact same SQL dialect.

I dare say database platform lock-in is a universal issue. Any migration will involve effort.

exelius · on March 27, 2015

To be fair, that kind of comes with the territory when talking about data warehousing. The data volumes are so large that migrating them is usually out of the question, and query languages vary between vendors pretty significantly.

otterley · on March 27, 2015

"Resort" is probably a better analogy than "prison." Most people wouldn't choose to leave, since the accommodations are so nice, but for the expense.

blumkvist · on March 28, 2015

Greenplum (and associated tech) is partly open source now.