Hacker News new | past | comments | ask | show | jobs | submit login
Why We Chose Redshift (amplitude.com)
103 points by alishiu on March 27, 2015 | hide | past | favorite | 40 comments



The title should say 'Amazon Redshift'. At first I thought its going to be about redshift vs f.lux: http://jonls.dk/redshift/

Edit: Why the downvote? redshift (and flux) exist since before 2010, whereas Amazon Redshift got introduced just in 2012. I think it is reasonable to assume that someone who has never heard of Amazon Redshift would think of the open source project first (that exists in various distributions as packages), and not the Amazon service.


If we took a poll I suspect the majority would be thinking of the Amazon service - I know I was. The date the projects were introduced isn't necessarily relevant.


The UI colorizer is what I thought of immediately, too.

If we took a poll I suspect the majority would be thinking of the Amazon service

That's just personal projection, and is as irrelevant as an argument beginning with, "I think most people would agree that..."

Personally, regardless of Amazon vs UI hack, I'm really tired of ambiguous naming in tech projects.


I'm much more tired of comments on ambiguous naming. There are at least two other people in my city who have my name, and many more who share either my first or last name. Somehow life goes on and this is not a topic of major controversy. But when two pieces of software have similar names, people just can't resist commenting endlessly and upvoting this content-free bikeshedding at the expense of actual discussion.


There are at least two other people in my city who have my name

I'm presented with information to parse about technology topics daily. Sometimes, I have to search for them and have all sorts of name collisions.

I never, ever search for information about you.

people just can't resist commenting endlessly and upvoting this content-free bikeshedding at the expense of actual discussion

One man's bike shedding is another thousand men's irritating trend. Personally, I very rarely see anyone called out for the trendy names and awful, buzzword-laden non-descriptions that infest projects.


Yes! Especially when people use an existing word like 'redshift' as their product name. (It seems to be a popular choice! I remember there also was this astronomy software for the Mac called Redshift.)

Of course, it gets even more ridiculous as the words get more common, eg see recent discussions about 'Paper' or 'Layout'


It's not 'just' personal projection, nor irrelevant, if accurate. Obviously you'd have to run the experiment to find out for sure, but it's not meaningless to contribute "I would expect anyone I know to think of Amazon's Redshift first". For someone who doesn't know that others feel the opposite definition is the 'default', it might be useful to find out.


Same here, since I was recently using Redshift on an Ubuntu laptop after coming from F.lux on my Mac. Never heard of Amazon Redshift.


Because why would a company make a blog post on what color adjuster their company uses, and if they did, why would anyone care?


Far weirder things have been blogged about before.

Not to mention that if you are not already familiar with the company, you probably would not know what sort of blog it was just by looking at what was presented on the HN frontpage. Could have been some random joe-schmoes blog.


FWIW, I didn't downvote you but you're also kind of hijacking the discussion (especially since this comment is the top comment and it really shouldn't be).


But what can the grandparent poster do? It would be better if a mod could rename the post and delete this subthread.

(I also came here expecting to read something about the f.lux competitor :) )


He/she can downvote the post. Which is the point.


I wish he would talk about how they protect one customer from running a query that brings down the full stack. When we permitted Tableau to start talking to Redshift, we frequently encountered "Oh crap, Peter is running that query and and that's why everything is at a stand-still..."


You can set up Workload Management[1] to restrict the amount of compute / query_slots each query/user can use. It splits the memory/compute into slices, and queries can use multiple slices, so you can get some fine-grained control, but it takes a bunch of work.

[1] http://docs.aws.amazon.com/redshift/latest/dg/cm-c-modifying...


*She :) I can understand why Ben Horowitz tries to default to using female pronouns.


Curious if your funnels are just queries directly in Redshift or if there's more going on behind the scenes.


Amplitude here - Most of our dashboards are powered separately from Redshift. We offer Redshift access as a way for our customers to answer more complex questions not offered by the dashboards.


Why not power your dashboards with it? What do you use?

I am considering using a columnar data store (maybe redshift) with a BI tool like bimeanalytics.com specifically to do dashboards.


My guess is latency - using Redshift for short lived, small queries might not be the best.


That extra order of magnitude you pay in pricing you gain in response time.


Yeah, but a data warehouse isn't supposed to have great response times. Data warehouses are for large, low-value sets of historical data that you don't always know how you want to use.

If you want to use data in real-time, you should be driving it from your transactional systems. Redshift and other data warehouse solutions are for doing reporting and dashboards, not triggering real-time reactions.


Well, used to be true, but now those systems are converging. -- Full disclosure, I work for a company working on exactly that problem called Treasure Data.


Most companies are generally more concerned about reducing their data warehouse costs than they are about improving the performance of their data warehouses. Many companies implement a multi-tiered DW structure to get a mix of the two, but the core driver is managing the cost of storing petabytes of data while keeping performance acceptable.


How do you guys handle the constant shifting of analytic schema that happens when handling a fast iterating application?


We update the table schema as we run into new fields, up to a limit. We also store the unstructured part of the data in a column that can be queried via json_extract_path_text.


O.T. but what the hell is a "Director Of Customer Success"?


It's pretty straightfoward I think.

You have a complex product/service, with very diverse application scenarios. => Customer adoption is hindered by this complexity. => Customer is not getting value => Customer is angry and stops paying

You hire a person who is familiar with the applications of your technology. He talks to customers to figure out what they want to do, how they plan to achieve it, what the hurdles. He helps them. Writes best practices, implementation plans, helps marketing to position and sales to close.

It turns out so good that you hire many such people, who specialize in particular customer segments. Those people need management.

You need Director of customer success.

It's something between account manager/service/marketing.


Is each customer given their own redshift cluster for their data?


No, clusters are multi-tenant. We have a cap on the number of customers per cluster and we monitor usage to make sure no one customer is hammering the cluster.


Redshift is like a prison, but with excellent accommodations. It's a great platform but it pretty much the perfect example of vendor lock-in.


How is Redshift a vendor-lock in though?

Put your data in S3, in csv/tsv/json format, if you want to switch to other provider, just figure out how to import it, and your are all set. How to figure out the limitation of the different platforms and tuning and optimizing is the difficult part.

Data migration is almost always painful and time-spending. When choosing your data provider, you have to be careful because it is very likely to be a long-term commitment. In that sense, in DW world there is always vendor lock-in. Only it is largely driven by the essence of the application itself, less so by the intention of the provider.


Compared to the lock-in of the AWS ecosystem in general, Redshift honestly isn't that bad. You can unload all of your data into S3 and then do whatever you want with it. I'd be surprised if most data warehousing solutions had such an easy way of exporting the data.


In addition, if you store your data in S3 and have Redshift load it from there then you don't even need to do an export - just leave your source data in S3 after Redshift's loaded it, and you're all ready to switch to another platform.


Can you explain what you mean by that? I fail to see how a PostgreSQL query interface could possibly qualify as a perfect example of vendor lock-in.


If you want to move to another DW platform, it's probably not going to be Postgres-based. As every vendor has a slightly different flavor of SQL with different behaviors, this will require redesigning your queries, schemas, and most if not all of your stored procedures. Depending on the company and age of the platform, this could be many thousands of hours of work.

Really, vendor lock-in is pretty much a given with data warehousing platforms. Though these days, it's not uncommon for large companies to have multiple DW platforms all pulling data from each other. When one platform falls out of favor, the users just migrate themselves to another since most reporting systems not made by SAP or Oracle are compatible with pretty much everything.


In contrast, Vertica, Greenplum, Netezza, Teradata Aster, and CitusDB are all based on PostgreSQL forks. In many cases, the client libraries behave like psql, and ease conversions at that level.

As to SQL language differences, no DW platform uses "standard SQL", just as no two RDBMS use the exact same SQL dialect.

I dare say database platform lock-in is a universal issue. Any migration will involve effort.


To be fair, that kind of comes with the territory when talking about data warehousing. The data volumes are so large that migrating them is usually out of the question, and query languages vary between vendors pretty significantly.


"Resort" is probably a better analogy than "prison." Most people wouldn't choose to leave, since the accommodations are so nice, but for the expense.


Greenplum (and associated tech) is partly open source now.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: