Hacker News new | past | comments | ask | show | jobs | submit login

Suggestion: provide a dimensionally-modelled reporting schema, it will be much easier to query.

Kimball's The Data Warehouse Toolkit is the definitive tutorial on the subject.




Our first sprint on this product was following the first principles as outlined by that book.

We found that the strategy works really well on paper but there are significant scaling issues on the querying side when you throw massive amounts of data at it. A simple sales report over cities for one of our larger shops under moderate load would could take 5-10 seconds to generate, which is pretty unacceptable. Caching would only take us so far because of how much data gets ETL'd every moment.

I definitely don't discount the dimensionally modelled strategy, but to make it proper fast, and not 1990's let-me-hit-report-and-go-get-a-coffee fast, you might need to write your own OLAP stack that's optimized for what you need[0]. I'd also do it in go or c.

Once we ship, we'll do a technical post on what worked and what didn't.

[0]I'd love to be proven wrong on this, so if you can generate fast reports with massive amounts of data ETL'd in real time, I'd love to hear from you. oren.mazor@shopify.com


I fail to understand why waiting five seconds to generate a report that will impact my business decisions is "unacceptable". I have numerous reports I've implemented that take over a minute to generate; they are complex queries that sometimes involve interesting statistics, and the idea of calling them outdated solely because they don't return data in a fraction of a second seems like a fundamental misunderstanding of what businesses do with reports.


How massive are we talking? What's your finest grained fact table? Are we partitioning fact tables by customer? What's your current underlying store?

You're actually at the coalface and I'm the bookworm, so I am totally prepared to be schooled on this one.


We started out with mondrian+mysql, but quickly had to drop mysql ("mysql is great until you want to put data into it and then get it back out again" - unattributed, to protect the guilty). Our primary work was with postgresql in the beginning. which, in my opinion, is a pretty solid database to work with overall.

We did partition all of our dimension and facts tables, which helped a great deal. Aggregates were a problem that is specific to us, so we couldn't cheat the usual way that reporting servers do.

The other problem is that our stack is suddenly fully of things like java and olap and postgresql, which made onboarding people who wanted to help, and just debugging, a pain.

I like that comment about the coalface/bookworm, but sometimes it takes somebody on the outside to see what I'm missing.


With all due respect to pgsql, mondiran and mysql - you're using the wrong toolkit. You need a column store, like Vertica, kdb+, TimesTen (now Oracle?), Sybase anytime (I think that was what it was called).

You can do very well at much, much lower cost in the Python world: pandas, PyTables, or even just straight numpy.

Seriously, using any of these would make the report generation time basically zero, and you'd just have to make your ETL work quickly enough to feed it; how well this can be solved depends on how you store the original data ("pre-facts").

The book written by this guy http://blog.wesmckinney.com/ (and the guy himself, if you can get him) will probably advance you way more than experiments.


I didn't make it clear in my response, and thats my fault. We actually dont use pg, mondrian, or mysql anymore.

We ended up with a golang solution that we use essentially like you would with a column store. It's very fast and extremely thin.


> ...you're using the wrong toolkit. You need...

Ouch. Do you really feel so well acquainted with their internal design that you can use such strong language? It comes across as dogmatic.


I am only as familiar as orenmazor describes above.

Would you have said that if I urged him to get a hammer to hit on nails, rather than a screwdriver? It is about as dogmatic. (you can put nails into a wall with a large enough screwdriver - I've done it myself before when the nail was small enough and i had no hammer - but it's still the wrong toolkit)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: