Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Who is using a self hosted analytics system?
27 points by mfrye0 on Dec 5, 2016 | hide | past | favorite | 39 comments
I'm so used to SASS based solutions these days like Mixpanel, GA, Kiss Metrics, etc. But with everything happening over the last few years in regards to privacy and security I've been looking into self hosted.

Is anyone else considering the same? I know this is standard for companies like Amazon and Facebook, but what about everyone else?

If you are using a self hosted setup, what is it? Custom built, open source, etc?




If you're serious about self-hosting analytics there is only one serious place to go: http://snowplowanalytics.com/

I don't use them, but they are building enterprise-grade self-hosting.

Disclaimer: I am working on a project in the marketing analytics space. I don't use them and this isn't an endorsement, just a pointer to research more!


This looks awesome. Just what I was looking for.

Seems like they have some decent companies using them too.


Are you still working on thebigpicture?


Yeah. I'm actually asking the question as one of my friends mentioned we should offer a full self hosted option for enterprise.

I don't know if we want to go that direction though. Plus I wasn't even sure who is actually self hosting these days. Kind of reminds me of the Silicon Valley show with the "box" in the data center dilemma.


> I wasn't even sure who is actually self hosting [analytics] these days

About every company who don't outsource everything + every company that adheres to strict privacy rules + every company that is forbidden to share user data because of regulations + every company that is beyond what free GA has to offer.

That's like a shit tons of companies, a lot of which are established and have money. Just sayin'


Yeah, I hear you. Snow plow is the only commercial one I've heard of that allows self-hosted. It makes sense but you need a new billing model then (e.g. services).

The bigger the data the more self-hosting will make sense for you, but the less for your customers.


I'm not sure if I follow you there. What did you mean by, "The bigger the data the more self-hosting will make sense for you, but the less for your customers"?


I meant that if you're running an analytics company and each client has a huge amount of data, it's less costs for you to host it (better for you as the company owner). If the clients have to self host, then the clients will have to pay for it.

This works for snowplow because they sell services, which only really works for bigger companies who have the resources to self-host and to pay the services fees.


Ah gotcha. Thanks for the input.

I've been looking at Snowplow and it seems really cool.


I'm actually on the same road right now, currently I'm testing Piwik.

https://piwik.org


What's your experience with it so far? I was playing around with it at a startup a few years ago. It seemed to have a strong community and features were popping up rather quickly. However, it didn't scale that well for us.


I'm very pleased so far, but as I said, it's too early for me to tell something.

> However, it didn't scale that well for us.

What kind of scaling problems did you encounter ?


In fairness to Piwik, it was some of the custom metrics that required more processing power. We ended up doing delayed batch processing for them.

Towards the end, IIRC, the calculation of unique visitors within a custom range was also slow.


Last I checked it only runs over a MySQL.

I don't know how many visitors and page views you have, but that's a red flag for most established sites.


Yeah I've been looking at Piwik. What do you think so far?

The only aspect I didn't like was it was PHP. Not really my strong language - if I wanted to customize it.


I'm measuring its footprint, it's too early for me to tell, I can refer back to you, after I've done some tests.


Yeah, that would be awesome.


I built my own analytics App for Splunk to offer business insights for my wife's small business. Mostly how traffic correlates with purchases and where buyers are coming from.

As a side effect same system detects malware and cyber attacks on other websites pretty well as well.

https://splunkbase.splunk.com/app/2676/


Whoa. That's badass.

Do you pull any info regarding the IP addresses, or is it only the raw logs that you're going through?


Only raw logs. Splunk resolves IP to Country/Region/City (and geo coordinates if wanted to map these).

Mostly playing with raw logs and then even RAW-er logs using Splunk Stream (thing that switches network interface in promiscuous mode and gives me all data for all protocols and any context I ever want).

For example I can analyze anomalies in web hits and anomalies in web session to discover new, previously unknown traffic sources and patterns.

It helped to discover 2 new classes of cyberattacks I didn't know were targeting my server.


Sounds really useful. I'll have to check that out.


I've used https://piwik.org successfully.


Thoughts using it so far?


To give you some perspective, a startup I worked for began with a self-hosted web analytics option.

In fact, most of the systems we had were all homegrown and while initially it was a great idea - the maintenance part of it took a lot of time away from optimizing on generating revenue.

We spent a lot of time trying to figure out the structure of our self-hosted web analytics platform, how it tracks data, how it stores data, etc. The majority of our company were engineers but we still wasted a lot of time fiddling with the self-hosted analytics.

There will likely be a subset of customers who would be interested in self-hosted but I'm willing to bet that they are more likely to be companies more engineering-oriented.


Good feedback.

Yeah I was thinking the same thing - that companies who are self hosting are more likely to be engineering-oriented. Otherwise like you said it's just too much of a pain to handle it yourself.

May I ask, why did you self-host vs use an external service? To save money, for greater security and privacy?


I believe we self-hosted because of privacy worries and the engineering culture (or myth that we can and should build everything).

Ultimately though, the business demanded to switch to something more robust and we went with Omniture.

I do believe that self-hosted is an option if it's easy to maintain and it also delivers on business features like reporting and data exploration.


Yeah I hear you. I run into the same issue - why pay if you can build it yourself.

Thanks for the insight.


Yeah but I believe successful companies are ones where they can figure out the right balance between buying and building.


Yeah I agree. It's a tough call sometimes.


I don't yet have a product I've settled on, but I'm in the early stages of developing a website for a government body.

With all the regulations and policies on data protection, using something not self-hosted is just not going to happen. (I believe that if there was a possibility that their users were affected by a 3rd party breach, the fine is around 1000x the project budget).

If this means a little less information about the audience, that is perfectly acceptable.


Yeah that makes sense being a government org.


I have a custom built one. It's main feature is that it can email me the details I want on a scheduled basis to save me logging in.

It tracks a few million records a month so not high scale and runs on a single $5 digital ocean instance.

I did attempt to make it Sass for hosting resellers as an upsell but never made any progress. It's just running for myself these days.


Sounds pretty cool. What tech is it based on? Just a standard api and db?

Also, can I ask how big your company is? A few million records is decent volume.


Nginx, Django/Python and MySQL. Nothing fancy.

No company actually. Just side projects. The largest being searchcode.com

Feel free to email me if you want further details on either. Details in my profile.


Sure thing. Thanks.

I've actually been looking at Snowplow and that seems pretty badass so far. Might work for what I'm looking at.


Piwik is pretty good, but like most open source, not so feature packed comparing to commercial tools. It's resource heavy, though. We had problems for big amounts of data which couldn't be loaded before timeouts. Caching helps, but it definitely needs stronger machine.


I've been talking to a friend about it and he mentioned using Elastic Search to build our own setup. Idk if it's the right use case though...


Working for a bank processing millions of calls a minute; using WebTrends


I haven't heard of WebTrends before. I'm looking at their website now. Are you self-hosting it? I can't find info on that option.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: