Hacker News new | past | comments | ask | show | jobs | submit | alexdean's comments login

Very cool. Snowplow CEO here - if you need any help on integrating with Snowplow just ask in our forums: https://discourse.snowplowanalytics.com/


Snowplow CEO here. We haven't used Jitsu before but are very familiar with Segment. It looks like Jitsu sits in the Segment product family, along with Rudderstack: basically a Customer Data Platform bundle of simple JSON event tracking, Fivetran-style transactional/SaaS data ingest, and then relaying of data out to various SaaS endpoints plus cloud DWs.

Snowplow started at the same time as Segment (2012) but has evolved along a separate tech tree. Micro-service architecture, cloud native, using Kinesis or Cloud Pub-Sub as the data transit, enrichment framework plus a Confluent-style schema registry supporting very rich and versioned JSON Schema-based event payloads. We are built by and for data platform teams; our open-source behavioral data engine doesn't have a UI (our commercial Behavioral Data Platform does). Hosted trial here https://try.snowplowanalytics.com/

Definitely room for both product families in the market! I'm sure Jitsu will do great.


Alex - love your work on Snowplow.

Looking at Jitsu as a Snowplow familiar person I tried to do a quick browse of their marketing site and couldn’t find anything about their back end architecture. Was immediately thinking that wasn’t the focus here which is concerning when thinking about enterprise scalable data patterns.

Also appreciate you taking the high road “room for both” while the founder of Jitsu says “we are better”

I’ll stick with the product with a solid schema strategy, thank you…


Back in the early years of Snowplow, we adopted a similar approach for our support of 3rd party webhooks[1] as an event stream source supported natively by Snowplow (https://github.com/snowplow/snowplow). (The important background here is that all events and entities flowing through Snowplow are described using versioned JSON Schema).

We built a kind of Maven Central for 3rd party webhooks' schemas[2], and got to work adding schemas for various popular webhooks, e.g. Sendgrid and Pingdom. But, we never met a single SaaS vendor who was interested in 'adopting' the JSON Schemas for their webhook, let alone publishing versioned JSON Schemas for their whole API. This meant that at Snowplow we have stayed on the hook for keeping these vendors' webhook schema definitions up to date in our registry.

It's a real tragedy of the commons - oceans of developer time wasted on tedious low-leverage work (updating API client code) so that each SaaS vendor can 'move fast and break things'. The ultimate irony is that if these vendors were to adopt, publish and respect versioned schema definitions internally, using something like OpenAPI, they would see huge productivity gains (think enhanced CI/CD testing, auto-gen of client code etc).

1. https://docs.snowplowanalytics.com/docs/getting-started-on-s... 2. https://github.com/snowplow/iglu-central/tree/master/schemas


Seems like this is a feature you could demand if you were a big enough client of their enterprise products. Better yet, get a critical mass of big clients together and push for it. You could probably do this by showing up at a product-specific conference and networking with the other devs who care enough about the tool to show up.

Also, if this is a genuinely useful thing to have, you could probably monetize it directly. Decouple the schema repository from your primary application and sell access to well-validated, performant client libraries for popular SaaS APIs.


I just wanted to jump in and say the same thing. I had the hardback of the Cuckoo's Egg as a teenager in the 90s. Huge inspiration to me and I have worked in and around tech ever since. Thank you.


He may think he doesn't have an 'HR department', but whoever holds the pen on his holiday policy, equipment policy, training & development policy, maternity & family friendly rights policy, expenses policy - well, there's his HR department.


Not having an HR department doesn't mean you don't have HR and no one is claiming that. The entire point is that you don't need a department to do those things and the argument presented here is that those responsibilities are left integrated within a team rather than compartmentalized into a separate and independent system.


That’s exactly how HR departments formed... you can distribute their functionality across the company and guess what it will cause issues that would lead to an HR department forming.


His managers handle that for their own employees.


That sounds like a lawsuit waiting to happen. All it takes is one bonehead. Not even a bonehead really, I don't want to try to maintain any semblance of technical acumen in my role and take care of the folks that I support and know all the intricacies of labor-related laws and standards.


What makes HR immune to having 'boneheads' in any other company? If your risk is 'someone might be a bonehead' then having HR as a separate department doesn't help does it?


Possibly, but if you have a separation of duties with the manager and HR tasks, you’re more likely to have someone go to bat for you when one gets out of line.

When it all goes through a single person, there’s less of a balancing force there.

Both obviously can be abused.


Nothing but as a manager I wouldn’t want to deal with clearly HR related issues alone.

Form a managers perspective I see HR essentially as your legal counsel, if the company wants to delegate that to the legal department sure but they’ll be likely overpaying those who’ll be tasked with dealing with HR issues all day long.


You have a legal department to guide you on legal issues, from harassment to mandated leave.


So, they're manager and HR.


Ok, so now you understand what the article is saying. No HR department.


Snowplow has had a strong following around HNers ever since we launched back in 2012. The truth is though that Snowplow open-source is still a complex project to stand up, and a lot of technology teams still aren't aware of the power of having all their own behavioral data (events from webapps, mobile etc) in their own data warehouse or lake, and available in real-time streams (Kinesis and GC Pub/Sub).

With this background, I am delighted to be able to Show HN our new Try Snowplow experience. Under lockdown last year, the team decided to go back to basics and build a new version of Snowplow called Try Snowplow; that version is now in GA.

Try Snowplow is a small version of the Snowplow technology that can be setup quickly and easily - normally under 15 mins. Like all versions of Snowplow, it runs in your own cloud environment.

The landing page is here: https://snowplowanalytics.com/get-started-try-snowplow/

And if you want to learn more about Try Snowplow before giving it a spin, there's a video here: https://www.youtube.com/watch?v=Aw5hdIjwVhY&ab_channel=Snowp...

If you are still puzzled why you want your own behavioral data in your own warehouse, check out our library of use cases here: https://snowplowanalytics.com/use-cases/

Any questions just shout! Happy to answer anything Snowplow-related, and thanks again for Hacker News' support of Snowplow over the years.


Is there an easy way to deploy this to my own infrastructure, possibly via k8s or docker containers - no aws?


Hey dylz - not yet. Historically Snowplow has been engineered in a way that is idiomatic to each host cloud - in other words, it runs using the AWS / GCP services that a data platform team would use if they were building a behavioral data pipeline from scratch on that cloud.

This said, we are steadily working to refactor all our components to be more generic, and already we run a lot of the Snowplow components on k8s for our customers.


Agree, building this in 2021 is not a good use of data engineering time.

As well as the SaaS packages like Amplitude and Mixpanel, you also have great open-source tools and platforms for mobile and product analytics like PostHog (https://posthog.com/), Countly (https://count.ly/) and Snowplow (https://snowplowanalytics.com/).

Disclosure: Snowplow co-founder.


I regularly see a lot of GA analytics on HN.

How does your solution compare to Matomo ? I feel the interface is extremely dated and not intuitive but that just might be me.

Why is Matomo so rarely cited ?


cofounder of analytics company says it's not worth building an analytics solution.

I welcome the competition.


I chortled when I saw the health warning against railway-oriented programming. Reading that post deeply influenced (and continues to influence) the architecture of Snowplow (https://github.com/snowplow/snowplow) and I titled a chapter of Event Streams in Action 'Railway-Oriented Processing' (https://www.manning.com/books/event-streams-in-action). Thanks Scott.


It's not that simple, for example Waymo has external investors alongside Alphabet, https://blog.waymo.com/2020/03/waymo-raises-first-external-i...


Sure, it's a bit more complicated, but the point here is that Waymo remains an Alphabet subsidiary, external investors are investing under that understanding and with full knowledge of Alphabet’s control of Waymo (which is why the blog entry you link to links to the Alphabet 10-Q reporting the external investments and the resulting “noncontrolling interests” in the subsidiary.

A “spin-off” within a common corporate umbrella is a different thing done for different reasons than a corporate divorce kind of spin-off like IBM is doing.


The event producer in this case is one of our Snowplow trackers (https://github.com/snowplow/snowplow-javascript-tracker) - and these tracking SDKs do indeed attach a UUID as a unique event ID at event creation time.

However, this event ID is not enough to identify and then dedupe all types of duplicate events. This blog post provides more information:

https://snowplowanalytics.com/blog/2015/08/19/dealing-with-d...

Big thanks to pragmacoders for putting this tutorial together! It's awesome seeing what you are doing with the Snowplow platform :-)


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: