Hacker News new | past | comments | ask | show | jobs | submit login
How we raised $3M for an open source project (posthog.com)
158 points by illuminated on June 5, 2020 | hide | past | favorite | 60 comments



For those not familiar with Posthog, it's open an open source product analytics tool. That means you install some javascript on your website. It then collects information on what features people are using on your website and let's people run charts and graphs to ask questions about feature engagement and product usage.

Product analytics is a pretty large space. The three main existing product analytics tools are Mixpanel[0], Amplitude[1], and Heap[2] (I used to work at Heap). Notably, Mixpanel was valued at $865MM as of 2014 and Amplitude was valued at $1B as of two weeks ago.

Posthog is an open source competitor to these tools that you are able to self host. This is interesting for several reasons. If you don't feel comfortable sending data to third party tools, you control the data yourself. In addition, existing product analytics tools can get expensive. After you get off the public pricing, you will likely find yourself paying >$12k/year for one of these tools.

There are also challenges I see for them going forward. In my experience, the person who makes the decision to buy a product analytics tool isn't in engineering. It's usually a product manager or sometimes a marketer. I imagine the fact that it's open source is less interesting to one of the traditional product analytics purchasers. Compare this to a company like Gitlab where the end users are engineers.

Scaling these tools is also a huge challenge. My job at Heap was to basically scale Heap. If you want to self-host a product analytics tool, you will basically be responsible for scaling the system. On top of that, because each Posthog instance is isolated from each other, Posthog won't be able to take advantage of the shared compute power. Imagine a product analytics tool that has 100 customers and 100 servers. If the compute power was shared, each query would make use of all 100 servers. If the servers were isolated from each other, each query would only be able to make use of 1 server. That's basically a 100x slow down in performance. Posthog does sell a managed version which in theory shouldn't have this problem. It will be interesting to see long term whether that's a big driver of revenue for them.

[0] https://mixpanel.com/

[1] https://amplitude.com/

[2] https://heap.io/


(I'm the person that wrote the article)

A few thoughts:

As you say, the thing that's really different about us is exactly that we are focussed on engineers not PMs with our tool. PMs are of course welcome to use it too, but it's a little more technical feeling.

We felt the people building the thing should have the context of usage data and this shouldn't only sit in another team - which was a behavior we saw happening quite frequently when we did user interviews early on.

We started with the fundamentals of product analytics with some features we wanted ourselves but are now focussed on features that are much engineering specific. For example, we are about to release (an optional) "inspect element" for usage data: https://github.com/PostHog/posthog/issues/870, so you can pop that up whilst working on localhost.

We had no idea if that hypothesis was correct - that engineers would care. We did a launch HN to find out and got quite a lot of good feedback and growth.

Re scaling, you are spot on - this is definitely hard! We have users doing 5 million events/day on Heroku's cheapest standard tier dyno. We offer paid support for people who need something higher volume still if it doesn't work well out of the box. We're working on supporting databases other than Postgres as people need them.


> We felt the people building the thing should have the context of usage data and this shouldn't only sit in another team - which was a behavior we saw happening quite frequently when we did user interviews early on.

What is preventing the engineers from getting access to that data? Existing product analytics tools usually don't charge by seats, so if they wanted to look at the data, they should be able to get access to Mixpanel, Amplitude, etc.

Compare that to Posthog's pricing of $25/user/month which incentives you to minimize the number of people at your company that have access to Posthog.

> We had no idea if that hypothesis was correct - that engineers would care. We did a launch HN to find out and got quite a lot of good feedback and growth.

Heap also found a lot of success on HN when they launched: https://news.ycombinator.com/item?id=5424206. While there was a lot of excitement early on, the kind of crowd it attracted wasn't a good demographic to sell to long term.

> We have users doing 5 million events/day on Heroku's cheapest standard tier dyno.

The challenge isn't ingesting the data. Especially with an analytics tool where you can turn off synchronous_commit so ingesting an event doesn't require writing to disk. The bigger challenge is around processing queries that span years of data. Queries get proportionality slower the more data you've collected. A query over 12 months of data is going to be 4x slower than a query over 3 months.


> Heap also found a lot of success on HN when they launched: https://news.ycombinator.com/item?id=5424206. While there was a lot of excitement early on, the kind of crowd it attracted wasn't a good demographic to sell to long term.

I hope posthog's founders heed this crucial implication: the paying customer for analytics are analysts, not engineers.

Analysts want analytics that work. Engineers want analytics to go away. An analyst wants to augment their data analysis, in order to produce more conclusions or support more decisions for the decision makers they report to. An engineer wants to focus on building technically challenging, long-lived systems, not on churning out short-lived weapons in the justification-explanation wars waged between Data Science, Business Intelligence, Product Management and the C-Suite.

To use a market analogy, the engineer-as-a-customer will pay on cost, not value. Analytics are necessary but secondary to their day-to-day work: for most, it's just a one-line telemetry SDK logging call. If an engineer builds a feature doesn't lead to an uptick in a business metric, that engineer can shift to another org with little fuss. The analyst, on the other hand, is tied to proving the performance of their feature or subject-of-analysis. The longer an analyst has to wait for engineering to build a tool to analyze (sell) a feature, the riskier their position is in the company. Analysts like Product Managers or embedded Data Scientists or BI experts, will accordingly pay a value-based price for analytics. They need it for their jobs.

Posthog will likely work for companies where engineers are also analysts. But in larger companies where the roles are more clearly divided, I forsee posthog's current focus leading them astray.


> What is preventing the engineers from getting access to that data?

In addition, should they even be spending a lot time looking at it? There is a reason companies hire PMs and data analysts (which are cheaper than engineers). I'd totally encourage engineers to understand the business, but at a certain point, they should probably get back to engineering.


Curious, in the article, you mentioned - “we believe that open source will eat SaaS's lunch in many product categories” — what to you is the core filter for “SaaS vs Open Source” product market fit?


I think the closer your product is to something that developers can use the stronger the OS proposition. If there are other value props (privacy or cheaper at scale) that can definitely help though.


Analysis like this is why I love hackernews, thank you.


Just wanted to highlight that heap's engineering blog is one of the best I have ever come across. Love all articles by Kamal and Michael malis


I am Michael Malis :)


Stupid question: Can't you do all of this in Google Analytics? What are the advantages of Posthog/Mixpanel/Amplitude/Heap over GA?


Not a stupid question. These products tend to offer a lot more functionality and flexibility in terms of how you consume your data. Different analysis possibilities, fancy ML models, custom dashboards, etc.

Also more control over how the data is ingested (sampling or not, etc.)

These products feel more suited to teams building rich apps/webapps where as GA seems to have its roots in sites that are more "content-based" (news sites, etc.).

And for many, not sending the data to Google is an advantage in and of itself.


I'll also add that product analytics software is built more towards measuring depth of engagement: How many times has this user returned to our site? What is the average LTV of this cohort? What is cohort X's n-day retention compared to cohort Y? What features have correlation to those differences? Did this experiment lead to improvements in retention - 30 days later?

Google isn't built to go that deep. Sure you can see that 3% of users have used feature X, but you can't really effectively dig in and see how upstream or downstream actions and events influence each other. Sure you can create some custom segments on a sessions/user level - but that quickly turns complex and unwieldy if you have several segments, cohorts and funnels. Also there are a lot of charts that are just plain better in product analytics tools. Retention charts, funnels and path diagrams are obvious examples.


At my previous employer we were using GA and it's very flexible. We were able to track all DOM interactions by just configuring the included js. Although, querying that later with tools engineers can use and build is a whole different story.


There's two issues with GA. First the functionality is somewhat bare bones. It does a really good job out of the box, but it's pretty hard to use beyond that. Take the funnel feature. You need to define your funnels ahead of time before you can start analyzing them. Compare this to any of the other product analytics tool where as long as you were collecting the events up front, you can compute any funnel you want.

Second, once you cross the free tier, you will have to pay at least $150k/year. The free tier is 10 million hits per month[0], which is high enough that most businesses never hit it. Other analytics tools often come in at a better price point at that kind of scale.

[0] https://marketingplatform.google.com/about/analytics/terms/u...


>> By hosted, we mean charging customers for the hosted version of the open source product. The risk is that a cloud provider decides to compete with your hosted version.

An interesting approach is what cockroachdb did, which basically forbid this very specific use case, and is otherwise standard open-source.


> An interesting approach is what cockroachdb did, which basically forbid this very specific use case, and is otherwise standard open-source.

I have no qualms with what they’ve done. It’s their intellectual property and they can do as they see fit. But it’s not open source. There’s plenty of proprietary software that also provides the source code (generally without support).

Maybe we need to settle on a new term like “source available”. Or “source provided”. But it’s not open source.

To quote Arthur Dent, it’s almost, but not entirely unlike, closed sourced.


If you can host it on your own systems for free and edit the source code it’s more open than an open source software encumbered by patents.

Really I think we just need a new terms like unencumbered software (do anything), open license software (give credit and include source code), trapped software (beware patent minefield or infections nature), etc. Because frankly people toss around incompatible definitions of open source all the time.


At the risk of being cynical, I don't think it was common to toss around incompatible definitions of open source until very recently. There seems to have been a considerable push by a number of people to dilute the meaning of open source -- in my tin-foil-hat opinion with the intent of destroying the community behind it. Free and open source software has weathered these kinds of attacks many times before and it certainly looks very similar to the kinds of things I've seen before.

I think it's a matter of people with vested interests throwing spaghetti against the wall until something sticks. We've often seen cries of "But how will programmers get paid if everything is free?", and "You don't need to modify the code as long as you can see it", and "Free and open source software is incompatible with commercial endeavours". What's new now is, "If you go with a free or open source piece of software, you are playing into the hands of the likes of Google who will gobble you up". This approach has be demonstrably far more successful than others.

From the perspective of someone who values free (as in freedom) software, I'm wary of diluting the brand of both free software and open source software. I think it only really serves the interests of people who do not value free and open source software. Given that the OSI actually has a trademark for open source, I hope they enforce it. I'm worried they don't have the resources to do so, though.


> There seems to have been a considerable push by a number of people to dilute the meaning of open source -- in my tin-foil-hat opinion with the intent of destroying the community behind it.

A cynic always looks cleverer than someone who takes people at their word. I prefer to do the latter even if I look stupid.

Companies like CockroachDB and Sentry invest years of work and millions of dollars making useful products. A concern they have is that someone swoops in just as they find product market fit and hosts a version of their own. If AWS/Azure/GCP offer hosted CockroachDB, fewer people will pay Cockroach for their product. It’s as simple as that.

At the same time these companies also want to make sure that firms have the option of self hosting. If you’re too small to be able to afford it, self host. If you’re large and need to be on prem for regulatory reasons, self host. If you want to make a custom fork, go right ahead. If you want to upstream any changes, you can also do that. Customers have options! The only companies left out in the cold are AWS/Azure/GCP.

This seems like a reasonable compromise to me. I’d rather these companies stayed in business and continued innovating rather than being put out of business by the big boys.

When they say this is their reason for going with a different license, I believe them. I don’t come up with conspiracy theories that they’re maliciously trying to harm something I love. Even if I was prone to conspiracy theories, I’d at least come up with a mechanism for _how_ the movement would be harmed. I wouldn’t just fling a serious accusation at people who are working hard just for the sake of farming upvotes


When they say this is their reason for going with a different license, I believe them. I don’t come up with conspiracy theories that they’re maliciously trying to harm something I love. Even if I was prone to conspiracy theories, I’d at least come up with a mechanism for _how_ the movement would be harmed. I wouldn’t just fling a serious accusation at people who are working hard just for the sake of farming upvotes

Look, free software and open source have a certain tradeoff and definition. If you don't like it, don't use the license and don't call your products open or free software.


I specifically referred to Sentry and CockroachDB in my comment

> This means that CockroachDB core is no longer Open Source (according to OSI’s Open Source Definition), although the complete source code is still available, and any commercial usage is allowed with the one exception of building a DBaaS

https://www.cockroachlabs.com/blog/oss-relicensing-cockroach...

> Although we’ve come to refer to the BSL as eventually open-source since it converts to an OSI-approved license at the conversion date, due to the grant restriction, it is formally not an open-source license.

https://blog.sentry.io/2019/11/06/relicensing-sentry

Does that satisfy you?


> Given that the OSI actually has a trademark for open source

They don't. They have a trademark for "Open Source Initiative Approved License".

There are no legal ramifications to using the term -- I can state everything I do behind closed doors is open source, and I can only judged socially by my peers.


IMHO this "almost open-source" is even better because otherwise people wouldn't be willing to start companies around their open-source solutions knowing that AWS would just offer a hosted version and screw them over.


What I always wanted to know is what prevents Amazon to create a service 'inspired by' an open source project. Like...check the source, take ideas, write in 'your own words'?



Wow, shit.

So much wrong in this world that we can't predict and handle ahead of time.


In theory patents, in practice nothing.


Agreed. Open source has a specific definition. People shouldn't repurpose it to mean something it's not, however close it may be. I'm not against the business model of blocking cloud vendors, it's your code if you wrote it, do whatever you want, but don't call it open source, call it something else.


It offers basically everything that normal open source licenses do except that they have a very specific non-commercial clause regarding offering it as a paid service.

I think that calling them “source available” or “source provided” does them a disservice as it implies that you can't modify/distribute/use it.


The issue that I have with this license type is: what if there's some functionality that they have built that I would find useful, unrelated to their core business. Like, imagine that they have some tool to make generating UI components easier.

If I want to use a bit from their UI component generator in my commercial software, am I violating their license?

This is why real open source is so important to distinguish -- I don't want to have to worry about this sort of thing, let alone bring in lawyers to help me decide.

Don't get me wrong, or take this as ungrateful. I'm still glad it exists, and am really happy that this works as their business model. I think it is valuable to use as a learning tool (how did they go about solving x problem?), as well as valuable for use in other Open Source software. I hope more companies are able to use licenses like this in the future, to be totally honest. I just want it to be distinguished from true Open Source software.


> If I want to use a bit from their UI component generator in my commercial software, am I violating their license?

It depends on the license. Based on what I have read about the cockroachdb, SSPL (mongo) it seems that you can do that (but I am not sure about it, someone more knowledgable might be able to verify it).

> This is why real open source is so important to distinguish

Consider another case then. You are making a proprietary software that you might or might not be planning to sell and you get the ui generator from an open source software. Are you free to do that? Truth is that just like above the answer is "it depends on the license". GPL will not let you do it while MIT will. LGPL will let you do it if you make the ui generator a library that you dynamically link with and publish your changes to it, while MPL... I have no idea, I think it works on a per-file bases.


These new licenses can be called non compete licenses. You can do everything except compete by offering a SaaS to other companies.


Ha, that's a great term for it


That is why then term open source is missing the point. It is about users freedom.


First of all congratulations on this. It is nice to see Open Source being on the table for VCs. I am not sure how it pans out from a business point of view though.

If you truly want a product built with the community then you might lose some control - or give control to the community, at least voting. Unless you want a* "source available" type of thing. Open Source is ideology and tool.

The less control might mean that you prioritize what is good for the community, not necessarily for your VCs. I started working full time on an open source product, which like PostHog, I feel is going to be core infrastructure to any business that runs on data/software.

I want to build a sustainable business around it. I have enough money in bank to sustain product building and marketing with my cheap living costs and get traction. Once there, I will have to think of next steps, but I do not feel I want to get into accelerated growth that comes with VC money. Mentors are highly appreciated though.

*Edited typos.


>> "Going for VC means you are committing to an exit or a failure - you can't really change your mind later that you want to take things more steadily"

Exactly. And that is a huge reason NOT to use VC-funded opensource.


Not exactly, you can still back out by not raising future rounds, or buying back their shares, as Gumroad did.


It's too late. The intent has been signaled. PostHog is leveraging the goodwill of the "opensource" meme for personal profit-seeking. Their intention is literally to sell this analytics product to a Google/Amazon/Microsoft.

If you're going to raise VC with the intent to sell, why even bother going opensource other than to manipulate a percentage of developers who will convert on the basis of "opensource = good".


What's wrong with being open source but also for profit? I'd rather they be open source, make money and be sustainable than be closed source. At least this way I know I can continue using it if any unwanted changes are made, perhaps via someone forking it, and also continue to self host it.


Early-stage VC-funded isn't the opposite of sustainable, but it's close.

The intent is (generally) to overshoot by mythical-man-month'ing for ~1yr, and on failure, firesale/acquihire to a likely inherently sociopathic bigco, vs keep going. Ex: When hiring folks, you're pitching a fat VC-dependent salary and promise of IPO-or-sale, and as that pressure increases & expectation solidifies, if revenue doesn't grow artificial growth levels (2-4X YoY for 3-5 years straight), you're now facing layoffs + non-giant compensation. Morale plummets, senior staff disappears, etc. Founders will feel the same. There are additional inherent reasons why this is the default, such as a now-broken cap table.

I'm for it and wish them luck, but it's asking for an internal + external disaster to delude yourself, team, and users relying on you about such a fundamental assumption.

(It's a very different story if there's already a popular federated OSS project not reliant on the founders+VCs lucking into reliable recurring revenue, and hey, the startup may be the lucky 1%.)


Sure, it might be better to bootstrap their product then rather than taking VC (which I know is the point of the article but I digress). Actual revenue will validate whether their product can work or not, and VC can definitely obscure this validation.


Sounds like I was unclear.

VC-funded is a structurally and historically bad path to sustainability.

Picking an unsustainable path is OK! People can now have a high salary while working at a risky business operations venture! It's easier to hire better people! I do deep tech, and VC-funded is almost necessary there. VC requires the product to not be good, but so crazy good that you can raise $3M -> $10M -> $20M over 4-5yrs... or close shop. Which is OK!

My caution is for founders to setup up employees & users with misleading expectations as you're getting out the gate: Messaging sustainability when you just lit a timebomb. This particular topic can lead to all sorts of heartache for your team & users as reality hits, and sets up a bad cultural precedent for future ones.


regarding this "you want to seem like you're having your * together":

My experience is this is bad advise. There are people who care a lot about this appearance stuff and people with skills, in any role, even as pure investors. You don't want to work with the appearance people because even the you-can-trust-them part often is just appearance. And the people with actual skills don't give a shit if you sit there with a 4k camera and a straight shirt or in your undergarments and a 3-day beard. They care about if you can deliver something useful, if you can stick to your goals+principles in tough times, and if your goals align with them well enough that they can believe you stay on board until they made their profit.

So not being well prepared for the video call part is actually not a minus point. It might even be a plus.

Kudos for the fundraiser. Keep posting, you guys are interesting!


I hope that they can put some of this funding to work by hiring a copywriter, as it's impossible to figure out what this software does from reading the homepage.


It says "Open source product analytics" front and center, and then it goes on to elaborate. I grant you that there are features in the screenshot which are not clearly described in the homepage, but it's hyperbole to say it's impossible to figure out what this does from reading the homepage.


It's not hyperbole in the slightest.

"Understand your users." So like, translation? People with accessibility issues?

"Build a better product." But you don't know what my product is, and I still don't know if you can help me achieve that goal because a service for everyone is a service for nobody.

"Join 1000 companies" Is 1000 a lot? It is if they are fortune 5000 companies, but not a lot if it's Shopify dropshippers or people selling Amway. If you name names, I can quickly decide if I belong in this group.

"Try hosted or self-managed" I still don't know what you're trying to pitch me, so how can I possibly know what to click on?

"Open source product analytics" - wait, are you an open source analytics tools, or an analytics tool for my open source product?

"Understand users and events" So it can tell me how my churn will be impacted by COVID-19 and who is in need of special attention during the Black Lives Matter protests?

I could go on and on, and no - I'm not being difficult or contrarian... they have one opportunity to convince me that I'm in the right place and they have what I need. They don't make any kind of attempt to compare or contrast what they are offering to current products I'm likely already using or existing behaviours that I'll be able to change.

The whole thing basically screams "if you're here, you already know what this is". That's a missed opportunity and it's what inspired me to post.


There is some value in brevity. The 'homepage' typically serves that purpose.


What's not to understand on that page? IMO there's no point overcomplicating/oversimplifying or otherwise modifying your message to try and reach people who aren't the target audience anyways.


The only people who could understand already know what it is. That's a problem if they'd like more customers.

"Understand your users." So like, translation? People with accessibility issues?

"Build a better product." But you don't know what my product is, and I still don't know if you can help me achieve that goal because a service for everyone is a service for nobody.

"Join 1000 companies" Is 1000 a lot? It is if they are fortune 5000 companies, but not a lot if it's Shopify dropshippers or people selling Amway. If you name names, I can quickly decide if I belong in this group.

"Try hosted or self-managed" I still don't know what you're trying to pitch me, so how can I possibly know what to click on?

"Open source product analytics" - wait, are you an open source analytics tools, or an analytics tool for my open source product?

"Understand users and events" So it can tell me how my churn will be impacted by COVID-19 and who is in need of special attention during the Black Lives Matter protests?

I could go on and on, and no - I'm not being difficult or contrarian... they have one opportunity to convince me that I'm in the right place and they have what I need. They don't make any kind of attempt to compare or contrast what they are offering to current products I'm likely already using or existing behaviours that I'll be able to change.

The whole thing basically screams "if you're here, you already know what this is". That's a missed opportunity and it's what inspired me to post.


Really great to see they've raised money.

I wished the founders best of luck and have been watching from the sidelines since their launch thread here, super solid product.


Others covered a lot of aspects already,so I'll just add this: We are seeing the same story again and again.A few Europeans go to the US,pitch, and get funded. I can't imagine raising $3M+ on this side of the pond after only a few months of starting and without having some very good connections.


Raising VC funds outside of greater Bay Area is not only harder, but generally per equity traded, you get less in return. Generally, these cluster investors cluster around an asset-type, industry, jurisdiction, and proximity.

Bay Area startups raised roughly $46 billion in 2019, compared to Europe in sum raising $36 billion; if you’re raising more than $10 million, not attempting to raise funds in the Bay Area is likely poor choice.


Congrats guys. It is always great to see other players bringing open source approach to product analytics. This will only bring more competition hence better products. (PS: I am one of the cofounders of Countly).


Been following PostHog along for a while and am super glad to see this! Congratulations and excited to see what happens in the future. Love this model!


Open source > SaaS is a hot take.


You can have an open source SaaS, they're not mutually exclusive.


Well it complicates the business model a bit. I am working full time on an open source data/content management product. I know my reasons to do it this way, but I also want to build a sustainable business around it. That is something I need to figure out.


Congrats guys. Great progress in a very short time!!

We @RudderStack also raised a round recently. Glad to see investors putting money in Open-Source projects.


Congrats




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: