>Well, the first problem I had, in order to do something like that, was to find an archive with Hacker News comments. Luckily there was one with apparently everything posted on HN from the start to 2023, for a huge 10GB of total data.
This is actually super easy. The data is available in BigQuery.[0] It's up to date, too. I tried the following query, and the latest comment was from yesterday.
SELECT
id,
text,
`by` AS username,
FORMAT_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ', TIMESTAMP_SECONDS(time)) AS timestamp
FROM
`bigquery-public-data.hacker_news.full`
WHERE
type = 'comment'
AND EXTRACT(YEAR FROM TIMESTAMP_SECONDS(time)) = 2025
ORDER BY
time DESC
LIMIT
100
Congrats on launching this! The design looks really slick.
This might be tough to hear, but I think you fell into a common indie founder trap. It looks like you've let engineering get way ahead of market validation. You have all these features and different plans, but it sounds like they're guesses rather than things that paying customers have asked for.
I did the same thing when I was starting out. I created an API as a service because it was something I wanted, and there weren't good solutions available.[0] But then I spent a few months building it and realized I didn't have a plan for how to find customers. I thought they'd just find me and recognize me as better than the competition, but they didn't.
If I were in your position, I'd think about who would be a good customer for your service. Is there a way to find someone who's doing this now but is paying too much / getting poor results? Can you reach out to them? Even if you have to reach out to people one-by-one at first,[1] that's fine and gives you valuable feedback.
Also, my more opinionated feedback about your product (take with a grain of salt because my feedback is worth way less than someone who's a real prospective customer):
- If I try the example curl command on your landing page, I get HTTP 403
- The landing page could do a better job of showing the result of the curl command or make it easy for user to just push a button and get a result in the browser (for an allowlist of example domains)
- "Powered by AWS Serverless" is meaningless to customers and shouldn't be something you advertise, much less the first thing
- "99% uptime" also doesn't sound very good. Customers for this product likely aren't comparing vendors based on uptime.
- "Offering 30+ Customizable PDF Options" seems like a meaningless claim
I think the product page would benefit from you talking to more customers and including language that they use and lets them know you're solving the problem that they're trying to solve.
Also, I'd get rid of the free tier. There's mixed opinion on this, but I think at this stage, you're better off not having freeloaders. You should only be paying attention to customers who want to pay.
I'd even get rid of anything below $30/mo unless you live in a place where USD is very high relative to your local currency, as it's hard to make a business work if customers are paying you so little. At $9/mo, you can't really run ads or hire people to help you because you're making so little per customer. I'd do something like $50/mo, $100/mo, and Enterprise (contact us).
A free trial is fine, but they should be entering their credit card upfront to show some interest in buying. And then if they sign up at that stage, I'd reach out to each signup and say something like, "Hi! I'm the founder, and I'm a real person typing this message to you. Thanks for trying it out! I'd love to understand how to make the product work better for you, so I'd love to hop on a call, or you can just let me know what you'd like it to do."
I don't think the use cases you're describing are what any critics are talking about.
How do you feel about someone with more funding than you going to an LLM and saying, "Reimplement the entire Overte source for me, but change it superficially so that Overte has a hard time suing me for stealing their IP?"
>We have an Apache licensed project. You absolutely can use it for anything you want, including AI analysis. I don't appreciate third parties deciding on their own what can be done with code that isn't theirs.
That's not what the Apache license says.
According to the Apache license, derivative work needs to provide attribution and a copyright notice, which no major LLMs do.
>Let’s say you have a team that owns a library that is used by ten other teams. The library team has their own goals, their own deadlines, and their own priorities. They are focused on what they have to do to accomplish those. If they are allowed to ship breaking changes to their customers every day and they experience no consequences themselves for doing so, then they will ship breaking changes every day, because that’s the most efficient and effective way to accomplish their goals.
This is an interesting idea. It feels obvious, but I've never considered this.
I use so much software where I feel like the developers must never use the software at all, as workflows that would be obvious to any user are buggy or convoluted.
>If you are a library team and you want to make a breaking change, you have to refactor the codebases of all your customers in order to adopt the new, breaking change. If you are a cybersecurity team and you want a new control to be implemented on all the codebases at the company, you have to install it yourself in all those codebases and make sure it works.
I think this is an interesting idea, but I'm skeptical that it's the full solution.
It seems hard to put in practice outside of very tiny and very huge orgs.
The author led the Code Health team at Google when I was there, so I saw this working because Google invested huge amount into tooling to make this work. But Google still had a lot of problems with misaligned incentives between libraries and clients, and there was still a lot of pain around breaking changes.
In contrast, I worked at Microsoft for 3 years, and never really worried about my dependencies breaking me, but there I also had to deal with way more legacy code and fear of changing anything because there were no tools for large-scale changes and the automated testing at the time was very coarse-grained.
>I think in the future, websites will learn to serve pure markdown to these bots instead of blocking. That way, websites prevent bandwidth overages like in the article, while still informing LLMs about the services their website provides.
Why?
There's no value to the website for a bot scraping all of their content and then reselling it with no credit or payment to the original author.
Unless you're selling something. If you have articles praising your product/service/person and "comparison" articles of the "top 10 X 2025" (your offering happens to be number one) you want the bots to find you.
The LLM SEO game has only just begun. Things will only go downwards from here
OP in this case is by no means the original author. In this linked post, they mentioned they scrape third parties themselves. OP's bots might not be as sophisticated, but they're still "borrowing" others' content the same way.
ChatGPT and others have some sort of attribution, where they link to the original webpage. How or when they decide to attribute is unclear. But websites are starting to pay attention to GEO (generative engine optimization) so that their brand isn’t entirely ignored by ChatGPT and others.
I do agree that LLM-as-a-search is going to likely become more and more prevalent as inference gets cheaper and faster, and people don't too much care about 'minor' hallucinations.
What I don't see however is any way this new way of searching will give back. There is some handwaving argument about links, however the entire value prop of an llm is you DON'T need to go to the source content.
>I tried to explain how review workflows work in a PR-based setup, and gave concrete suggestions for how we could improve the process. But they didn’t want to try it. That might sound like a small thing, but at that stage, all I wanted was a smooth and efficient collaboration process. The easier it was for me to track changes, the better.
I'm surprised the copy editor was more comfortable using git than using a web-based review tool to leave comments, especially given that she was reviewing a Go book and didn't seem to know what Go was.
How does that even happen? It seems bizarre that Manning had this copy editor at all.
I recently had a negative experience with Manning. I sent them an email saying that I'm in the process of writing a book, and I'm self-publishing it, but I was curious about the possibility of applying to Manning for a second edition. I asked whether they accept second editions after a self-published first edition and what document formats they accept.
I got back a form letter telling me that they'd rejected my proposal. When I followed up and said I didn't send a proposal but was asking preliminary questions, they told me that they understood I hadn't sent a proposal, but they were going off of the table of contents on my book's website. I guess they decided to pre-emptively reject me?
They also only said Google Docs as a document format, but based on this blog post, clearly they accept AsciiDoc.
I must compliment your ability to keep the reader hooked, I had to see what chapters they saw, stalked your website and ended up reading the whole post about your pre sale.
This is pretty off topic but did you test how your book works on an E-Reader? I check a sample chapter and there where a lot of pictures and colors to distinguish information, this will probably not work very well on my Kindle.
The first few chapters, I've been primarily targeting web and not testing on e-readers. I figured that until I knew whether people actually wanted to read it, I should just focus on making the web excerpts look decent and try to avoid over-optimizing for web.
Now that the book is officially a go, the PDF version is a first-class citizen, and I'll be testing e-reader experience on my rm2.
That kinda blew my mind too, I'd expect the complaint to be about needing to use some online review tool. Editing the asciidoc source directly sounds archaic, and I was expecting the authoring / editing world to have had 'code review' style software years before the software development world did.
I mean all mainstream word processing applications have a 'commentary' / 'review' mode where someone can leave comments and suggest edits.
I think this is a fun thing for TigerBeetle to do, but I'm pretty skeptical that this was a good engineering decision.
And it's fine to make non-optimal engineering decisions for fun, but the top reason in the article should be, "Because we thought it would be fun to code a docs site from scratch."
This post reminds me a lot of an article I read on HN about a year ago and can't find now, but the author was explaining how so many organizations end up investing humongous amounts of effort rolling their own databases from scratch because none of the off-the-shelf solutions meet all their requirements. But in most of these cases, it's because some of the "requirements" were actually "nice-to-haves" and they could have gotten by fine with an off-the-shelf database, but they talked themselves into building one from scratch.
A lot of the justifications here feel pretty weak:
- Didn't want to use a complicated React app? Use Hugo or Pelican or Eleventy.
- Wanted nice reading experience? Replace the default CSS in any SSG.
- Want a nice search experience? Theirs looks good, but is probably also achievable in off-the-shelf SSGs and is probably not worth rolling their own docs site from scratch.
>We employed a Content Security Policy to prevent Cross Site Scripting (XSS) as defense-in-depth, in case a seemingly friendly PR contains some innocent looking MathML. This MathML could contain obfuscated code that would run in the user’s browser. CSP prevents any unwanted inline scripts from running and keeps our users safer.
This was the silliest reason of all. Who's XSS'ing a docs site through obfuscated markup contributions? That sounds pretty difficult to achieve in the first place, and then what's the reward for achieving XSS on TigerBeetle's docs site? There's no valuable data there. At worst, you'd mine tiny amounts of crypto in a serviceworker. But also, you can mitigate this risk in lots of ways that don't involve rolling your own docs site.
We didn't design our docs because it was "a fun thing" (as suggested) but rather because we simply care deeply about the experience of developers reading our docs. For example, concerning performance and offline use, which were further reasons we gave in the post.
We have a high bar for taking on dependencies. We don't take on dependencies automatically without justification. It's just not a philosophy that we share, to assume or to insist that everything needs to be a dependency.
(The discussion on CSP in our post was also not given as motivation, but as an example of the thought process that went into this. Again, as per the post, it's a matter of defense-in-depth. We have plans for how our docs will be used in future, that you may not be aware of, and security is important.)
Finally, we're happy with the result, the project was small and didn't take long. We're used to "painting" things like this fairly quickly. It's just easier for us than trying to "sculpt" off the shelf dependencies. That's not to suggest that everyone needs to paint like we do at TigerBeetle, but it's equally true that not everyone needs to sculpt either. [1]
For context, I like TigerBeetle, and I respect the team. I'm not trying to take cheap shots but rather to disagree respectfully.
>We didn't design our docs because it was "a fun thing" (as suggested) but rather because we simply care deeply about the experience of developers reading our docs. For example, concerning performance and offline use, which were further reasons we gave in the post.
To me, this still sounds like "for fun."
The blog post just talks about performance and offline use, but "maximize performance" isn't a real goal. You can invest ininite hours improving performance, so it comes down to how many engineering hours you're willing to trade in exchange for improving some set of performance metrics.
Maybe the issue is that the blog post doesn't present the decision making process well? Because the critical questions I don't see addressed are:
- What were the performance metrics that were critical to achieve?
- What alternative solutions were considered beyond Docusaurus?
- How do the alternatives perform on the critical metrics?
- How does the home-rolled solution perform on TigerBeetle's critical metrics?
In the absence of those considerations, it feels like the dominant factor is that it's more pleasant to work with greenfield, home-baked code than off-the-shelf code, even if the existing code could achieve the same thing in fewer engineering hours.
And to be fair, we did present the metrics (footprint etc.), and we did discuss alternatives to Docusaurus (e.g. Zine, which is pretty great!).
I think at the heart of your argument is this assumption that unquestionably taking on dependencies would achieve the same quality in less time, and that a methodology such as TigerStyle that challenges this assumption need necessarily take "infinite time". You almost force us to apologize that we don't share this view! :)
But again, this was the quickest, highest quality path (for us, at least!).
Have you read TigerStyle, our engineering methodology? And have you watched our talk? Perhaps that will help close the gap in understanding how we think about engineering at TigerBeetle: not as an expense to be minimized, to minimize only our own development time, but as an asset, to be invested in, since we build it once, but developers enjoy it many times over. However, as you watch TigerStyle, you'll see it's not only about quality, but also a way to get quality in less time (go slow to go fast).
In other words, I think we differ when it comes to Total Cost of Ownership. We're not trying to minimize only our own development time, but investing in it, to produce something quality for our community, and so minimize the Total Cost of Ownership across the relationship as a whole (ours + community) [1].
To evaluate this as your are describing, you must reveal your estimate of the workload of what Tiger Beetle has done to roll their own docs. If it took them 5 minutes, for instance, the calculus is far different than if it took 5 years. Plus you must compare that time estimate to their other priorities to estimate the opportunity cost, something that you simply can not do accurately from the outside looking in.
And we must estimate the potential future value of what Tiger Beetle has done here. I value "no dependencies" pretty deeply and I can see how Tiger Beetle values it supremely. I don't see how you can hand waive it away so easily.
To assert that you don't believe Tiger Beetle at their word here is deeply disrespectful imo.
>To evaluate this as your are describing, you must reveal your estimate of the workload of what Tiger Beetle has done to roll their own docs. If it took them 5 minutes, for instance, the calculus is far different than if it took 5 years. Plus you must compare that time estimate to their other priorities to estimate the opportunity cost, something that you simply can not do accurately from the outside looking in.
I don't need them to reveal their numbers to me to offer my critique, as I think few people would argue that the upfront cost of rolling your own docs site could possibly be lower than the cost of deploying an off-the-shelf solution like Hugo.
I think where reasonable people might disagree is about the total cost of ownership of Hugo vs. the home-rolled solution over five years, but I'd find it surprising if home-rolled solution wins.
>To assert that you don't believe Tiger Beetle at their word here is deeply disrespectful imo.
Where did I say that I doubt TigerBeetle's claims? I disagree with the justifications in the blog post, but it's a difference of opinion, not a question of facts.
They published this blog post, and this is HN, so I think it's well within the community standards to offer a respectful critique.
> I think few people would argue that the upfront cost of rolling your own docs site could possibly be lower than the cost of deploying an off-the-shelf solution like Hugo.
I'm not convinced. At some point, you will have to debug something weird in your docs system.
If you deploy Hugo, that means understanding Go. Docusaurus--Javascript, Node, and that entire ecosystem. With this, it's Zig all the way down.
Zig users tend to be (possibly notoriously) anti-dependency.
For a docs site with no special requirements, I'd be surprised if Hugo or another SSG can't do what they need out of the box. So, it's the cost of implementing your own SSG vs. the cost of figuring out how to use an existing one.
Also, just as a datapoint, I've been using Hugo on multiple sites for about five years, and I don't recall ever having to drop into Go to fix an issue. Hugo might be unique in this regard, as it ships as a single-file binary. You have to learn Go templates, but you don't have to learn anything about Go the language or standard library.
Before Go, I used Jekyll, and I don't recall ever having to learn Ruby, but I did have to work within the Ruby ecosystem because Jekyll required a Ruby environment.
Incidentally, TigerBeetle seems to have rolled their own rudimentary templating language, too.[0] I think that has potential to either limit the functionality they need or cause a lot of bugs.
> Zig users tend to be (possibly notoriously) anti-dependency.
I don't get anti-dependency, to be honest. Like say you want REGEX support in your database. You code a REGEX parser from scratch? What are the odds your parser doesn't have a vulnerability?
I think over zealous dependency usage is also bad, but it cuts both ways
>To be honest, the hard part of static site generation is parsing the Markdown, since Markdown is a complex language. Everything around it is simple scripting, which we can easily do ourselves.
I would think 95%+ of the work would be in pandoc if everything was from scratch. And they would have used Zine if it had supported the feature they want.
For larger project, mostly DB selection I completely agree with what you said. But for SSR, especially when there are other similar OSS like Zine available, I think they are fine.
Although I do wish if Zine had all the improvement Tigerbeetle wanted so at least the Zig community could all use one rather than roll their own.
I think this is a fun thing for TigerBeetle to do, but I'm pretty skeptical that this was a good engineering decision.
Ha, yeah, as an 8 year software engineering manager I'll agree that "fun" is not a good initial look for a new project, sadly — the best engineering decisions are boring far more often then not.
After years of insisting on picking boring options, I realized working like that was a buzz kill long term for my reports, I tried to relax and figure out how to have fun projects too. Give people with ideas space to run. My deal now is, the tighter the blast radius of the project you can give me, the more I'm ok with you going nuts.
Documentation is a great place for fun, low-blast-radius projects, so I totally get TB on this one!
Some other rules I give up front for project proposals. Hopefully the theme of blast radius control is charmingly obvious:
- No new languages. (I have had professional arguments over this)
- No fun projects that require ongoing labor/upkeep.
- No fun projects in stateful storage infrastructure. (I have had distressingly passionate professional arguments over this)
- No fun projects that involve new SaaS / hosting providers that can't be trivially cut loose or cost > $50-100/mo.
- Fun projects in generally persistent infrastructure need solid justification.
- Fun design system / UI infrastructure projects must be able to be gracefully incrementally adopted, or scoped tightly.
> how so many organizations end up investing humongous amounts of effort rolling their own databases from scratch because none of the off-the-shelf solutions meet all their requirements. But in most of these cases, it's because some of the "requirements" were actually "nice-to-haves" and they could have gotten by fine with an off-the-shelf database, but they talked themselves into building one from scratch.
I love the term "arbitrary uniqueness" for this too. Like how different are your needs, really?
This is one of the best articles about running a bootstrapped business that I've ever read.
These are all great tips that obviously come from years of hard work and introspection.
> When I started, I integrated with standard SaaS product analytics software that most big SaaS products use. They tend to have features like session recording, where you can see exactly where their mouse moves in your product, and funnel tracking for working out how many users make it the whole way through from landing page to using your product.
I had the same experience. When I started out, I'd see people talk about complicated views in their analytics with cohort analysis and A/B testing. I'd think those people were succeeding because of their analytics, so I kept trying to build complicated views in Google Analytics or investigate expensive alternative analytics platforms. And I eventually just landed on going even simpler than Google Analytics and not checking it unless I had a specific question I wanted to answer.
> People will suggest you should build particular features to improve your product. They'll never use those features.
I've experienced this as well. Early on, prospective customers would tell me that they'd definitely buy if I had X feature, and I'd spend a week implementing it, and then the customer would disappear or say they couldn't purchase for some other reason.
> When a user signs up for OnlineOrNot, I have an automated email going out asking what brought them to sign up today. I explicitly tell them I read and reply to every email. This is the main source of my insight for building product.
I like this a lot. The main competitive advantage indie founders have is a personal touch and direct access to the founder.
I think too many indie founders over-automate and over-outsource their customer interactions. It always drives me crazy when I use a product from an indie founder, and I reach out with feedback and the response is just a generic, outsourced customer service rep who says, "Thank you for your feedback. I'll pass it along to the team."
> Tracking your MRR is a crap way to measure how you're doing as a business... Find another success metric to figure out if people are actually using your product, and whether it's bringing them value. Things like number of images generated, or number of form completions, for example.
I agree, but I'll add the caveat that the other metric should be as proximate to revenue as you can get.
Early on, I made the mistake of measuring success based on things like social media followers or SEO rank, even though those things didn't directly translate into revenue. I felt like I was succeeding, but I eventually realized I was pursuing metrics that were too loosely related to revenue.
Also worth noting that I spent about 5 years trying other projects (from writing books, to coding addons and building SaaS products) before landing on this one.
I think SaaS might be one of the hardest ways to make money on the Internet, but I'm patient.
This is actually super easy. The data is available in BigQuery.[0] It's up to date, too. I tried the following query, and the latest comment was from yesterday.
https://console.cloud.google.com/bigquery?ws=!1m5!1m4!4m3!1s...reply