Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's a build vs. buy decision that you got wrong?
326 points by kiernanmcgowan on Dec 28, 2022 | hide | past | favorite | 272 comments
What are services/products that you built and wished you had bought?

Have you bought something that you had to scrap and build yourself anyways?




> Have you bought something that you had to scrap and build yourself anyways?

Yes, but that hasn't meant it was a mistake -- 'buy-then-build' can be a great strategy. Often the 'then-build' never happens, but going into a decision with the mindset readiness for 'then-build', you can learn from existing products, hit their limits and understand what is the custom version of it you'll need in your context. Recent examples are on smaller scale, though - using a library that speeds work up early, hitting its limits, and replacing or extending with DIY that does less things but goes deeper for my use case.

> What are services/products that you built and wished you had bought?

The most annoying recurring version of this has been being just a little too early - building something, then discovering a few months or years later a public product that does the same, but better. At that stage, rebuilding to use has low ROI, and one ends up maintaining a legacy monster. There was a period when public offering of supporting backend infra was maturing, ie things like secrets/configuration management, logging, observability, monitoring, a/b tests, a bit earlier even basic web frameworks (ie building anything on PHP before Laravel came out meant you built your framework first; iirc worse than the frontend framework landscape in 2022).


I 100% agree with your second point. Case in point, my previous employer built an in house development platform to create ephemeral environments and run deployment pipelines for production. At the time, around 2016, there weren't many (or any? I wasn't part of the research time) options to buy this type of software. Today, there are more than a handful in this space.

I wonder at what point companies would finally bite the bullet and swap to buying? Maybe cataloging the feature set of the internal software, seeing what is a necessity, and seeing which vendors can cover those requirements?


> Today, there are more than a handful in this space.

Really? I was a DevOps engineer a few years ago and can't really imagine a useful tool in that category so I'd appreciate a few names I could check out. While I've switched to frontend development, I'm still interested in CI/CD and just can't imagine any tooling being able to make it easier then what we had back then too (ephemeral envs using docker-swarm/k8s and pipeline triggers, basically)


https://releasehub.com/ (Disclaimer, first listed because I work here)

https://www.quali.com/

https://www.withcoherence.com/

https://webapp.io/

https://www.uffizzi.com/

https://www.qovery.com/

https://www.devzero.io/

These exist in the B2B space (and isn't an exhaustive list), there are a ton more in the B2C space


Vercel, netlify, aws amplify, google app engine come to mind. Not sure if they’re strictly better than the setup you describe, but easier to get started.


… interesting.

So we went the "build-then-buy" route with Netlify. AFAICT in hindsight, what we built was more stable than Netlify, and had the same interface for devs, basically. (… and might have been operationally cheaper.)

(It's all moved to an -and-buy-another-different-thing-for-no-real-reason-other-than-different-people, now.)


Infrastructure automation tools like Pulumi or AWS CDK make this relatively straightforward... just deploy a new stack (or use the APIs to automate a deploy/test/destroy cycle.)

It's even possible with good old Terraform, used correctly.


Time travel, time travel, wherefore art thou time travel.

When it comes to build versus buy, there are only right answers with hind-sight.

>> Maybe cataloging the feature set of the internal software, seeing what is a necessity, and seeing which vendors can cover those requirements?

Feature set is part of it. Longevity is another. (been relyinb on Google Reader lately?)

Cost us part of it (internal is usually more expensive, but not always.)

Flexibility is another - building up the knowledge and expertise helps morph the system to your needs, not the other way around.

I'd say it's mostly better to buy, you'll likely get much more for much less, but it comes at a cost beyond just money. Beware of things on your critical path, to which you are beholden to outside "partners".

All that said, I think developers err on the build side way more than they should.


Can you list some of the solutions that are out there now? I'm considering using one of them so I can have preview environments!


I replied to a sibling comment with a list of a few. https://news.ycombinator.com/item?id=34166459


Workflow engines. Every company develops one base off of queues or some form of async messaging. Works great when prototyping and your initial customer base. Works less great as you grow, add more complicated features, and realize you didn't have the distributed systems expertise to write this thing to begin with. It doesn't handle any of the common edge cases, and is increasingly painful to operate, needing constant babysitting.

Use Temporal, StepFunctions, something and try to avoid this urge.


"Workflow engines" and their close cousins "DSLs" are the ultimate newbie trap. In theory, it's awesome for everyone: programmers get to work on interesting, abstract problems like distributed systems, syntax parsing, event-based architectures, and "business users" get to make changes to "business rules" without bothering developers or impacting feature roadmaps. Win-win, right?

In reality you just end up making a shitty, nerfed version of a programming language, that business users can't understand, because you still have to understand conditional logic to model workflows, oh and your documentation is terrible because devs don't bother with the boring stuff. Most of the time the devs end up implementing the workflows anyway because they don't actually work properly.

If you really need a workflow engine definitely use something off-the-shelf, but I would go so far as to say that in the 95% case, you don't even need a workflow engine: you need a developer who is capable of writing some python scripts. Even if you pay a developer a full salary to do nothing but sit around and make changes to Python scripts on-demand, that's still going to be way cheaper than the complicated workflow engine solution, which will probably require a team (or multiple teams) to maintain.


This reminded me of the time I wrote a mini-language for querying web server log files and generating reports (this was early 2000s) so you could answer questions like "how many users searched for 'cyprus flight' per day and went on to purchase a holiday?". With a nice web-based interface etc.

It was a disaster of course. The business people weren't programmers, had no understanding of programming, didn't want to learn programming even greatly simplified with docs and examples, and I ended up translating their queries into code anyway. So essentially I had written a DSL for myself.


With some tweaks to your second paragraph, the description fits CMake:

In reality you just end up [with] a shitty, nerfed version of a programming language, that [...] users can't understand, because you still have to understand [CMake's] logic to [utilize its magic behavior], oh and [the] documentation is terrible because devs don't bother with the boring stuff. Most of the time [you] end up implementing [workarounds] anyway because [CMake doesn't] actually work properly.


While Cmake has a large share of idiosyncrasies, a build system is much more graph oriented and must therefor at some point have a way to declare such dependencies, often using a DSL, for concurrency and incrementals to be reliable. Lets not forget IDEs that need to understand the project structure as well.

Workflows on the other hand are more conditional and imperative, mapping better to normal programming languages, with the exceptions of transient error handling, long timers and distributing workload. Here, writing a custom DSL would be a much bigger mismatch.

Best for build configuration would be a hybrid, where the declaration of dependencies is done with normal programming language, which at the end calls a build(dep_tree)-function where all the magic happens. With the risk of developers abusing this setup-step and run half the build in their own reinvented imperative flow here instead... trust me, I've seen this happen even in makefiles that run shell commands outside of targets. This is what Scons tries to be, it seems however not to be very popular compared to cmake.


Isn't Scons abandon-ware? I remember it being part of the build setup at work at some point and it caused quite a bit of frustrations because it broke every now and then. I believe it was mainly failing on Windows.

Conan's recent development is promising and it gives you the full power of Python. It can also be used declaratively but with limitations. If I remember correctly, there are discussions in the C++ world about a declarative exchange format for dependencies and build information, but it's in the early stages. It's not trivial because there are also C++ modules now.


It will work that way with anything :)

Especially gradle.


> the complicated workflow engine solution ... will probably require a team (or multiple teams) to maintain

You can get these "as a service" which might scratch the itch for some.

(Disclaimer: I work for a company that sells Airflow-as-a-service and adjacent consulting)


Low level infrastructure of any kind is the worst.

Programmers love to work on it, so there's never a shortage of good implementations, companies need it, so some of them get proper funding and teams, and yet it's really hard so there's no way your DIY solution will have the features of the big ones, and it probably won't have the reliability.

And worse, you'll probably need to learn and support the existing thing anyway, why not just skip to it?


You're probably thinking of a different type of "workflow engine" than what Temporal is. It's not something where business people drag boxes in a GUI. It's still all code, within your code base, only with different approach to handling long running (whatever that means in your context - 2 minutes or 2 years) tasks.


I agree about the limitations of DSLs.

Temporal is a workflow engine that doesn't use a DSL or nerfed version of a programming language. It runs your arbitrary code with any of the supported runtimes (currently Go/Java/Node/Python).

I'm able to write, deploy, and maintain workflow code by myself and use their cloud service for persistence and admin UI.


Damn, this hits the nail right on the head.


>"Workflow engines"

I actually had quite opposite about this particular area. At the time we were to develop particular product for a client company. The product would've greatly benefited from using workflow engine. I did some shopping around, talked to sales reps and have discovered that we would have to shell out at least $350K for our particular case. So I've proposed to the boss that I would quickly build one that would cover basics. The boss has agreed and I built it in about one month. It worked fine for what it was intended for.

After a while we have approached a vendor (the one we would have paid those 350K for their wares). We showed them what we have built, how it was used. They were impressed enough and we became sales and implementation partners. They have routed a gobbles of jobs, training and installations for us to do.

As for original home built engine - over the time we have replaced it with the one from our partner without much troubles.

Win win for everyone involved


I had very similar experience. We built a custom workflow engine, with visual designer in a few weeks, which then went with our DMS custom solution. Any workflow engine we tested, before or after the build, was either too complex or too expensive. We had few bugs and newbie mistakes in ours, but nothing too bad. We sold few more installations quickly because we could implement any customer need promptly. It's now more than 10 years and I moved on, but the company is still selling the engine with other solutions. I would say, a big win for build vs buy.


Temporal has really solved so many problems for us it is only opinionated about a few things that actually matter, and gives you complete flexibility otherwise.

The days of Airflow and similar seem like a stone age in comparison.


Major benefit of Airflow is the number of already implemented integrations. Importing data from GCS to BigQuery, copying data from Postgres to GCS, KubernetesPodOperator and so on. IIUC with Temporal you get only workflow management which can be easily integrated with any application to implement business logic. And this is great, because implementing business workflow in Airflow is even more awful than the Airflow itself. But for any ETL or plumbing job Airflow is IMO better due to existing integrations.


You are correct, that's the main difference. I wrote some more on the topic of data workflow engines like Airflow vs general-purpose application development workflow engines like Temporal here: https://community.temporal.io/t/what-are-the-pros-and-cons-o...


Is Temporal meant to be an Airflow replacement? The website exclusively offers examples of executing multi-step core business logic and not ETL workflows.


It’s for whatever your code can do, same as Airflow.

One of the great things a Temporal workflow can do is can send or wait around for signals from external processes, indefinitely if needed. It’s much easier to start orchestrating things you already have. You don’t really need to buy into it as much as you do with Airflow. If it exceeds retries or timeouts, it can send a signal or launch a process to notify a human that something needs fixed, then someone can intervene, then notify the workflow that it can keep going now. Airflow is much more all-or-nothing success or failure in my experience. Very hard to re-enter the workflow after something got twisted.

Certainly Airflow has more ETL integrations at this point in part due to how much longer it’s been around and the use cases it’s been evangelized for.

I had never worked in place that had much investment in the ETL integrations, we used dockerized processes and just the docker operator, as it was easier to develop and test independant of an airflow instance.


So, when looking at Airflow some time ago it looked like it was good at 'fixed workflows', something like, fetch last day's data from a website, process then load it at the DB

But it seemed it was bad at more flexible ones, like, load data, then process each entry in a certain way, then trigger a new workflow based on each entry (like send an email to every entry on the data fetched based on some condition)

Does Temporal does this?


Yes, Temporal workflows are as dynamic as needed.

The other useful pattern is always running workflows that can be used to model lifecycle of various entities. For example you can have an always running workflow per customer which would manage its service subscription and other customer related features.


Workflow engines are the most difficult thing to sell in organisations as the usecases are open ended. Organisations needs certain level of maturity to understand that they need one


People assume all workflow engines need all complicated document logic and business aware routing.

We built one for a medium sized business without any of that. It's essentially a form system where the users can select the next destination for the form's approval on their own. Users understand the business process and are responsible for implementing it in any other context, turns out, they are capable of managing it in an online forms system as well.

Then all you really need is an auditing system that tracks all the states the document has moved through and displays that to users who are making decisions based on the form and that state. Add "final approval" and "return for revision" and "recall" states and you're pretty much set.

No business specific logic. No need to keep the system configuration in sync with the organizational chart. No need to build "vacation delegation" or "user impersonation" features. You just need to keep the forms up to date with the business use cases, the users will manage everything else on their own.

Our system has been in place for around 6 years now. We do maybe two form updates a year. We have not changed the backend code or system logic since it was deployed. The only other support issues we have to deal with are when the LDAP integration configuration needs to be updated.


In a way your solution sounds like a JIRA tickets. Not criticising but a lot of upper middle management want guard rails.


To a certain extent, that was certainly the idea, the main difference would be that once a flow is started it is not editable. It has to be returned or recalled to be changed, and then the audit trail starts over again.

That's also a valid criticism. Especially if you're expecting a lot of automated processes to be kicked off once an appropriate approval chain exists. This really is only well suited to businesses where there are limited opportunities for automation. In this particular case, once they recognized that all of the terminal business process steps are mostly manual anyways, they understood the utility of something so simple.. and cheap.


I'm glad you got a solution implemented out of it. Nice to see reason win the day.


Nah, it sounds like Lotus Notes.


That and a lot of engineers are genuinely excited to build their own workflow engine - whether they call it that or not - because it’s complicated and it feels like they just discovered a brand new and powerful abstraction.

Tears follow when the team either doesn’t account for all the edge cases or doesn’t have the resources to address them.


Absolutely agree with you.


What exactly is a workflow engine?

At a previous job, we had a fair amount of celery tasks and logic around starting them based on user input or on a schedule, retrying on failures and marking progress or cleaning up state in various databases.

Is that a workflow engine?


Sure.

Open source analogue would be Apache Airflow.

Abstractly, it's some directed acrylic graph (DAG) that is asynchronously computed, sometimes on a schedule.

Unfortunately, most things fall under DAG. But the framework / engine exists to manage the complexity of the ever-extending pipelines declared by the engineers

Event/push-based workflows also fall under this taxonomy.


why acyclic? review steps can send stuff back to previous input steps, can't they?


Yeah, I mean, the computation can be re-materialized.

DAG may too specific. It's really a dependency graph, that likely has a DAG topology.


It is usually better to have explicit returns.


What makes Temporal a "workflow engine" rather than a background job runner? I think I used Active Job or its like a career ago in rails. The docs on Temporal are showing me retries and storing results of the job. That does seem useful!


Temporal jobs/tasks are called workflows because the code is effectively translated into workflow steps—i.e. code progress is made by workers, and each step is persisted so that if the worker dies, you don't lose anything—another worker picks up in the exact same place in the code, with all local variables and threads intact. It also provides helpful built-in features like retries and timeouts of anything that might fail, like network requests, and the ability to "sleep" for arbitrary periods without consuming threads/resources (other than a timer in a database that is automatically set up for you when you call `sleep('1 month')`).


This comment is just too real and too precise.


I built an entire custom e-commerce platform - product catalog, cart, checkout, CRM, order tracking, fulfilment management and back office. The site is a print on demand store, using contract printers. In hindsight I wish I had gone with either an off the shelf package or taken the punt on Shopify (this was 2012).

My assumptions around how much integration our custom product customiser/editor needed with the rest of the e-commerce platform ware wrong. I thought I need a user system and "saved designs" for the customers, but that's somewhat rarely used, and could have been bolted onto a standard system.

Maintaining and updating it is extra work over what the core business is, there is now a lot of custom code to fix old assumptions and implement features that we didn't previously expect. All of which come as standard with Shopify.

We also believe that customers are increasingly used to seeing the Shopify checkout, it is a reassuringly familiar experience. I suspect it has a measurable effect on dropouts.

If I was to start again now I would 100% just use Shopify, no question. We are considering a large project to move to it. It would be quite satisfying to delete all that code. But it would probably bring new problems, and thing we are used to being able to customise that we will be unable to.

Do I regret doing it? No not really, hindsight is 20:20. A lot of lessons were learnt, but that enabled us to build a successful business.


We are repeating this process because the Shopify API, library infrastructure, making it difficult for us to manage our 30K products. It takes 5 hours to sync our catalogue via the Shopify REST API. The once amazing Shopify API gem regularly breaks.

Additionally, the admin interface is slow and buggy, lacks the necessary information to effectively pick and pack orders at scale, and is unable to handle multiple kinds of inventories (local vs supplier inventory) and lead times.

Furthermore, it takes 47 clicks to ship a product (together with Australia Post), whereas our system can do it with one click. This saves us just under 2K hours per year. Our system also integrates with a robot[0], which can automate the shipping process further.

Our system removes the siloing of data which is a huge problem when it comes to effective ecommerce customer support, and followup service. An example of this, our system tracks deliveries to customers, so we can touch base with them when an order is delivered, or reach out when an order is awaiting collection at Australia Post.

[0] https://raspberry.piaustralia.com.au/pages/the-raspberry-pi-...


Sounds neat. Tell us about the problems with it.


Could you plaese be more specific on "it"?


I did this same exercise circa 2002. At the time there really weren't any good options (at least none that I was aware of) so I was pretty much charting my own territory. It took the business well past $1,000,000 in revenue before they switched to some off the shelf package long after I had moved on. I guess it wasn't too bad for a comp-sci student with no e-commerce experience and little serious programming experience.


Too funny - did the same thing in 2009 for a print on demand greeting card site. Wonder how many of these are still around. Cardstore, Greeting Card Universe, SendOutCards, etc. There was a moment ~ 13-15 years ago...


ORMs. I didn’t “buy” Hibernate but we adopted it and the whole JavaEE thing wholesale. What a disaster. Learned a lot of lessons from that, #1 being, don’t use an abstraction layer (ORM) to abstract away another abstraction layer (SQL).

The performance of Hibernate relative to plain SQL was abominable, and this directly caused us to lose at least one contract. It turns out - on reflection, of course - that it’s not even theoretically possible to get into the same performance ballpark.

After years of doing battle with the tools, I eventually kicked them all out, decided to work hand-in-glove with the database, and suddenly things became both straightforward and performant. I now think that ORMs are a code smell.


I'm not sure I agree that ORMs are a code smell. I've used NHibernate (the C# equivalent to Hibernate) and more recently, Django ORM, both extensively. It's possible to get out the SQL generated by each, and 80% of the time, it's roughly what I'd write if I were writing SQL, and performs comparably.

The other 20% of the time there are performance differences, but only about half of those are big enough and in hot enough spots to warrant fixing. For those cases, some of them can be fixed by doing things a bit differently in the ORM, but often it's easiest to dip out of the ORM and write a little SQL. The mistake I see people making here is "fighting with the tools": if you find yourself fighting the ORM, just dip into SQL: the ORM doesn't make this hard to do.

What emerges from this strategy is that the ORM handles all your trivial cases, while you dip into SQL for your more complex queries.

It doesn't have to be all-or-nothing. The ORM can co-exist peacefully with a little SQL. If I had to choose "everything ORM" or "everything SQL", I'd certainly choose the latter, but we have other options.

All-SQL has its own problems. As your number of queries grows, you start to find duplication between parts of the code that moves tabular data into more user-palatable object forms. If you create abstractions to remove that duplication... slowly a half-baked in-house ORM emerges. Some projects aren't large enough or long-running enough for that to happen, but it isn't usually the small, short-term projects which have non-trivial datasets that ORM's can't handle.

Some of our differing ORM experiences might come from the different ORMs: maybe Hibernate is just bad and you'd have had a better time with one of the ORMs I've used. I wouldn't know as I haven't used Hibernate.


I can’t comment on your experience either but I can say that IME Hibernate doesn’t play well with others. You end up with all these cache problems that you have to mop up, and interop either becomes super fragile or you disable the cache and the app becomes very slow.

Maybe that’s hibernate, maybe we were just Holding It Wrong, but whatever the reason - we spent a lot more time writing Hibernate code and a lot less time working on domain logic than I was comfortable with (the “blue flame” of a couple of recent comments on other posts)

For me the clincher came when I started using stored functions for business logic. There is so much less duplication (entity objects etc) and you can call the logic from any language, including the SQL CLI.

FWIW I am not at all opposed to libraries that help me work with SQL, but these days I want any tool I use to be very close to the database.


Type safe APIs for SQL are the happy medium.


What do you mean by this - do you have a specific example?


I think that he means generating application code from a database schema. A tool like Jooq (https://www.jooq.org/). If so, then I agree with him.


Libraries like Jooq and SQLDelight include what I'm talking about and then build on top of it with codegen which is even nicer since it adds compiler safety

But even without codegen you'd still a much nicer interface than manually hacking together strings as the Golang example others have linked shows: https://github.com/Masterminds/squirrel


Zapatos is a good example. https://jawj.github.io/zapatos/


golang https://github.com/Masterminds/squirrel

Constructing sql by concat strings has a few issues, its repetitive and hard to assemble certain queries conditionally, and at least in golang its easy to write code vulnerable to sql injection and you can avoid that by using types


I never use string concat to generate SQL in Go - isn’t it normal to use placeholders? ie,

    db.QueryRow(“select $1”, n)
Looking at squirrel, I really don’t see how this

    sql, args, err := sq.Insert("users").Columns("name", "age").
    Values("moe", 13).Values("larry", sq.Expr("? + 5", 12)).
    ToSql()
Is better than this

    sql == "INSERT INTO users (name,age) VALUES (?,?),(?,? + 5)"
That said, I will happily agree that that SQL statement composition is not the same as an ORM, and I can see the benefit of Squirrel for those rare times you do need to conditionally build SQL statements.


Not OP, but I would recommend Kysely as a great example. I’m on mobile so don’t have a link at hand sorry.


Sqlx for Rust type checks your queries at compile time.


What are the use cases for an ORM like Hibernate? I've worked in fairly complex domains, an I'd rather just use a lightweight library and write the SQL myself. Typing out SQL queries is easier than, for example, using the JPA mappings and Spring Data JPA query format.


That is the right approach. We were sucked in by the claims that Hibernate would make life “easier”, with the idea that we could represent everything in Java. This was very appealing.

I think another factor was that for a while, databases were seen as a necessary evil, rather than the beating heart of our application (see also: NoSQL). Working in a “real” language was seen as better than working with this messy data thing. The whole “impedance mismatch” nonsense.

(I’m not defending our decision, which in hindsight was wrong and stupid. just trying to explain it.)


Seconding: databases as an "implementation detail" is pretty wrong. If you're writing a basic CRUD app you can get away with it, and in fairness this is a lot of cases. But as soon as you start saying things like "how do we lock rows" or "let's put 10M rows in this table" you really need to write the SQL yourself, because at this point you don't have a software application, you have a data application.


I've always thought the caching Rails added was invaluable most of the time. Granted I've also worked with plenty of people who don't even look at the query logs and wonder why their API's are slow


I didn't get it wrong, but the company I worked for did. I spent 2.5 years creating a sales tool that they dropped the same day I released it. The company went with Salesforce instead.

It was really disheartening to have 2.5 years of hard work dropped like that, but it was absolutely the right choice. They had designed it themselves and it was barely functional. They had a lot of upgrades planned to make it do what they really wanted.


That's just bad management. It didn't take them until the release date to realize that SF was the right choice. They just procrastinated until they couldn't anymore when you released your project.

I had the same thing happen at a previous job. My team built and executed and hit every object and was ready to ship and that's when they decided to inform us that they don't want to ship what we built and wants us to integrate our features into a product they had acquired. Like that decision could have been made 6 months or even a year earlier. Instead they procrastinated until the last moment. I decided to leave the company after that. Seeing 2 years of effort and good execution flushed down the toilet was very disheartening.


The last three years of my work will be permanently abandoned

https://news.ycombinator.com/item?id=33804293 by Eric Lippert last month


Oh! https://ericlippert.com/ now has 5 "Retrospective" posts up about https://beanmachine.org/ . Hmm, but no easy way to link to the set ("Uncategorized").


Can you tell me more about the Salesforce decision? Salesforce is one of those companies that I hear the name a lot, but seldom do I get to read insights about them.


Salesforce (SFDC) provides the customer relationship management (CRM) solution where your entire sales team will view prospects and do data entry on their status in the sales cycle. People build entire industry-specific solutions on top of its (IMO horrid) API and have been doing so since the early 2000s; they had an App Exchange long before Apple had a consumer-facing App Store: https://9to5mac.com/2011/08/26/salesforce-boss-tells-a-story...

It's absurdly complex and had a reputation for being annoyingly slow for any user interaction (though reportedly that's improved in the recent half-decade). If your needs are unusual (say, for a B2B2C multi-sided marketplace where dynamics are counterintuitive and your schema constantly changes) it may be difficult to iterate fast enough with Salesforce. But if you're doing any kind of B2B sales, and you want something that everyone knows, that will grow (however painfully) with your organization, it's the default choice.

Also worth noting that they acquired Slack last year for $27B, and overall have a $128B market cap.


I was in the same boat and finally did an implementation just recently. Salesforce is a crazy beast. At it's heart is really just a CRM. A tool for tracking and nurturing customer relationships. Different businesses have very different kinds of relationships with customers if you're like a giant retailer vs something like selling industrial equipment to manufacturers so making a tool to cover all those cases is hard. Salesforce to me really feels like a UI-on-a-database kinda tool surrounded by a load of data manipulation tools. It's also rife with DSLs (Apex, derived from Java and two custom flavors of SQL). The big thing about Salesforce is just how big it is. It integrates with everything, they have a robust app marketplace and dozens of integration partners, they've acquired loads of complementary software companies like Exact Target, Tableau, Mulesoft and now Slack and Heroku. It's expensive has a ton of operational overhead but everyone in that domain knows what it does and how to use it.


I was in a similar situation until Retool blog enlightened me [0].

[0] What's Salesforce. https://retool.com/blog/salesforce-for-engineers/


I added some code for our Salesforce integration, but I never actually used it, nor did I see it used. So I don't have any insights there beyond what's already on their own page.


I have frequently found myself directed to build something when it clearly would have been better for the company to buy (or use SaaS). It seems like a lot of business owners and managers have some mental block that makes it hard for them to pay for that kind of thing directly, but easy for them to burn shitloads of developer, ops, and support hours doing the same thing in-house even when there's no strong business reason to do so. Totally bizarre.


Did you continue with this company?


I did, but I can't remember how much longer it was until they decided to stop giving me good raises, so I left them. I think I got 1 more good raise out of them after that, but then they hired a new CTO and that stopped.

I now think they were desperately trying to pump up their valuation so they could sell, since they did that less than a year after I left.


Not a decision I've got wrong, but I'd suggest authentication services are a common regret. Yes authentication is important to get right, but it's not _that_ complex. SaaS tools for auth are incredibly expensive for the feature set they offer, and are difficult to replace once embedded in your services. SSO in large scale business is a case where buy is the right option but I'd anticipate that for many simpler use cases teams have to work hard to unentangle themselves from Okta etc.


I work at Stytch, a company that provides auth services

> Yes authentication is important to get right, but it's not _that_ complex...

Though I agree in some cases, I think that authentication complexity is rapidly changing and providing a _good_ authentication flow is not straightforward.

Consumer adoption of Passkeys, biometrics on mobile, OIDC/OAuth etc is really starting to take off and that really complicates your login flow quickly.

The eng time to get auth done right (and importantly securely) is not trivial, nor is maintenance. Even companies who's core competency is security get hacked (LastPass just this last week), it is that much harder to worry about when that isn't your core business.

> SSO in large scale business is a case where buy is the right option...

100% agree; any team that I've talked to wants solid, off the shelf SSO to add into their product within a sprint and doesn't want to embark on untangling the SAML/OIDC knot.

> ...teams have to work hard to unentangle themselves from Okta etc.

Agreed, a huge complaint that I hear all the time. Okta/Auth0 have decided to take the interesting road of increasing cost per user as you scale rather than offering volume discounts.

Whenever you're considering SaaS, it is critical that you look at cost per user over time and make sure your contract scales with you instead of explodes when you cross a threshold.


I'm looking forward to your SSO product. I hope that it adopts the same developer friendly pricing as your current product.


Authentication is a nightmare web of complexity to me that takes me away from the differentiators to my product that provide value. If I spend all that time and get it right, I might save $10 a month in a cloud bill. If I get it wrong, terrible, terrible things happen.


You can certainly pay $10 and still have terrible terrible things happen.


It’s hard to get right. In my experience plenty of developers fool themselves into thinking it’s not that hard. The problem is they’ve never read an NIST standard or spent time learning state of the art black hat techniques. If you don’t keep up you’ll implement something out dated or broken and not even know it.


Authentication is (usually) not that complex, but identity and authorization are. I've seen plenty of institutional regret when each application has its own pool of users and an internal authZ system.


Authorization is one of these areas that has traditionally been viewed as too hard to extract from the application, but it's core to no one's business and in recent years lots of companies have started to use authorization products. I'd chalk this up to:

1) Better abstractions for disentangling authorization

2) Better technical literature on the subject [1][2]

3) Increasing comfort with third-party infra services (RDS, LaunchDarkly, etc.)

Note: I'm cofounder of an authorization-as-a-service company (Oso) [3]

[1] https://www.osohq.com/academy

[2] https://research.google/pubs/pub48190/

[3] https://www.osohq.com/


I feel like authentication itself is probably straightforward, but there's a lot of boilerplate for account management that these services provide. Creating a new account, resetting a password, etc.

I'm thinking of firebase specifically. I'm using it for a website I'm building and I've spent very little time on integrating and using it. Quite less than it would take me to write the stuff myself.


(Note: I work for Passage.id, now part of 1Password...)

Auth is pretty easy to implement, but difficult to get and keep right. Then there are the nooks and crannies that crop up and appear and get discovered that you have to be aware of and keep up with. I am of course biased, but it seems to me that paying a company to keep up with the rapidly changing environment is much more efficient than trying to do it yourself.

And with WebAuthn and Passkeys -- you can implement that yourself without too much trouble. It's not trivial but not impossible, but the same argument applies -- nooks, crannies, corner cases, risks, etc.


I've spent a better part of last week upgrading Keycloak docker container from 13.0.1 to latest (and then rolling back the changes after I failed to make it work, I will try again after New year). IMHO, Redhat just introduces breaking changes on purpose to force you to pay for their single signon thing.

Installing and configuring it was relatively easy. Keeping it up to date and secure is a different thing


I agree, but only in a consumer environment, such as games and other trivial consumer apps with a lot of free sign-ups, when storing none or very little PII, where SSO/etc aren't ever going to be required. These SaaS products are far too expensive for free consumer apps.

Otherwise, if you are going to be selling into the enterprise and the majority of your users are paid, this is an area where using a SaaS tool is a no-brainer. Your sales team is going to run into customers that need XYZ-compliant auth and it's solved out of the box with the SaaS and cost isn't an issue since it'll just be baked into the per-user pricing.


I think auth SaaS is very expensive, but I’m currently playing with on-prem Zitadel, and it’s very good so far.


I twice worked at companies that built authentication as part of their application (using open source libraries, not from scratch). It's a mistake, and a mistake that's very expensive to fix later. Even more expensive not to fix given the sensitivity of this area of software.


All the companies I worked with in the last 15 years had custom auth (all bcrypt in the end).

Nothing bad ever happened. They're still rocking the same system and the only notable change that went through was adapting to gdpr deletion requests. And they all avoided that okta hack from some time ago.

What's expensive mistake are you talking about?


I thought I was alone on this one. I feel the same way.


We built our own elaborate auth system.

Soon afterwards keycloak came on the scene and negated a lot what we had done.


Shameless plug.

Our company (https://aembit.io/) solves auth problems (specifically identity and authentication between workloads).

I have been doing security and auth for the last 20 years in different shape and form. It's a minefield. Grabbing and using some SDK for auth is simple. Making sure that you account for the whole lifecycle (identity, authentication, authorization, secrets management, secrets rotation, addressing vulnerabilities as they pop up) is incredibly complex.


Most common mistakes I see people make:

- They ignore overhead. If an engineer makes $100k they calculate with those number. The reality is a 2X to 10X overhead depending on the company. - Required return. You can't spend $100 to make $100, no investor will fund that. So you need to do activities that generate an adequate return. - Opportunity costs. So say that you have an engineer that costs 100k, the overhead takes that to $400k and they need to generate at least $600k a year. That still doesn't mean that you should do the project if you have something else that's even better. - Maintenance. People always underestimate the cost of maintenance. I'm skeptical of any estimate where maintenance is < build costs

One thing I regret building is an analytics pipeline. We should have relied on Segment for that. I've also once built an analytics platform from scratch which was bad.

On the flip side, one time we bought an ETL tool and it was terrible compared to in-house solutions


The common mistake I see people make is presuming that the "buy" will deliver on their end of the deal. When they don't, that engineer making $100k is spending a lot of time (i.e., and money) writing support tickets, desperately trying to get it escalated to someone on the vendor's side who can actually do something, as inevitably you have to first peel away the layers and layer of customer support, account managers, "solution" people who aren't actually capable of fixing anything. (Even for some inanely simple questions, sometimes, things that I think "wow, a support rep. should actually be able to answer this, it's not a highly technical problem for once" and then, nope, it has to get escalated to eng., inevitably.)

The amount of my own salary that has gone towards "Azure support ticket monkey" is frankly frightening. And I have never once seen that included in the estimate for "buy".

Many of the "build" solutions I have run require <1 engineer to maintain. When they require maintenance (whether due to something needing an upgrade, new feature request, etc.), yeah, there's maybe a two or three week piece of work, but then there are months long spans where it just sort of hums along in the background.

Even outside of "support", I still end up having to dedicate eng time to "buy" solutions, to fill in gaps in their implementation. (E.g., an artifact store having read-your-writes bugs. Heck, Github Actions ("buy" of CI) has so many bugs in it that we hit in the course of adopting it…)


You are correct in the fact that 'buy' doesn't mean zero cost. At scale both 'build' and 'buy' suffer from the same problem, overhead.

'Build' has the overhead of the initial development, documentation, testing, the follow-on maintenance, refactoring, etc. If you have one 'build' item in your code base that's not a huge amount of overhead in the long term. If your entire code base is custom you have a scaling problem; you will end up spending a lot of your development effort on this overhead rather than revenue generating work.

Ideally 'buy' lets you go further but you will eventually reach the same scaling problem. The overhead here is in research, prototyping, integration, bugs, workarounds, etc. As you have more and more external dependencies this overhead goes up until you are in the same boat as above. Also not all dependencies are created equally; some have more overhead than others.

For background I do embedded software where basically everything is 'build'. When I say everything I mean file systems, network stacks, hardware abstraction layers, communication protocols, testing frameworks, and so on. I would estimate that roughly 90-95% of what my team does is maintain all of the 'build' pieces, none of which makes a profit but all of which are required for the features that do make a profit; we just don't have time to work on the latter unless we take more shortcuts in the former. So when you say that a 'build' solution takes < 1 engineer to maintain consider what that looks like in 5, 10, and 25 years. The overhead stacks up, the industry changes, institutional knowledge is lost, hiring and onboarding becomes more difficult, and so on. Of course my bias is in systems that last a long time where it's incredibly difficult to replace custom bits that have outlived their usefulness with third party bits.


what was the ETL tool? Fivetran is great at EL, but leaves so much work on T. They have some open source dbt models[0], but they need a lot of work. For example, the Hubspot model doesn't have a way to join most of the tables (e.g. you can't join Deals and Companies, which seems like an obvious thing you'd want).

If they dedicated one data engineer to building an amazing set of dbt models on top of their raw EL output they could drastically improve the lives of their customers.

0 - https://hub.getdbt.com/#:~:text=feature_store-,fivetran,-ad_...


This is a great explanation of overhead and opportunity cost. I'm always trying to make that pitch for selling my product, but the biggest issue is that I think many companies are just jumping in on BUILD without even researching BUY.


That's wild, at every company I've worked at so far the default was always to BUY. That meant dev teams needed to fight for BUILD and fight for their existence often.


I'm thinking it depends on the purpose of the software and how much customization and integration is required. A peripheral application may be a stronger BUY than something that is deployed at the core of the business process.


In the past I've built servers by carefully selecting parts, calculating maximum power usage versus airflow, calculating full time electrical load, building, testing with fans disabled in rooms with high ambient temperatures, looking for failure modes, researching failure rates of specific power supplies and drives, et cetera.

I'd build one or two, test them for several months while making adjustments and fixing various weaknesses, then I'd build ten, twenty, however many were needed.

One particular client decided they wanted to skip all that and just spend the money on Dell. They didn't have many servers, certainly not enough to justify a separate server room. Their offices weren't air conditioned at night, and they had an entire summer where one server or another would become unresponsive over the course of a weekend, and sometimes at night between weekdays. Accessing iDRAC was beyond what they wanted to do, so of course I had to do that.

They had Dell support, but Dell had no "fix" for unstable machines other than to tell them to build a server room. Mind you - the ambient temperatures were always below 100º - any reasonable person would say that while that's not ideal for servers, "premium" servers should still be able to handle warm rooms, particularly when they're idle, and not crash or lock up.

After that fiasco, they gradually replaced the Dells, one at a time, with machines I built. I wish they had tried harder to return the Dells, but they just wrote them off.

I've learned that any savings in time and money (mostly - the Dells cost more than the machines I build, even accounting for extra billable time) aren't worth the loss in time, productivity and reliability in the long run. Of course, the opposite would be true if I just wanted more billable work, but I can't do that, unlike many others in the field.

BTW - when I build a new generation of servers, I do months of testing, but the same general platform can last a good five years, like Ryzen 1000 through 5000 systems with ECC have lasted since 2017.


I might have a consulting gig for you if you’re interested. Let me know how to get in touch if relevant!


Sure, I'd be interested. My email address is john at klos dot com :)


Is there much business building custom rack mount servers for people? I would have thought that market would have been taken out by the big brands.

Do you have a favored brand for server motherboards?


ASRock makes some good motherboards, and all support ECC by default.

Somewhat related:

I don't like server motherboards like those from, say, Supermicro, because they don't care very much about security. For example, they've had motherboards for years that let you share the primary ethernet with IPMI. Is there a way to disable this? No. Is there a way to force IPMI to ONLY use the dedicated ethernet? No - the settings are stored in the battery backed memory. Wouldn't a jumper make sense? Sure, but they don't think so.

What this means is that if your motherboard battery dies and your dedicated IPMI port is somehow disconnected, your server can be completely, 100% taken over via the public interface. While it's not a likely attack scenario, the point is that if you have a need to ship servers to various datacenters, you genuinely can't be sure that your server won't be utterly vulnerable on first boot unless you're there, confirming it yourself.

Does Supermicro care? No. They refuse to consider the loss of all security due to a dead battery to be a security issue. Having a security issue is one thing - ignoring it when told about it (or worse - pretending it's not an issue) is something else entirely.


I do something similar as part of a team that builds bespoke equipment (can't be too specific). It's a full time job for 2 people for runs of around 50 custom half-racks every year. It's definitely a niche, but there's demand for sure.


Sometime around 2019 I went all in on the Ubiquiti ecosystem, specifically the UDM-PRO "Dream Machine" and all of the integration with cameras and wifi APs that could be done with that.

I made this decision vs. building out a new gateway/router/switching/monitoring/SSIDRoaming infrastructure from scratch.

This was a bad decision.

Even now, nearly 2023, the UDM-PRO is a beta product and I am a beta tester. Further, we have all learned that Ubiquiti is a dysfunctional organization not focused on anything at all resembling technical/engineering goals.

Ubiquiti wifi APs are still, probably, some of the best available so I will probably keep those ... but everything else - including PoE switches is getting ripped out and replaced.


I'm getting ready to do a build for a client with about 60 people and no expectations of growth. They are technical and want gear that one or two of their folks can be trained to 'power user level admin' vs calling me every time (hourly consultant). I was investigating Ubiquiti gear, but you've given me pause.

What gear did you go with and/or would recommend to check out?


I second this. I went all in on the dream machine and some of its parts and it failed in an unrecoverable way, I sent it in for repairs; and it failed again in a different unrecoverable way. I just use the crappy Comcast router with a ubiquiti AP now.


I've seen a project outsource a core component "to speed up development". Turns out integrating with a 3rd party service added similar level of completely, all the design decisions had to be made around 3rd party system which constrained the project pointlessly. Project eventually got scrapped - part of the reason was that their core component wasn't even their own and they would have to renegotiate the license to pivot.


Definitely had this happen in recent memory. Also the thing to be careful about here from experience is you still need to qa the contractors deliverables. Team had a contractor delivering features and it turns out what was “delivered” many times didn’t actually work. Wasted several months of my engineering time babysitting and validating said contractor. Net result was wasted time and money as we just ended up reimplementing ourselves.


Yep, this was part of the reason I said integration vs developing in house would have been similar level of complexity. Pitching to our management "everything we requested was already covered by their system, it's just a matter of exposing it publicly and configuring it for us". Reality - we would hit blockers regularly and we were basically dealing with a black box system + cross org communication delay. It was obvious that some of their shit wasn't even tested before shipping, let alone done before we singed.


Magento Cloud... a.k.a. platform.sh was exactly that.


cough azure...


A commercial Nas (qnap, this time), vs just a simple Linux box with hard drives. Simple Linux box was the better option. We even wrote a blog post about it: https://www.factorio.com/blog/post/fff-330 (scroll down, it's topic #3. Also, I don't work there anymore)


ZFS is ridiculously good too. So is Factorio, in fact!


Virtually everything in my self-hosted tech stack: Web, mail, specialty apps for CRM and project management... all of it.

I like the control and I like tinkering. But as I've become busier, I realize how much revenue-producing/move-the-ball-forward time I've lost and am still losing by doing all of this myself.


I am spending an inordinate amount of time on my homelab (and the todo list does not seem to get any smaller). But on the other hand this activity gives me a ton of experience that I could not get doing $DAYJOB, and this experience comes extremely handy from time to time.


100%.

What are you running?


- Unbound+nsd for all dns needs

- Tiny Tiny RSS

- Nextcloud

- Docspell

- Gitea

- Youtrack

- Prometheus

- Minio

- Set of wireguard mesh networks for management, monitoring, service access, backups

Almost everything is managed through a single Nix flake with deploy-rs and agenix


::::partial overlap high-five::::

All of my stuff is public-facing and needs more reliability than I can provide via home internet, so I have a rented 1U in a datacenter. Proxmoxing my way through about 20 Wordpress sites, mail infrastructure to handle 100k messages a month for a couple of organizations, Gitlab (would have preferred Youtrack!), Nextcloud, FreshRSS, Mautic, Bookstack, Nextcloud, etc. Really want to dive into Minio on another machine for backup storage.

EDIT: See what I did in that last sentence? More tinkering! Argh!


Minio has been great IME. I have set it up to replicate the backups into Backblaze B2.

I do recommend managing it via terraform from the get go as it will save you quite some time later and lower the wtfs/minute metric.


There's people who use hammers because they have nails that need hammering, and there's people who use hammers because they enjoy hammering.


The folks who just love hammering and teach themselves in their spare time, may well come across an opportunity to hammer professionally.

In which case, nobody will care how they got their skills, whether it was on the job or not. Just that they are really good at hammering and can prove it.


You missed my point. I don't disagree that as long as you're good at hammering, it doesn't matter if you learned it by hammering on the job or because you love hammering so much that you kept doing it by yourself. My point was that some people don't have hammering as a goal. Some people's goal is to make money, and some times that's best achieved by hammering, some times that's best achieved by sawing. But the people who hammer because they like it will try to hammer even when sawing would be more effective.


- A former CTO made the decision to fork thread mesh protocol for an IoT project. Thread (and our development) was totally inadequate tech for industrial usage. Months of development went by and no progress was made. We wound up licensing some proprietary mesh protocol.

- Same CTO somehow convinced investors there was no x86 machine available on the market with the right specs for what was needed… so they put together a team of hardware people to design and build a motherboard.

- Personal… bought a late 70’s / early 80’s Sol cat catamaran sail boat for 500 bucks while in college. Unbeknownst to me, the hulls were notorious for delaminating and I didn’t know what to look for at the time when I purchased it. Long story short - I spent months of effort fixing and painting it, a lot of money, and sailed it once before giving up on it (of course it partially sunk during the maiden voyage). It wound up blowing away in a hurricane.


Your boat story is pretty much the same as every boat owner story I hear. The jokes about boats being a black hole for time and money seem to be accurate.


I think the joke is "a boat is a hole in the water that you throw money into"...also, the two happiest days of a boat owners life - the day he buys it, and the day he sells it.


I have always heard that the second best day is when you get the boat, the best day is when you sell it.


> so they put together a team of hardware people to design and build a motherboard

That seems wild to me.. well not that it happened, but that someone thought that was a good idea.


>but that someone thought that was a good idea.

All of those "someones" were irrationally exuberant C-suite'r's and investors who had no prior experience in hardware.


My opinion is most companies that have built in-house "AI" or machine learning teams, or hired consultants to custom build AI "use cases" for them have made the wrong decision and are coming around to it. This is an area where buying a product is going to win, because most companies don't have the capability of successfully managing in-house ML teams (I've seen line departments trying to hire a PhD ML scientist) and because a company with a product has done the work to figure out a proper value proposition, as opposed to just trying to ham fist AI somewhere into operations.


How can you buy a product in the AI/ML space when the data needed to generate the model (and the value) is proprietary?

You may be correct that buying is better than building in a lot of these areas, but I don’t think its practical given IP concerns


Good question. I think the "data" aspect is largely a misconception. Any proprietary data might have some value in model fine tuning that would come with product setup, but almost no businesses actually have the big diverse datasets that support a nontrivial AI model. Look at the hyped up current AI language and image generation models. They're not based on proprietary data. Nor were the past generation.

So I'd argue that while proprietary data has a role in what is effectively the setup of a product, it's not the defining factor. And if anyone thinks that ML is going to come in and do something amazing on the strength of their own data, they're probably setting themselves up for disappointment. In fact, I think it's that misconception that has led companies to (imo foolishly) invest so much in in-house AI


About a decade ago or more, I spent a ridiculous amount of time trying to build a rig to mine bitcoin.

Should have bought a $1000 worth and then just post this from my Caribbean island.


1k USD a decade or more would've netted no more than 20 bitcoin. Even at all high time of, let's say 70k USD, that would've gave you a maximum of 1.5 M USD before taxes. Doubt any Caribbean island is anything less than 10 M USD, unless you buy one where you can only stand on one feet.


If you go back less than 5 more years than that, it was less than $1. That's what he's talking about.

Fell into the same trap myself. Thought about it as free money generation and not an investment.

If I had just bought and held 10% of what I spent trying to generate BTC, I might not be on a private Caribbean Island, but I would be retiring a lot earlier.


You can't do this to yourself. Once your combined Bitcoin hit 20-100K you probably would have sold. And you would have been smart.


For sure, I cannot complain. There a plenty who lost money to create those crazy valuations, so I am definitely on the fortunate side of things.


You could just build an exchange :)


Most buy vs build decisions that go wrong with buying is because there was insufficient due diligence on the vendors, comparison to other vendors, and general ignorance of the thing being bought-or-built.

The worst outcomes I've seen are when it's SOWs that are buy-to-build. That is - paying a vendor to build something that doesn't exist yet. It's like the blind leading the blind. You have to agree to these detailed specs and project plans and contract, and then at the end of the day hope they deliver because there's so much nuance in software it's not going to hold up in court or be with your time. If you did write specs and acceptance criteria detailed enough to be bullet proof, you'd probably have been better off just writing the software instead.

My company did two of these recently, one ended up overrunning by 100% with 50% of requirements not met, never going to production and half the people on vendor side quit/fired. The other one went to production and then the vendor went bankrupt, lol.


This is sort of related but I've often wished that I hired someone to work on my house, instead of DIYing. I'm never going to be good at most house-related repair and upgrade tasks that I will do one or two times in my life. And I've been unhappy with the results of most of my work.

The main reason I don't immediately jump to hiring is it takes weeks to get people out for estimates, then half of them ghost you, and if you do manage to convince someone to take your money they are booked months out (and they also ghost you).


> I'm never going to be good at most house-related repair and upgrade tasks that I will do one or two times in my life. And I've been unhappy with the results of most of my work.

I just got done with a home reno. Did quite a bit ourselves, and hired out a lot of it. It's 100% possible to hire someone who does a way shittier job than you do and end up having to rip it all out. Ask me how I know.

Point is, it's not always true that spending the money will save you anything in the end. Finding people who do mediocre work is hard enough, but finding people who actually care about your home and your projects is exceedingly difficult. Sometimes a morning on youtube and an afternoon with the circular saw is the best you're going to get.


It's very hard to contract out the management part of a job. Even Instacart isn't a pure buy-vs-DIY decision. It's amazing how many questions the desire for a tomato can generate.


> Have you bought something that you had to scrap and build yourself anyways?

I'm very close to with another product.

HashiCorp is way overhyped. Yes, their open source products aren't bad if you don't pay for them, but then you get an enterprise license and realize how bad their support is. Seems like the company thinks that "enterprise" means just a product with additional features, but you're still on your own with any help.

Consul, especially when running on k8s is very complex, it feels like support barely knows more than you. They won't answer any questions explaining how given feature works, referring to solutions architects, which takes months to get access to them. Unlike Open Source, since you don't have the source, you can't just modify code yourself to add missing functionality, and if you ask your rep about it, they might tell you it could take a year (if it even happens) to implement it. WTF?


Totally agreed on being disappointed with Support. There are some diamonds in the rough, but I'd give them a hearty thumbs-down overall. Agreed on feature requests too. If it isn't already on the product's roadmap, good luck!

> They won't answer any questions explaining how given feature works, referring to solutions architects, which takes months to get access to them.

It sounds like your account team is really letting you down here. We meet with ours, including SAs, bi-weekly. Generally we can get answers to our questions/issues within a few business days, including feedback from product engineers.


> It sounds like your account team is really letting you down here. We meet with ours, including SAs, bi-weekly. Generally we can get answers to our questions/issues within a few business days, including feedback from product engineers.

Thanks. So I see that this is not normal, maybe I can escalate.


Have never regretted a buy. Sometimes you need to pivot if price gets too high, but there's always been mitigations and alternatives.

Anything I've seen built that was not contributing business value ended up being a distraction.


At my previous job the company heavily relied on an industry standard application. This app basically had a monopoly on that particular industry. However, they were pushing the app beyond the limits of most other companies. It was also just generally an old and busted Windows 2000 era kind of deal.

When I first arrived, and knew nothing, they told me we were going to build a replacement. That sounded great to me at the time. Obviously the right thing to do is replace something old and busted.

Well, I and the company learned over time that actually that was not really the best business strategy. It may have been possible, but not with the resources we had available. I ended up doing a whole lot of work that went unused, through no fault of my own.


A better strategy is to first move a small part of it out of the system. And not go for a complete rewrite. And then make sure the old system can work as a proxy for the new system, or vice versa.

When both can be used simultaneously you can now gradually move from one to the other, or revert back as needed.

This is what I followed.


It reduces risk of failure but can end up costing more in the end. It's like doing a home renovation piece by piece rather than all at once: if you have the capital upfront and an experienced professional, doing the whole reno at once will be cheaper.


If you need to keep things running and to make changes while doing the rewrite, then you might not have any other option than the Strangler Pattern.



The software that ran the "core business" of the company. The CEO and I decided to not hire an off-the-shelf solution, which would do everything that we wanted to do, but would put our data into the vendor's hands and we thought that it was not worth the risk. After all, we are gonna be huge and our data is our oil!

A few years later, our internal version of this software was crappy and our data was still not something that we could monetize. In the future, I'd refrain from being too cautious at the beginning when you are facing bigger risks that may not be sexy, but are very dangerous.


It's amazing how the attitude about not wanting to put data in the vendors hands has changed over the last ten years.

I remember having a hard time to convince our CEO to use BitBucket in 2011 because our precious code (that no one would actually want or could actually understand.....) was going to be "in the cloud".

I bet there are many firms that gave into the fear and regret it now.


I initially built a system for web scraping but was constantly running into issues of getting blocked, even when using good quality residential proxies. I had to constantly investigate why I'm getting blocked and update tools. Sometimes the effort was significant when I had to switch to a different framework which was giving me a better success rate.

Then, I switched to web scraping API (I'm using https://scrapingfish.com as they have convenient pricing for my use case, but there are other alternatives). Now I only have to maintain parsing logic in scrapers. It also actually reduced my costs of scraping since I no longer pay for proxies which are more expensive for my scale than a web scraping API.


At a past company, we (incorrectly) assumed we'd always have to support an on-prem deployment capability, so we decided to build our own virtual appliance (assuming we'll do actual hardware one day). That appliance cluster had to do a bunch of heavy lifting to provide a bunch of services, and we had to do it all ourselves. We even had a Cassandra-like DB cluster, which we stupidly tried to do using Scylla-DB (Scylla is amazing now, but it was just getting started at the time, and while it was super fast even then, it was not stable or reliable enough at the time). To add insult to injury, we only did a single on-prem deployment ever, and that customer never actually converted to a paying one...

If I could do it all again, I would have gone cloud-native (or at least leveraged K8s), and I'd use as many managed cloud services as humanly possible. At a later gig - we did just that, and we very very rarely had even 1% of the infra struggles we had with the solution I described above.

Nowadays, my basic advice is to always buy the best possible service when you start out, and only start to think about replacing it with DIY services when you have enough scale to pay talented engineers a salary to build AND support replacing it - and even then, the potential loss of focus and velocity might still make this a bad idea. There's a reason Netflix is still on AWS.


> There's a reason Netflix is still on AWS.

I would counter with stack overflow which has scaled great over all this time on only a few self-hosted servers.

The trouble with replacing the buy with a diy later on is that it will now cost at least twice as much to build, because 1) you're maintaining the existing system which takes some of the people most familiar with the problems of the existing system, and 2) the existing system will still evolve a bit because that's business, so you're also trying to hit a moving target.

I think other posters have it correct, you have to think hard about how central a particular feature or service is to your business model. Chances are, differentiating yourself means off the shelf solutions won't be a perfect fit for your core revenue, but all other supporting services should probably be off the shelf.


I suspect there's a hidden trap here due to survivorship bias - those of us who were lucky enough to see hugely successful products/services that got some of these decisions spot-on in the beginning, see how wonderful it can be when you are able to make these investments early on and build things once. The risk is that you fail to count the number of teams that did the same sort of analysis process and still made the wrong decision because they ended up pivoting, or just straight-up made the wrong call...

Replacing a managed solution in a small-scale product really isn't required, because it likely doesn't cost you enough. Replacing it in a high-scale product does matter, but that also mean you should be making enough money to make the replacement pay for itself. If you make the decision to use the managed solution and it's wrong, that's something you can usually fix. If you go DIY when you should have paid for it - you rarely get to correct it in time, and it might just be another nail in your coffin due to lower velocity, focus, etc...


It seems strange to me that anyone would go anything but Cloud Native today.


This reminds me of situations where people “buy then customize”. Such as buying a payments gateway, then using its internal RDBMS tables directly for reports and integrations. Or buying Great Plains and then massively building on it.

Then one day these products eventually go EOL and the company is often stuck maintaining a zombie product. Or the products undergo a major refactor that breaks their customizations and integrations, and they end up stranded forever on version X.

I hear ERP systems often go like this too.

The only thing worse to me is entire huge products built in stored procedures. I know one product that was written in about 2 million lines of PL/SQL. It did some amazing things, but we were locked into PL/SQL for all time, and the Oracle scaling bills and HW were astronomical…


> Then one day these products eventually go EOL

This happens with any product you buy. Dealing with it has to be included in the value proposition.

> we were locked into PL/SQL for all time, and the Oracle scaling bills and HW were astronomical

Yeah Oracle is expensive. Scaling costs should have been part of the calculation to commit to it. For most orgs that use Oracle, the cost is not really an issue in the grand scheme of what they spend on IT and servivces.


I agree, all products die, but how you integrate is all the difference. When you start reaching into the private guts of the system (in this case, the database), you are hopelessly intertwined with them. There are much cleaner ways to do this that makes a migration to something new a lot less painful.


Yup, they are better off making small changes to their processes to fit the ERP system, instead of the other way around.


While fun and enlightening, my Toyota SR5 EV conversion was twice the cost of the used Nissan leaf I had before it, and apart from the stunning good looks (IMHO) has crappy range and all the other drawbacks of a 1990 car. My BMW 528 conversion I started in 2019 will probably never get 100% finished as I always seem to need another fix to get something to work properly, last year the wife got tired of my efforts and bought a Model 3. (Sigh!)


Do you have a blog or something to follow? :) I am thinking about converting my cancer producing E39 530d and would love to read how others fare with the effort :)


I didn't (too embarrassed) but there is this guy whos quite famous in the EV conversion world and video documents the whole process for an E39 https://www.youtube.com/watch?v=BcyJm5CrLJI&list=PLPHK4T9kKE...


I spent $800 bucks on a gaming PC, via parts, and then couldn't get it to boot anything at all. It sat in my home from early 2020 until last week when I put it on the street with a FREE COMPUTER sign, never to see it again. I bought a Playstation and have been using it for years, with no problems, so maybe I just need to get a good prebuilt PC someday and not mess with it too much.

I don't feel particularly bad, though, because in the past I've done more impressive things that were harder (building a 3D printer from scratch, putting a V8 into a Datsun Z, etc.), but for some reason could not figure out how to build this computer from parts.


If you ever feel the need again, vs a look through pcpartpicker. They'll warn if parts are incompatible and you can also look at other's successful builds to get ideas.

The computer shop near me will price match parts and for $25 will put them all together and run the system through tests like memtest etc. That is a small price to pay in my mind for them to handle RMA, etc. I can load up the OS etc after and know that the system will boot up.


Also, look at Logical Increments. It's a great site for figuring out what you should actually be buying at any given price point.


Thanks for the tip! I'd never heard of Logical Increments before!


I recently had a conversation with a colleague who mentioned:

“We can only be great at one thing. The rest we can only be good at.”

This doesn’t quite answer the question, but I think it’s related.


That's a quality aphorism.


Company I consulted for had 12 engineers, spent millions and 18 months to build some niche custom e-commerce solution.

They were burning money and about to go belly up.

We spend 3k on shopify plus some outsourced shopify engineers to get 95% of their solution in ~4 weeks with zero engineers.

Turned the company around and ended up doing a M/A 2 years later.


The biggest issue tends to be when people double down on continuing down the wrong path when a better alternative is obvious. The thing you bought or built is often the wrong decision three-to-five years down the road.

When you do hit the fork in the road on build/buy I don't have a hard and fast rule but generally I take the view that if it isn't directly related to revenue generation for the company than it shouldn't be built.


Ah yes, the sunk cost fallacy! Just because you’ve spent $20 dollars in the past doesn’t mean you should spend $20 more to realize $15 in gains.

A lot of times dollars are man-hours which are harder to estimate though.


I've been bitten multiple times by relying on external products, only for the vendor to either go bankrupt, or pivot, or get acquired and then prices exploded.

As a practical example, Heroku has been pretty unbearable support- and stability-wise since Salesforce bought them. But moving off their platform now that we integrated it into everything for 6 years is surprisingly difficult...

Similarly, all of our Allegoritic automatons became worthless when Adobe bought them and replaced $100 once perpetual indie licenses with $250 monthly rental. I've abandoned custom 3ds max and vray plugins for the same reason.

And ask anyone running a dropshipping store how renting their platform worked for them. Platform prices go up until you (the merchant) have practically no margin left.


Yes. We had a nice home grown B2B helpdesk/ticket system that worked really well for our customers but we decided to ditch it for HelpScout and it has been a big mistake. It just doesn't work for b2b and charges like $20/Agent when our home grown solution could be hosted on a $10 VPS with hundreds of customers on it and unlimited team members. It wasn't perfect but far better than us switching to Helpscout.

I am almost reconsidering going back to the home grown solution even though it will require a bit of work since we haven't touched that code for 2+ years and almost retired that tool.


Shortly after joining the PyTorch compiler team, I was part of a team that decided that we should build our own tensor-expression compiler for PyTorch (called NNC, although it wasn’t well-publicized) instead of using an existing one like Halide or TVM.

We ended up sinking two years into it, and never ended up with a particularly good compiler (although we did absolutely crush a couple toy benchmarks).

Arguably both sides of that tradeoff were wrong, though, as the eventually successful PyTorch 2.0 compiler (TorchInductor) was based on Triton (plus some custom higher-level scheduling logic).


My company is currently paying through the roof for Datadog. While I'm not certain that it's cheaper to staff a full-time observability platform team, there are a lot of open source off-the-shelf TSDB solutions like M3, Timescale, Influx, etc. that should make maintaining an in-house platform less arduous.


I did triage / performance consulting for a long time with several products in that space, added features to shipping commercial projects, and worked on a product of my own in that space. While in theory you could build your own tool tailored to your specific organizational needs, I can't really say I'd recommend it. In fact, it wasn't completely uncommon for me to replace whatever homegrown tooling had been built with our kit because of scale, features, better UI / time to resolution tools, etc. If you are just doing some basic app logging, sure - roll your own if you want - but if you are trying to do real-time tiered analysis across something bigger than a few servers, there are commercial products that in this day and age are relatively cheap (or at least a lot more affordable than in the past)


I've looked into all sorts of APM platforms and have used most. All the free, open-source ones have absolutely nothing on Datadog and New Relic (which is what we use).

I wish this wasn't the case, but all the time I spent on Honeycomb and various other OpenTelemetry packages simply can't get it done. Automated APM platforms are simply that much better.


Wouldn't be so sure. Observability tools usually have the highest traffic, highest data volume, and highest availability requirements of any system a company runs. The times you need them most tend to be the times all of your other systems are failing, so having a shared blast radius is a really bad thing. In any microservices architecture, they're multi-tenant systems with plenty of noisy neighbors.

Those factors make running them well both harder and more expensive than running any ol system.


A monitoring system is more than just the TSDB. You've got all those pre-built integrations with everything else. All those log, traces and a nice GUI for your user to actually be able to use them.

Currently in the later stages of a Datadog project. Which of course means we are working to get the cost down a little. Modern apps product a lot of logs, traces and metrics.

SaaS solutions for monitoring are usually great except for the cost. You can build (I'd advise going with the Grafana stack) but I'd only go down that route if SaaS was way too expensive.

Monitoring always takes a long time to setup whatever you do. Manually setting up every app, team, server just takes ages.


The elastic stack is pretty good, but you’ll need to either run it on your own cluster/s or pay for the cloud product. It’s likely to be much cheaper than datadog but not quite as good


How big is your company? I would try to stick with Datadog until 1000+ engineers at least


The biggest issue here is that you have to hire real engineers to manage the bought product, and they feel dead ended or just waiting for a layoff opportunity.

The institutional knowledge around your data never develops because the engineer whose job it is to manage datadog and observability never gets the opportunity to learn or demonstrate technical depth, and so jumps ship to another team or company literally as soon as possible.

SRE recruiters make this worse, making point after point of gaslighting new hires.


I don’t buy your take at all. Isn’t this like arguing we should be running our own Postgres bare-metal instead of relying on RDS because engineers who work with RDS are “dead ended” and a “layoff opportunity” and will never develop institutional knowledge about our database schema? In my experience working on all kinds of infra, I have 1000 things I need to do, and buying a product that lets me avoid worrying implementing 300 of those things myself is a no-brainer.


If you're spending time jumping through hoops with RDS, then it's very possible bare-metal will be easier.

I don't believe the same is true today but in the past we did exactly this because we were spending a bunch of time banging our collective heads against some RDS idiosyncrasies and it turned out running our own Postgres was a breeze.

Plenty of things can and should just simply be bought. It's not always so cut and dry, though.


All too often the integration costs are overlooked.

Do we spend two engineers to build a system or buy a system (and spend two engineers integrating and maintaining it)? I've seen this happen more times than I can count.


More SaaS vs self-host. But. We used self-hosted Mattermost to “save money” on Slack.

The loss in productivity was so big that it was definitely not worth the money saved. And the promised “it’s open source, if it misses a feature, we will implement it” never materialised either, as the whole thing is really complex.

I don’t know why is Mattermost so slow and buggy. But it was. And nobody had time to research and profile that.


Years ago, a camera gimbal. One of those DIY kits at less than a quarter of the price of name brands.

Spent hours messing around with the physical rig and software settings, but the balance was never quite right.

I'd imagine that the kits today are better all-around. At the time it was pretty cutting edge and the idea of being able to do steadicam shots with my DSLR but on a budget I could afford was too good to be true.


I love tinkering with hardware, but I always make a mental note to myself and say that building it will often cost 2-3 times what I thought it would, and that I'm actually paying for learning and having fun along the way, not the final product (I often end up using many of my projects only a handful of times, but still enjoy making them).


I spent a long time tinkering with my personal blog theme. Turns out, I really hate CSS. I eventually broke down and pulled a polished one off the shelf and adjusted it a bit. I’m a much happier person for it.


Not myself, but I used to work for a company where there wasn't much of any front-end development experience (almost all were usual CS grad dotnetter/ex-java types), and quite unfortunately, and almost unbelievably, they made the wrong decision both ways, in a short space of time - they bought when they should have built and built when they should have bought (or used open-source packages).

So, due to the lack of front-end exp (a while before I joined), they chose to buy the license for a moderately well-known UI component library to heavy-lift a big front-end rewrite. Well, due to same inexperience, there was no due diligence done and it turned out said component library had tonnes of bugs, wasn't easily extendable, had no real Typescript support, and on and on. The product suffered immensely for years, but dev leadership I think took on a sucken cost/hopeless mentality.

Years later, just before I joined, they chose to try and do something about it and pivot (I suppose as some political recoil), deciding to have a go at creating their own UI components and gradually strangler-pattern the existing external UI library. However, leadership thought themselves and the other CS grad dotnetter/java types could "learn on the job", so they didn't hire ANY devs with real experience, i.e. JS, TS, React/Angular/whatever, build tools, general front-end best practices, anything!

Fast-forward half a year and there is already even more bugs, a growing mountain of tech debt, etc. This is when i roughly joined, as their first developer with (back then) a couple years of f/e experience. Took me around a year to start turning around the general careless culture the business took towards it's f/e dev work, and argued for more learning, adoption of industry norms, craftsmanship, and all that jazz.

It's all in my rear view mirror now, but wow was that an interesting time.


I've been the lead on an insurance / vehicle appraisal platform for the past 3-4 years. One of the primary functions is scheduling the claimant to come in for an appraisal; we used a platform called YouCanBookMe for the scheduling, wiring webhooks and API calls into the core product. We mostly used YCBM because the owner's previous experience. It was great initially, but I tremendously regret not refactoring YCBM out after the first year. It had a few outages and quirks that caused tremendous issues on our platform, all for a commodity of a function.

I actually wrote about this experience and the ultimate refactor. [0]

[0] https://koptional.com/article/nuanced-strategy-build-vs-buy


That's a great article. The divergence of domain is a really big sticking point that doesn't always rear its head until you scale up. I would be interested in reading more about the solution you built to replace it.


Interesting, we need to implement something similar but for real estate appraisal appointment scheduling. Any advice on modeling that all out?


I’m not the OP, but was involved in a radiology booking platform. It still blows my mind how hard a problem this is and how poor the existing solutions are (that I have used).

Hooking in staff rosters, asset availability, appointment reminders, holidays and the multitude of other quirks is daunting.


Happy to talk and offer what I found- my email is jack at koptional dot com feel free to shoot me a message


While not strictly build vs buy (eg: technically you still have to "build" a postgres cluster and in house expertise even when buying AWS Aurora) but I generally like the ideas in the classic https://mcfunley.com/choose-boring-technology

IMO your product is the only thing you build, as much as possible should buy everything else.

There is eventually an inflection point where your product is so mature that the opportunity cost of improving it operationally vs marginal next feature you will eventually save money, but that scale has been growing YoY for a while (AWS mostly gets cheaper every generation so far) .


Man… just about every woodworking project ever.


Do you enjoy woodworking? Or do you do it to save money?

Just like with coding, it's best to start small so you get better at the core skills. Building a chess board as a first project is like learning Javascript then trying to build a github clone, you're just setting yourself up for failure.


Absolutely for pleasure, yes.


What I got wrong is not knowing management’s priorities and accounting rules about where the cost shows up on a balance sheet. This had much more weight in decision making simply because something works better for taxes (services purchased) vs large up front costs with maintenance (buying software) vs building it (labor costs plus a lot more time).

When management already says buy instead of build, the actual technology being bought is much farther down the priority list than engineers want to admit.


BigQuery is not cheap but it’s remarkable at analyzing big data quickly. My datasets are currently 500gb a day and it handles them like a walk in a park.


what do you use to query it? the biquery UI? I've found it incredibly slow and clunky. Doesn't have many of things I'd want in an app like that (e.g. command+k to jump to and preview tables)


I think the exercise of saying let's legitimately try to build something vs buy, is a great place to start. It will help determine what you need in an off the shelf solution, while also serving as a useful pricing target. If the buy solution is more than maintenance and development, then it makes sense to build yourself and avoid vendor lock-in and allow for bespoke flexibility. I've experienced both situations, some rules of thumb to go by:

1. is this service or product similar to what your company creates, or would you have to create an entirely new business unit to develop and support it

2. how complex would it be to integrate the off the shelf solution into your current system. if it's the same as building on your own, then why pay for it

3. is there competition in the market for the buy solution so pricing and innovation are competitive.

4. who is this company selling this product or service, what is their track record, do you service as just their financier and guinea pig testers


Using your own company's database application that is meant for managing physical objects in the field of cultural heritage and shoehorning it into a CRM and project management software is not a good idea IMO. One of the reasons given by the engineering leads was that this way, the software is tested internally before it ships to customers. It did break multiple times, even corrupting some records. Would not recommend. Do proper QA testing before upgrading your internal tools at least.

My manager shared this opinion and said, we can't sell our software with the argument that users need specialized software and do the exact opposite ourselves by using software that wasn't even remotely designed and fit for project management and the like. Custom solutions do also have certain advantages. But you get like 80% of what you want with common software, usually for a reasonable price and without the headache.


Assuming you're talking about inside an enterprise, rather than something minor or personal, then yes (although not necessarily my decisions).

A trap that sometimes gets laid out is it a binary build or buy decision, there is typically options in between in my opinion. Build isn't necessarily as onerous as it once was either, the use of cloud, frameworks, libraries, low code products and SaaS means you can often construct something from these legos.

In my experience, there are some in enterprises with procurement and IT management skills who tend not to have a clue about building modern software, and often push for buying stuff (keeps them busy), and sell it as a win, bought a thing, set it up, declared victory and fucked off to the next project leaving the users and technologists to figure out how to unfuck the mess that has accumulated around this clunky COTS product that is now a critical part fo the business.


I wrote about this at length in my book, and hopefully this part will fit within the limits of a Hacker News comment. What I got wrong: I thought this work needed to come in-house, but in the end that was not the crucial issue. The crucial issue was building a trusting, long-term relationship with the team, and that team could have been an out-sourced team.

Open Verse Media

When I first started, in 2016, at Open Verse Media, an ebook publisher, they asked me to look at their content management system (CMS/CRM). The staff had to rely on it, but it was very slow. The COO, whom I'll call Robin, had overseen the creation of this app. The actual work of creating the software was outsourced to one firm, but after two years Robin felt they were too expensive. She fired that first firm and then hired a firm in Ohio, which I'll call MegaStars.

The app had been built using a popular software framework called Ruby On Rails. Whenever Robin felt that a new feature was needed, she would ask MegaStars to add the feature. MegaStars billed $500 an hour, and over the course of seven years, a total of $3 million had been spent on the creation of this app.

The staff hated the app. When the head of marketing wanted to bring up the top 100 best-selling books, they would click on a link, and it would take a full 60 seconds for the page to come up. The staff had gotten used to the fact that they always needed to be engaged in two tasks, that is, something to keep them busy while they waited for the pages in the CMS to render. An advanced search, with multiple filters, could take up to five minutes to render a report. Many of the lower-level staff would simply go into Slack and engage in gossip with their peers while waiting for each page to slowly appear.

So on my first day I logged into our main web server, and right away I could see that the app was generating several thousand errors each hour, all of which were being written to a log file. Since this app was single-threaded, the work of writing the errors to the log file had to happen while each page was rendering. This was one reason why it was so slow.

This arms-length relationship needed to be closer.

Why did this app have so many problems? Well, when Robin requested a new feature, MegaStars would tell her exactly how much time was needed to get that feature done. If they felt a new feature needed 30 hours to build, they would simply quote $15,000 as the price tag. Sometimes the new work conflicted with old work and generated new problems, but that wasn't in the estimate and therefore the new problems needed to be ignored as much as possible. This tactic of ignoring new problems had been going on for many years. Additionally, much of the code base was now out of date and suffered version conflicts whenever some parts of the system needed to use newer libraries of code (which in Ruby On Rails are called “gems”).

MegaStars could have said, "Pay us $100,000 and we will clean up all of these problems." But then Robin might ask, "Why did you allow these problems to exist? What are we paying you for?" It might seem like a scam, if MegaStars asked for more money to fix the problems that they themselves had created.

Here was the central dynamic of the situation: Robin felt she held power because she could terminate the relationship at any time. In fact, all of the problems in the relationship were because she could end the relationship at any time and was leaning on that fact as her main way of getting compliance. MegaStars was unwilling to commit to the long-term health of the software while Robin was constantly threatening to fire them.

When you work with an outside agency, they typically can't or won't go back and clean up the code, because the customer is not willing to pay $500 an hour for that work. Some of the better agencies try to include the clean-up work in the overall price, but then those agencies seem expensive — and they get undercut by other agencies that are willing to do the absolute minimum, even if that means writing poor-quality code full of errors.

More one-on-one meetings would have helped

In many ways, the situation was worse than what I’ve already described. “Robin asked MegaStars to add a new feature” – what does this really mean? As a practical matter, the real process was something like this:

    1. The staff hated the CMS.
    2. Occasionally the frustration was so intense that it bubbled up to Robin.
    3. Robin would convene a large meeting, including all team leads and their assistants. 
    4. Robin would give a speech emphasizing the need to control the budget, plus various warnings she had received from MegaStars – without doing a full re-write, MegaStars felt there was a limited amount they could do. Plus a full re-write would be too expensive. 
    5. Then Robin opened the floor to suggestions. 
    6. Everyone threw out some ideas, but without any knowledge of how much a feature might cost, and no real idea of what the budget was, the staff tended to engage in self-censorship. 
    7. Robin would pick three or four ideas that seemed interesting, then send them in written form to MegaStars.
    8. MegaStars would send back a cost estimate.
    9. Robin would then approve whatever items she felt were within the budget.
    10. A new contract would be signed between Robin and MegaStars, regarding the next batch of work. 
    11. MegaStars would deliver the work, but without cleaning up some of the long-standing problems. 
Please note, this is not a rant about out-sourcing. I’ve seen companies have great results while working with an outside agency. The real issue is this: if your company depends on an outside relationship, then that relationship needs to be a close, long-term, trusting relationship.

There were several factors that caused things to get so bad at Open Verse:

    1. The CEO was an industry legend, but rather elderly, so she pushed most of her responsibilities onto her COO. Robin was therefore spread thin with too many responsibilities. 
    2. The CEO and COO had spent much of their careers in print publishing, and were slow to realize how different ebooks were. (Books that sold well were more topical, less based on the prestige of the writer.)
    3. Robin was very slow to realize how much the organization depended on the CMS. She herself didn’t use it, so perhaps she didn’t realize how painful it was for staff to have to wait 60 seconds for a page to render. 
    4. Robin thought her power, regarding MegaStars, lay in the fact that she could fire them. In fact, this was a source of weakness in the relationship. 
It does not matter if your company has an internal tech team or works with an external agency, if you are the COO, be prepared to have long one-on-one conversations with whoever is heading up your software development. Obviously the COO is going to push day-to-day management of the tech team to someone else (a CTO or a project manager who can operate at a high level) but then the COO needs to be in frequent contact with that person.

Who should accumulate requests for new features in the software? That should be the CTO or project manager, not the COO. It should be a regular, on-going process, not an occasional ad-hoc event. It needs to be someone who has the time to sit with those making the requests, talk to them one-on-one, and translate what they claim to need into what they really need.

One way or another, the only path forward for Open Verse Media was to find someone who could manage the software on a day-to-day basis. There were two possibilities:

    1. Hire a project manager and let her manage the relationship with the outside agency. The project manager could focus on building a close, trusting relationship with that agency.

    2. Hire a CTO, plus several software developers, and bring all software development in house.
Open Verse Media decided to go with the latter option. They fired MegaStars and instead hired a CTO plus several software developers. This should have given Open Verse Media the ability to move forward with faster and better software, as well as the ability to imagine software projects much more ambitious than anything that had been possible in the awkward and distrustful relationship with MegaStars.

As it happened, Open Verse Media hired the wrong person to be CTO. This person was an egomaniac and very controlling. This irritated the software developers, and after eight months there was a mass exodus where the whole tech team quit. If the goal was to get a team that could care about the software over the long-term, choosing the wrong person to be the team leader undermined the intent — a fact that keeps us from drawing any easy or simple conclusions from this story, regarding the benefits of out-sourcing versus in-sourcing. It evidently isn't true that bringing the work in-house ensures the project will go smoothly. There remain other factors that can sabotage the situation. However, the fact remains that the relationship between the COO and MegaStars was unable to be productive because of distrust between the parties.

excerpt from this book:

https://www.amazon.com/meetings-underrated-Group-waste-time-...


I enjoyed this story quite a bit. Thanks for sharing!

(I don't think it was answering the question, which I took to be about build-it-yourself vs. buy-something-prebuilt, not outsource-to-someone-else, but I enjoyed it nonetheless.)


Good rules of thumb, never build anything related to:

- Time/timezones

- Taxes

- Encryption


Ugh my first work project fresh out of college was all about timezones. Scarred me for life.


DIY monitoring solution. Every service had its own way of gathering metrics that were pushed into another home-made service where it could be fetched using a half-assed query-language and rendered using, you guessed right, another home grown Javascript graphing library built with raw D3 calls. The whole thing required one full time guy just to keep it running and to tweak whenever someone needed to make a new dashboard using anything beyond the most basic queries.

Just the graphing library alone would have been worth it to buy rather than develop ourselves.

The rest, obviously Prometheus and Grafana are way easier to set up, integrates with other third tools and is much more powerful and battle proven than something developed from scratch, at no cost either.


Kind of in the Bought & Built: I fell for the Kickstarter trap and bought a 3D printer kit. Spent almost a year of tinkering and upgrading but it never worked reliably (always more to fix). Bought a few commercial ones which worked (mostly - 3D printers are frustrating).


I see several posts with disclaimers, it makes me wary enough to dig in a little bit before following any advice in this discussion. (I appreciate the honesty.)

______ is hard, you should buy it. Btw, I happen to work for a company that sells it!


Not my decision, but:

Company had bought CloudShell test automation system from QualiSystems, after struggling with it for a few years and hitting pretty much every possible road bump along the way it was scrapped along with all artifacts which were accumulated inside it, because of course none of them are transferable. Or rather I wish we scrapped it already, in reality we are in the second year of this process and it will take at least one more year to finish. Replacement is just python code with some sane commonly used tooling where needed.


Build coupon flows. Every billing platform/provider/service provides coupons, but they never provide what seems like the obvious types of coupons you want, or the flow you would want, unless you are just starting out and have the most basic of coupon needs. If you try to use built-in coupons from your payment processor, after the 2nd or 3rd promotion you run, you are left tacking on custom functionality, writing custom coupon validation, checkout or activation flows, invoicing, or modifying customer account details.


Everything outside the core product should be bought. Building an email marketing product for internal teams does not lower costs nor does it give you a competitive advantage.


Some parts of my old project at work:

1- WebSocket Server / Service (Poorly designed, barely alive)

It was fine until it was not. It seems managing a lot of connections are harder than our team thinks it is. I still don't get it why we dedicated a couple of people to this for very long time. We should have used one of the existing services like pusher or signalr etc.

2 - Mobile Push Notifications Service only for our usage.

To be honest this was working fine but they designed it like to be one of competitors. Was not worth the effort.


Not my decision but a previous org invested years in a custom CRM system. Huge waste.

On the other hand. For a number of years we were running AWS manaaged redis. Way worse inspectability, worse performance, worse everything than just running it ourselves. And a tad more expensive too. It's easy to think "oh I'lljust buy it", but any time there is tight integration, things get hairy quickly...


F*cking Thingworx from PTC. Utterly useless, old, bloated garbage. My co-founder brought it in to our startup. I had to replace it when he left, although much to my surprise I had already built 90% of the replacement building software around it to fix the various shortcomings it had.

I will never get than $10k licensing fee back. Awful company, awful product.


On a long enough time scale - all of my decisions to reuse vs reinvent were wrong. At a given point in time though they all seemed fine.


A C# redis lib. For just the basic set/setex/del/exist and an auth layer and proper retry logic both the StackExchange and the (try-than-buy with command limit!) Other popular lib, I just made my single-file driver instead. I didnt need clustering and async. Much better and something I can trust, as well as much less code, so less error prone.


Every time a company moves from a paid logging solution to something in-house, like ES/kibana, I loathe it.

I still miss sumologic, expensive as it may be, I probably prevented many millions of dollars of churn with being able to investigate and fix issues quickly. Businesses aren't good at measuring this.


I really wish gmail and outlook would become so expensive that companies would start putting up mail servers again. Or maybe logging solutions should have so low margins that nobody in their right mind would do it in-house...


I have never in my life built something and then wished I had bought it.

I've bought something many times only to find it did not sufficiently address my needs.

To be fair, I've also never built something in the time I originally estimated, often off by a factor of up to 10, but the end result is always better for me.


In my personal life I don’t usually wind up thinking “I wish I had bought this” but I do usually blow both my schedule and budget such that it would have been cheaper and faster to buy.

In particular I’ve learned that if I’m doing something new, it usually takes me three tries to make something I’m really satisfied with.


> I have never in my life built something and then wished I had bought it.

I think that's normal. If you're in a team, usually your peers (or even other teams) are ones that would complain.


Hindsight is 20/20.

If things go well we made the right decision. If things go poorly we should have gone the other way.


You remembered me anecdote, govt decided to make really good auto in Tolyatti.

They have rebuilt all buildings, buy all new equipment and hire all Germans, but still got something awful.

So one day, decided: "there cursed place".

That's history of my life, I usually buy discounted top products but still supported officially, and usually they working excellent, only issue, quickly outdated software. For example, I use as photocamera old SGS4.

Few times I used new products, but from cheapest category (costs ~ same money as discounted top products), and this was terrible exp. For example, once, I have corporate phone, cheapest Sagem, and it was so awful, even unlimited tariff not helps.

And yes, I smart enough, to diy cellular phone.


Building our own ETL tool on top of Apache Spark.

It did work better than Informatica or whatever, but now we run 20k Spark jobs a day in the cloud, and it is incredibly wasteful to start whole Spark instance to touch ten megabyte csv file or database table.


I built a monitoring system. Datadog seemed way better and more immediately useful. Then we got the bill.

What I should have done is added three engineers to the project and made our own homegrown system better and more useful.


Hiring Devops contractors to put together any piece of infrastructure. Infrastructure is so core to your business, you always regret and have to rebuild. Tried twice now and regretted it both times.


Maintained an in house Java ORM for years, instead of moving to Hibernate like everyone else. Good for me learning lots of cool techniques, bad for company but it was their idea/plan in 1999.


Built a silencer.

Terrible idea.

There is so little margin in them that it’s definitely better to buy!

Even if you own an EDM, and mill, and lathe, and anodizing line, and don’t value your time at all - still buy it unless you are going to make ten.


So less so decisions I have made but I have made a career out of correcting such mistakes.

I'm a database systems guy fundamentally so in the previous ~3-4 companies I have worked out my main achievement has usually been coming in and ripping out a proprietary, either custom built or poorly selected hosted database and porting it to OSS. Generally to PostgreSQL but also more specialized stores like Apache Druid (replacing a custom TSDB).

The issue is that folks seems to get enamored with shiny or think a certain feature of a proprietary database is too hard to replicate on PostgreSQL. Currently that is Google Firestore which has a real-time capability. Replicating such capability on PostgreSQL isn't that difficult if you are aware of logical replication and the tools necessary for scaling it if need be (namely a distributed log like Apache Kafka or Pulsar). The end result is a system that is heinously expensive for what it does, poor at the sort of queries it needs to do and thus the system is riddled with workarounds for it's shortcomings.

In the past it also included RethinkDB, MongoDB, Cloud Bigtable, DynamoDB, etc. Some of which can be good datastores if they perfectly match what you are trying to do but most of which you should never touch with a 30ft pole cough MongoDB cough.

Generally speaking if you should almost always use PostgreSQL unless you know you need something else.

The other big one for me is porting from runtimes like Lambda and CloudFunctions to k8s (either on EKS or GKE). Both of those runtimes result in terrible architecture and aren't worth the ~2-3 days you save with initial setup of k8s.

For the more general question of build vs buy I see it this way. If something is core to the flow of your product the selected solution should be maximized for control and ownership. i.e favour build over buy unless there is overwhelming positives to buy or the component is so comoditized it really doesn't matter.

Everything purely line of business however should be Buy > Build. i.e Slack, GMail/GSuite, ERP/CRM, etc.

Example for buy > build because of superiority is Cloudflare > self-baked CDN. You can't match CF network, it has to be bought.

Example of buy because comoditized is low level infra components from AWS/GCP, e.g VMs, networking, LBs, DNS, k8s controllers, etc.

Stuff you should never outsource are authz/authn, runtimes (i.e never use Lambda or shit like it), databases (except relatively portable like RDS/CloudSQL), etc.

Murky stuff where it's not clear: CI/CD - lots of tradeoffs and spectrum here, hosted controller + self-hosted runners seems good sweet spot, email - generally need multiple upstreams configured with different sending domains to handle being occasionally randomly blacklisted, observability - it's a bitch to run yourself but hosted options are heinously expensive, generally come with restrictions that reduce fidelity directly or indirectly through cost minimization. Feature flagging - hosted services usually result in CORS requests which are slow and the SDKs often block application startup, better handled yourself but maybe getting started with something hosted is OK.


I would suggest you to learn about serverless platforms like Lambda and Cloud Functions. Hint: it's not about the setup.


I have moved multiple latter stage startups off of them thus I am intimately aware of all of their capabilities, limitations, etc.

You should calculate the $/CPU/RAM of such platforms vs k8s on spot/pre-emptible instances and it clearly becomes obvious that you can't ever possibly be cost effective on any of these runtimes even with steep 50%+off-list discounts.

So no I don't think I need to learn more about serverless, I think you perhaps need to learn more about k8s and modern container orchestration.


I work at a enterprise SAAS startup that uses lambda for backend. I agree with a lot of what you say.

But I also disagree with you. Lambda ends up being very cost effective at the place I work at. Its a payment processor so lambdas are only ever run in response to approved transactions. For this sort of SAAS the economics of lambda work very well. I'm not sure what would be better. Of course if I were running a public facing free web service I would not use lambda, but for something where every user is a paying customer? I think Lambda is very cost effective. And if it ever got to the point where moving to k8's would result in a massive cost-savings, I would just pay a guy like you to handle the migration. At that point its probably not a small startup anymore.


Yeah that is fair. I think the contention I have with it is those runtimes generally don't save time or money but come with more constraints - constraints that usually lead to poor architecture decisions.

Examples I have seen include:

Massive/mono-lambda - this is caused by the overhead of managing many functions so instead everything gets packed into one with some sort of router logic. The usual cause of this one is managing many Lambda/FaaS things becomes too hard, especially if you are also using API Gateway and friends.

Resource misuse - the nature of these FaaS runtimes is they are essentially glorified cgi-bin, in all the worst ways. Namely they tend to mask memory leaks and other poor programming habits until they start leaking through as spurious failures.

Poor observability - because FaaS results in unpredictable lifetimes you end up making concessions in your observability stack. Can't do in-process rollup, can't do pull model metrics, i.e Prometheus, etc. You can somewhat get away with this with doing full tracing and sampling on the collector side instead but you have just made the whole thing inefficient for...what exactly?

Unnecessary cron-like things - this usually comes from places that try to go all-in on FaaS and squeeze everything into FaaS shaped boxes. Something that would much better be a daemon that spreads it's work over time cleanly becomes a cron-like thing that fires every minute, usually spawning 1000x of sub-FaaS things to accomplish some maintenance task paying huge amounts of overhead. Again... for what?

Poor DB connectivity options - again linked to unpredictable lifetimes and "unlimited scale" you don't have as many options for DB connectivity, you have to invest in bouncers immediately even if your application would actually be fine running on just 10 connections or so... if you had 10 containers instead of potentially 100s of lambdas.

Poor resource sizing options - sometimes you just need a fk-ton of RAM to solve your problem. Most of the FaaS providers limit not just your max runtime but also max resources you can grant said function. Leading again to sub-optimal work splitting for memory hard problems.

The list keeps going on tbh, this is far from exhaustive.

At the end of the day the tradeoffs aren't worth it. The only thing of value you get in return is "unlimited scalability" which is easily approximated with k8s primitives.

I really don't see why k8s shouldn't be the default option for anyone considering these FaaS things given GKE/AKS/EKS exist and do 99.99% of the work for you and are WAY less complicated than the FaaS runtimes and their associated requirements.


I think the biggest one I got wrong was in a previous job, I spent a lot of time writing abstractions and helper functions to reduce the amount of boilerplate I had to write in the name of saving time overall. In hindsight, I should've just bought Copilot or some other AI autocomplete tool to both save me time as desired and make the code easier to follow.

Disclaimer: motivated by this, I'm now building Codeium (https://www.codeium.com), which is a free alternative to Copilot so that no one needs to consider cost as a reason for not using this tech.


Interested in hearing any tales of build vs. buy for CMS. Anyone have anything to report? Medium/large size company usage would be interesting to me.


I've worked at 20 people companies and 2000 people companies, and can never recommend building a CMS. There are so many out there, and the one you build will just have a different limitation to the one you buy.


I second that. Any CMS-like system that you attempt to build with some customization will fail to implement some core elements of a CMS that you can adopt - be it versioning, publishing, workflows, etc.


Yes. Buy a CMS, and then cache the ever-loving shit out of it so it’s flaky slowness can’t affect your production delivery


(If we're talking about physical things) trying to build/fix anything has always been a failure, a waste of time/or money.


I have never regretted a build decision. I have often regretted a buy decision.

Software that I (or the team) built just works. And is a perfect fit for our needs. External software usually isn’t a perfect fit. And bug fixes can take anything from months to infinity to get resolved.

An example was a commercial game engine we used for a PC/Console game. It was buggy and slow. We sent the developers code fixes and speed improvements. They ignored it. We ended up ripping out their engine and replacing it with our own.


I always cover this with post purchase rationalization. You just cant go wrong with that!


Do you build a new rationalization each time or use an off-the-shelf one?


I went with an inhouse dsl. It's fully self-rationalizing and recursively optimistic.

Everything has something good however small then you integrate those things into your personality and delete the timestamp.


My own Node+Express REST API backend, model declaration format and RBAC. :-)


Most things but more recently self-serve promoted listings


Oh I have such a good answer for this one: a camper van.

TLDR: I wanted to travel the US and explore national parks in a camper van. I'm handy as a builder and knew that I could, technically, build everything. Turns out that the challenge wasn't in the technical aspect but rather the sheer volume of work involved.

A little more detail: Having grown up around 4x4 vehicles, I wanted something four wheel drive. No full-sized American made vans come from the factory as four wheel drive. Even if you find a new 4x4 van on the lot, it will be an after market conversion. This means that they're harder to find and, thus, cost a bit more than your typical 4x4 truck. But I was determined. I found a 1987 Ford Econoline in rough shape for around a grand. I bought it, named it "Polar Bear," and set to work on it.

One of the biggest setbacks for the project was the ongoing expense and hassle of repairing an old van with a custom conversion. I learned more about automotive mechanics with this vehicle than anything else I've owned. Still, a lot of repairs were well beyond my scope and I ended up spending tens of thousands of dollars rebuilding various components. Repairs were a near constant problem and this drastically slowed my build process.

Another hitch in my plan was fuel economy. Polarbear would take me a mere 9.5 miles on a gallon of gas. With a 3-speed transmission, the engine was always running at a high RPM. The alternative factory transmission with an overdrive wasn't as strong as the 3-speed and the gas mileage was only nominally better. After extensive research, I learned that a manual 5-speed swap could be done and would increase my MPG to over 20. However, with all of the expense and hassle of installing a clutch, I never took on this endeavor. Ultimately, I never did that national parks tour. I did, however, go a whole heckuvalotta cool places.

The camper conversion, which was supposed to be the focus, took me a solid ten years to finally complete. There are a lot of rather sloppy looking camper vans in the world and I really wanted mine to look good. This meant taking time on the details. And oh my, where there details. Unlike a box truck, a cargo van's walls are all continuously curved. Cutting wood to smoothly fit the curves of the walls is tricky and takes time to figure out. I also learned a ton about automotive and RV electrical systems in the process. When I began, I imagined a very complex system. I quickly learned that the more simple the design, the better the design. Everything has the potential to break and the more complex it is, the more likely it is to break. In the end, I decided to entirely skip water pumps and simply went with a gravity based system. And despite having a propane tank mounted to the van, I opted to use a portable camping stove instead of running more propane lines. In my opinion, these were good decisions.

Over the course of ten years, I spent enough money to have bought a nice completed rig right from the start. At the very least, I could have purchased a completed 2wd camper and had it converted to 4x4 for far less money. This would have also given me a more modern 4x4 drive train and suspension.

Still, I have no regrets. What I learned was priceless and the adventures I had along the way are some of my best memories in life. I finally sold the van last year for almost 10x what I paid for it - but far far less than I had invested in it. The following link is the build thread I posted to the Sportsmobile Forums for anyone who may be interested in seeing what she looked like:

https://www.sportsmobileforum.com/forums/f24/polar-bear-1-a-... (the last page has the final pictures)


Yes, this is exactly why I bought my campervan (2017 promaster 2500) already built out by a youngish kid who wasn't a professional, but sure made nice instagram pictures.

It certainly cost me a lot more than just doing it myself, but by doing it this way, I achieved two goals: immediate use for camping and the ability to figure out what I would do differently by living in it first.

There are a few things I learned pretty quickly:

1. Get rid of the "fancy" toilet. Just use a bucket with eco bags that can be buried or tossed into a trash can. No cleanup mess, lighter, and frees up a lot of valuable space.

2. Moved the sink from the door side and put it on the other side of the van next to the stove (there was an uncomfortable seat there before). Again, freed up a lot of space and made better use of it.

3. Removed a lot of hidden unnecessary support wood. Like in the backs of cabinets against the walls. People don't realize how heavy that stuff is and every extra pound affects van handling and gas mileage.

The list goes on and on, but one thing I did learn after taking my van into cold climates... insulation is critical and very hard to add after the fact.


Typically, home renovations and custom vehicle builds/resto-mods have negative payback. You do them for the personal enjoyment you get from the end result, which it sounds like you got.


You can't buy some locations, so home renos oftentimes make sense.

Vehicle builds rarely, if ever do. Unless you absolutely, positively, must have an electrical motor in a Javelin or something.


Yes but it's worth understanding that houses rarely sell for more than average in the area. An "overimproved" house is likely not going to recoup the money spent. Of course if the house is very dated or hasn't been kept up, simple updates like paint, carpet, new appliances, etc. can be inexpensive enough to be worth doing.

If you're renovating because you want to continue to live in and enjoy the house, it's more a personal question of whether spending the money will increase your happiness.


Thanks for posting that. I clicked through a bunch of pages and read a decent portion of your journey.

I really want to try the van thing but there are a few hard factors preventing it for me. So, I live vicariously through such accounts from time to time.


Have you tried renting? I love the folks at GoCamp [1] (no affiliation, I've just rented from them) and they've been great even after getting acquired by Storyteller.

[1] https://gocampcampervans.com/


Thank you for the suggestion! Health issues in the family make this sort of thing impossible for me.

But, I think it's highly possible your suggestion could help somebody else reading this and in any event, I definitely appreciated your effort. Thank you.


A lot of stories about building and wishing you had bought. I'm going to go the other direction. I have a story of buying something that was later scrapped and brought in-house.

I started blogging in earnest in 2005. At the time, I thought building my own blogging tool would be too much work, so I looked around for options that would allow me to store my content locally in source control. I settled on Blosxom, a Perl-based tool. Static site generators weren't really a thing in 2005, or I probably would have gone with that.

Blosxom served me well for 15 years, but it had problems. It ran on Apache, and every time I upgraded MacOS, I had to fiddle with Apache config to get it working again. It was customized with various plugins, and over the years I accumulated a lot of plugins and tweaks, to the point where I was afraid of making changes for fear of inadvertently breaking something. And it was very slow. I have a huge amount of content—I just checked, and it's currently at 6.9MB of text—and Blosxom couldn't handle the volume.

These problems meant that I eventually got to the point where I couldn't realistically make changes to the website any more. Blosxom had to be replaced.

Let's set that story aside for a moment. Back in 2012, I started a subscription-based screencast called "Let's Code JavaScript." To learn more about Node.js, I decided to build the website for the screencast myself, rather than using something like WordPress.

This code was a pleasure to work with. I had written with tests, so it was easy to change and evolve. Starting around 2018, I decided to start evolving to support more than one site, with the goal of eventually migrating my blog off of Blosxom. I did this gradually, in my spare time, as kind of a fun side project. Mostly it involved cleaning reversing questionable design decisions involving globals, and cleaning up bad tests so I could do so.

By 2020, the codebase had evolved into a general-purpose content management engine. I migrated my content from Blosxom. In 2022, I migrated a PHP website a third party built for another business I was involved with. Now the system handles three websites, all on a single server: letscodejavascript.com, jamesshore.com, and agilefluency.org.

The result is so much better than using Blosxom, and I think it's better for my needs than anything I could buy. It's well tested, has a great development feedback loop, and any feature I want is a small code change away. For example, I replaced Google Analytics with simple email alerts that inform me when a piece of content gets a surge of attention. It was an hour or two of work.

I don't think there's any grand lessons to be learned from this, other than the one lesson everybody forgets when they're contemplating build vs. buy: it's not "the cost of building" vs. "the cost of buying". It's "the cost of building + maintaining" vs. "the cost of buying + the cost of keeping up with vendor updates + the opportunity cost of some feature changes being unsupported."

In my case, in 2018-2020, "build it yourself" was the right decision, because I had a robust codebase that I could evolve into a content management engine. In 2005, "buying" Blosxom was the right decision. I don't regret either choice, and I'm happy with where I've ended up.


this is an incredible question and ought occur periodic'ly on here


yes


You created an account just to say this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: