Hacker News new | past | comments | ask | show | jobs | submit login
Lock-In and Multi-Cloud (tbray.org)
120 points by mooreds on Feb 10, 2022 | hide | past | favorite | 135 comments



I think there's a missing angle from the engineer's point of view.

I have no interest in developing skills that are essentially just shallow knowledge of a single proprietary vendor's product brochure. I want to develop skills that are highly transferable, fundamental skills that allow me to continuously build over a career. I also think there is danger here for the profession if engineers allow their skills to be pigeon-holed and segmented that way. It's already the bane of our industry ("oh you only know React, we are looking for a Vue developer ...") that recruiters look at shallow skill segments instead of fundamentals.

For that reason if nothing else, I'd always want to work on Plan C - cloud providers as suppliers of standard commodity infrastructure. Possibly in a managed way but still standard things that, after I spend years acquiring knowledge about them, will have some value when I show up at a different cloud provider or need to design for internal infrastructure.


You touch on a point that has been bothering me for a while.

In the 90s I remember what it was like to be in an industry beholden to MS and other proprietary software. I worked for an MS partner and did the cert thing etc but it felt uncomfortable to me. There were lots of people happy to soak in everything MS and ignore everything else. They were basically tiny pawns in a global Microsoft sales strategy. Most of their knowledge was product details, and how to glue multiple MS products together.

I switched to the FOSS world to get away from that product focus and lockin. I enjoyed the freedom of evaluating and choosing tools and building things from interchangable layers. I wasn't beholden to any vendor and was part of a community instead. As Linux later grew into a mainstream OS it worked out great for my career too.

Now with everything in my circles moving to AWS and moving into higher level managed services with massive lockin, the creeping notion of everything (incl the certified engineers) turning back into proprietary products is reminding me of how I felt in the late 90s about MS.

It's all in the name of efficiency, simplicity and value for money. But I'm not seeing it play out like that. Our Infra/Devops/SRE/etc teams seem to be getting ever larger and ever more swamped in ever complexifying CI/CD pipelines and endless freaking IAM policy tweaking and hand holding of self service development teams. Meanwhile the annual AWS bills dwarf our previous on prem colo and hardware replacement budgets (eg 10x), even though we haven't saved anything on staff costs as that has gone way up too.

We're back to just gluing vendor products together again, with general purpose tech knowledge being replaced by proprietary product knowledge. And the new generation of engineers and managers that don't remember the 90s see all this as progress and not a problem.

I am now an "old man despairs at cloud" meme....


Yeah it's history repeating. I've said it before on here and I'll say it again I don't think running on AWS will ever be cheaper than DIY. It's kind of a fallacy and a sales tactic by AWS to tell you how the elasticity of their offering allows you to "save money during the quite hours". I don't think I've ever seen a system that actually does that.


The savings usually come from the corners being cut on DIY solutions: not accounting for the time spent on infrastructure services, skipping various security and monitoring features, sacrificing reliability, etc.

You can definitely beat it with certain workloads (e.g. high-volume egress) but I’ve seen more cases where someone thought they had beaten it but was really comparing unlike things (e.g. comparing S3 to unreliable storage in a single location) or leaving out operator time.


Cutting corners assumes not doing things that are necessary. What I've found is that systems in the cloud are often unnecessarily resilient in areas which they do not need to be.

The example of S3 is apt as often ephemeral files are kept in S3 in which case they don't need the reliability storage guarantees that you'll get with S3.

I've also found that just because you're paying over the odds for AWS's SLA doesn't mean jack to your systems actual uptime. All systems I've dealt with in AWS / Azure have had unplanned outages. I think almost all of these have been problems in the software itself nothing that AWS will help you with.


> Cutting corners assumes not doing things that are necessary. … The example of S3 is apt as often ephemeral files are kept in S3 in which case they don't need the reliability storage guarantees that you'll get with S3.

… or which didn't seem necessary until something breaks. For example, the cost of using S3 for temporary files is often negative because in most cases your usage charges are going to be lower than keeping a sufficiently-large storage volume provisioned 24x7 for it, and if you know that you don't need geographic redundancy you can use the one zone storage for that. In all cases, you'll benefit from things like bitrot protection which that scratch storage space doesn't have built-in.

> I think almost all of these have been problems in the software itself nothing that AWS will help you with.

Definitely. What you get from a cloud environment is better tools for dealing with this — for example, shifting load is easy and making it trivial to get new resources makes it easy to do things like high-fidelity testing, blue-green deployments, rolling reboots, etc. which most organizations struggle to do efficiently in a traditional data center environment.


Optimistically, those unplanned outages are software errors, instead of data center management errors. There is an impact to your uptime by using AWS, but it's relative to having the same thing runing on not-AWS rather than an absolute value of uptime. The scope of what you can screw up is smaller


The counter to this is that companies rarely cut ops staff when moving to the cloud. The problem is that you still have security and monitoring knobs to fiddle with in a cloud. And the staff you had is worse at fiddling with these knobs than the old knobs that they knew well. Plus, you've now added a whole new category of cost optimization problems.


> And the staff you had is worse at fiddling with these knobs than the old knobs that they knew well. Plus, you've now added a whole new category of cost optimization problems.

More typically, it makes those problems visible — you get a bill from AWS every month in a way which you don't get a report that 70% of the hardware in your data center is either under-utilized or being used inefficiently. I think cloud migrations often suffer from that as a source of attribution error: “OMG, we don't have perfect security in the cloud!” “Aren't you running a production app on RHEL 6 under the account of someone who left 8 years ago?” “yeah, but we know about that!”


I'm talking about things like exorbitant egress fees changing how you have to architect your systems. Like at a previous job we had e-mail servers running in every AWS region our product ran partly because of egress fees. If we had run them in 1 (or a couple) regions we could have used a lot fewer machines due to demand smoothing and sharing things like Redis clusters.

It also caused much more operational complexity. Being in 10 times as many regions meant that we got hit with data center issues 10 times as often. Similarly, we had to run 10x the instances instead of going with fewer bigger ones. Ultimately that means we got 10x the issues caused by random things like disk failures.

Our alerting got much harder because the volume was smaller since it was split into 10 parts. The smaller the baseline is the harder it is to detect problematic variations. We frequently got paged because some customer in a small region did something unusual in the middle of the night or someone tried an outgoing spam attack.

That's one example but there's a whole category of "avoid getting extorted" problems that arise because of AWS's predatory pricing.


I think companies like Cloudflare and fly.io are trying to tackle some of those multi-datacenter complexity issues you're describing. Those ridiculous egress fees appear to have peaked as well since Cloudflare announced their zero egress S3 competitor.

I believe the way forward is not going back to self managed infrastructure. It looks like there will be sufficient competition to limit the most egregious price gauging. The way to avoid lock-in is to build on APIs that are at least somewhat portable.

I know the portability devil is very much in the non functional details. But I think we'll see more of what happened with S3. Everybody will be adopting the more popular APIs. Over time we will converge on some new POSIX-like industry standard for the cloud.


> Like at a previous job we had e-mail servers running in every AWS region our product ran partly because of egress fees. If we had run them in 1 (or a couple) regions we could have used a lot fewer machines due to demand smoothing and sharing things like Redis clusters.

Were you either sending a huge amount of email or was this something related to the internal traffic needed to generate each message? I've definitely seen this in other contexts but am impressed to see email driving that (and, of course, preventing use of SES).

Note that I said “more typical” — I would define “cutting corners” as something like having a production service running with only one instance but once I go greater than one I'd be investing in automation anyway for reliability, and that would have included things like health checks & automatic rebuilds. It sounds like your business was pretty aggressive in region usage and I'd be curious what was driving that many regions.

> That's one example but there's a whole category of "avoid getting extorted" problems that arise because of AWS's predatory pricing.

Don't take anything I'm saying as defense of their egress pricing. My point was just that cloud pricing is very visible and that can lead many places to focus on that without accurately assessing what they're currently paying for their own infrastructure.


I share your frustration, but you have to factor in what we get for all the complexity. IAM roles are much more secure than "pwn one, pwn them all", containers are much easier to develop and deploy with, etc.

I'm not saying it's worth it, but we should evaluate it on those merits. I, too, have a suspicion it's not, but maybe the larger ops team on AWS can run circles around the smaller ops team on on-prem? There has to be a reason, after all.

> I am now an "old man despairs at cloud" meme....

"At cloud" indeed :P


It's not the cost or complexity that really bothers me (that was just a side rant hehe), more the vendor "productisation" of all the tools. After all, in many ways open source stuff is often lower level with the extra complexity that involves.

With open source tools (hell, even they are often turning into products now) there was always a feeling you could dig in and understand them, then join the community and contribute, and even shape their future. Now tools are just things to consume on the vendors terms.

Maybe the open pendulum will swing back again in 10yrs - probably at the hands of something like the Cloud Native Foundation, rather than the last peak 10yrs ago driven by Linux distros and some key apps/languages as well as dissatisfaction with MS etc.


Ah yeah, that's frustrating to me as well. The network is now the OS, and it's proprietary.


I also did the 6 developer MS cert thing in 2010. Not because I thought certifications were valuable. Just as a guided learning path. I never told anyone or put them on my resume. I was transitioning from a decade of C bit twiddling.

I did the same thing 8 years later to learn the AWS ecosystem. Where I now work 2 years later in ProServe (just disclaiming my biases). There is a chance I could retire from AWS or I could move on to the next hype cycle. Managing a career in technology is about staying relevant.

It’s not like my vast knowledge of 65C02 assembly that I learned in the 80s in middle school comes in handy in 2021.

However “my general tech knowledge” has served me well as well as my ability to know what level of the stack provides a business an “unfair advantage”. Managing infrastructure is seldom it unless you’re a DropBox.

Every company is dependent on dozens of vendors for their operations.


I found that even a proprietary product can improve your skills in a general way, and it has the advantage of being pre-selected for "what works". Anecdata: Google Firebase, specifically Firestore. On a superficial level, it's just another NoSQL database and you'd probably (rightfully) realize all the things that MongoDB can do that Firebase cannot. OTOH, I learned two things from it that I didn't from previous work with MySQL and PostgreSQL. (Note that I did not previously work with MongoDB beyond some tinkering, so maybe that would work as well).

The first was data-change notifications, i.e. getting notified when a record changes. I toyed around with this idea with MySQL some time ago, masquerading my server as a replica server, but generating derived data form it. IIRC PostgreSQL has something similar with WAL streaming. With Firebase, I learned how this can actually work and convinced me that the idea is sound. It definitely encourages me to actually do this for real when I work with PostgreSQL again.

The second is what PostgreSQL calls row-level security, i.e. using the data in a row to determine required permissions. In Firestore, this is the only kind of permission control (except for all-in admin accounts), and it showed me how this easily prevents a whole range of security issues that can occur through bugs in a back-end server.

The specific coding around Firestore is shallow as you said, and non-transferable, but understanding what is possible is something that will definitely help me in the future, with other databases.


This might be true in some cases. For example, I'd argue that doing a Cisco CCNA/CCNP certification is going to teach people not entirely familiar with the matter a ton of useful stuff about network technology in general even if they go to work on a Juniper site.

On the other hand, I've rejected all certs that relate to "_______ certified architect/developer/whatever", most famously pushed in our company as the "AWS certified in-house advertiser".

In any case, those certs are extremely lucrative business.


Disclaimer: I work at AWS in Professional Services take my bias as you will…

I’ve been developing professionally for over 25 years and 10 years before that as a hobby. My goal has always been to solve business problems. Infrastructure is just “undifferentiated heavy lifting” to me. At my second job over 20 years ago, we sold hosted storage because we had to and we had to maintain our own SAN, infrastructure, databases, 20 or so application servers, DB servers, etc. That wasn’t part of our core business. Today, a company would never do that themselves.

I went from belatedly opening the AWS console in 2018 for the first time, to rearchitecting my previous 60 person company’s entire architecture to be “cloud native” and “serverless” so we could effortlessly scale as we moved into selling access to our microservices to large health care systems. Being able to scale came in handy in 2020 when there was a little worldwide pandemic. They were acquired 6 months after I left for 10x revenue. They weren’t concerned about “cloud lock-in”. My CTO who championed the charge to go all in on AWS - and is still very up to date technically - is in his 50s.

Two years later, I got a job at AWS. There is nothing magical about the concepts of learning AWS. Most of the services are just hosted versions of well known open source tools and the rest are conceptually the same as I have done for over 25 years. I didn’t get my job because of the vast knowledge I had about AWS from 2 years working at a 60 person startup (please note sarcasm). I got it because I knew how to lead development and deployments and do “application modernization”.

I spent half my loops discussing how I had used Hashicorps Nomad, Consul and Mongo to create a master data model at a company that didn’t use AWS at all.

There are plenty of people who are hired in my department who have no prior AWS knowledge. But are subject matter experts in their domains.

I’m 100% sure that if I applied for a similar role as mine with MS/Azure or Google/GCP they wouldn’t be deterred from hiring me even though I know nothing about either.


There are plenty of PHP developers that focus on PHP careers, C# developers that focus on C# careers. Windows developers that don't develop for Linux, and vice-versa.

The cloud providers are just one abstraction above of OS's. AWS is Linux. Azure is Windows. GCP is Mac. No, not really. Just as an analogy of approximate customer market share, and again illustrating that there are plenty of people who could build careers on specializing in only one. Most developers, even.

Then there's something like Kubernetes. Which, extending the analogy above, makes me think of HTML5/JavaScript development with Electron - write your logic once, run on any OS (or Cloud Provider).

Now Electron is great, and web development is clearly very popular. But would you recommend that from an engineer's point of view, focusing on Web Development and Electron is the only way to build skills that are "highly transferable, fundamental skills"?

Now if by transferable skills, you meant actually the underlying OS/networking/infrastructure/data center skills, well then that's A) not really cloud skills, and B) soon will be as niche as C/Assembly programmers - still vital, essential, and widespread - but also a tiny minority.


> Windows developers that don't develop for Linux, and vice-versa.

Heh, the vice-versa reminded me of: A profitable, growing, useful, legal, well-loved... failure (2012), https://apenwarr.ca/log/20120326


That seems like pretty much the entire point of containers and/or kubernetes.


...and of HashiCorp, more aptly.


I think HashiCorp is made fairly redundant by Kubernetes, but yes if you don't want to adopt Kubernetes, you're correct. And that’s totally valid.

Kubernetes is already multicloud, and you don't need Terraform, Nomad, or Consul if you're using k8s. I would argue Vault is the exception.


I understand your point about career vendor lock-in, but I am not sure if it is as big of a problem as you may think. If I were a hiring manager for a team using Google Cloud or Azure and I saw a candidate with a strong background using AWS specific services such as Lambda or DynamoDB I certainly would think they can handle whatever GCP/Azure's versions are too.

Heck, AWS in 2022 has changed quite a bit since AWS 2018, so for better or worse we all need to keep our skills fresh even if we are locked in!


>> I think there's a missing angle from the engineer's point of view... I'd always want to work on Plan C - cloud providers as suppliers of standard commodity infrastructure. Possibly in a managed way...

tbray did address this point with a different question:

So if I'm in technology leadership, one of my key priorities... is something along the lines of "How do I succeed in the face of the talent crisis?"

I'll tell you one way: Go all in on a public-cloud platform and use the highest-level serverless tools as much as possible.

You'll never get rid of all the operational workload. But every time you reduce the labor around instance counts and pod sizes and table space and file descriptors and patch levels, you've just increased the proportion of your hard-won recruiting wins that go into delivery of business-critical customer-visible features.


> Here’s an example: Last year, AWS dramatically reduced data egress charges, under pressure from CloudFlare, Oracle, and of course their customers.

This is an argument for how bad lock-in is though!

AWS was fucking everyone on egress price and nobody could do anything about it. It was not dropped until a serious alternative to the moat came in.

That’s not an example of some innovation or anything like that. It’s exactly like the Microsoft and oracle examples where the cloud is just extracting as much as the market will bear.


> AWS was fucking everyone on egress price […]

And they of course still are, as they didn't change anything related to the prices, but just increased the amount of traffic in their free tiers. That removed the pain for toy projects and workloads not requiring much egress traffic, but doesn't really make a difference for traffic heavy applications.

For example transferring 10TB/month from EC2 in us-east-1 to the internet did cost $899.91 before and costs $891 now.

For reference here is the announcement of these changes: https://aws.amazon.com/blogs/aws/aws-free-tier-data-transfer...


Because the reality is that the bulk of people complaining aren't established businesses who actually look at the pricing before buying but tiny startups and toy projects strapped for cash.

Sure it's highway robbery but they neatly segmented the market into the people who are and the people who don't and charged the former less.


Isn't it what we also do when we negotiate the highest possible salary? What's so terribly and inherently immoral to that?

To me this sounds a bit ideological-driven decision making.

The fact is they never raised their prices. People decided to use knowing the price. They weren't screwed afterwards. If the service is saving precious time and money, buying opportunity for the business, why would I ditch it?

Because AWS is extracting as much as they can? Great, I wanna use services from successful people. I just want them to come forward in advance about how much they cost, not later after I decided.


The argument is that this is rent-seeking. A mechanism like this allows the company to capture 99% of the value they add compared to their competitors, which is obviously not fair.


Rent-seeking involves manipulating public policy to artificially increase profits. This isn't really the case, is it?

The point sometimes people miss is that the value of a service varies to each person, business or team.

It might be that, for you, in the contexts you were in, they were capturing too much with their price to the point its not advantageous for you to use the service. That's fine, use something else. But it doesn't mean everyone lives the exact same experience as you.


Rent seeking doesn't require manipulating public policy, manipulating public policy is an effective method of rent seeking. I think vendor lock-in leading to higher prices is an example of rent seeking. The prices are going up not because that cloud provider is doing more/better things, but because of social things built around it


This isn't rent seeking, it's called sales. You may not like it, but it's how the world works and what puts food on our tables.

The problem I see with vendor lock-in is when, after you're locked, the service provider exploit this to change their terms (price and/or else) in a disproportionate and unjustifiable way against your initial interests in the service.

This never happened with AWS, to my knowledge (please correct if I'm wrong). Everyone knew the price when they subscribed. Little changed afterwards. AWS was able to meet a demand with quality enough products in an acceptable price to whoever decided to buy.

This is a commercial transaction. They're tremendously successful at marketing, sales and delivering, and that's OK. They don't have to be ashamed of that, there's nothing immoral or "rent-seeking" here. They're providing a service and selling it honestly, with transparent and predictable prices.

Lock-in exploitation happened with GCP at least once, that I know, with App Engine. And their service was doomed after that, as developers reacted against their practice and they weren't able to grow past a tiny fraction of the market.


> The problem I see with vendor lock-in is when, after you're locked, the service provider exploit this to change their terms (price and/or else) in a disproportionate and unjustifiable way against your initial interests in the service.

> This never happened with AWS, to my knowledge (please correct if I'm wrong). Everyone knew the price when they subscribed. Little changed afterwards. AWS was able to meet a demand with quality enough products in an acceptable price to whoever decided to buy.

Vendor lock-in does not require any changes to terms, price, or any part of the service or product provided. In most classically studied cases of vendor lock-in (e.g., Microsoft Windows and Office binary formats, VHS vs. Beta), everyone knew the lack of interoperability when they bought the service/good, and nothing changed afterwards. That didn't stop those cases from becoming "vendor lock-in".


I totally agree that lock-in is there. What I'm disputing is that AWS has exploited it maliciously to screw their customers. More specifically, this statement:

> AWS was fucking everyone on egress price and nobody could do anything about it

I disagree because AWS has been transparent with their prices since day one. Everyone who subscribed was aware of how much it cost. AWS didn't screw anyone by increasing prices after they were locked-in.


Good, we're in agreement.


> This isn't rent seeking, it's called sales.

It's rent seeking when you don't produce new value, you just leverage existing systems to extract value.


Sure, we can certainly agree on that.

But can we make it more tangible with an example?

Would you argue that AWS, for instance, is a company that produces no new value?

One might argue they add value, but their price is too high for the value they add. We'd go back to what I've said before: value perceived by a service is subjective and highly dependent on the context.


Clearly AWS provides a valuable and ongoing service with ongoing costs. I'd contend that the rent seeking aspect would be the difference between AWS's barest bones hosting and their more advanced/expensive offerings once that software (mostly) stops updating. There's not much difference between rent seeking and amortization.


"Leading to higher prices" is where the issue lies

The prices didn't get higher, they just didn't get lower by as much as they could have. That really describes profit though. While I agree that all profit is rent seeking in some way and thus unfair, it's not very useful for splitting up some practices as bad rent seeking and others as less bad


Yeah i feel like this point is often conveniently overlooked because it's just uncomfortable.. HN comments often lean towards maximizing comp, thus tacitly endorsing the company to maximize stock price so their RSUs are worth more. But they don't like seeing the other end of where the money comes from..


Sure, but don’t pretend AWS is purely driven by innovation and delivering value to customers. Just say that AWS is trying to extract as much money as customers are willing to pay regardless of how little something costs them because they have a moat.


That has nothing to do with lock-in, you can still do your business elsewhere and have egress.

It is more like the position of power AWS has; you can go elsewhere, but you'd have to choose not to make use of the facilities AWS provides. So you either pay for egress at AWS, or pay for reinventing the wheel elsewhere so you won't have to pay the same egress rates.


How does it not have anything to do with lock-in? They literally make it expensive to use someone else's services, outside of the AWS data center.


Lock-in means "making it hard to move your business data and processes to a different offering", not "network costs money".


> "making it hard to move your business data and processes to a different offering", not "network costs money"

Correction: "Network costs obscene amounts of money" ~= "making it hard to move your business data and processes to a different offering"


That's not hard, just costly. Hard would be "here is your data in an obscure binary format and we are not going to tell you how to decode it". If you want to make a point of it being expensive, just say that it is expensive. Expensive isn't hard, it's just a calculation.


> That's not hard, just costly.

"High cost" doesn't mean "not hard". "Hard" isn't exclusively owned by engineers; business decisions can also be hard.

> Hard would be ...

Stop right there. You're awfully close to gatekeeping what "hard" means, but you have artificially limited your vision to "hard engineering problems".

> Expensive isn't hard, it's just a calculation.

Everything is a calculation. The cost of reverse engineering (or using an RE'd method) is a calculation. The cost of switching is a calculation. Deciding where to stop when making a pithy comment on a forum is a calculation.

Here's a simple method to determine (in the context of lock-in) what's "hard": is it significantly preventing you from doing what you want?

Maybe you have a lot of money to burn in getting your own data out from a predatory silo, so it's not "hard" for you, but consider that not everyone might be in the same boat as you.


Nobody is 'gatekeeping' anything. Words have meanings, and have existed for a long time. Their meaning is well understood at exec, management and engineering levels and don't need to be redefined. You seem to assume that I'm writing about technology lock-in, which I am not. I am writing about the ability to use a different vendor. Not about runtime cost or integration.

Known runtime cost is part of normal OPEX. Having to reverse engineer something because your MRI vendor won't give you the specs to write drivers for the 64-bit PCI-X acquisition card is vendor lock-in. If you need more examples, just ask around in any organisation of reasonable size. And no, we are not talking about individual people here (just in case you were making that assumption - it's hard for me to detect that in your writings).

If you want to change the meaning of what vendor lock-in means (it already has a definition, which I have written down a few times for your ease of not having to look it up yourself), good luck trying that out in your own context.

Just for the sake of completeness, here is a random paper that explains it in extra detail: https://personal.utdallas.edu/~liebowit/paths.html you could have found that relatively easily, even Wikipedia refers to it making it highly discoverable. If you don't like papers that are published by the author, feel free to request a copy at the Journal of Law, Economics, and Organization. If an explanation, reference or paper isn't good enough for you, why bother posting at all?

Perhaps the simplest distinction might be: you keep talking about OPEX while vendor lock-in is vendor-controlled CAPEX to 'get out'.


> Words have meanings, and have existed for a long time. Their meaning is well understood at exec, management and engineering levels and don't need to be redefined.

The world progresses. Language evolves (yes, even legal language, often understood to move glacially). But I'm not redefining anything here. "Vendor lock-in" is well understood at exec, management, and engineering levels and doesn't need to be artificially restricted to your limited reading of it.

> I am writing about the ability to use a different vendor.

The word "ability" also has meaning. It does not mean "absolute ability". E.g., Nearly every human has the "ability" to fly, through the modern technology of airplanes and the commonly provided services of airlines; but those (say, in countries like mine) who cannot afford a flight ticket are said to not have the ability to fly.

> Not about runtime cost or integration.

> Known runtime cost is part of normal OPEX.

Migrating off of one vendor is not "runtime cost" or "normal OPEX". By making it more expensive to migrate away, a vendor raises OPEX costs of such an "abnormal" move. Such weaponisation of OPEX costs is made to price-out the option, and is absolutely a denial of ability for business who cannot afford or justify such high abnormal OPEX costs.

----

> Having to reverse engineer something because your MRI vendor won't give you the specs to write drivers for the 64-bit PCI-X acquisition card is vendor lock-in. If you need more examples, just ask around in any organisation of reasonable size.

Look, no one is taking away your use-cases of the term "vendor lock-in". No one is disagreeing with you that those examples you cite absolutely are "vendor lock-in". But those are not the only cases or methods of it.

While intentional incompatibility is a far too common method of creating vendor lock-in, it is not the only method. For example, vaporware by dominant vendors used to deny market access to competitors is an accepted form of vendor lock-in.

> If you want to change the meaning of what vendor lock-in means (it already has a definition, which I have written down a few times for your ease of not having to look it up yourself)

I'm not trying to do any such thing. I have seen your "definition" and explained to you how the AWS egress tax is a means for promoting vendor lock-in, _within your definition_ of it.

> Just for the sake of completeness, here is a random paper that explains it in extra detail

I have read your "random paper"; it is a document explaining some history of economic literature. It does not define "vendor lock-in" (itself being careful to often use the phrase "lock-in" in scare quotes), but restricts itself to describing "lock-in by historical events".

What it does define, though, is _path dependence_. And what does it define it as? "... a minor or fleeting advantage or a seemingly inconsequential lead for some technology, product or standard can have important and irreversible influences on the ultimate market allocation of resources ..."; or, simply put, "sensitive dependence on initial conditions."

----

> If an explanation, reference or paper isn't good enough for you, why bother posting at all?

If an explanation, reference, or paper isn't even pertinent to the topic, and doesn't illustrate your point, why bother posting it at all?


Nothing in your world salad has anything to do with lock-in; you first describe network egress cost (which is OPEX) and then you describe it as lock-in, which it isn't (migrations are CAPEX). Make up your mind before you try again.


> Nothing in your world salad ...

I see that you don't like others contributing to your salads. My bad.

> network egress cost (which is OPEX) ... (migrations are CAPEX). Make up your mind before you try again.

You're the one pushing the agenda here that migration cost off of AWS is a "known" network cost, and thus OPEX, and thus missing the point. I don't care how you pencil down the money; the point is that migration costs are often NOT a "known ongoing network cost" at the beginning of service, and have been artificially inflated.

I should've seen the signals before venturing this far down this thread, trying to engage with you, but this drives the nail in the coffin. I no longer believe you're engaging in honest discourse; or worse, that you don't actually understand the business terms you're talking about.

----

From your other comments, you appear like you're an AWS apologist; or at the very least, that your business depends on people using AWS. I have a strong feeling I should have listened to Upton Sinclair's advice about getting someone to understand something, but it's not too late for me yet.

Good day to you.


It seems that you are wrong at every turn. This scenario where we migrated clouds has been pretty similar between IBM, Oracle, Azure, GCP and also AWS. Most businesses we're migrating (from/to arbitrary clouds) are using 2 of them, some more, and every time network egress cost is just a small factor in the CAPEX for the one-time migration during a vendor switch. While vendor switching has generally just been for arbitrary reasons, vendor lock-in was never a parameter conjoined with network egress costs. Besides, we don't migrate using network egress anyway, we migrate using physical bulk exports/imports using trucks and only do a small delta update afterwards. Even if in theory network egress was so expensive that people can't pay for migration anymore, you wouldn't be migrating that way anyway making the whole concept moot. The article is also not about migration egress but about operational egress.

Example: for small migrations we order an IBM Cloud Mass Data Migration device and an Azure Data Box, transfer between boxes, wipe the IBM box, send both back to their vendors, then do a delta transfer of the remaining updates.

Other Example: For acquisitions where we can't retain IAM between AWS accounts, order a snowball or snowmobile between them, all your RDS and bucket data goes from A to B in a matter of days. Not weeks or months over some slow 10Gbit link.

Maybe you could come up with some real-world examples where your point was a reality, because I'm not seeing it, the terminology doesn't match, and none of the scenarios from small to large have seen egress OPEX as a migration CAPEX issue.

We've been running successful migrations with no vendor lock-in woes between clouds for over a decade, egress and transfer costs has been a very small fraction of the whole operation, and never has been an ongoing pain point during post-migration usage. And none of this is AWS-specific, except the egress cost element the article linked here in HN. But hey, you do you.


> It seems that you are wrong at every turn.

At this point, I don't know what you're talking about. My point has been simple — and correct — from the beginning: AWS promotes vendor lock-in by making network egress obscenely costly.

> This scenario where we migrated clouds ...

That's your scenario, not mine. I did not make this thread about migration, you did. I did not follow you down your strawman path, but you seem to be believe I have.

> ... we don't migrate using network egress anyway, we migrate using physical bulk exports/imports using trucks

Good for you, I guess. Doesn't serve my purpose. Also, still on the wrong track about one-time, total migration.

> The article is also not about migration egress but about operational egress.

You almost came back on to the right track here, but quickly jumped back to you strawman path.

> Maybe you could come up with some real-world examples where your point was a reality, because I'm not seeing it

You're ignoring use-cases that aren't similar to ones you've had before, largely because you keep trying to see everyone else's needs through your "terminology" and comparing them to your "scenarios".

Here's a real-world use-case #1:

1. I started building an industrial-IoT analytics product for a client when it wasn't clear which cloud (private or public) they were going to run it on. Think lots of data, and lots of ad-hoc compute for the analysis.

2. I built it in a cloud-agnostic way, with all cloud-provider interaction safely nestled away behind a common API. It also had first-class live data replication across deployments. (If you're wondering, this kept most of the "migration CAPEX" down, in case the client wanted to move workloads after deployment.)

3. The client initially deployed it on AWS.

4. Later, the client's physical DC (closer to HQ) was upgraded and became capable of running our stack with nearly no changes.

5. Eventually, the client wanted to run some workloads in their physical DC. Some, mind you, not all.

6. Even including the cost of running their DC, it'd have been cheaper to run some compute workloads in it than on EC2. It seemed like it made perfect business sense to go through with it. Until network egress costs of the data were calculated.

7. Snowmobile was more expensive than biting the bullet and paying the egress tax. Not that it'd have mattered anyway, because ...

8. Snowball didn't exist, but even if it did, repeated usage to keep moving deltas of updated data was too operationally onerous.

9. They owned the data. They were willing to pay the price of a normal data transfer. Had they deployed to their DC first, the situation would have been so much better. They'd have to pay egress costs to their DC's ISP, and they'd still be saving money on splitting workloads across AWS and DC.

----

Real-world use-case #2:

1. While building the above, I gave every developer on my team a dedicated environment to work on, with a beefy shared server in our office; all this for low-latency dev work.

2. To mimic production scale and behaviours as closely as possible, I set-up a live sync of (anonymized) production data (back when it was small) to our dev environments.

3. Going from beta (~ 100-1000 devices) to GA (>70k devices) necessitated cutting off the umbilical cord, because our calculations showed the cost would be ridiculous.

----

Real-world use-case #3:

Same client as above. Wanted to keep their devices' data around for seven years, for auditing purposes. Didn't know if they'd still be on AWS in seven years. It appeared ending the contract would've been costlier than just paying for another year, because they weren't large enough to justify the expense of Snowmobile and Snowball didn't exist back then.

----

You see, AWS knows that if you can get your data out, you can get your compute out too. Of the two things keeping compute on any platform — data and code (including platform features like IAM) — only one is sticky. Other vendors can catch up on the code front, but if a customer even thinks about a partial migration, they'd see that the costs can get high enough that they might as well go for a complete migration, which is a significantly higher expense and much harder to justify.


I think you had difficulty understanding my point. People were effectively locked in until cloudflare and others had enough comparable offerings that Amazon had to actually compete.

A massive drop in prices like that means people were locked in and the switching cost in the market just dropped. So Amazon had to adjust to keep the customers.


It turns out other tech companies are actually doing business as usual, not just the ones we love to hate such as Microsoft and Oracle?

I feel like scales have dropped from your eyes. :( Sorry buddy.


I can't agree with this at all. It is entirely practical to configure bare metal or virtualized systems yourself. There are so many wonderful tools to do whatever you want. It will cost less and is usually more performant, especially when you avoid the biggest cloud players.


Have worked on a few SaaS apps that are built on just a couple of Linode/DigitalOcean VMs and bare metal DBs. Costs on AWS would have been 3-4x more. It isn't difficult writing Chef scripts or building a postgresql cluster with auto failover.

Unfortunately now finding engineers who can work outside of the big 3 cloud providers are rare. The lock-in is actually industry wide, not just from a resourcing point but also if you want to be taken seriously from an investor perspective. And of course with VC money you can just throw a chunk of it on AWS, but if you are bootstrapped it’s a different story

Having said that, eventually there will be a need to move to cloud once you are about to hit a certain scale or have more compliance and security requirements


> And of course with VC money you can just throw a chunk of it on AWS

Don't think VCs are much enthused about AWS either: https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap...


That explains a lot of blockchain enthusiasm. Get somebody else to worry about paying for the cloud costs


Enterprises are also doing this to themselves with the usual trend to hire only in X, so if they want people skilled in cloud native with 5 years experience in production, that is what they will get.


What about training people to get better?


Only if they can get away with putting that as project expenses, in most cases.

Fortunately that are still some that do care about training, without going through such actions.

But they are the minority either way, when training is considered a perk on job offers.


I'd expect anyone that can work well inside of the big 3 cloud providers to be able to work outside of them too. The performance drop is that the developer experience is poorer when you don't use them. There is more rote work to be done to accomplish an equivalent end


Good lord I could not more strongly think the opposite. It’s a massive resource sink for anything of even modest scale, and in my experience it’s literally been 10x the overall cost, with many, many fewer 0s of uptime guaranteed.


I think the truth is somewhere in-between (hence the articles proposal of a hybrid approach).

That said it's not just the budget for hardware to take into consideration but the often ignored costs like man hours, support contracts etc.

It also ignores security. Many security departments would much prefer the risk of having a dedicated cloud provider take that on, rather than a bunch of local sysadmins who will come and go.


That's where hybrid comes in so, you don't make yourself completely dependent on big cloud. Of course it's cheaper, because it's mass production.

With hybrid you can use any cloud you want, but when it starts doing things you don't agree on, you can revert back to your own systems.

Use the force, but have alternatives ready.


Just learn Ansible already, it will take you at most one day.


I'm not sure how learning Ansible (which I already do know) is going to help troubleshoot or actually maintain the deployed services.

The problem is not deployment or orchestration (though that is one of them), the problem is building a team with the necessary expertise to support what could be a gigantic variety of tools and technologies at an expert level.


Isn't that true of cloud providers, too? I'll use AWS as an example because that's where I have the most experience. There's a dizzying number of services, some of which have a high degree of overlap. Someone needs to understand them to make an informed choice as to what to use. Every place I've been at using AWS has had to deal with undocumented API issues and responses that the docs say are impossible, but happen. There are performance pitfalls that require in-depth knowledge of the product to avoid, but usually you don't find out until it's too late. Adopting many of the cloud native services isn't just a technology swap (your RDBMS knowledge isn't going to help much with DynamoDB). There's a cottage industry helping people just understand their bills and how to reduce costs. And for non-trivial deployments, you often have to learn orchestration tools like CloudFormation or Terraform.

I'm at a place that uses GCP now and very little of that AWS knowledge I've accumulated is applicable here.

Cloud platforms/services have a lot going for them, but I think they've become complex monsters that don't really save people as much time as they think. If you're just pushing buttons on the EC2 console, then it's faster than provisioning an Ubuntu server with Ansible for sure. But, those toy deployments aren't indicative of the actual effort to manage a production system. At least in my (mostly start-up) experience with AWS going back to 2009.


> Isn't that true of cloud providers, too? I'll use AWS as an example because that's where I have the most experience. There's a dizzying number of services, some of which have a high degree of overlap. Someone needs to understand them to make an informed choice as to what to use. Every place I've been at using AWS has had to deal with undocumented API issues and responses that the docs say are impossible, but happen. There are performance pitfalls that require in-depth knowledge of the product to avoid, but usually you don't find out until it's too late. Adopting many of the cloud native services isn't just a technology swap (your RDBMS knowledge isn't going to help much with DynamoDB). There's a cottage industry helping people just understand their bills and how to reduce costs. And for non-trivial deployments, you often have to learn orchestration tools like CloudFormation or Terraform.

It's true to some extent but I would typically characterize it as having DIY add additional layers you have to be concerned with. For example, I've seen more problems with SANs in some weeks than I've had total using EBS since the late 2000s (repeat for dodgy network cabling, server firmware causing issues, etc.).

The other thing to consider is what the alternatives are: for example, the reason why some of your RDBMS knowledge isn't relevant to DynamoDB is because it's a NoSQL database – if you wanted a SQL database, RDS is going to be at least an order of magnitude less time than running your own, especially if you are concerned with performance or reliability. If you've determined that NoSQL is a good fit for your application, that means that the relevant comparison is how much it costs to run DynamoDB versus, say, MongoDB.

> I'm at a place that uses GCP now and very little of that AWS knowledge I've accumulated is applicable here.

My experience has been the opposite. There are some differences but an awful lot of the principles transfer well to the other major cloud platforms and a solid foundation will make it easier for you to transition between the major cloud providers more easily than on-premise.

> If you're just pushing buttons on the EC2 console, then it's faster than provisioning an Ubuntu server with Ansible for sure. But, those toy deployments aren't indicative of the actual effort to manage a production system.

It's definitely not trivial but I certainly have no desire to go back to running a VMware cluster, either. I still have memories of being on the phone with their support team explaining how the flaws in their the design of their cluster health-check system lead to a big outage.


Yeah. I remember years ago working at a shop that had hundreds of baremetal hypervisors in a bunch of colos. We had a problem where a clocks would sometimes skew massively (maybe dodgy hardware)? The thing was, ntpd would be able to correct it sometimes, but not always. It has a cut-off point where it throws its little arms up and says "I dunno!", and the skew just gets larger until someone manually fixed it. So we added a remediation step to the Sensu time check it if exceeded a threshold. Then one time we got blacklisted by the public ntpd servers we used in one region because we had a shitload of servers hitting them directly with ntpd and sensu checks. So we had to set up our own servers (and monitor them). And we still had occasional skew even though the remediation action would (eventually) force a correction, and these would cause odd failures with authentication or database replication. We eventually ditched ntpd and moved to chrony (which will continually adjust the clock regardless of the drift). But that took research, testing, puppet code, scheduled deployment, documentation etc. The whole episode was boring and stupidly time-consuming and wasn't even some cool thing that "moved the needle forward" for the company. It's just the fucking time on your servers. Now, take any number of stupid little things like this and sprinkle them over every single sprint, and see how the infrastructure/sre/devops/dogsbody team's promotion cycle works. "Why was the database upgrade delayed this time?"


I've never witnessed that kind of hardware problem precisely, but now just think about it: what would happen if the same situation happened on an AWS instance? How would you go about debugging anything and/or fixing it? It's not even sure you could diagnose the problem in the first place, let alone deploy a workaround. You'd have to send dozens of email to tech support who of course would say there is no problem because their machines have 999999999999999999% uptime and nothing could be wrong on their side, but hey they can sell you advanced support/engineering if you've got way too much money to help you find the problem in your code.

Some commenter mentioned the dark days of Oracle/Microsoft/SAP ruling over the server market. But at least these companies had the most basic decency to let you house/access your own hardware and diagnose things yourself if you had the skills. Now in the "cloud" you can just go to hell and suffer all the problems/inconsistencies, or rather your users can suffer since you as a sysadmin have zero control/monitoring over what's happening. Oh and bonus point: if users report some problems, they will be reproducible 0% of the time since there are great chances your users are connected to a different availability zone: yeah it's easy to reach more than 5 9's when "ping works from a specific location" counts as uptime for a global service.

So in the end, is it better to have a silly answer to "Why was the database upgrade delayed this time?"? Or is it better for the answer to be "i don't know, but upgrading the database cost us 37k€ in compute, 31k€ for storage and 125k€ for egress for backup of the previous db" ? I much prefer silly answers, but maybe that's because i don't have dozens of thousands of euros to be shorted of even if i wanted to :-)


> It's not even sure you could diagnose the problem in the first place, let alone deploy a workaround.

It's easy to detect: ntpdate -q will tell you the drift, your logging would tell you when ntpd gave up because skew was too large.

Correction would depend on why it was happening: you might be able to tell ntpd/chrony to adjust more frequently or to accept larger deltas, but at that point I'd also say that the best path would be pulling that instance from service and replacing it so it's not critical.

> You'd have to send dozens of email to tech support who of course would say there is no problem because their machines have 999999999999999999% uptime and nothing could be wrong on their side, but hey they can sell you advanced support/engineering if you've got way too much money to help you find the problem in your code.

Has that been your experience? Based on mine it would be more like “we confirmed the problem you showed and have contacted the service team” followed by an apologetic call from your TAM. I've opened plenty of issues over the years but have never had someone insist that something is not a problem because their systems are perfect — they'd basically have to be saying you faked the evidence in their report.


> ntpdate -q will tell you the drift, your logging (...)

Sure, that's if you got root on the machine. I was not talking about VPS hosting which we've been doing for a long time, but rather so-called "serverless" services.

> Has that been your experience?

That's been my experience with most managed hosting i've had over the years, though i've had some good experiences too. I've never dealt with Amazon but i'm assuming since you can't run diagnostics (since you have no shell) and like any other business it's likely their first level of customer is inexperienced and reads from a script, i'm guessing you're gonna have a bad time if you encounter weird/silent errors from your cloud services.

Some hardware errors are hard-enough to diagnose with root and a helpful customer support, i can't imagine without those.


While it's true that you can't just shell into a managed host, you can run diagnostics in many cases[1]. I would also say that it's _far_ less common for those services to have hardware issues — part of what they're doing is optimizing to things like health-checks and rebuilds easy since the places with durable state are very clearly identified and isolated.

I've never seen a hardware problem which the platform didn't detect a fault first in those environments (e.g. you see some requests have latency spike for a minute prior to a new instance coming online — it doesn't say “underlying hardware failure” but clearly something changed on the host or perhaps a neighbor got very noisy) and that includes things like the various Intel exploits where you could see everything rotate a week before the public advisory went out. I will say that I've had a few poor run-arounds with first-level support but I've never had them refuse to escalate or switch to a different tech if you say that first response wasn't satisfactory.

1. e.g. on AWS “Fargate” refers to the managed Firecracker VMs, as opposed to the traditional ECS/EKS versions using servers which you can access, but you can use https://docs.aws.amazon.com/AmazonECS/latest/developerguide/... to run a command inside the container environment.


I've seen lambdas run into that aplenty on AWS. The clock has skew between the lambda and S3, resulting in measurable signature mismatch errors. There's even settings on the S3 client in JS to sync times to account for the skew


> the problem is building a team with the necessary expertise to support what could be a gigantic variety of tools and technologies at an expert level.

That is just as valid for cloud solutions, though.


Does that include your salary? How about power, and cooling, and the so many very wonderful things it takes to run a data center.


At some level of scale, it makes sense to hire your own platform team and do it all in house. At the other end, it also works for turtle mode / lifestyle / small businesses.

The place for expensive cloud is when you're growing fast and need to spin up infra in a hurry. When you can't be bothered to configure software outside of your core competency.


First off, the Cloud isn't expensive. Cloud misuse and bad architecture is expensive. It's a lack of skill (and to some degree tooling as well) that cause this to happen, expensive is not some inherent property of the cloud. Also, unless you are in some sort of business that has suddenly found itself without the need to innovate, or you are building a very commoditized thing, there is no reason to ever go in house.

Second, the reason people go cloud is not because they can't be bothered to configure things. It's because they have realized that any time spent doing undifferentiated work like that is a complete and utter waste of time and capital.

Third, the clouds power (and ultimately why it's less expensive) is not its ability to spin up infra in a hurry, the clouds power is elasticity. Properly architected cloud applications spin resources down as freely as they spin up, without intervention. You don't pay for idle and you certainly don't have to pay an army of people to maintain things that are now completely automated and realized in code or at the very least managed (without you having to think about it) for you.

Going multi-cloud, and forcing yourself to use the lowest common denominator services that are common across cloud providers is a really great way to miss out on all of those things.


So has Netflix, Disney+, Intuit, Capital One not reached that scale?


Netflix has quite a bit of their own hardware at the edge [1].

Disney literally just started and are rapidly growing.

I don't know anything about Intuit or Capital One. Do they make you wear a suit and tie while working on Oracle databases? (I kid.)

[1] https://www.theverge.com/22787426/netflix-cdn-open-connect


Netflix has their CDNs at the edge. Their core infrastructure is on AWS, Disney+ technologically is an outgrowth of the BamTech acquisition which has one of the best regarded streaming technologies.


You wouldn't consider a CDN core infrastructure for a streaming service?


Considering that the hardware that Netflix is using for a CDN is physically located in the network providers network center, they can’t very well use AWS for that.


For 5G networks they can (Wavelength) and maybe do.


At a previous gig, they were moving from on-prem to cloud for accounting purposes.


It is not a binary choice between AWS/GCP/Azure and running your own data center.


In the context of this article, and I presume the parent comment, “bare metal” refers to a hosted VM (like ec2 instances) not running your own data center.


Bare metal definitionally means not a VM.


> especially when you avoid the biggest cloud players.

I find it hard to find a balance between the OVH model of 'purchase a server for a year with discounts for a 12/24 year commitment' and AWS where you can purchase a single hour of bare metal compute for cheap.

I would say DigitalOcean fills this, but it's not like they've committed to bare metal, and I imagine they'll continue to look into expanding their suite of managed offerings like Apps[0] (which I think is heroku-esque auto scaling for containers) since the margins on these have to be way higher than their VPS offerings.

0: https://www.digitalocean.com/products/app-platform


Its certainly possible, but it requires a level of discipline and commitment thats usually missing at larger organisarions.

In some firms they still struggle with basics, like staff with 200k salary get a terrible slow pc/ crap monitors, etc.


There are no slow desktop computers in 2022. There is a lot of slow software running on fast computers.


Haha, you ain't seen nothing yet! An mayor accounting firm stulkl has some devs on ancient macbooks with HDDs!


It's a golden era for tooling, if we could do it before it should be even easier now. I think we are in a risk avoidance rut. "No one ever got fired for picking AWS".


First, anyone with under $100K per year of cloud expense should probably just suck it up and keep going. You have more important fish to fry than to worry about cloud lock-in.

Second, if you've got more than $10M per year of cloud expense you most likely should be thinking about switching to a hybrid model. You have the resources to manage your own computers as well as analyze your expenses and should be using the cloud companies when it saves money. The biggest problem at this level is that your employees probably use "cloud" as a way to bypass your IT department because they're not responsive enough because everybody with an MBA thinks of IT as a "cost center" and shortchanges it.

Third, I think this gives WAY too much credit to the competence of modern large enterprises. The fact that Oracle isn't bankrupt demonstrates that competence and technical execution really don't dominate the enterprise business cycle. And Microsoft really didn't have all that much control until Office365--I've never heard anybody talk about losing control of their desktop spend as all they had to do was slow down the refresh cycle on their desktops to control that cost. It's only been since Office365 that things have changed--and the mid-tier businesses love Office365 because now they don't have to deal with managing Outlook anymore.

In between $100K and $10M cloud spend, however, is a tough spot. I suspect that answer is still "just keep going" and that delivering features requested by marketing and sales is more important than reducing cloud spend.


> The fact that Oracle isn't bankrupt demonstrates that competence and technical execution really don't dominate the enterprise business cycle.

So... It's marketing, licenses and lawsuits? (I'm serious, since I don't know Oracle from... Bad reputation and Java.)


There are a ton of small time complaining blogs about all the clouds, but practically none of them are at a scale where it really matters anyway.

A recent one was about how clouds should be dumb pipes, but the blog forgot about the 30 FTE you'd be hiring just to maintain your 'own' IAM, API and on-demand CRUD of generic resources (like compute, storage, networking and higher level resources like databases, object stores, queues, buffers, pubsub). And then they also forgot about the benefit of scale they would never be able to compare with as a single customer vs. a multinational specialised company with practically every possible service scenario under the sun...

Once you are beyond about 200K with enough in-house dedication you can start thinking about higher level abstractions that make you less bound to a single cloud or less bound to a single implementation (which can still be on a single cloud vendor's offering). You'd end up going that way anyway because at scale some facilities start to need adjustments to keep up with demand, like the creation and lifecycle management of object stores; you really can't be manually or bare-API managing those anymore when you have a ton of them. At that point you make processes, policies and automation a thing and if you implement them with some abstraction in mind it suddenly can be made to work everywhere... this is also where the bulk cost and ROI comes in; say you have your internal adaption in place for S3, adding B2 or R2 is really not that hard. Bam, object egress solved (For non-IAM scenarios; signed URLS would still work tho). But maintaining and owning this when you aren't big enough is just a drag your customers (internal and external) aren't going to profit from, and you really should be spending your time and resources elsewhere. Because you should probably focus on your business first, on what your customers want (so you can sell more in a commercial setting), and then on optimising your processes at which point you might be getting to the size where your increase in vertical integration starts making sense.


My dream scenario:

Several open source vendors get tired of Amazon taking their innovation, repackaging, and reselling it. They band together to create the strictest open source license ever. Stricter than AGPL3: "If you run this as PAAS, your entire PAAS down to the billing layer must be open. Every ounce of your code. Even the internal employee management bits. Maybe you're allowed a budget of 30kloc secret blob sauce, but that's it."

It should also say: "All billing reports must be instantaneous, easy to understand, and attributable to the products and the teams that use them." And: "Egress should never cost more than ingress, and high quality export tools must be provided for every service offered."

Use this as the foundation to build a fully open source cloud platform with the tools anyone could use to stand up their own data centers.

It would be a dream to see cloud reduced to commodity. Then open source vendors could make good money on support contracts, focusing on the software they love building rather than being eaten out of house and home by the giants.


In the first paragraph, you are describing SSPL license, adopted by MongoDB and ElasticSearch.

So far, it has not been used to build fully open source cloud platform. Instead, it has been used (with limited success) to ensure that people who want to run ElasticSearch in the cloud have only one choice, going to Elastic NV.


You are describing the SSPL, which people on this website (not me) have loudly cried about as being a proprietary license


It is infuriating how the these companies have managed to appropriate open source for profit. I would be pissed if my unpaid hard work community contributions were being appropriated to make Bezos richer while harming the OSS ecosystem that I love. That said, I don't think that more restrictive licensing is the answer. I don't know the answer TBH.


Why not? Licensing is designed exactly for that purpose. When you choose an MIT/BSD license, you are giving an express permission for other to profit from your code and give you nothing back.

It's not like it is new or something.. I remember reading the same discussions back in 2000's, when people discovered that Windows used parts of BSD's network stack.


Historically, when businesses adopted open source software they would contribute upstream. Cloud vendors changed that model. It's interesting that attitudes about this behavior have changed.


They did? Which fraction of bash users (for example) contributed anything upstream? I am betting significatly less than 1%.

I am not sure why people suddenly get outraged about cloud. Most webhosting providers in the last 20 years had business criticaly dependent on Linux kernel and Apache. Almost none of them contibuted anything to those projects. And people were not complaining.


> I am not sure why people suddenly get outraged about cloud

Because now we are in the part of the business cycle where companies merge into a few huge sellers.

I couldn't name most of the webhost providers from 20 years ago, but Amazon is literally a household name.

So now it's not "A few Geocities knock-offs are using PHP for free and making a few bucks off a few people" it's "One of the biggest companies ever is making (seemingly) tons of profit off of almost everyone in the software industry."

Also the fact that in the last 20 years FOSS really succeeded in its mission - Proprietary software on the desktop is a joke now. The frontline has been pushed back to network apps where the backend is proprietary and the frontend is throw-away code that's useless without API keys.

So "stuff on the web" has gone from a low-profit novelty to 90+% of all software revenue.

I won't cite anything because we're talking about feelings. This is probably what the average 30-ish programmer feels about the situation, and if I got the facts wrong, I doubt anyone else is looking at these facts before deciding how to feel about AWS' marketshare.


I suspect there's an ideological influence to this weird outrageousness. But I'm probably wrong.


Back then OSS was more volunteer labor based, or red hat style business models based on support contracts etc. So there was maybe less expectation of monetizing OSS directly.

These days the conflict seems to be about big cloud vendors productizing OSS code made by SaaS vendors, thus directly competing with them. It seems more like an economic argument (we want to get paid for the OSS we create) dressed up as an ideological one (OSS is about community, or whatever).


Come on, we are programmers. Tradition is undefined behavior and out of spec.

If you expect contributions, _get it in writing_. Copyleft was always the way to assert what you expected without this passive-aggressive "The user MAY contribute code to us or MAY just reap huge profit and contribute nothing."


> Historically, when businesses adopted open source software they would contribute upstream.

This is completely inaccurate. The vast majority of changes to non-Free open source code, let alone the wider software that has subsumed them, does not get contributed back.


Why would you be pissed if you gave away your work? Don't give it away under a license that permits commercial use if you don't want it to be used (freely) to make money by others.


Because the are selling your work for profit without giving anything back to the community.


I sense that you haven’t fully internalized the idea of some of the popular open source licenses.

Such licenses are a bit like letting go of a balloon. Most times, the balloon will float away and pop on some tree with no benefit to anyone. In a few rare occasions, your obnoxious neighborhood kid will grab that balloon and play with it all afternoon.

You might not like that the kid is enjoying your balloon but no right minded person would expect that the kid’s parents recompense you for the balloon you let go willingly.

TBF, not long ago, someone posted on HN that AWS simply used their open source ware without even a thank you. It created quite a bit of noise on HN, so your opinion isn’t too far outside the norm.


I won't comment on the other licenses, but the point of the GPL is _user empowerment._ The origin is that Stallmann had a broken printer but was not able to fix it due to it having a proprietary driver.[1] It's not at all about ensuring support (or preventing profit). It's all about ensuring the end user (even if that end user is 37 steps away from the original author!) has _all_ the rights to do anything with the software that they would like.

Quite a lot of the time the user may not be able or want to use the rights granted them, but that doesn't make those rights useless. Indeed, they serve an important function, see the various offshoots of that early Linksys WRT code!

It saddens me greatly that so many have lost sight of this.

[1] https://www.fsf.org/blogs/community/201cthe-printer-story201...


It's more like setting up a little reading library. Have you seen those? People build little dollhouses and fill them with books with the idea that anyone can grab one and donate back when they can. It's mostly to encourage children to read.

Now imagine Amazon comes by and lists the books in the reading library on it's website for sale. You offered them for free after all. Further they are knocking on your door obnoxiously asking you when you are going to get a particular book in stock, since they have a customer who wants it.


TBH if it's not in tge contract I would not really blame them. Change your license then.


None of that is commercially viable (so far), and commercial activity really doesn't benefit from this at all.

What matters is the 'stuff' you get for your money. AWS knows this (as does Azure and GCP), as do the B2B customers. It's just a simple sum. Either the customer does more of the hard but generic work, or the provider does more of the hard (generic) work. You make two calculations (including potential benefits of vertical integration) and you pick what works best.

"Inventing" generic compute with an API, web console and service-wide IAM is possible, but adds no value at all to say, e-commerce. So you might as well just pay some vendor to already do that, and if possible to do it at a scale and quality you will never be able to reach as a single business anyway. So you go to a cloud since that already has it and you can't replicate it anyway. If it turns out to be expensive but still cheaper and better, that's a win, OSS/SSPL be damned.


My dream scenario is a community support Kubernetes "distro" that is:

1. Controlled by the community of users, not a big provider. 2. Is optimized to be run multi-cloud and multi-region. 3. Assumes that you will use SaaS services for common tasks (Replicated, Nobl9, Tackle, EdgeDelta, etc...). 4. Has a fully hosted solution for CI/CD (with integrated security offerings) that is, again, not controlled by a big provider. 5. Makes enough assumptions about "sane defaults" that services are well tested and service and operator boiler plate is can guide developers to best practices.

My view is that K8s was designed to attack AWS hegemony by providing a cloud abstraction layer that assumes that a company will have a single cluster in which they will run multiple relatively trivial applications (see Helm), but for some of us, the reality is that we have many clusters which each run the same mission critical application.


What does this all mean from IaC perspective?

Let’s say you can define all your containers, orchestration, database and middleware in a compact way (Dockerfiles, Terraform, etc) with cloud vendor independent open source tools, using managed OSS.

How much of that code would be dependent on your cloud provider, really? Maybe a thousand lines of hopefully modularized platform dependent configuration code?

Migration from one system to another would still be an effort, but not that big one?


You can also modularize code that goes into something like Lambda as well, and distance your code from proprietary APIs through interfaces. Also under 1,000 LoC.


While multi-cloud interest and vendor lock-in are an organizational concern at the enterprise level, there are also larger nation-state macro concerns that are warranted to help mitigate the risk of concentrated influence and control from any given cloud provider. This is best embodied in the idea of a cloud exit strategy requirement (other related concepts include reverse migration and cloud portability) to manage concentrated risk and regulatory demands [0]

"Several regulators, mainly in the EU and focused on financial services, now mandate an exit strategy. Those organizations that do look at exit strategies often focus on contracts — aspects such as terms and conditions, and service-level agreements." [1].

[0] https://www.bestexecution.net/esma-warns-financial-service-f...

[1] https://www.itconvergence.com/blog/importance-of-cloud-exit-...


Didn't read the article, but we migrated for our k8s stuff from AWS to self-manged Hetzner Cloud and save roughly 66%. We don't manage that much, we have helm charts and a bunch of bash scripts etc. So it's easy to manage it, even I can do it.


For compute:

1. In the beginning (new startup), you don't know what usage looks like. It may not even be used at all at night / during the day. If you're not full-in on managed services, you're burning money on stuff you're not using.

2. Eventually you start to see consistent use. Run back-of-the-envelope numbers on using something less managed, i.e. Lambda -> Fargate -> EC2. If it makes financial sense to switch, do it. If not, stay where you are.

3. Continue to optimize over time in the direction of bare-metal as your cloud spend hits seven figures and higher.

For data:

1. If you have a small amount of data, then managed services are a trap, as services make implicit dependencies on the managed service. Stick to open core. If the amount of data is small, then it won't be too expensive to export it off the environment where it is to a new environment where it will be cheaper.

2. If you have a large amount of data, then your data is probably locked into whichever environment / provider is hosting it anyway, due to egress issues (either via network or via logistics shipping drives). You might as well use a managed service and reduce talent dependencies, as described in the article.


I believe this all depends how you're using Cloud. You have to pick the right tools for the job at the end of the day, and my choice is: all of them.

I have a small book I'm going to be releasing soon. It's going to be self published as it's an entirely online mixed media book (text, videos, quizzes, etc.) and I'm going to be packing it all into a Go binary (minus the videos, those will be on YouTube) to make deployments insanely easy.

(The binary will be less than 50MB in size - smaller than most Docker images. It can also be made to run on any OS I care about without a third party tool. Just sayin'.)

I have a 100/100 fiber line here at the office with an SLA. I have plenty of hardware too. I could self host it here, but instead I'm going hybrid.

It's simply too easy to spin up an EC2 instance fronted by an ALB, TLS being handled by ACM and DNS by Route53, all "as code" (Terraform is my weapon of choice.) I can buy that instance as a reserved instance and pay up front for the year too, reducing its cost by 45-50%.

Then I can attach that instance to an AutoScaling Group with a minimum of '1'. If I get traffic influx it will scale out to meet the demand at per-hour costs. It's more expensive but it shouldn't be for long.

If the (static) instance goes down the ASG brings one back up at "full fat" per-hour fees... but at least I'm up and running.

The landing page is static and delivered via CloudFront, and that's what most people will be hitting. The application won't even see that traffic at all.

As for on premise: I'll be self-hosting analytics here and also logs from the application and servers. Why put them on expensive compute in Cloud when I can just do it here locally on a cheap quad core with 16GB of RAM and TBs of NVMe storage? It's a no brainer really.

So you have to pick your weapon correctly. For this project, here are the weapons I'm selecting: AWS EC2, ACM, ALB, Route53, CloudFront, S3; YouTube; on-premise hardware for analytics and logs.


That seems overkill for just hosting 2 static files (the go binary and the html file). Why not just use github or something to host those 2 files for you for free?


It's not quite two static files.

The Go binary is a custom web server for handling access to the book. This will be similar to how a few other book authors have handled access: a link you click in an email (like a one-time sign in token) that gets you access. The Go binary is also responsible for receiving and resolving Stripe Links web hook calls.

The book itself is made up of hundreds of pages, images and more. It's all built using MkDocs + Material.

The custom workflows meant I needed a small, custom app.


Oh, from what you explained I thought it was a desktop application written in Go.


Sorry my bad. No, it's a web app/server :)


I note that open source companies like Collabora go all-in on Plan B, self-hosted OSS, but owned hardware instead of rented. Not sure if RedHat is the same though.


Does anyone have any good articles about AWS (or other) lock in? I have Matt Prince's article about egress fees, but I'm looking for something more thorough.


Most (on HN) are just people who misunderstand the cloud business badly or think their case of reinventing the wheel can be applied globally.

You can do a site search for hacker news with just one of the cloud names in it to find a ton of blogs about it.

Most of them say "lock-in" but aren't actually about lock-in because you're not locked in, just in a bad position (power vs. utility).


I came across Cross-Cloud Development the first time when I found http://klo.dev ... Haven't tried them yet but the idea is intriguing.


>Microsoft and Oracle calibrate their pricing in a way that bears no relationship to its cost of production, but is carefully calculated to extract the maximum revenue without provoking the customer to consider alternatives. Also they design their platforms so that that migration to any alternative is complex and painful.

Not everything is priced using bottom up pricing, where you calculate the cost to produce something and then add a profit margin. There is also top down pricing where you price it based off of the value it provides or based off of what your competitors are selling at.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: