Hacker News new | past | comments | ask | show | jobs | submit login
Deactivating an API, one step at a time (apichangelog.substack.com)
55 points by bpedro 4 months ago | hide | past | favorite | 57 comments



> It might be that you want to replace it with a new, more capable version

If you're truly replacing your API with a new, more capable version there's a much better option, in my experience.

Roll out your new API, and replace your old API's implementation with a proxy that calls through to the new API.

The proxy will need very little maintenance, as all it's doing is connecting one fixed, stable API (your old one) to another fixed, stable API (your new one). Lock it down to only your old customers, if you want.

The support costs will be basically zero, and your existing paying customers will thank you for respecting their time with a dependable, low-churn API.


And voila! Just like that, you now have 2 systems to maintain, support and debug.

Do we redirect server side, use 302 or use JS to redirect ?

docs and api dont match, users don't RTFM, but lament that the engineering quality is low.

Support now needs to know which api endpoint the user with problems is hitting and why they cant repro when hitting the new endpoint.

Auth will be fun too, is it a domain redirect? subdomain? Cert rotations anyone ?


> And voila! Just like that, you now have 2 systems to maintain, support and debug.

It's even worse, the "new" system is a layered mess where logic is spread across multiple systems and services, and is bound to accommodate all the original scenarios and ensure new changes remain compatible across multiple versions.

It's a far worse situation than keeping the old APIs in. It's a proposed solution from someone who clearly did not even read the problem statement.


This sounds a good idea in theory, but not sure about in real world.

Consider having 4 versions of API, will you create 3 levels of proxies or each time re-write your old proxy?

Sometimes, more capable mean additional side effects not existing in old contract (e.g. send an email about order), in that case your new API should be flexible to accommodate configuration of side effects, adding more maintenance cost to your eng. org


also, the assumption here is "more capable = same features + something new (no removals)", sometimes you have to delete some features as part of new API (e.g. stop creating audit log entry for an action).


Conceptually, we should see deprecating a feature as a separate question from moving to a new API. So in this scenario, you would manage deprecating functionality in the same way regardless of whether you're also rolling out a new API at the same time.


> This sounds a good idea in theory, but not sure about in real world.

> Consider having 4 versions of API, will you create 3 levels of proxies or each time re-write your old proxy?

In the real world if you insist on breaking your paying customers' processes and products over and over again, pretty soon you won't have any paying customers any more, and you'll be free to make incompatible changes as often as you want.


> In the real world if you insist on breaking your paying customers' processes and products over and over again, (...)

You do not break anyone's experience by releasing a new version that's voluntarily adopted by your paying customers.

You are also somehow inexplicably assuming that your paying customers didn't ended up being paying customers because you adapted to customer demands and added the value that was important enough to them that they decided to pay for it.

I can tell you for a fact that there are paying customers for APIs that pay to use both the latest API and older versions. More often than not, they still consume old versions because they cannot spare the manpower to update some clients/services.

On top of that,the problem grows a few orders of magnitude when you open your APIs to the public. Even when you control the only clients to your APIs, it's impossible to get some end users to upgrade their software. Some still run old OSes that have long been unsupported, and some are even physically unable to run updates.


> In the real world if you insist on breaking your paying customers' processes and products over and over again, pretty soon you won't have any paying customers any more

Business 101, your profit is in higher margin or lowering your spending/cost. If your business is stable and making money your customers feel safe and are happy.

Deprecation lowers your cost (system complexity, less maintenance, dev time -> new features, new revenue streams). If you have an almost infinite resources then of course why not, you can support all possible API versions (there are few companies who can afford it and doing it at scale, AWS and Win32 APIs)


What if the new API is more capable precisely because the parameters have been completely redesigned?


The presumption would be that there is a mapping from the old parameters to the new? Not necessarily a simple shift/rename, but you aren't doing your users any favors if they have to speak an incompatible vocabulary to make use of the new system.


> The presumption would be that there is a mapping from the old parameters to the new?

"Presumption" is a funny word when used to say you didn't considered the part which happens to be the cornerstone of the whole proposal.

You can't just hand wave over the core part of the whole proposal, and just say "well I expect it to somehow already be working and serving it's purpose", specially when the question is on what happens when this mapping is either practically or theoretically impossible.

Just think through the problem and consider small mundane details such as the fact that the decision to create a new API version is never taken lightly, and different API versions are created to accommodate breaking changes. Think about problems such as: what if resource IDs are incompatible and entirely different? On top of that, what if the old API returns references to different resources than the new API? How do you solve that with "mapping"? Are you suggesting we roll out a mapping service?


If you are making a new API that cannot serve the needs of older users, then you are basically saying to your old users that you would rather they leave. At the least, you are signalling that you are inclined to give them more work. Don't be surprised when they decide that their migration isn't from your old service to your new one, but from your old service to a competitor.

Note that this is not the same as saying, "if you want to use the new features, you have to do the work to tell us more about what you are doing by using the new api." More, you can have some restrictions. But, at large, the view you are espousing reminds me of the blog by Yegge about Google's deprecation policy. (Link: https://steve-yegge.medium.com/dear-google-cloud-your-deprec...).

I'm sympathetic to the idea that some APIs are terrible and are best redone. I have seen far too many redone APIs that do not service existing user's needs and that resolve into a complete disaster. In almost all cases, it is because the team doing the work thought that it was best to offload all thoughts of how to migrate onto every other team that was calling to them. You think it would be a lot of work to make a facade/proxy/whatever you want to call it layer. You aren't wrong. But largely what you have done is shifted that work to every single caller you have.


It depends... let's say you're the leading product in the area, your users are devs, and there's a good reason behind the change, then you're probably okay to deprecate APIs. For example, NUnit has now deprecated the "Assert.AreEqual" method: https://docs.nunit.org/articles/nunit/writing-tests/assertio... . For me this meant running through hundreds of tests and rewriting them using "Assert.That" or sometimes using "ClassicAssert.AreEqual". I guess I see it as part of the job of being in an evolving industry.


Library deprecations are fairly different from service deprecations. That said, same points generally stand. I know MANY people that no longer use Guava because they got tired of how rapidly they had to update their code due to changes in the library. Similarly, I know very few people that have gotten upset with the deprecation policy of Java core.


Surely you can keep it backwards compatible AND give people access to the new good stuff?


> Surely you can keep it backwards compatible AND give people access to the new good stuff?

New API versions are not rolled out lightly. You roll them out only when you cannot give access to the new stuff while preserving compatibility.

Otherwise you'd just update/extend the old version.


Not if the API reflects a fundamental change in flows. E.g. not fun to proxy a sync API into the new async one which splits a single operation into three.


That's a reason not to make fundamental changes to your flows, but the example given isn't that difficult. It's probably how the developer on the customer's side is going to deal with the issue, so you may as well do it for them. Trying to convince any meaningful percentage of customers to rewrite fundamental aspects of their existing integration to support your product roadmap isn't going to happen, so don't expect it.


> That's a reason not to make fundamental changes to your flows (...)

You're somehow assuming the engineering teams involved in making these decisions don't think through these issues and just jumpstart changes to critical parts of their business without any thought or consideration.

In the meantime, engineers need to work in the real world, with real world requirements and constraints. Often enough, these include introduce breaking changes, such as changes to workflows.

It is entirely unhelpful to lay grand claims based on your ability to wish whole problems away. Don't like catching a flu? Well, that's a reason to not get sick. Insightful.


> Roll out your new API, and replace your old API's implementation with a proxy that calls through to the new API.

I don't think you understood the problem you're commenting on.

The problem is not that you have a new fancy API version you expect users to consume. The problem is that you need to shut down the old API without causing customer or business impact.

It makes little to no sense to pile up technical debt and extra maintenance work to keep around a specific configuration in an API Gateway if your goal is to get people to stop using it so that you don't have to maintain N+1 versions of your service. Your vague observation of how much work a API Gateway config takes is meaningless because you have no idea the impact of any change to the API will have on the older version fed through adapters. I mean, even changes in performance can be disastrous. And even under the unlikely case a API Gateway applying a transformation is enough to sunset the old version,how is that an improvement on maintenance? You now have a new abstraction layer that requires testing and maintenance, and a multitude of scenarios you need to validate independently whenever a change is made. And what answer do you have to the question on what to do if a commit works in the new version of the API but introduces a regression on the old version?

The truth of the matter is that there are only a couple of sunsetting strategies that work, which is:

* In internal service calls, negotiate a sunsetting strategy with other internal teams. The deadline will invariably be pushed forward each and every single time.

* In external service calls, you can announce to the world you're sunsetting the API and advertise it on Times Square,and still a hefty share of your customer base won't know and will still be caught by surprise. The only strategy that works is graceful degradation: start returning 404s and 410s periodically and dial it up until clients feel the need to move on. Dial it back when appropriate to get your point across without denying service, but understand that it will be impossible for some clients to change.


The post you're replying to is describing a straightforward implementation of the strangler pattern.

It's a valid and useful technique. Even if you disfavor it, I don't see why you feel the need to judge the comprehension skills of the commenter.

https://www.redhat.com/architect/pros-and-cons-strangler-arc...


> The post you're replying to is describing a straightforward implementation of the strangler pattern.

I'm not sure you are aware the strangler pattern is completely irrelevant to the problem of shutting down old versions of APIs still in use, specially when the goal is to shed maintenance needs.

Just because you are confident you can serve the same requests through an adapter, that does not mean you have less code to maintain and less work to do. Quite the opposite. And now your services also have to deal with more constraints as you're now bounded to comply with far more scenarios, some conflicting, from the same code base.

Talking about strangler fig patterns makes as much sense as blabbering about interfaces and dependency injection when talking about deleting old code. Irrelevant.


> Your vague observation of how much work a API Gateway config takes is meaningless because you have no idea the impact of any change to the API will have on the older version fed through adapters.

I don't think you've understood what I'm proposing here.

When I release the V2 public API and invite my customers to code against it, they're going to start calling the methods I offer and expecting the response fields I promise. If I make incompatible changes to the API I'll break their stuff and they'll be understandably mad at me.

All I have to do, therefore, is write an adaption layer that adapts the V1 public API which is stable and never needs any new features or fields because it's deprecated to the V2 public API which is also stable because fields and methods do not disappear on public APIs.

I don't know what an "API Gateway" is - given that you seem to think it's a lot of work to set up and maintain and that it can't accomplish this task, it's probably not the right tool for the job?


I don't think you figured out the point I presented you.

The whole point is that new API versions are rolled out because the changes that need to be introduced are incompatible with old APIs. Otherwise there won't be a need for a new API, is there? The whole reason there are different APIs is that they are already incompatible. If you could handle new usecases with adaptors, you'd already be using them. But you aren't.

> I don't know what an "API Gateway" is

Right, and here you are lecturing on API design and present solutions to API problems.


An assumption you're making here that you can even write an adapter layer for the V2 API. There may be things you wish to _remove_ in the V2 API. V2 does not always mean "add more stuff", it can often mean rethinking "what stuff" is provided.


Yes - that's why I included the proviso "if you're truly replacing your API with a new, more capable version"

If your intention is to remove features and fire the customers who are relying on them, that's a different matter.


> If your intention is to remove features and fire the customers who are relying on them, that's a different matter.

No. That is not a different matter. That's basically the whole problem being discussed, and the topic of the blog post you're discussing.


Other strategies that could come in handy before completely shutting down the API:

- Rate-limit the API, with increasing aggressiveness until you're down to 0 requests per unit of time;

- Introduce latency in serving the requests (assuming your edge can handle the increased volume of open connections).

Both of these introduce gradual degradation of the old API, without outright killing the business functionality that recalcitrant customers are nevertheless reliant upon. It helps spur a bit of urgency to switch to the new API, while remaining nice. Many (enterprise) customers will wait until the last minute to switch: essentially they are having to put in work without any tangible feature gain—at least from their perspective.

Regardless of the strategy, one other point I'd add is to monitor actual use of the API; if important customers are still actively using the old API, it would be unwise to shut it off.


Man I really hate the idea of "Let's make a thing that works shittier so that people switch to the new thing". If you're in a situation where you have customer's just be honest about the changes coming and give them enough runway to get the changes made.


Yep. If you gradually degrade performance, and word of your intentions fails to make it to the right people, the victim may well have to devote significant effort to trying to work out what's going on. Boy will they be pissed when they find out.


This is standard practice though. You can only communicate it so much. If your customers ignore it, that's on them. You either have a hard cutoff date or you introduce failures or throttling. The latter is friendlier.


> Man I really hate the idea of "Let's make a thing that works shittier so that people switch to the new thing".

You're conflating sunsetting old services with forced upgrades. Degrading services when preparing for sunsetting the service is a surefire way to convince product managers to bump up migrations in their priority queue. Announcing sunsetting doesn't work. Reaching out directly to stakeholders doesn't work. What works is the urgency of avoiding downtime. The alternative is simply pulling the plug on a fixed date, which is not preferable.


We have customers using a client that was officially deprecated in 2020 and stopped working altogether for six months in 2022, along with as much announcement noise as we could make. A year later I was surprised to realize the environment change that had blocked it was gone…and people were still using it.


Yes, and you should do this after being honest that it's being sunsetted

Set a date it'll be 100% available until then begin degrading it. After all, you didn't guarantee 100% availability/usability after said date

all this to say, if you release both versions, I'd monitor adoption and keep v1 alive if a lot of customers (esp. big spenders/enterprises!) depend on it


Yeah you do that and you'll still have customers that ignore it. That's why you have to slowly degrade it at a communicated point in time. You're not being sneaky about it, it's planned. It's that or you dedicate a ton of manpower to hold everyone's hand for a hard cutoff which will always be painful.


I think GitHub actions cause errors for a few minutes to alert, one could do that also with an API.


Something that can be extremely useful as well in situations like this, is doing API brownouts, toward the later stages of the process. Disable the API for short stretches of time, on the way to disabling it entirely, to give consumers who might not be keeping up with changes a way to be alerted (they'll notice the downtime).


Or instead of losing paying customers:

1. design API accounts to include a preferred server and default server (handy to explicitly load balance, or dynamically bounce users to specified servers.)

2. design clients to have a timed service lifecycle (expiry 2 weeks prior to cert expiry). Then enter a semi-dormant mode until valid signed updates succeed.

3. add a random timed daily update check, and begin reassigning the users to the new API after updates install properly. Also, warn users the migration will happen 2 months before it is set to launch (do a few random A/B tests the first day).

4. Never rely on people to act, or not act for keeping infrastructure running. You are not going to be able to manually update 30000 legacy hosts with a single team. Worst case scenario you must auto-reconfigure the clients for a standalone offline mode... so the next user of the IP doesn't get hammered by failed connection retry attempts.

Brownouts won't work for cached-edge systems designed to reconcile month long intermittent outages. i.e. systems that were designed to handle DDoS, worms, and acts of clod...

Have a nice day, =3


Yes, I completely agree. The story sounds like fairytale: migration was announced to happen in 3 months and in 4 months it was done by removing old API.

Where are all those clients, who are happy with current API performance and do not want to spend their money on making API owners life better? What happened to them? Did the company just decide to let those clients go?


Collecting clients other people hosed is an easy business. Except, entrenched incompetence may still pine for the convenience of a quick sometimes-broken kludge (some folks expect everything to be glitched half the time).

I definitely understand why some techs just stop caring about customer opinions. You'll know when you are in a senior role when one starts to fantasize about being a Plumber. =3


Yeah, one of the biggest problems with API deprecation is that you have zero control over the roadmap of your clients.

If they can't spare the engineering time in the next six months to carry out the upgrade (and you aren't 100% mission critical to their business) they're not going to do that no matter how much you bug them.

Depending on how old the integration is they may not even still employ the engineers who built the first version, which makes it even harder for them to roadmap the work.


In general, most managers just hire external firms to attempt a new version:

1. if it succeeds, the manager looks smart given your services are no longer relevant to their operations

2. if it fails, the manager looks smart as they are not responsible for the external firms business operations

It is a win-win situation for the client, but 100% bad for your business... =3


I really like the API brownouts trick. GitHub have been doing this for years, a few examples:

- https://developer.github.com/changes/2018-11-05-github-servi...

- https://github.blog/changelog/2021-08-10-brownout-notice-api...


I was thinking about this kind of thing. Another idea might be to introduce artificial latency and gradually increase it over time? Maybe dial up rate-limiting? I'm not really sure if this is a better or worse idea, though.


That is a good idea, but I don't think it does enough to serve the purpose - the point of brownouts is to trigger their error/alert system. If the API is just being slower than usual, it won't trigger anything. Even if a human was reviewing it manually (which is quite unlikely), they would only think "oh, their API's really getting slower these days, sad".

There'd be nothing that indicates "The API is going to get shut down in a month and I need to move off of it ASAP!". Random, intermittent API failures would lead you to go check the API status out, and in the process you'd find out "oh, this API is going away".

Edit: On the point of rate limiting, I think the problem with it is that it'll affect everyone using the API all the time, not just during the brownout period. It effectively shuts the API down for everyone still using it (if the rate limit is too low, and if it isn't, then it won't be noticed by low-use consumers).


Hey, some of us look at P95 latency.


It would be really cool to see a graph of the API usage over time with markers showing when each "stage" was occurring. I'm wondering if there were significant dips shortly after each stage, or if it was more of a gradual decline?

It would also be interesting to know if the API was still being used up until the final stage? Were there any ramifications/ angry customers at the door after that?


Given the timeline of just 4 months, I would expect 90% of customers not migrating in time. The percent of those who migrated to the new API after that depends on how vital this service is to them, it would be interesting to see the rate of lost businesses. I know that if a service provider did this to me, I would prefer migrating to their competition, all things being equal.


These are great suggestions.

I'll write about it the next time I'm in a similar situation.


> In addition to offering human-understandable communication, I asked the API producer to add the Deprecation HTTP header field to all responses

Cute, but, I question the value


> Cute, but, I question the value

I came here just to say that. What an half-baked idea. It might be trivial to mindlessly bolt on response headers, but if the goal is that the mechanism needs to be impactful and have consequences then the response header is a big red herring as you're actually relying on clients to implement support for sunsetting the endpoints. If that's the case then you already have meaningful mechanisms, such as passing this sort of metadata in responses to requests for the root resource. This is something that pretty much any HATEOAS spec already supports.


> If that's the case then you already have meaningful mechanisms, such as passing this sort of metadata in responses to requests for the root resource. This is something that pretty much any HATEOAS spec already supports.

Yes, they support it with headers.


Great question.

As a consumer, you can set up an alert when any of your API requests has a deprecation or sunset HTTP header. You'd know immediately if any of the APIs you depend on is about to be deactivated.


You're assuming downstream companies will parse & utilize this header

If they were competent enough to do that, they'd probably pay attention to updates sent via, say, email. Brownouts would make people notice, like "oh shit, $API doesn't work!"


I think one thing the author did not mention is telemetry. It is very dangerous to deactivate the service when there are still incoming traffic. Normally people only deactivate the service when there is absolutely zero incoming requests in an extended period of time.

Using old API as proxy as new API has its merit in that the callers needs to do little to pick up the logic change. However, often the old API are badly designed. Now you basically have two sets of APIs to maintain. When business requests come in, you might need to enhance the old API and the new API to support it. Of course, one could push the callers to switch to new API, but in a large organization, this often involves a lot of politics.

The difficulty of migrating/deactivating an API also depends on the implementation. If the API is a C++ library, it can be hard to tell how often the API gets called as you cannot tell whether the code path that contains your API is executed or not. For a HTTP or RPC service, at least you could put in some logs/metrics to check who is actually sending requests.


This kind of thing is important to consider, and could really change the world for the better in terms of more reliable world... Should we consider adding "user suggestion" as a feature to all APIs whenever possible? Clients could support by showing arbitrary messages to users, servers could say "this api will deprecate on 2024-07-10"

But capitalism crushes this to hell, imagine all the extra work to anticipate future problems, when "good enough, barely works" triggers immediate SHIP IT BOYS, then most engineers will just choose "someday shit stops working, oh well its not like this will be deployed in a nuclear reactor we hope they read the eula about that not being allowed"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: