Hacker News new | past | comments | ask | show | jobs | submit login
I discovered caching CDNs were throttling my everyday browsing (abctaylor.com)
220 points by arcza 11 months ago | hide | past | favorite | 150 comments



I think this is just what you see with typical ISP "traffic shapers".

They try to limit bandwidth to video sites, but since most video traffic is transferred by HTTPS these days they end up just making a massive list of IP's which look like they might be sending video data and dropping some percentage of traffic to those IP's. Most CDN's are probably on the list.

End result is most video sites drop back to SD rather than HD.

If you do a speedtest, it will come out as fast. If you VPN, that will also be fast.

The IP range has nothing todo with it - it is the route the packets traverse and what the packets look like when they pass the shaper device that matters.

You could theoretically find out which device on the path is doing the dropping by manipulating the TTL of packets in a live TCP session and seeing when you get back TTL exceeded messages.


There’s fast.com and speed.cloudflare.com to get realistic speed measurements since there’s no way* the ISP can tell traffic to these sites vs. video traffic to Netflix/Cloudflare.

*Technically some ISPs can (and probably do) look at the ServerNameIndication to determine hostname and therefore whether they’re video flows or not, but I’ve not heard any real world examples of this being done.


Modern deep packet identification gear looks at packet size and pacing to distinguish video traffic patterns. It is possible to fool these systems, but it is not trivial.


Any reason fast.com couldn’t stream a video as the test content?


Yes, this is exactly what it does, and it's a very clever trick by Netflix to discourage their traffic from being throttled!


I think it does pull a real video, I'm not sure its pulled in the same way the actual Netflix video player would do it.


you wouldn't pull at gigabits/s bandwidth when casually watching netflix


dunno about netflix, but youtube usually does. They send you a chunk of data as fast as possible, and then send nothing till your client requests the next chunk.

By doing that, they can get a good estimate of your connections available bandwidth, which is needed for the decision whether to automatically switch to a higher/lower quality feed.

It also means they can use 'dumb' servers which don't do any application-specific logic for throttling.


Recently I've noticed my phone disconnecting from my Wi-Fi router (a relatively underpowered device running OpenWRT and WireGuard) when starting YouTube videos. I guess this could explain it, if the router is hiccuping on that initial blast of data. I also use a Safari extension that lets me choose the YouTube video quality, so that might affect it if I'm not giving the feedback they expect for "auto" quality selection.

(The issue is not the router, though - this doesn't happen on my laptop. I think my phone is just more aggressive in assuming the Wi-Fi is down and it's better to switch to cellular. But couple that with my phone's policy to use VPN on cellular, and the switch becomes much higher friction. I tried simply disabling cellular data, but then it's even worse because every time the phone disconnects from Wi-Fi it pops an alert telling me to enable cellular data.)


makes sense, but in that case it's quite easy to distinguish between a burst of traffic of <1s to a speedtest that last tens of seconds


Are you kidding?

Snooping like this is EXACTLY why we still have unencrypted ServerNameIndication, even to this day in ECH (which still leaves the outer SNI in the clear).


What does the outer SNI tell a hypothetical filtering/traffic shaping ISP that the IP address doesn't already?


lots of sites often just are connections to cloudflare/akami/fastly these days. Ie. one IP, millions of websites. The ISP/feds need SNI to tell what you're up to.


Yes, but the outer SNI (with ECH) in that case would just say "Cloudflare", "Akamai" etc., if I understand it correctly.


ESNI to the rescue?


> ESNI to the rescue?

IIRC, it's now called ECH, because AFAIK the focus changed from "encrypt the SNI" to "encrypt the whole Client Hello".


Computer Scientists pick the best names for things


Uh, no, the focus changed to don't encrypt the outer SNI but pretend we're encrypting something that will prevent filtering, even though it won't.


Isn't the outer SNI something that can be deduced from the IP address anyway?


Can you elaborate? I have not kept up with the eSNI/ECH stuff... How will filtering still be possible?


Do you have more details on that ?


I got called a paranoid a couple of years ago, because I said that my provider slows down long downloads, since every download starts fast, but after a few megabytes they become slower and slower.


Are you using cable by any chance?

This is a pretty standard feature of DOCSIS, I believe. It allows bursting above the provisioned connection speed for a few seconds when there's capacity on the shared network segment.


That’s not paranoia, it’s bog standard traffic shaping that ISPs have been doing since the 90’s or even earlier.


Eh, others in the discussion claimed it would be completely impossible for an ISP to do this thing, due to how network protocols work.

Anyway, annoying… at least http supports resuming, so it is possible to kill and restart manually, or use aria2c to circumvent it.


ISP controlled modem counts your incoming data in a short rolling window. If it exceeds a value, it throttles below your normal speed and buffers or drops incoming packets to slow things down, until the bandwidth budget is compensated for (and the dropped packets are all counted, so it could be stuck there longer than anticipated), then switches to normal speed until traffic goes below a threshold, then switches back to full speed burst. Worst case the flow control reacts slower than the modem burst switching and you end up in an oscillating pattern.

The trick is just to do your own rate limiting without burst at the right limit, that bypasses whatever shitty burst implementation your ISP modem has and weird shit won't happen anymore.


If you set your router to limit bandwidth below that initial burst speed, to your actual connection speed, you'll get consistent fast speed.

The burst mechanism of your provider possibly squeezes below your normal speed to compensate for the burst, but ends up confusing TCP flow.

Those burst and squeeze cycles can cause some undesirable bandwidth oscillations.


Any TCP congestion control algorithm worth its salt should be able to adapt to varying path capacity these days, especially given how common traffic shaping with bursting is in residential connections.


You can safely assume the bursting itself is badly implemented on shitty ISP modems and causes spikes of high packet loss and latency when it switches.

Say you're bursting 1gbps on a connection with 100ms latency. That connection squeezes to 20mbps suddenly. In the best case in those 100ms that the sender cannot even react to the change, we receive 10mbytes. Those 10mbytes that now have to be throttled away either get dropped or end up in a buffer. Now you either have a 5 second backlog latency or massive packet loss for 100ms while this resolves. And things overcompensate. And worst case, you end up in a oscillation pattern, the throttle goes off, and you burst again.

This happens and is measurable.


>but since most video traffic is transferred by HTTPS these days they end up just making a massive list of IP's which look like they might be sending video data and dropping some percentage of traffic to those IP's.

You really don't when every modern browser sends SNI, which exposes the full domain name it's trying to connect to.


In most cases an ISP is also doing DNS resolution so would already be able to pretty accurately reverse map IPs to hostnames without reverse records.


That's much easier to fix (by using DoH or DoT) than getting ECH widely deployed, though. Importantly, it can be done unilaterally by each user and doesn't need cooperation on both sides of the connection.

Many browsers these days support using DoH without changing any OS settings.


That's not actually a fix. As long as some other subscriber does the DNS lookup, then the ip address will be associated with the name that was looked up. If you happen to use a different resolver and get the same address, there's no difference.


If the IP address belongs to a cloud load balancer or CDN, that’s still much better than hostname-level detail. Even for subdomains it helps (think somebody.bloghoster.tld).

And ECH will still reveal the outer SNI, apparently.


It doesn't have to be traffic shaping, it can be something as simple as limited peering capacity. Hanlon's Razor applies, this isn't necessarily malicious capping.

Most ISPs don't have one fat pipe to this magical "internet" place, they have multiple pipes going different places and not all of them are even listed in peeringdb, so their route to a particular CDN might be over private peering which is limited in size (e.g. 1-10Gb). Sure, then excess traffic might go over public peering but that might also then get saturated, or the ISP might force traffic over the private peering to keep the public peering clear for general traffic.

If you use a VPN then the traffic will come from a different route and different peering, thus be subject to the capacity of that route. What the OP probably wants is for Zen to have better peering with Akamai (they already have some announced routes).


You might want to consider switching from Zen to AAISP[1].

Zen used to be decent, now pretty much only the only residential ISP left that offers quality support is AAISP. They also have all sorts of stats monitoring[2] on all customer lines by default, which they expose to customers on the portal.

AAISP will also, unlike many ISPs, not shy away from giving Openscreech a strong poke with a very sharp stick if the underlying issue with your line is due to Openreach. They know how to play the BT game.

No affiliation with AAISP other than knowing a number of their customers, not a customer myself due to completely unrelated reasons which are entirely beyond their control.

They are not the cheapest ISP in town, but it very much is a case of you get what you pay for.

[1]https://www.aa.net.uk/ [2]https://support.aa.net.uk/Category:Diagnostic_Tools


As a smug AAISP customer that was my reaction on reading the title. They’re one of the few honest isps. If they lose connection to your house they notify you immediately and then again once the connection is back up telling you how long it was out for.

Have also seen the openreach situation play out. I tried to get one of the other providers initially and they said that it would take almost a month for openreach to do the install. Contacted AA and they had it done in a couple of days. They actually show the logs of their comms with openreach in your online account so you can follow along.


As someone living in rural America- it is amazing there are so many providers! I literally have a single broadband provider available to me.


> As someone living in rural America- it is amazing there are so many providers!

Sort of.

Certainly when it comes to home internet connections, there is a little bit of smoke and mirrors because BT have their horrid sticky fingers in many pies.

So the reality for many home internet connections in the UK its more a case of finding an ISP who's on your side and that is willing to pick a fight with BT when needed. This also remains the fact with the alt-net home fibre providers, because they will typically just aggregate your building onto a BT connection back to their PoP.

Well, that's slightly unfair, because ISPs clearly also vary on their own commercial decisions such as how "hot" they run their network, whether or not you can get static IPs and whether or not they run CGNAT. And there are some very noticeable quality differences on this level.

The absolute independent choice is only there for businesses in (some parts of) major cities where you have genuinely independent operators running their own fibre end-to-end, never touching the BT network.


I have circuits which run long distance over the BT network as wavelength services. You could argue that's a "BT network", but it's just a piece of glass and some filters/multiplexers. Other providers are available - network rail I believe offer the service for example.

Wavelength services are very different to a "network" though.


The line is usually owned by either Openreach, Virgin, CityFibre, or a smaller local provider. The "ISP" is a reseller that gives you a modem, router, etc. it's not really that competitive in reality


I'm not sure I agree, the ISP rents the bandwidth off the "main" provider on equal terms so the competition stems from the quality of the support and service that comes with that... there's a reason why AA have such an outstanding reputation and also why they aren't the cheapest.


That is how it should be infrastructure is government funded people compete to provide services on it. Just like most roads/highways are. Asking 4-5 different isp to do the infrastructure independently would increase costs by 4-5 times then they would be competing for customers so payback period for infrastructure investment is up in the air. This is the main reason USA has problems with monopolistic isp as who the fuck invest in infrastructure when another isp is already there. So they all build in there own areas and cost of infrastructure become the most that keeps other out.


We had fibre laid a couple of years back but for some reason I can choose City Fibre based ISPs OR Virginmedia on the same line.

I don't think I've ever seen that before, they usually don't share their infrastructure like this.

I can't get Openreach-based FTTC ISPs though.


As someone living in New York City it's amazing there are so many providers.

Having lived in ~5 apartments over the past ten years, I've only ever had a single broadband provider available as well.

Different providers cover different parts of the city, and even if you're lucky enough to be in a neighborhood that gets two, your landlord will only have wired the building for one of them.


Sadly A&A don't offer an unlimited bandwidth plan, which in 2023 seems absolutely ridiculous if you ask me.


> A&A don't offer an unlimited bandwidth plan, which in 2023 seems absolutely ridiculous if you ask me

What is ridiculous to me is that in 2023 we have someone on a techie forum who actually believes in the unlimited fairy.

Go read the small-print of your "unlimited" contract. Especially any sections entitled "acceptable use policy","fair use policy" or similar.

"unlimited" only means two things : traffic shaping and rate limiting.


That may be true where you live, but down here in NZ, unlimited means unlimited (on fibre). Our competitive ISP market means that all major ISPs have completely ditched their fair usage policies.

I regularly use well over 10TB a month and have never had any issues from multiple ISPs. We have a regulation-enforced split between ISPs and the line owners, meaning anyone can set up an ISP for a relatively low capital cost by leasing access to the last mile fibre lines to customers. As such, there are plenty of nation-wide ISPs here.


> here in NZ...all major ISPs have completely ditched their fair usage policies.

I'm quite fond of NZ and I've heard good things about the quality of your telecoms regulation, but ....

     We manage traffic which may influence your broadband performance. This means we might have to pause, restrict, end or slow the performance of your service if it’s necessary for us to protect our networks or manage traffic over our networks. See clause 6.6 of our General Terms for more details.
and

     You need to use our services fairly – we’ve set out our rules on this at clause 2.6 of our General Terms.
and

     Fair use by you: You must use our services fairly. This means you agree to use them in a way that’s not overly excessive or unreasonable. This policy is based on how most people use the service and helps us make sure everyone using it gets to enjoy it. If we, acting reasonably and in good faith, believe your use is excessive and unreasonable, we might need to restrict the service or stop providing it to you. 

All from Spark NZ's conditions[1][2] for their "unlimited data" fibre product[3].

[1]https://www.spark.co.nz/help/other/terms/personal-terms/esse... [2]https://www.spark.co.nz/help/other/terms/personal-terms/gene... [3]https://www.spark.co.nz/online/broadband/buy-plan?category=f...


Okay, so perhaps 'all' was a little optimistic. Spark is the old government monopoly, once called 'Telecom'. As such, they have inherited a huge base of users who don't pay any attention to which ISP they are using, or don't even know what an ISP is. As such, they're basically non-competitive.

There's rarely any benefit to using them compared to the competition - they are more expensive and refuse to join free internet exchanges, so Spark users frequently experience bad routing to services which refuse to pay money to Spark to peer.

What I can tell you, though, is that I have never once heard of Spark warning/booting a user for excessive usage, and I have my ears to the ground at various NZ tech forums.

Other ISPs, however:

https://www.orcon.net.nz/terms/broadband

    If you are on an Uncapped or Unlimited Plan, the total amount of data you can upload or download is unlimited. We may use traffic prioritisation policies for these Plans to protect our Network and improve the overall performance amongst our customers. 
(This is the Vocus group, including 2degrees, Slingshot, Flip, 2talk and Stuff brands)

https://main.prod.vodafonenz.psdops.com/_document?id=0000018...

    Our policy is to provide you with the best broadband experience possible, so we won’t slow down or throttle your connection.

    One New Zealand does not have a fair use policy for One New Zealand Fibre, HFC, VDSL or ADSL broadband
(One NZ, old Vodafone)

https://care.zeronet.co.nz/hc/en-us/articles/7436185566863-N...

  Zeronet does not enforce a fair use policy when your connection is used for standard home use.


In the UK, because ISPs "abused" the fact that were was no formal definition of "unlimited" (NTHell kneecapping your speed if you downloaded too much in a given period for example), ISPs now have to disclose up front in plain language things like traffic shaping and speed limiting (and adding them after the fact is grounds to exit your contract early - heck my last ISP increased their standard pricing (which would effect any discounted pricing they offered) by a couple of quid a month and had to offer anyone who wanted to the ability to exit their contract early at no charge).

Once a few ISPs started offering "truly unlimited broadband" where they couldn't hide anything, (the big) ISPs that did shape traffic speed limit where now fighting on their back foot, so most of them stepped up and started offering "truly unlimited broadband" too.

Some networks/ISPs may lower speeds in an area due to capacity on a backhaul, but if your speed drops below the minimum outlined at the beginning of your contract for longer than 30 days you can exit your contract early.

So most of the ISPs (in the UK) who offer unlimited contracts these days will have a FUP but they mainly focus on a) no reselling their services b) not running open proxies c) not sending solicited bulk emails, spam emails, calls, sms, etc (because the FUP also covers the landline bundled in with the internet connection it covers things like 118,0871/2/3 service limits and mass calling). They tend not to fret about download caps in their FUPs. Heck I was on a unlimited 4G connection for a while and easily used over 1TB per month and didn't even get a text message asking me to "tone it down".

EDIT: Just checked and my providers FUP is 3 (well 2 and a third) pages long, written in easy to understand language (no legalese), and hasn't even been updated in the past 5 years.


Sure but it's also the colloquial name for a fixed-price variable service with no hard shutoff plan which is the plan everyone wants.

Like I can't understand why so many people get butthurt about the word unlimited when I can play hours and hours of video games, have multiple 4k streams constantly going, casual piracy and two adult fully remote workers in video calls all day for like $80/mo and not a penny more.


The standard monthly quota plans are 1TB and 10TB for residential, and 2TB and 20TB for SOHO/business, which is pretty workable for most use cases.

Compared to some residential ISPs where I've seen caps as low as 15GB/month..


> The standard monthly quota plans are 1TB and 10TB for residential, and 2TB and 20TB for SOHO/business, which is pretty workable for most use cases.

How much does it cost to upgrade from 1TB to 10TB?

I don't care how common a cap like that is, or what abominable caps some other service might have. 1TB for a whole month is only 3 megabits per second average, and that is not a good amount. And it should definitely be possible to buy more than 30 megabits per second average for a residence. For a business, 20TB in a month is good for a few employees but breaks down very fast if they start working remotely.


> How much does it cost to upgrade from 1TB to 10TB?

£10/month


Okay, that's pretty reasonable then.


This is what made them a no-go for me - they may well have superb service, but their service would be expensive even without a bandwidth cap. And 1TBbof bandwidth in this day and age is nothing.


>> Starts with ADSL and 1TB/month download for £25.00

Wow.


How much do you need? 10tb/month which is their higher quota is a lot.


More than 1TB a month. Just downloading one game from Steam can easily be more than 100GB these days. Downloading ML weights churns through gigabytes easily, as does downloading or streaming 4K movies.

What's the point of having gigabit download speed if you have to constantly worry about using up your cap? And paying an extra £5 for every 250GB extra seems like some arbitrary punishment for power users.

They don't pay for infrastructure based on bytes transferred so why should I as a customer pay this random made up number?


They say:

> The second reason, which mainly makes sense with our terabyte services, is to have a high limit but one that deters the really heavy users, i.e. the people that literally use hundreds of times the typical usage. By excluding such customers from these services we can provide a faster and better services to our customers


That’s still not going to use 10tb. Maybe 2 or 3.


I already have an AAISP line coming on Monday! Very excited. AAISP = the new Zen.


The biggest difference you'll find is that when you phone support you'll probably be having a conversation with someone who knows more about networks and routing than you do. Doing first line support.

I figure the only way this is economic is because they have a technical customer base who phone support maybe twice a decade.

Years back I had a conversation with someone who had done a couple of ISP mergers, and his opinion was that techie-focused ISPs look fantastic on paper because their support costs are really low. Then someone buys them, tries to expand their userbase to "regular" people, and their support costs fall in line with the rest of the industry (eg Demon -> Thus -> Vodafone).

So far, A&A have avoided that fate.


So many people rave about the support A&A provides, but I don't really understand why support is so important to people.

Isn't a stable connection more important?


Not much they can do to improve that, as they're relying on Openreach same as everybody else. But when you phone them up because your copper pair dies every time it gets soggy and they say "Yeah, I can see it in the logs. That's a bit crap. Nothing I can do from this end but I'll get on to Openreach and email you in about ten minutes with an appointment" instead of "Ok, I'm going to talk you through rebooting your BT Hub", it leaves a positive impression.

Oh, and on the subject of stability, they send me an SMS every time my line goes up/down. Even if it happens at 4AM. They don't have to, other ISPs hope you won't notice, it's just a nice feature.


> I don't really understand why support is so important to people.

Support is not important, until you need it.

There are alt-net providers out there (who shall remain nameless) whose engineers are trained to practically breathe down the necks of their customer just after installation in order to get them to post a 5-star review on Trustpilot. However the reality is when the customer has an issue down the line, they discover to their horror that the post-sales support is shit.


It is, but in many parts of the UK BT's equipment is knackered. So you rely on support to poke BT with a big stick to get things fixed quickly.


> Isn't a stable connection more important?

Support is for the times where the connection isn't stable, and it appears AAISP handles these situations better.


>Zen used to be decent,

I remember they were often referred as one of the best on ADSLguide.uk. At least in the late or early 00s.

It is sad to see the state of UK ISP not improved much after 20 years.


> They know how to play the BT game.

Ever since the introduction of the "Automatic Compensation Scheme" the larger ISPs (zen being one of them, the scheme covers about 80% of UK customers) have had a bigger stick to whack Openreach with as Openreach have to contribute to the payments to end customers. (Some altnets such as CityFibre have also started signing up to the scheme, but iirc CityFibre don't pay per day for faults, only missed appointments and delays to connecting a new line)

One thing ISPs can do (but increases their costs) is increase the service level on your line, standard service is iirc end of 2 full business days, which most residential lines are on. Next level is end of the next working day, next level is end of same day (if reported before 1PM and next day if not) including Sundays/Bank holidays. And the next is 6Hr fix around the clock. But the higher the service level the more the ISP has to pay OR for the line. I can see "more specialised" ISPs such as A&A increasing the service level of the lines but passing that cost on the customer (one of the reasons of their increased cost, but as you said you get what you pay for, though this is just speculation as I have no idea if they actually do, just seems like something they would do - A&A are known not to be fobbed off by Openreach).

However the "Automatic Compensation Scheme" and increased service levels still doesn't stop Openreach taking their time. Recently had an FTTP outage which took a week to clear, Blinking PON, first tech who turned up determined there wasn't an issue between the pole and the premises but as he was "first level support tech" there wasn't much else he could do, he did however call someone at the exchange to poke around which a few things (which he told me was "above that persons pay-grade", so he shouldn't really be doing it but it was worth a shot as its fixed issues in the past) but no dice and the call would have to be escalated (They said it was a one-way light issue).

Next call was a missed appointment, Booked the callout, waited in until an hour after the time, called the number I got on the text and it was the same person as the callout before, OR had screwed up and put it back on his job list instead of escalating it, not the end of the world, just wish they had told me so I didn't waste most of the day waiting for someone who wasn't going to turn up.

3rd call out and this tech re-splices every splice between my house and the exchange, tech told me it was four splices (not including the splice in the "customer service point"). Still no dice. At which point they believe its a problem with a fibre card in the exchange, it will need resetting which they don't want to do in the middle of the day because it will knock everyone connected to it offline while it resets. Fair enough, he tells me they will get it done overnight, next morning still flashing PON, call the number for the 3rd tech and they confirm that the card wasn't reset and they will get it done that night.

Next day, Still Blinking PON, I'm just about to call the tech again (this is about 8am) but as I'm pulling out my phone PON goes static green, I start rebooting and reconfiguring kit (I was using my own 4G back up, I hadn't config'ed auto failback, so was killing one connection and restarting the other) when my phone rings, its the 3rd tech to tell me that they hadn't reset the card overnight again so he did it himself, something he wasn't supposed to do but heck it was getting silly at this point as it had been a week and from his end it was looking like my ONT had connected and was calling to make sure I was connected (which I was).

So even Openreach have to take the advice of the The IT Crowd every once and a while, because it turns out my week long outage was cleared by turning it off and on again.


Huh, I was with Zen + they got openreach to fix a crap cable + connectivity went up from 20mbit to 70 on my last ADSL connection.


This behavior would be fully explained if the Akamai <-> Zen interconnect is simply overloaded.

Internet connectivity is not transitive, throwing a VPN into the mix changes the A <-> B scenario to A <-> C <-> B, which can have very different properties, since the paths may have very little in common. For multihomed A and B, the paths may in fact have nothing in common at all.

Same applies to IPv4 vs. IPv6, the routing may be entirely different, especially with a CDN you might even straight up get a different CDN instance.


Exactly what I was thinking reading this. A CDN has no interest throttling things, it will hurt their performance metrics. And if they think your IP is "bad" you'll get a straight up error or captcha, not just some packet loss.

But a link between an ISP and a CDP provider being overloaded is quite common. The ISP is trying to get away with the minimal infrastructure investments possible, and good interconnect is expensive.


Time for a soapbox rant about interconnect prices, which are based on a "How much ya got" pricing instead of "How much it actually costs to hook up".


To be fair this is mostly a problem between ISPs, or with ISPs fleecing their (business) customers. A restrictive peering policy for a CDN is just batshit stupid, and the CDNs know this and will peer with anything that doesn't run away fast enough. You do have to meet the CDNs at some PoP though, and that getting-there part is generally the real issue. Anything last mile is just… ugh.


Many ISPs have been lobbying for a tax on "content generators" which they say they would use to pay for infrastructure upgrades (or line the pockets of their shareholders who knows).

The EU has a consultation on it, although I think it'll fail to get traction.


> The EU has a consultation on it, although I think it'll fail to get traction.

Thank god it'll fail to get traction.

Many ISPs have been receiving boatloads of government subsidies to build out infrastructure, and they have very little to show for it. There is no reason to believe this would be any different.


I had an issue recently where my Spotify playback kept pausing due to being unable to download the songs quickly enough on my 75Mbps fiber connection.

My ISP has a strong presence on a local forum where I posted my issue.

Long story short, despite my ISP actually having an Akamai cluster on their own network, Akamai’s DNS was resolving my ISP’s customers to a cluster on a different ISP’s network.

That different ISP either had terrible peering, or the theory is they were throttling their Akamai cluster’s IPs to other ISPs.

Fortunately my ISP managed to convince Akamai to fix the DNS resolution.

Needless to say, I’m super impressed I can actually get the attention of the right people at my ISP to resolve this kind of issue.



Perhaps the blog could use a CDN? ;)


* UPDATE * pride has been swallowed and Cloudflare is fronting the site. Yes, it's ironic. Still sort of self-hosted and not on EC2 :)


To get more out of it, don't forget to create a "page rule" to cache all content (including the page). By default Cloudflare only caches things like images, js, css, etc.


don't even :) well, a second copper connection is getting installed on Monday with a less awful provider.


Just curious if MSS or PMTU blocking has anything to do with the problem.

In the 2 different Wireshark dumps, a relevant difference is MSS=1460 and MSS=1380 in the second one.

I'd recommend setting the local NIC MTU to a low value just to see if it has an impact. However, the Wireshark dump doesn't show packet fragmentation, so perhaps this isn't a problem at all?



This is quite a common issue with PPPoE connections like the one OP seems to use with his own router. You need to increase the MTU of the physical underlying ethernet connection to 1508 to allow a 1500 MTU for the encapsulated packets inside the PPPoE tunnel. Otherwise you'll run into weird issues and unreachable websites.


You also need to make changes on the PPPoE server, which is hard because if a provider is running PPPoE in 2023, they probably don't care about doing things well (but maybe I'm just bitter about CenturyLink)

I have a browser based mtu test http://pmtud.enslaves.us/

Currently IPv4 only, requires a somewhat recent browser, and client to server testing is iffy, but if you start the test and get OK in the notes field for both directions, your MTU settings are probably fine (or something is doing proper mss clamping between your client and my server, my server is limited to 1500 MTU so problems with jumbograms can't be detected)


I think openreach requires all their customers to use pppoe - certainly I've had it with both BT and A&A


Increasing the MTU at the sender side to something >1500 is not a great idea. It’s unlikely that the path will support 1508 byte end-to-end.

A better idea would be to reduce the MSS inside the tunnel.


The MTU is only increased between the router and DSL modem to account for PPPoE overhead, so that the MTU inside the PPPoE tunnel (and thus to the internet) can be a standard 1500 (otherwise it would be 1492).


+1 ^^ This

Set MTU on affected systems to 1400 or implement MSS clamping via firewall, etc.


So I have seen this before - a lot of ISPs now days are using "optimiser" boxes that are designed to throttle Elephant Flows (https://en.wikipedia.org/wiki/Elephant_flow) to reduce overall consumption. Usually they add a little bit of buffering or the occasional TCP congestion notification to cause a client to back-off and (for example), reduce the streaming video bitrate. But I've also seen bad configuration that can cause this sort of issue - e.g. an mis-configuration that limits you to 2kbps vs 2mbps. The reason the Wireguard tunnel works fine is because it's UDP-based and you can't trigger the same congestion notification behaviour over UDP. These boxes are usually inline to your traffic and are often referred to as "middle-boxes" - more commonly they're used in mobile (4G/5G) RAN aggregation networks where bandwidth is more scarce but they're now being sold into fix-line network providers as a cost-cutting measure.



Looks like net neutrality is going down the shitter. Except it's not the way ISPs would have originally wanted, with CDNs taking stewardship of what's allowed and what's not.


The whole net neutrality argument was made by people who've never operated a large scale network. I've shifted Tbps for a big content provider for some years now and I also worked for two ISPs.

The idea of paying for premium access and it negatively affecting the competition is looking at the challenge wrong. It also presumes that ISPs have one fat pipe that gets divided up, which is not usually the case unless you're a tiny ISP.

What actually affects performance and is probably the case with this user is that ISPs and CDNs need to come to an agreement over what connectivity they peer with. At scale they don't do that over public peering, it's private peering either through a third party (like Equinix or Digital Realty) or directly patching fibres within the major data centres and linking their networks together. New and unusual services will likely use public peering instead of private peering, or they won't use an tier 1 CDN like Akamai who an ISP would peer with, instead using someone like Bunny CDN, a fine CDN but not peering on the same scale.

The fairness risk comes not from the CDN or content provider 'paying' for priority, but comes from the ISP not investing in public peering. That's not the content providers fault, it's just bad operational practice. You could say it's the content providers fault for subsidising the route that gets their traffic through, but its really the ISPs poor infrastructure investments playing out. There's a small risk from content providers doing deals with ISPs to "zero rate" traffic where that ISP (or more usually cellular provider) charges users for bandwidth (or caps it), where the big content provider can use their leverage to make their service cheaper to the consumer. But the reality is that zero rating isn't particularly commercially popular, I've seen it once or twice in my career.


Business of premium CDN is providing the bits as fast as possible (to legit users). This is what customer pays for. If the experience is worse on a CDN, customer will quickly put their traffic and money elsewhere.

Strategy might be different for a free-tier/cheap CDN.


HN's hug of death

https://web.archive.org/web/20231123142332/https://blog.abct...

Funny thing is the author apparently doesn't use the caching CDN, thus users are not getting throttled but having 503...


the author actually _is_ using a CDN now :D


https://web.archive.org/web/20231123121535/https://blog.abct...

In case anyone else wants to read it while it's hugged to death.


I’m with zen. This lines up with my experience. Time to move to AA.

Except farnell.com which is shit everywhere because their entire platform is a turd.


I'm with AA, had problems very early on, traffic was really slow.

Hopped on IRC and nobody was talking slowness, so I was assuming it was my end still, but then somebody mentioned sluggishness, so I spoke up.

Quick traceroute later and within 5 minutes I had a new pppoe user to try, which moved my routing to a different router in docklands, and all was good. 10 minutes later they've shifted everyone to that and taken the router out for investigation.


> Except farnell.com which is shit everywhere because their entire platform is a turd.

Farnell the company is also a turd, another example of a once great company that has gone to the dogs.

You can't even trust the stock numbers on the Farnell wesbite anymore.


Ah yeah. Place an order, find out an hour later only 75% of it is actually in stock. The 75% comes in 5 boxes from various locations at random times over the next week.

At least it's not CPC. They sent me an empty box once.


There are a lot of problems reported on thinkbroadband forums for zen about some new routing configuration they are doing, which is meant to be better but is in reality a lot worse.

I wouldn't be surprised if this is related to that someway!


I'm with Zen and don't have the same issues, even farnell.com works fine


I thought farnell was an electronics store.


> I thought farnell was an electronics store.

They are. The name only comes up here because the blog post used their website as a test target.


I had a weird one on my network I never managed to solve before I moved:

I had symmetrical 1gbps up and down. When wired, I could get nearly the full amount on the WAN. When wireless, I could only get 300mbps to the WAN.

However, when wireless, I could get ~800mbps to another device on the LAN. I could also get 800mbps to the internet if I proxied from my wireless devices to my wired device before going to the WAN.

My router company sent me two additional routers, one with a similar chipset and one with a chipset from a different vendor and this persisted. I checked it with a competing router and it persisted.

It did not matter what the wireless device was, Mac, windows, phones, or tablets, and it persisted.

Moved somewhere else with a different ISP and it immediately stopped. I still don’t know how an ISP would identify and throttle a wireless device, but that was pretty much the only explanation I could come up with.


I wonder if this was some weird coupling/interference between wireless and the actual modem/NIC in the path to the internet? Bouncing off a separate machine would've desynced the two streams just enough that the wireless interference doesn't clash with the outbound NIC (or the other way around - the outbound NIC of the modem was causing interference which was slowing down your wireless).


The router was plugged into a Cat6 cable which ran to a switch in a closet in the building’s networking room, which was connected by fiber to the network.

No modem within 15 floors.

Interesting guess though, wouldn’t have thought of that.


- Could it be that there was a difference in IPv6/IPv4 presence/preference between the two? E.g. your wired connection acquired an IPv6 address while the wireless was IPv4 and had to go through NAT.

- Did you make sure to compare results for non-concurrent speedtest? Ie maybe those ~800 were actually 4x~200 - many speedtests open parallel connections by default.


- All IPv4 and NATd.

- The speed test difference was consistent no matter the measurement tool - speedtest.net, fast, actual file transfers, etc.

We got pretty far down the rabbit hole with diagnostics. TP-Link actually spent a significant amount of effort supporting me - had a debug firmware doing packet captures, testing different hardware acceleration settings, sent me multiple routers with different chipsets, etc.

I brought the hardware with me when I moved and I do not have the issue.


Aside from TP-Link consumer segments being unreliable and buggy - I wonder if TTL comes into play here. I know some mobile ISPs have been throttling or block traffic based on packet TTL (as a proxy for tethering) but it's not something I've heard of otherwise...


I haven’t had issues with my hardware from them, I confirmed the behavior with non TP-Link hardware as well, for what it is worth. I was actually impressed I ended up taking to an actual product engineer.


Oddly enough

Pinging abctaylor.com [82.71.78.1] with 32 bytes of data:

Request timed out.

Reply from 82.71.78.1: bytes=32 time=186ms TTL=55

Reply from 82.71.78.1: bytes=32 time=208ms TTL=55

Request timed out.

Reply from 82.71.78.1: bytes=32 time=200ms TTL=55


I got blacklisted by Akamai once for some very lightweight automation of web page screenshots (once every ~5 seconds, different sites). Do not recommend the experience.

The ironic thing was I was blacklisted from loading Akamai's help pages about what to do if you are blacklisted. I never did find their tool, I wonder if it would have been blocked too. https://www.akamai.com/us/en/clientrep-lookup/

The ban expired after about 3 days.


You should look into image optimization, especially when you are self hosting. Use thumbnails for big images, webp looks decent enough and files are smaller than png. Prefer system fonts - why should you serve those too, when each visitor usually have dozens of them available on their device already? Oh, and favicon really can be smaller.


Yes what's up with that 2048x1536 image in the header that gets loaded with the page without a thumbnail?


I had a problem with Zen recently-ish too. Ultimately was an Openreach thing at the local exchange apparently. The good Zen support was still ultimately there, but it took a little time for things to fall into place. Standard L1 checklist inflation. Thankfully though Zen are one of the few ISPs where I felt like it was worth it to send packet traces because a decent chunk of folks there would know what they are.

On the other hand, I think any ISP at the mercy of openreach is doomed to have limited support.

I have fibre to the property, and was having periods of 1hr-2hr day of my gigabit speeds dropping to 4-5MB. openreach themselves were blindly sending engineers to look for an issue that couldn’t physically be at my house.

Not much you can do there either as an ISP or as a customer besides wait for openreach to figure out they’re wasting their own time


How odd, I'm also with Zen, albeit on 900/100 FTTP and have no such issues, but then again, I also have a /48 IPv6 prefix delegation and so whatever wants to use IPv6 uses that.

BBC, Farnell, everything else - just works, and works fast.


I guess that there is a congestion somewhere in the path, maybe between your ISP and CDN. I have been worked in an ISP for a while and this was the root cause of problems like yours.


I'm with Zen and have a good experience over the last few years, one of the only companies where customer service has been decent - when I was on ADSL the fault finder on their router helped identify a nearby a problem, once BT openreach replaced the cable connectivity was really good, probably would have been given the runaround by another provider and had to live with a flakey connection.


A short story on how I diagnosed half the Internet being broken for me, but the rest would be perfectly fine. And no, it's not DNS for once.


It's probably not traffic shaping, its almost certainly a peering capacity issue between Zen and the major CDNs. Zen probably needs more/better peering with Akamai and the other major CDNs than they have. I've written more complete answers about this elsewhere in this thread. Hanlon's Razor applies in this case.


Where is the discovery though? I don't follow how you got to the point where you think it's CDN's that are throttling you. For all I know it could be something like a faulty router, right?


Its not, because the same infrastructure doesn't crap out when through a tunnel (same routers and same ISP network, same MTU on my VDSL router etc)


>> Stable 6ms ping to 1.1.1.1

Please note, pinging public DNS servers is a useless metric, because you would never know if your provider hijacks your DNS packets or even all traffic to those public servers.


good point. what's the alternative? I like 1.1.1.1/8.8.8.8 etc as its memorable.


> I like 1.1.1.1/8.8.8.8 etc as its memorable.

Yes.

> what's the alternative

a) some ISP targets, eg mailcluster.zen.co.uk

b) lg.he.net and bgp.he.net


thanks


I doubt it is a global throttling by an IP address. The most common reason is ISP traffic shaping and ISP ingress/egress deals with the networks it is connected to.


Why don't you setup IPv6 if that solves the problem for IPv6 enabled sites?

You also seem to know your way around networking as well, genuinely curious.


If this didn't work for you earlier, the blog is now behind a CDN. Any good technologist would put practicality before pride :)


I presume you have a cell phone? Could you run all these tests hotspotting and contrast them?


The good ol' Hug of Death.


Seems like this site could used a caching CDN lol


The author seems to be serving it from their home network at 7Mbps upstream, which probably isn't quite up to handling HN front page levels of traffic


Correct, my crappy VDSL2 connection is not cut out for this level of traffic. I am grateful for the traffic from HN nonetheless :)


I didn't know people still did that.


I would love to, but I'd very much prefer a static IP for that (instead of reverse proxy / wireguard shenanigans) but getting one is prohibitively expensive where I live. Basically, I'd need to purchase the big business package from my ISP.


Lots do. I have since 1998. It's only become easier and better with time. Join us.


Home hosting is neat but I'm not going to use the phrase "easier and better" unless I'm talking to someone with a much faster upload than single digit Mbps.


Single digit mbps is fine for 99.9% of the time. The slashdot effect would take down shared hosting just as easily.


With image-rich content, 5mbps will be visibly sluggish with only one visitor and even a few people you know poking it at the same time well have a bad experience.

Judging by https://unixism.net/2020/05/what-kind-of-traffic-does-hacker... and https://news.ycombinator.com/item?id=30481230, surviving a couple loads per second up to 25 will get you through many slashdottings, and with a solid symmetrical home connection you have a very good chance.

If you have video, you're not going to survive a slashdotting, but 5Mbps will let you have about one viewer with a smooth experience, while 20-30 viewers could watch the same content on 100Mbps. Or maybe you want to deliver 4k and it's zero versus several peak viewers.


The best part about personal websites is that you don't have to survive 99.99% of the time. It's okay if people can't access it for a day. No big deal.


If I want to tell my friends about a new post, I want several of them to be able to click the link at the same time! And not feel like they're walking through mud.

This isn't about getting tons of nines of uptime, this is about people enjoying the page a strong majority of the time they're visiting. That needs a certain amount of speed unless it's a super lightweight page.


It's not like when you post a link in a chat they all load it at the exact same time. It's spread out over a minute or few. I'm currently on a relatively slow Comcast connection with 5 megabits/s upload and it works just fine for hosting and posting links for several (or more) people to look at.


I still do that.


503 error

Oh the irony




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: