I've been having major internet issues lately (Seattle area), have had 4 techs c...

texthompson · on March 19, 2020

If you didn't know, that 80% number is probably the result of Little's Law. That's the result where if your demand is generated by a Poisson process, and your service has a queue, 80% utilization of the service is where the probability of an infinite queue starts to get really high. People

Here's a nice blog post about the subject:

https://www.johndcook.com/blog/2009/01/30/server-utilization...

lxgr · on March 19, 2020

This law does not apply to queueing as encountered in routers. It assumes unbounded queues and a poisson arrival process (i.e. a memoryless channel); both assumptions don't hold for packet routers and senders using congestion control (TCP or otherwise).

There is, however, a high chance of encountering buffer bloat if countermeasures are not taken at the chokepoint: https://en.wikipedia.org/wiki/Bufferbloat

Modern cable modems, for example, are required to implement such countermeasures. My ISP is at over 90% capacity and round trip times are still mostly reasonable. (Bandwidth is atrocious, of course.)

drewbailey · on March 20, 2020

How do you monitor this? The 90% over capacity, would like to see where mine is at

lxgr · on March 20, 2020

There might be a way using a cable TV receiver (see my other comment on this thread), but in my case, a sales rep of my ISP just told me on the phone.

mikepurvis · on March 20, 2020

I have an older modem (DCM476) and it definitely doesn't have this or doesn't have it enabled. I have to use/tune queue management myself on the router side.

lxgr · on March 20, 2020

Yes, it's mandatory only as of DOCSIS 3.1, and yours seems to be 3.0. (Supposedly it has been "backported" to 3.0, but that obviously would not apply to existing devices certified before that amendment to the spec.)

eru · on March 20, 2020

To add:

If you have more control over or knowledge of your load, you can safely go higher than 80%.

Eg when I was working at Google we carefully tagged our RPC calls by how 'sheddable' they were. More sheddable load gets dropped first. Or, from the opposite perspective: when important load is safely under 100%, which it is almost all the time in a well-designed system, we can also handle more optional, more sheddable load.

As a further aside, parts of the financial system work on similar principles:

If you have a flow of income over time, like from a portfolio of offices you are renting out, you can take the first 80% of dollars that come in on average every month and sell that very steady stream of income off for a high price.

The rest of income is much choppier. Sometimes you fail to rent everything. Sometimes occupants fall behind on rent. Sometimes a building burns down.

So you sell the rest off as cheaper equity. It's more risky, but also has more upside potential.

The more stable and diversified your business, the bigger proportion you can sell off as expensive fixed income.

stjohnswarts · on March 20, 2020

I've noticed that above 70-80% it gets pretty hard to insure that interrupt timing can be met and balanced with low priority main looping in a lot of my bare metal embedded projects.

op00to · on March 19, 2020

The tech was full of shit. This happens literally all the time. You probably won’t get a “node split” unless more people loudly complain. It’s cheaper for them to roll a tech and hope you get fed up than it is to actually fix the problem.

lxgr · on March 19, 2020

My ISP has been playing the same game with me for months. I finally cancelled the contract when it was about to renew, and I got a very interesting winback call from sales:

Not only did the rep freely share the utilization numbers with me (80% during the day and 90% at night), he also mentioned that things would not get better until end of the year when they would do a node split.

As consolation, they offered me 10x the download speed for half the price. I'm not really sure how that would help congestion...

spaniard89277 · on March 19, 2020

I work in this field in Spain. Margins in this sector are slim, deployment is expensive. EVERYONE works with simultaneity rates, it's the only way to have cheap connections.

In fiber connections is actually not that expensive to split a fiber after a CTO, you can actually sort of daisy chain it, but you want to keep everything as standard as possible.

VHRanger · on March 20, 2020

Margins are not slim at all in the USA

ThePowerOfFuet · on March 20, 2020

You think they're fat in the US? Look north.

richjdsmith · on March 20, 2020

Shh, you'll upset the Great Robelus[1] and they may start euthanizing animals....

[1] https://www.thebeaverton.com/2020/03/telus-threatens-to-euth...

VHRanger · on March 20, 2020

I'm Canadian, trust me I know and hate it.

Some EU relatives of mine keep their phone plans living here because it's cheaper with the overseas rate than paying Canadian plan rates (!!!)

djannzjkzxn · on March 20, 2020

Maybe being in the system with a higher speed tier gets you higher priority?

lawnchair_larry · on March 19, 2020

I don't see what motivation a tech would have for lying about this.

mulmen · on March 19, 2020

I asked a Comcast tech when IPv6 would be available and he said “IP v what?”. Don’t attribute to malice what can be explained by incompetence.

kitteh · on March 20, 2020

That's like asking a telephone lineman about IPv6. Diff layer in the OSI stack.

p410n3 · on March 20, 2020

My 67 year old grandpa has vague idea what ipv6 is.

mcv · on March 20, 2020

He probably was around when the standard was defined. It's amazing this is taking 30 years to replace IPv4.

bitlash · on March 20, 2020

The transition is definitely taking a long time, are there additional reasons for delaying the switch to IPv6 other than the mitigation of the problem with NAT/private networks?

Arnt · on March 20, 2020

It requires cooperation from perhaps fifty thousand organisations (there are 45k ASes that announce more than one prefix, and I'm guessing that there may be 5k software vendors). Some of those have orgcharts that aren't very friendly to this kind of change.

Adding to that, even clueful places may be held back by one or more vendor or provider, all of which need to have working v6 support before you yourself can deploy it.

setr · on March 20, 2020

I thought ipv4 and ipv6 addresses could be provided simultaneously (or rather, ipv6 has provisions to be mapped to/from ipv4); you just wouldn't see any real benefits until you could switch wholesale (because you'd still be limited to whatever ipv4 can do)

That is, it was my understanding that there was no real blocker to supporting it in the interim, except for the lack of any immediate benefit. Though I'm also not clear on whether supportinf both introduces any significant complexity

Arnt · on March 21, 2020

They can be provided simultaneously, that's the normal case.

Suppose an ISP wants to provide IPv6 besides v4. What does that ISP need? Well, first, v6 from the upstreams, that's simple, and v6-capable name servers, routers, that's simple too nowadays.

But there's more. Suppose that the ISP has some homegrown scripts connected to its monitoring or accounting, written by a ninny years ago, uncommented, and some of those assume IPv4, and noone wants to touch them.

Suppose that ISP outsources its support, and the outsourcing company promises to do the needful regarding IPv6 support but never actually does it.

Suppose that that ISP is in a country where ISPs have to answer automated requests from the police or courts, and one of the software packages involved in that has a v6-related bug. Or the ISP worries that it's poorly tested and the ISP's lawyer advises that if there are any bugs, the ISP will be criminally liable.

And so on. Enabling IPv6 may need a fair number of ducks lined up.

loeg · on March 20, 2020

Did you ask them ten years ago? Comcast has had v6 for ages.

PaulDavisThe1st · on March 20, 2020

the point was, i believe, that the techs frequently don't know what they are talking about.

madaxe_again · on March 20, 2020

A lot of techs for large orgs don’t. I had a grid electrician in a while ago, replacing unshielded triple phase from the pole, who was convinced that they only use AC in the US, and that here in Europe it’s all DC, so safer, and this is why I can work on it without shutting it down, mate.

The mind boggles. These people maintain our infrastructure.

magduf · on March 20, 2020

Wow, that's wrong on several different levels. I can't even begin...

I understand that you don't need an electrical engineering degree to be an electrician, but still, these are some fairly basic concepts in the electric power industry, especially the safety aspects, so you'd think someone working on live wires would know better.

Honestly, any halfway-intelligent person who travels internationally should know that Europe runs at 240VAC/60Hz, because this is really important if you want to use your American electronics there without a transformer. (When I went to Europe last, I brought my laptop, and an adapter which does not convert voltage, only the prongs, but that's OK because the laptop's power brick says it works on everything from 100VAC to 240VAC, as do a lot of electronics these days. But you have to check this first, you can't assume! Plugging a 120V-only device into this adapter could cause a fire.)

raarts · on March 21, 2020

Europe runs on 230V 50Hz.

magduf · on March 23, 2020

Yep, you're right.

Luckily, the 50/60Hz stuff really doesn't matter these days except maybe for some digital clocks on appliances.

Godel_unicode · on March 20, 2020

It's instructive I think to look at the job ads for these technicians. It's frequently something on the close order of: can be professional, knows how to drive, can handle close proximity customer service, knows some handyman skills, and oh by the way maybe has seen an Ethernet cable before.

Not that there's anything wrong with that, everyone was entry-level at some point, but engineers who do capacity planning and traffic engineering they are emphatically not.

josefresco · on March 20, 2020

To contrast this, every Comcast tech (3) that's been in my home has been very knowledgeable. Once they see I'm a "geek" they unload with technical knowledge and generally talk my ear off. That's how I learned my town has less nodes/per subscriber than any of the surrounding towns which is why my Internet speed is frequently ass.

op00to · on March 20, 2020

Because he wanted you to believe they were going to fix the problem at a later date so he could go to the next job (paid by the gig) and get you to close the ticket (improve his metrics).

martin_bech · on March 19, 2020

I’ve worked at a major ISP, for a decade, and spotting something like this should be so easy to spot. There are tools on monitoring of load all the time, and areas are routibely getting split etc. to improve bandwith, so I think your ISP are basicly amateurs..

kitteh · on March 19, 2020

The problem is that most companies aren't going to tell you that their peering circuits are running hot or that their internal network or access layers to the end user are running warm at peak. ISPs all do stat muxing and the line is "we make money when customers don't use the service".

They'll be happy to deal with the last mile segment, but anything beyond that is murky and most companies I know aren't going to share much. Helps to have friends on the inside leak some graphs, though.

notyourday · on March 19, 2020

> I’ve worked at a major ISP, for a decade, and spotting something like this should be so easy to spot

MRTG graph, ISP circa 1995. Colorized.

See a flat line? that's congestion. Now figure out where it is coming from. Sorry, we have been doing this for thirty years so I'm kind of cranky. It is not a rocket science.

cpitman · on March 19, 2020

Alternatively, load has gone up across the board in a short period of time, so that preventive scaling has fallen behind and are in recovery mode.

martin_bech · on March 19, 2020

Yes it can, but why would it take several techs, to spot something like load, which is the first thing you would do, it should take no more than 10s to look it up in a tool.

sulam · on March 19, 2020

A "last foot" tech might not even have access to those tools, much less know how to use them.

rwbhn · on March 19, 2020

Rolling out that tech has got to be more expensive than checking the load first.

toyg · on March 19, 2020

Dunno how it is in the States, but here in UK rolling out the tech is basically the first thing they do after the unavoidable "have you tried turning it on and off again" phone call. They just don't trust the customer to have any clue and maybe don't want to waste time doing troubleshooting at their end when it's "probably" a downstream issue.

TheSpiceIsLife · on March 19, 2020

Network Operations should be raising known problem issues to front line call centre staff.

Network congestion issues shouldn't be handed off to field techs to check local loop (last mile) and CPE (Customer Premises Equipment.

wtallis · on March 19, 2020

I'm pretty sure it's standard practice at these companies to never let front line call center staff acknowledge known problems. Sometimes, the automated phone menu will give you a recorded generic message that they are currently experiencing a service issue, but that's intended to convince you to hang up and patiently wait for them to sort their shit out. I've never had a front-line rep be at all useful in diagnosing a real problem.

TheSpiceIsLife · on March 19, 2020

Yeah true.

I guess I need to remember the ISP I work for here in Australia (front line tech support, and then network operations physical security and infrastructure) was widely recognised as the best ISP in Australia multiple years running, so I shouldn't use it as a baseline expectation.

tacticus · on March 20, 2020

So how was life at internode or aussie?

TheSpiceIsLife · on March 20, 2020

Internode.

Yeah, was a good place to work. I was in their Adelaide data centres when iiNet acquired the company.

robocat · on March 20, 2020

In NZ you sign up with an ISP, but your local connection is usually handled by the same physical equipment (DSLAM for ADSL, etc) which is owned by a single network provider.

I’m not sure what the incentives are for an ISP to try to get the provider to fix issues, or even if they would e.g. https://company.chorus.co.nz/what-we-do is notoriously bad for service and the copper network is being deprecated. Locally https://www.enable.net.nz/about-enable/ are doing a good job of service, because they are well subsidised by the government and seem to be effectively operated.

the8472 · on March 20, 2020

> There are tools on monitoring of load all the time

On some days my connection resets 5 times within an hour, which is quite annoying since retraining the connection takes a minute or two. When I call support about it they have zero monitoring in place that would let them know about the recent history of the connection quality, they can only do spot tests of SNR on demand, which of course doesn't show any transient events. According support forum posts of other users they'd have to explicitly enable "long term monitoring" based on user input to get that information.

Of course SNR line quality is an issue separate from congestion, but still, automatic monitoring appears to be limited.

fibers · on March 19, 2020

how can i as a subscriber find out whats the capacity?

lxgr · on March 19, 2020

It used to be possible to determine the downlink capacity and even current usage with a DVB-C receiver and some Linux software, since DOCSIS is essentially just IP encapsulated in MPEG transport streams on a digital TV channel.

More recent versions of DOCSIS have moved away from that layer of backwards compatibility, so you would probably need some specialised equipment, if it is possible at all (I don't know at what layer exactly encryption happens).

op00to · on March 19, 2020

Not amateurs, liars.

dawnerd · on March 19, 2020

So Frontier?

sph · on March 19, 2020

A free alternative to PingPlotter: https://www.thinkbroadband.com/broadband/monitoring/quality

My connection: https://www.thinkbroadband.com/broadband/monitoring/quality/...

In case anyone is shopping for broadband in the UK, I only have great things to say about Zen pictured above. It's so good I just called to upgrade my 80 Mb to a 300 Mb just for fun, meanwhile my quarantined Italian friends are suffering awful internet now that everybody's at home streaming Netflix.

I used to have Virgin fibre and my average ping was 80ms with a ton of jitter. The plot above is my internet while downloading at about 2MB/s average over the past 24 hours, and surprisingly stays the same even at peak download.

nvarsj · on March 20, 2020

I’m being pedantic, but that’s not really Zen, it’s the BT Openreach backend which has really great stability and latencies. I tracked my BT Openreach connection for many years and I never got more than a few ms of jitter, really amazing. However the speeds are not great (70/20), and the coverage is also fairly poor - I'm in a dead zone right now between two local exchanges. So unfortunately I'm forced to use Virgin, which has gotta be the worst ISP in the history of the world (and I have had Comcast!). Terrible network and terrible customer service - I don't know how this company exists.

collinmanderson · on March 20, 2020

That's a neat alternative to PingPlotter. I like that it pings from outside, so no client required. I'll check it out, however, I'm in the US, so I bet it's always going to be high latency.

wh1t3n01s3 · on March 20, 2020

Not friend of yours italian quarantined enjoying 1Gb/s here. Never used Netflix ;-)

abjKT26nO8 · on March 20, 2020

You're describing an issue specific to US ISPs. It doesn't apply to Europe. From what I read even before the pandemic the US ISPs offered rather crappy services. In Europe, particularly in Poland, I don't have and haven't heard about anyone having any issues with connectivity right now, even though the country is in lockdown, schools and universities are closed, restaurants work only in delivery/take-out mode, companies switched to remote work, ... And still no issues at home nor at work.

Don't make decisions about the European infrastructure based on American problems.

v77 · on March 20, 2020

This article is literally about the EU asking Netflix to reduce bandwidth in Europe.

abjKT26nO8 · on March 21, 2020

And the comment I replied to was "literally" about problems encountered in US (Seattle is in US FYI).

nicoburns · on March 20, 2020

Having issues with the internet here in the UK today. Unsurprising given that half of the world has suddenly discovered video calling. Mobile network seems more stable.

Cthulhu_ · on March 20, 2020

In my country (NL), a lot of the backbone of both cable TV and internet on a street level has been replaced with fiber already; I can imagine that in the US, due to the scale, this process is lower. Doesn't have to be fiber-to-the-home, 20mbit should be enough for everyone for example.

I haven't had any problems with my internet (I do have fiber straight into my house, wired network on my laptop, fast.com reports 600 Mb/s), but Skype, which we use for meetings, has been pretty shit in terms of sound quality.

slac · on March 20, 2020

20mbit is not enough if you have kids with retina display ipads looking at youtube!

abraae · on March 20, 2020

I work virtually from New Zealand with my colleague in Lombardy Italy. Today I noticed some more serious degradation in video call quality for the first time.

But mostly I'm amazed how well the internet is working given the circumstances.

pawelk · on March 20, 2020

I'm in Poland as well and I've been working remotely for over three years. Since the lock down started I feel that everything is a bit slower and less stable, but I haven't experienced major issues during usual work hours doing work-related things (maybe except MS Teams acting up). However Netflix is broken most of the time during afternoon hours (when I want to keep kids occupied with cartoons for an hour or so to get things done). Luckily other streaming services work fine.

terramex · on March 20, 2020

In contrast, my internet connection finally started working great since lockdowns started. I suspect my ISP (small local company in central Poland) got some additional bandwidth or somehow finally fixed their infrastructure when they saw increased internet usage among their clients.

op00to · on March 20, 2020

It depends. If there is competition, things can be good. I live in a place in the US where there are 3 broadband providers, and I pay far less than $100 for a symmetric gigabit connection, and I get it too.

rkeene2 · on March 20, 2020

FWIW, the reason nodes typically don't get to 100% is due to something called WRED (Weighted Random Early Detection). As the outbound/inbound queue on your "node" approaches fullness, it randomly selects packets to drop. This signals TCP on the sender to back-off. The closer to full-ness it gets, the higher the probability (weight), so the sender knows to slow down to the slowest link's speed.

I've written more about this problem here [0].

[0] https://rkeene.org/projects/info/wiki/176

eru · on March 20, 2020

Thanks for the write-up!

I wonder how TCP BBR would react here. If I understand it right, it wouldn't need RED to back off: the increased latency of buffers filling up would do that automatically. But BBR also wouldn't let the occasional dropped packet make it back off.

rkeene2 · on March 20, 2020

From what I understand about TCP BBR from reading about it the past few minutes, it would compute a new link speed as a result of impacts from WRED and then use that for the connection baseline speed.

TCP BBR would still rely on RED/WRED to compute the connection rate estimate initially, then it would attempt to send below that rate to avoid packet loss. If packet loss is detected it would recompute the estimated connection rate.

I found this page [0] useful, especially the graphs.

[0] https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/

chrisseaton · on March 19, 2020

> have had 4 techs come try to figure it out

Doesn't sound ideal for distancing.

martyvis · on March 20, 2020

The way the story is written it sounds like their attendance was there was serial temporal distance involved (they didn't come at the same time)

chrisseaton · on March 20, 2020

I mean inviting four different people into your home sounds silly if they're there at the same time or not! I guess people need internet to earn a living though.

kitteh · on March 19, 2020

Comcast's last mile network in Seattle has been struggling in some areas from the morning until around 4 to 5 PM. It's not massive loss, but enough to disrupt video conference. Run a mtr towards an Internet dest and you'll see loss at the first hop and everything behind it.

op00to · on March 19, 2020

Mtr isn’t a reliable measure of packet loss. Routers drop “extra” packets like ping before they drop “paying” packets.

kitteh · on March 19, 2020

Yes I'm well aware of routers policing TTL=1 packets, but if you see consistent loss all the way down it's usually a sign. This compared to seeing individual spikes on intermediate routers which are usually control plane policing.

lxgr · on March 19, 2020

mtr uses UDP data packets, as far as I am aware.

Yes, the ICMP response packets could still be skewed, and the effect you mention is definitely real, but on a good connection, usually there should not be much to drop at all, neither TCP/UDP traffic nor ICMP packets.

cthalupa · on March 20, 2020

>mtr uses UDP data packets, as far as I am aware.

Doesn't matter what it uses (though by default MTR does use regular old ICMP Echo - you have to specify -u or -t to get it in UDP or TCP mode). When TTL expires it still requires an ICMP TTL Exceeded be sent, regardless of whether or not you were sending ICMP through it.

Traceroute implementations in general are probably telling most everyone in this thread a lot less than they think, even without icmp deprioritization being taken into account.

https://archive.nanog.org/meetings/nanog47/presentations/Sun... is worth a read for most anyone that's ever attempted to use traceroute to troubleshoot networking, because they're almost certainly doing it wrong.

jtokoph · on March 20, 2020

This happened to me years ago near the University of Illinois campus (UIUC) with Comcast. I had multiple techs come out but they would only come in the morning when the connection was fine. I finally escalated to corporate who finally told me they needed a node split. I made them give me 100% free internet until the split was complete about 6 months later.

teekert · on March 20, 2020

Since I have been at home I practically live in MS Teams, with constant video chats. Yesterday I did a presentation with 140 people connecting watching my ppt and camera. That's got to be unusual. I imagine most of my colleagues going through this routine daily.

the8472 · on March 20, 2020

> I've been tracking the performance with PingPlotter, if you're curious how bad it is right now here's the last 10 minutes: https://i.imgur.com/AnUqv3j.png

Is your own connection idle though? Pings are also affected by the congestion on your own router†, especially if you don't have good AQM (such as CAKE). Dumb queues will just drop all packets equally, smart queues will do flow isolation and penalize the bulk flows first while keeping the trickle ones (ping, ssh, voip, ...) untouched.

† and anything else along the path to your ping target

illiilliiililil · on March 20, 2020

When I have connectivity issues during a pandemic I make sure at LEAST 6 techs come to make sure I have perfect connectivity to Netflix and chill.

cmauniada · on March 20, 2020

Ho lee sh, that is absolutely crazy.

I am sure its affecting you internet speed, what sorts of tasks are you generally doing now that the entire is state is pretty much on lockdown?

Here in Alberta, although we are told be socially distant, there is no full lockdown and I want to know what kind of issues would I be expecting to run into in the up coming weeks/months?

C1sc0cat · on March 20, 2020

Makes me glad I went with the Business version of Vodaphone in the UK - which is ironically £1 cheaper a month than the consumer.

I suspect its the services that relay on super low prices and don't have excess capacity Talk Talk etc that are really going to feel the pressure in the UK

mulmen · on March 19, 2020

What ISP? I’m on Comcast “Business Class” in Seattle and experiencing occasional slowdowns as well.

dottenad · on March 20, 2020

hendry · on March 20, 2020

Ping plotter looks like a SaaS https://hub.docker.com/r/linuxserver/smokeping/

catalogia · on March 20, 2020

Around the time you posted this, my internet in Seattle was down for near around 12 hours yesterday. I'm not fond of my ISP, but that's unusual even for them.

TheSpiceIsLife · on March 19, 2020

How does it come about that the ISPs Network Operations team didn't know they were saturating a link?

Last ISP I worked at would have email and SMS notifications going to On Call staff.

kitteh · on March 19, 2020

Because the NOCs may not be all that competent. I remember talking to the Cablevision IP NOC back in the mid 2000s about their internal backbone circuits they were running hot that went to a POP we peered with them. I had Cablevision at home and the congestion was breaking my VPN to work. The NOC said "an OC45 was down" (no such thing, it's an OC48) and that congestion is okay because TCP will work with it okay and there won't be a problem. I shutdown the peering session with them force traffic around a diff city (sent it to Chicago). I remember talking to the eng team at Cablevision about their NOC and they had a good chuckle and admitted they're only good for the simplest of operations (link down, go fix).

In some parts of the world running links at 95 percent is okay because look 5 percent left (totally ignorant of buffers or microbursts etc.).

solsticedev · on March 20, 2020

Curious, what ISP do you have? Currently moving to a new place in Seattle and have to decide between Wave G or Atlas Networks.

geniium · on March 20, 2020

Thanks for mentioning PingPlotter, I'll try it out to monitor our connexion.