Why Does It Take So Long to Connect to a WiFi Access Point?

IshKebab · on Jan 22, 2017

And why is there no proper 'invalid password' response? It always seems to me like if you have the correct password the connection succeeds and if you have the wrong password the connection just mysteriously fails and the computer has to guess that maybe it was because the password was wrong? In fact I've had a Macbook tell me the password was wrong when it was definitely correct. I imagine it's because they have to guess the failure reason.

gipsies · on Jan 22, 2017

If you're using a pre-shared key, the password is verified during the 4-way handshake. The thing is, if your password is wrong, then the Message Authentication Code (MAC) of the messages your are sending is wrong. The AP will simply drop frames with a wrong MAC, and will not respond to them. The problem is that as a client you do not know whether the AP is not responding because (1) the MAC was wrong, and hence the password was wrong; or (2) the message did not arrive at the AP (or you did not receive the response of the AP).

tl;dr: can't tell the difference between dropped messages due to wrong authentication check (i.e. wrong password), or dropped messages due to bad connection.

zaroth · on Jan 22, 2017

I think this is half right. There is still an L1 ACK, so the STA doesn't have to retry sending the packet, it knows it was received.

I believe what happens is AP sends Nonce to the STA, STA uses the PSK to send Message 2 back to the AP. It will receive a '802.11 Ack' but then no Message 3 of the 4-way handshake will ever come from the AP.

Good drivers see this and flag an invalid password warning back to the user within milliseconds. But bad drivers... sure, they will just keep assuming magic dust got in the way and if they just keep retrying the handshake enough maybe they will see a Message 3.

I'm not sure why from a security hardening perspective it's better not to specify the AP should send '802.11 Disassoc' immediately after receiving an invalid Message 2 with a proper error code so that the driver can message the UI that the password is wrong instantly.

gipsies · on Jan 22, 2017

Not really. The STA may know that its message was received, but can never be sure whether the AP replied. The reply from the AP could have been missed due to noise, or maybe it didn't reply at all. You cannot be sure. There are only heuristics.

Good drivers indeed tell you whether the message arrived or not. But it's up to the client to decide what to do with that information. And again, it's just a heuristic. I've read and messed with the code of four different Wi-Fi clients, and none of them attempt to detect a bad password this way. Most simply report an error after trying to retransmit message 2 multiple times (e.g. if wpa_supplicant got message 1 from the AP, but didn't get a reply to message 2, it warns that maybe the password was wrong).

IshKebab · on Jan 23, 2017

Well that is a stupid design!

bwat48 · on Jan 22, 2017

Depends on the operating system. For example on an Android phone it will usually say 'authentication problem'

IshKebab · on Jan 23, 2017

Not always. It often just disconnects without any message.

zwieback · on Jan 22, 2017

This is great work but I think the machine learning part is just thrown in for trendiness reasons. More rigorous statistical analysis should lead to a cleaner algorithm without relying on pretrained classifiers.

gwern · on Jan 22, 2017

I'm a little more bothered by the glibness of moving from 'we have a good prediction/regression algorithm' to 'we can causally control and optimize connections using it'. When they claim in the abstract that

> Based on the measurement analysis, we develop a machine learning based AP selection strategy that can significantly improve WiFi connection set-up performance, against the conventional strategy purely based on signal strength, by reducing the connection set-up failures from 33% to 3.6% and reducing 80% time costs of the connection set-up processes by more than 10 times.

They don't actually demonstrate this and don't know that it does improve WiFi connection performance, because they don't benchmark it on any real-world devices. It's pure extrapolation from the algorithm's predictive performance on the dataset. (And there are some suspicious inputs to the algorithm like time of day.) When they write on pg9 about where they are pulling these numbers from:

> To evaluate, we first divide our connection log dataset into two parts, each subset contains 50% of the overall data. ...This fresh dataset ensures that we can accurately evaluate the performance of our algorithm if we deploy our algorithm in the wild, where many of the APs will not be seen by the mobile devices before.

They're just wrong. Splitting like this only ensures good out-of-sample performance when drawing from the same distribution, but when you use the algorithm to make choices, the distribution is different. Correlation!=causation; it's no more guaranteed to help than data mining hospital records and finding that antibiotics apparently are killing patients and so hospitals should stop using them.

thearn4 · on Jan 22, 2017

In general, any opinions about where we are on the hype curve for machine learning (as a stand-alone concept separate from traditional statistical inference)?

https://en.wikipedia.org/wiki/Hype_cycle

visarga · on Jan 22, 2017

It's not the first hype cycle, so we've already been through all phases, but now I guess we are still raising towards peak hype. The fact is that this time AI has real world applications and benefits, so it's not just a bunch of hot air. It will not have a crush at this level of hype yet because we now have lots of results that were unimaginable even 5 years ago.

We're breaking human level accuracy in vision, speech, text and behavior - on specific tasks, not in general yet. In the last 3 years neural nets have become creative - now they can create paintings (neural style transfer), images (GANs), sounds (WaveNet), text (seq2seq, translation, image captioning) and gameplay (Atari, AlphaGo).

All these are complex forms of creativity, as opposed to simple classifiers. So we have recent progress, there is no period of lacking results behind us. That's why we're still on the rise.

zwieback · on Jan 23, 2017

Well said, I would agree that we're just to the left of the peak but that applies more to the areas adjacent to the problems where deep learning has shown some real value.

I remember the previous AI peaks in the 80s and 90s as well the neural net and fuzzy logic euphorias. The problem back then was that the results were not that exciting. Now we have some really impressive applications searching vast datasets and recognizing useful features. However, I notice everyone is trying to apply the same algorithms to problems not well suited for that type of approach.

paulddraper · on Jan 22, 2017

AI has had big-money real world applications for far over a decade, e.g. web search.

(Or stock trading, if you believe in that sort of thing.)

ppod · on Jan 22, 2017

If the object is only to improve performance, a black-box machine-learning model should outperform a hand-tuned algorithm. The generalisation (out-of-sample) performance will depend on how representative the training and evaluation sets are, but this is also true for an algorihtm with human-selected features.

bfirsh · on Jan 22, 2017

This reminded me of this explanation of how MacBooks (and presumably other devices) manage to reconnect to Wifi so quickly after waking from sleep: http://cafbit.com/entry/rapid_dhcp_or_how_do

digi_owl · on Jan 22, 2017

And in the process walking all over various security best practices?

And was there not a similar security snafu involving iPhones broadcasting past ssids every time they tried to connect to a access point?

kiliankoe · on Jan 22, 2017

Don't most (mobile) devices broadcast all known SSIDs just in case a network is available and hidden? I don't really know a lot about the subject matter, but as I recall this does not apply to just iPhones.

I also remember an installation at the Datenspuren in Dresden with a monitor showing all of these SSIDs it intercepted with people walking past and being astounded how the device knew their home network name^^

ycmbntrthrwaway · on Jan 22, 2017

Almost all Android devices do the same, they just send probe requests with all known networks. That is why you can automatically connect to "hidden" networks. Just run kismet or wireshark and see for yourself.

BoorishBears · on Jan 22, 2017

And it can be used to uniquely identify devices and track them with fairly standard hardware

epistasis · on Jan 22, 2017

Doesn't a MAC address get broadcast anyway? Unless wifi devices are randomizing their MAC addresses that seems like a fairly trackable thing.

nitrogen · on Jan 22, 2017

IIRC newer iOS devices do randomize the MAC address used for probes.

icebraining · on Jan 22, 2017

Well, this traffic is sent after link establishment, so at least in a WPA network, it's already encrypted (hopefully it doesn't do the same in unencrypted networks). Still, any member of the network could sniff the MACs of the last few APs the device was connected to, which as you said, it's a security/privacy leak.

dorianm · on Jan 22, 2017

    > we develop a machine learning based AP selection strategy that can
    > significantly improve WiFi connection set-up performance, against the
    > conventional strategy purely based on signal strength, by reducing the
    > connection set-up failures from 33% to 3.6% and reducing 80% time costs of the
    > connection set-up processes by more than 10 times.

    > The correlation analysis finds that though the signal strength is important,
    > knowing the AP model and mobile device model has great help to predict the
    > connection set-up time cost.

Neat!

saycheese · on Jan 22, 2017

> [Abstract] "we develop a machine learning based AP selection strategy that can significantly improve WiFi connection set-up performance, against the conventional strategy purely based on signal strength, by reducing the connection set-up failures from 33% to 3.6% and reducing 80% time costs of the connection set-up processes by more than 10 times."

> [Conclusion] "The correlation analysis finds that though the signal strength is important, knowing the AP model and mobile device model has great help to predict the connection set-up time cost. To the best of our knowledge, we are the first to add AP model and mobile device model as features which greatly increases the accuracy to predict the connection set-up time cost."

_____

(NOTE: Please do not use block quote formatting, it's unreadable on mobile.)

emmelaich · on Jan 22, 2017

@HN please add wrappable quoting markup.

mintplant · on Jan 22, 2017

I requested this a while ago and was told that, although they like the idea, adding standardized blockquote markup would somehow break HN's spam filter.

nitrogen · on Jan 22, 2017

It's not uncommon to use italics for quoted paragraphs, optionally prefixed by a single greater than, or surrounded by quotes.

Like this.

> Or this.

ClassyJacket · on Jan 22, 2017

Why are the quotes like that on mobile anyway? It's quite irritating.

digi_owl · on Jan 22, 2017

Because it is wrapped in a pre tag (aka preformated text).

And it is not so much a quote as a way to show code segments.

saycheese · on Jan 22, 2017

Correct, block quotes use a mono-space font by default and the use cases for it were largely for expressing code snippets.

tonto · on Jan 22, 2017

You can add wrap to pre...Maybe for pure formatting's sake it's not a good idea though

thyselius · on Jan 22, 2017

Was hoping for an explanation to: Why doesn't it take a fraction of a second to connect?

hannibalhorn · on Jan 22, 2017

The paper says that the biggest delay is in the "scan" phase, just getting a list of all the available APs in the area. This would be the same problem addressed by Apple's "try to associate to all known SSIDs on powerup" approach.

Maybe I'm missing something, but their actual machine learning model seems to address a different problem:

  The final features we choose to train: the connection
  time cost includes hour of day, RSSI, mobile device
  model, AP model, Encrypted

Given this, I they're basically giving lower priority to known incompatible device & AP pairs based on their BSS IDs.

Not sure I like the approach that much, seems an AP running a recent OpenWRT and very reliable would be penalized for having buggy factory firmware.

andreyf · on Jan 22, 2017

Seems like something simple like "look for the last N and the most common {M0, M1, M2} in the past {week, month, year} before doing a full scan" should hit the vast majority of use cases, no?

ams6110 · on Jan 22, 2017

Apple devices seem to aggressively cache the last known IP address on each wireless network, rather than issue a new DHCP request.

At home this results in duplicate IP addresses when the kid with the iPhone gets home after being away and meanwhile another device has started using that IP address. This tends to bork up the entire network on my cheap Netgear router and I usually have to reset it at that point.

unfamiliar · on Jan 22, 2017

This might explain what I've seen, which is that whenever I go home for the weekend and join my parent's wifi it somehow screws up the internet for everyone else in the house.

paulddraper · on Jan 22, 2017

Naive question...why does that mess with the network? Your router has the correct MAC <-> IP mapping, and iPhone kid is the only one losing out.

ams6110 · on Jan 22, 2017

Not sure but the correlation of Apple device walking in the door and entire home network hanging is pretty consistent.

I presume the router periodically issues "who has 192.168.1.10" or whatever and upon getting responses from two different MAC address just gives up.

lend000 · on Jan 22, 2017

It should be noted that many routers (my Netgear not being an exception) reserve private IP's for saved MAC addresses, which makes this a non-issue. Every device on my LAN has a static private IP (I haven't used my full address space yet, so curious what will happen when I do).

j3097736 · on Jan 22, 2017

I believe most of them also derive IPs from MAC addresses because I always get the same addresses, even with the routers NVRAM set to read-only and different OSes

niklasrde · on Jan 22, 2017

Aren't there TTLs on those leases? Without have read up on it now, I would assume that if the lease is given with a 24h day the device is in its right to assume it can re-use that IP address in the 24 hours to come?

azernik · on Jan 22, 2017

I would be very interested in seeing the correlations between set-up time and total devices in the area (or, to be more precise, total channel utilization). This paper studied devices associated per access, point, which is a separate metric. They mentioned the issue in a paragraph at the end of a section, probably dug up in post-study literature review.

From work in wi-fi router development, high channel utilization is often a much bigger determinant of packet loss than either router overload or RSSI. Hour of day is probably just a good proxy for this.

802.11 is pretty good at handling the shared medium when a single access point can do traffic control, but multiple access points in a crowded city or office building gets you into all kinds of problems. Usually you can get big performance gains in large deployments just from making sure nearby access points are on different channels.

Between the other two major factors the paper looked at:

1) The number of clients a single router can handle is mostly limited by CPU power (and the failure mode there is typically not association request drops, since those are usually processed pretty early in the packet pipeline, without much queuing). So I'm not surprised that they saw very little effect of number of associated clients with connection time.

2) RSSI is more important in low-interference environments, where the ability to hear packets over the noise floor is a big limiter. In dense, high-interference environments, it helps a bit in terms of being able to shout over the noise of very distant interference sources, but for the most part a collision is a collision even with substantial magnitude differences between the colliding packets.

ac29 · on Jan 22, 2017

Since you work in WiFi, I was wondering if you could comment the linked chart of SNR to Modulation rates [0]. Obviously the exactly values are going to depend on hardware to some degree.

[0]https://dl.dropboxusercontent.com/u/8644251/Revolution%20Wi-...

azernik · on Jan 23, 2017

What exactly do you mean by comment? Explanation of what it represents?

Basic overview, with the caveat that I haven't done wifi stuff in a couple of years, and was not doing straight up RF stuff like the hardware folks. It's basically saying that as signal strength above the noise floor (in dB) (ie RSSI - kind of) increases, you can get to higher bit rates. Let me know if you want more info on what exactly is being described.

petra · on Jan 22, 2017

There's a new tech called MulteFire, which is LTE over unlicensed spectrum. Everybody could buy an access point(altough those will be more expensive for than wifi for some time, at the very least).

One of the biggest selling points for it is a simpler faster handoff than WIFI. Is that a big enough problem that people will be willing to buy access points ?

tomsmeding · on Jan 22, 2017

Interesting to see this researched, but I think this isn't really applicable to most people's situation at home, where there's 1-3 access points, and most often you just want to connect to a specific one. Then you don't really need software telling you that connecting will be slow, because you know, but you just want to connect.

Besides, in my experience, having a lot of wifi interference is a huge PITA when connecting or communicating with an AP. Maybe they were not able to include that factor in their dataset for some reason, but I think you'd find a strong (negative) correlation.

icebraining · on Jan 22, 2017

Probably not at home, but that's not when I want to connect faster. I use the Fonera network when I'm in the street, and there's often three or four APs near me, so this could be helpful. Another case is campus wifi, when I was in college. In both of these cases, I want to connect to a specific SSID, but not to a specific AP.

saycheese · on Jan 22, 2017

Here's an MIT Technology Review article covering this reach, "Data Mining Solves the Mystery of Your Slow Wi-Fi Connection":

https://www.technologyreview.com/s/603414/data-mining-solves...

TwoBit · on Jan 22, 2017

That article is crap because it never says why connections are slow. Lots of words and no substance.

saycheese · on Jan 22, 2017

I would ask if you're missing a bit, but that would be like saying something is "crap" and materially misrepresenting its value and substance.

digi_owl · on Jan 22, 2017

Seems to me like a "problem" only in the sense that one wants to leech data connectivity while out and about.

icebraining · on Jan 22, 2017

There are many cases where one wants to use wifi while out and about without "leeching". In my city alone, we have areas with paid networks, tit-for-tat networks (Fonera) and public networks (as in paid by taxes).

macintux · on Jan 22, 2017

Interesting, I'd never heard of Fonera. Trying to find hotspots in my area has simply emphasized to me how dusty my laptop screen is (the closest hotspot seems to be several hours' drive away).

icebraining · on Jan 22, 2017

Fonera is nice in my country (and a few other European ones) because one of our largest ISPs made a deal with them, so every client of theirs is also a Fonera member, and therefore a hotspot I can use.

ctdonath · on Jan 22, 2017

It's a problem I deal with daily as my personal notebook takes 5-10 seconds to connect to my home network, annoying when I expect near instant connectivity to a known network.

jbg_ · on Jan 22, 2017

Could someone with permission to do so please fix the title? It either needs the question mark removed, or to be written as a question (e.g. "Why Does It Take So Long…")

jonsen · on Jan 22, 2017

Only the paper authors can fix their PDF, I guess.

jbg_ · on Jan 22, 2017

Is there some policy on HN that the title of the submission always matches the title of the thing that is linked to? I'm sure I've seen situations where it doesn't.

Spare_account · on Jan 22, 2017

The verbatim title is preferred:

"... please use the original title, unless it is misleading or linkbait."

https://news.ycombinator.com/newsguidelines.html

pbhjpbhj · on Jan 22, 2017

Though it's more like the verbatim title is preferred except when it isn't and there's a good chance an editor will change it to the verbatim title, or something worse, or maybe better and an equally good chance people will complain and it'll be changed [back].

I've always said we should have the verbatim title and an editorialised one and let people choose in config which one[s] they want.

Current system mostly works though.