Hacker News new | past | comments | ask | show | jobs | submit login
Web Browser Telemetry (sizeof.cat)
66 points by freediver 10 months ago | hide | past | favorite | 52 comments




I'm not sure you can derive any useful conclusions from this post. Various browsers invoke various numbers of connections to remote peers upon initial startup.

There is no technical reason that a browser (or any software) couldn't open a single connection, and funnel every piece of data it can extract from your machine to a remote host. That remote host could then distribute that data to any number of other hosts/services without your awareness.

For the purposes of privacy, all connections numbering more than 0 are functionally identical. At a minimum it might be helpful to include some information on the amount of data transmitted over these connections.


> I'm not sure you can derive any useful conclusions from this post.

A striking one is that many browsers continue to claim to be privacy respecting, when in reality they are not even though it is easilly verifiable.

You are correct in saying that from the standpoint of privacy there is either zero telemetry or everything else. There are other drawbacks from having many unnecessary 'phone home' requests, like performance in slow connection environments.

And may I add - telemetry on by default is just not decent, browser is supposed to be my ("user") agent, not somebody else's.


Privacy is not a goal, but just a marketing tool these days.


Another striking conclusion is that even browsers that value privacy need to have some level of understanding about what features their users are using and what they are not.


Nothing wrong with that, it just needs to be opt-in or otherwise you can not claim to be a privacy-respecting browser by default.


That assumes an objective definition of "privacy respecting". I am of the opinion that telemetry can be privacy respecting.


Sure, but only if you opt-into it. If a browser sends your private information like IP address to a third party, without your consent, that is by default opposite of privacy-respecting. (there is no privacy being respected there).


I disagree.

1. I don't think an IP is particularly identifiable information for a browser vendor - the information they have is "this device currently associated with this IP uses our browser", which is not significant.

2. Just because the IP is sent doesn't mean it's collected and stored. They may drop it as soon as the data gets to the server - meaning that the IP may have been transferred but it in no way is analyzed to attribute any information to you.

That is entirely respectful of user privacy.


And I disagree.

- If it's not opt-in, it's not privacy respecting.

- IP is a significantly identifiable piece of information.

- Regardless, we have seen that as few as 4 pieces of individual data collected about a user is enough to identify someone 90% of the time (https://archive.nytimes.com/bits.blogs.nytimes.com/2015/01/2...)

- Browsers collect far more than 4.

User privacy is no longer something you can tiptoe around - it *has* to be informed and opt-in; i.e. if you slipped into someone's bed at night and had sex with them without consent, was it okay?


The point is that we disagree, not that one of us is right. Mozilla isn't using some objective term, they're using one that's highly subjective.

You didn't address the major point I had, which is that they can just drop your IP and not store it. Soooooo, anyway, I'm ignoring the rest of your post because I don't care about the debate, I was just trying to explain that telemetry is privacy respecting.


You didn't address the major point I had, which is that they can just drop your IP and not store it.

The point is that you don’t know that and you cannot guarantee it. You just assume the best case.

As analogy: is sending your bank password a privacy problem? You argue no, because they could just throw it away immediately, instead of going on a Christmas shopping spree. That's an insane argument to make.


> For the purposes of privacy, all connections numbering more than 0 are functionally identical.

Even a browser that launches zero connections upon first startup could start doing so after a random delay and/or while you're connecting to a legitimate website. And if you visit any website from any entity that produces the browser, it can smuggle out data without opening any new connections, e.g. Chrome could backchannel data up to the mothership when you access any Google-owned website. And on desktop, who's to say that the installer itself didn't already send up data, or otherwise install a separate binary that will? And on mobile, Google and Apple already have half the world's data, because they own the OS!

So in effect, this benchmark shows nothing at all. It's a web browser, at some point you're going to access the web with it or else you wouldn't have installed it, so demonstrating that it has network connectivity is not particularly interesting.


Network connection because the user expects it, is fine. If a browser is privacy-protecting it may even proactively block connections related to ads/trackers.

However if a browser claims to be privacy respecting than it has to be zero telemetry by default, otherwise it isn't. If you do not care about it being privacy-respecting, than sure, why not relay information to browser vendor and/or 3rd parties on startup.


Sure, and yet this comment has zero relevance to the content of this post, because this method cannot disprove that any of these browsers have telemetry, nor do any of these connections imply telemetry on their own. Read the source code if you want to prove or disprove the absence of telemetry. And if you don't control the source code or if you can't build reproducibly, you've already lost.


How so? This method (monitoring connections through network proxy) is perfectly viable for detecting outbound connections from a browser. If a browser makes an unwanted connection to the browser vendor and/or third party site, that is unwanted telemetry. Regardless of what the purpose of that request was, user's IP address has been transmitted (hence telemetry from user standpoint) and users's privacy was not respected. Client source code is irrelevant from the standpoint of privacy, as such requests are processed server side and to my knowledge no browser with telemetry has open sourced their server side data processing code.


> I'm not sure you can derive any useful conclusions from this post.

Sometimes a conclusion isn't necessary. Just having that information written down and spelled out is useful in terms of informing people. There are a lot of things we take for granted, and some heinous things that are happening that shouldn't be but are allowed to happen because the consequences don't feel real or tangible enough.


I think it has a point until it becomes a benchmark, because as you said it's easy to only have one connection if you want to.

Before it becomes a benchmark, I think it kind of leaks a measure of the amount of things that are being done on the first run, all possibly exposing yourself in various ways.


>I think it kind of leaks a measure of the amount of things that are being done on the first run, all possibly exposing yourself in various ways.

Not really, because "thing" is a totally arbitrary concept, and can be implemented as "service1.example.com/api/" or "example.com/api/service1". Whether the former or latter gets chosen and with what probability is dependent on organizational factors, so comparing between companies makes little sense.


"239.255.255.250 on UDP port 1900" is listed for several browsers. This is not telemetry, this is a UPnP discovery broadcast. Not a "connection", either.


Seems like every new update, Edge gets more enabled-by-default 'telemetry' options. Here's one of the latest:

>Include related matches in Find on page

>When enabled, your Find on page search and the webpage contents will be sent to Microsoft to help find better results, including synonyms, alternate spellings, and answers to questions


The addition of new options that are default-on is an incredibly user-hostile pattern, and it is depressingly common:

- Modern Windows versions introducing new telemetry options (to say nothing of existing disabled options being toggled back on...)

- LinkedIn with with new notification types that are turned on by default, with no option to disable all notifications entirely (there are now so many notifications that the notifications settings page has a nested page)

- My former University's newsletters mailing list, which keeps adding new types of newsletter that I am default subscribed to, despite previously unsubscribing from every other newsletter (I finally just created a mail rule to block the entire subdomain.domain.edu).

And the list goes on...


Yeah, it's incredibly invasive. US desperately needs privacy laws governing these mass violations of privacy.


> ocsp.pki.goog

You know your org has too much money and/or power if you don't even bother with domains anymore but just request another custom TLD for your infrastructure hosts.


The metric of “hostnames contacted” isn’t a valuable pressure point: each browser can simply replace all hostnames with one, at which point they’ll look better than their competitors without having made any material changes whatsoever in behavior.

The correct metric here is “number of requests initiated”, but that’s harder to collect than copy-pasting logs from LittleSnitch and gesturing theatrically at them as this post is doing.


>gesturing theatrically

Agreed that this post isn't particularly illuminating, but where is the theatrics?

It's just a list of connections, and the author even refuses to put a conclusion.


They claim to refuse to draw any conclusion, but then in contradiction to that statement, state their personal belief that any connection number greater than zero is harmful elsewhere in a later paragraph; and present the appearance of taking the high ground by not making claims, in contrast to journalists that do.

This document as presented is only meaningful to two audiences: those who already agree with the author’s viewpoint on connections made by browsers when no user action has occurred, and those who have already formed viewpoints on the concerns around browsers and outbound connections.

This is theatrics. The author clearly has opinions, and has crafted it to be attractive and interesting solely to those who have already formed an opinion on these issues — as is evident from the context that leads to such a post existing at all, much less reaching the HN front page.

A less theatrical post could have stated why counting these connections matters to anyone, and plainly stated the author’s view that browsers with a count of zero are preferable.

Instead, they buried their opinion in a later paragraph, declared it absent when it’s present, and left things vague enough that any supporter of their viewpoint can argue to any detractor of their viewpoint that they never stated an opinion at all. Instead of supporting productive and nuanced conversation, they fan the flames of belief with data structured to promote their view, while attempting (and failing) to claim the neutral high ground.

A useful datapoint for evaluating their viewpoint here would have been, “Are all of the browser’s outbound connections first-party to the browser’s author and/or the site navigated?”. For many mobile browsers embedded in mobile apps, is wholly untrue: they monitor and report on your page views to themselves, which is a gross violation of privacy. Another would have been, “Do any connections occur before site navigation?”, to which a simple Yes or No suffices; either it’s Yes, or it’s No, as for tracking purposes it doesn’t matter whether it’s 1, 2, or many — only if it’s zero or non-zero. Those two questions would lead to productive discussion and debate, in a way that lists of hostnames with a statement of neutrality does not.


you could just tunnel everything over a single long lived request


Purple scrollbar on purple background... And this is why website authors should not be allowed to style browser controls.


That's magenta on purple with decent contrast. To my eye that's vastly more usable than those San Francisco Thin fonts.


I had to install Custom Scrollbars for this reason.


Thanks! This info is always nice to have when setting up firewalls. Every application should come with a list of connections made, what the purpose is and what the consequences are if you disallow it.


I guess windows does ask if you want to let a program connect to the internet. Of course that pop-up occurs so often that everyone just hits “go for it,” right?


Why's Brave try to talk to ftx.com? Website's defunct anyway.


I believe it's from the new page widgets: https://community.brave.com/t/what-is-ftx-and-why-did-it-sud...


The blog post is from 2021


I'm guessing it's leftovers from their whole "reimburse websites via crypto" strategy.


I couldn't even get their webpage to load until I turned off scripts and unblocked check.ddos-guard.net.

That said, it is hard to understand how much stuff is going on in the background unless you have good browser extensions and proxy or outgoing firewall.

I use firefox and just block mozilla.[net,com,org] firefox.* etc.


I agree with everyone else that the number of hosts reached at startup, by itself, it absolutely useless. There are better articles, personally I like https://digdeeper.neocities.org/articles/browsers which goes into greater detail, examining the purpose of the connections, not only at startup. It does tend to label even some requests (such as updates checking) as spyware, but everything is explained so you can still come to your own conclusions.


It would be interesting to go over each connection and figure out what it really does. E.g. "contile" from Firefox: "The goal of this service is to pass tiles from partners along to Firefox for display while ensuring customer privacy and choice". "detectnetwork": tests whether the network connection requires you to log in or accept the network's terms before permitting you to use the network.


From the article, number of requests on start:

Mullvad, Orion, Tor, Ungoogled Chromium - 0 (zero telemetry)

LibreWolf - 3

Safari - 6

Brave - 7

Chrome - 9

Chromium - 12

Vivaldi - 13

Firefox, Yandex - 15

Edge, Opera - 21

Arc - 44


For an overview on Android there's this great website: https://divestos.org/pages/browsers

I'm using Cromite at the moment.


An issue I've found with Cromite is that it doesn't pass the magic "are you a bot?" tests that various websites perform.

For example, I can't log into the Linode console using Cromite, but I can with Chrome.

Has anyone encountered anything similar or know how to fix it? I'd much rather not be using Chrome.


Did you complain to Linode to fix it?


Privacy aside, Arc seems to be doing too much work.


Tor browser connects to the Tor network - which isn't telemetry but the post said not everything listed is telemetry.


The post is about unsolicited requests at startup, Tor Browser does not connect to Tor unless you click Connect or enable the auto-connect option.


> # Brave

> Version: 1.33.106

> […]

> - ftx.com on TCP port 443

Hopefully that one is no longer present…


Website blocked due to riskware from Malwarebytes


This is heresy! Lynx isn't listed!


DDOS Guard is blocking my Safari





Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: