There's already a comment here, and I've also personally experienced just from talking with others offline, that a lot of people don't even know VSCode had telemetry, despite it mentioned several times in the product itself and its documentation. They (and remember, these people are supposedly developers) either ignored it, or just didn't read the docs.
I guess a big WARNING banner might scare users away, but it's still a bit disturbing to see such a lax attitude towards tools which developers use to work with a software company's most valuable assets. A lot of people, developers included, don't really read EULAs, and it's the same reason (traditional) spyware could thrive: no doubt they all specify in their license agreements the fact that they all collect information, but approximately no one reads those.
Put it another way, would you want your compiler or other parts of your toolchain sending information about all the source files it processed? I wouldn't consider myself particularly paranoid when it comes to security, maybe even looser than the average on HN, and even then I wouldn't use such tools. I wonder how many companies have already banned their use...
It may be disturbing, but it is entirely understandable that no one reads the EULA. If I read all the terms and policies for every piece of software and site I use, I would be doing that full time.
Companies have basically DoS attacked the public by attaching pages of legalese to anything and everything.
Telemetry is all about how it’s presented to the user. It should simply be presented at install/first start and users should be shown the checkbox (it shouldn’t be hidden away in settings) and asked nicely to leave it on in order to improve the product. Checked by default is a hot topic but I think it is OK so long as it’s presented clearly to the user.
I mean obviously I wouldn’t like any of the information I handle to be sent somewhere, but I also don’t mind statistics about that info (number of files, file sizes, feature use count) to be sent to Microsoft. If I found out that file contents was transmitted then obviously I’d be outraged - but I’m rational and assume Microsoft is too.
>I mean obviously I wouldn’t like any of the information I handle to be sent somewhere, but I also don’t mind statistics about that info (number of files, file sizes, feature use count) to be sent to Microsoft. If I found out that file contents was transmitted then obviously I’d be outraged - but I’m rational and assume Microsoft is too.
Exactly, telemetry on a broad level is important to find out if your software has problems. The problem is what kind of telemetry you have.
Precisely! Albeit I am a bit surprised our infosec team seemingly haven't cracked down or made a stance just yet on Kite and other telemetry software for developers. That day will come soon I am sure :)
Sure. I think all the info you need is in this screenshot [0].
According to "Matt" [1], It scans the folders related to Chrome. Please do not worry as your data wouldn't be effected by the clean up tool. .. I think reality speaks for itself. My Steam Library just isn't related to Chrome.
> They (and remember, these people are supposedly developers) either ignored it, or just didn't read the docs.
Or they forgot and their stance on it changed. I vaguely remembered (or thought I did) having seen it. Just checked my settings and it's disabled. I've gotten way more privacy minded in the last 24 months, so I figured I might have said "yeah, you can collect data" back when I first installed it, but now I'd prefer they didn't.
A middle way would be to periodically remind people, "hey, you've consented to sending anonymous usage data to us so we can make better decisions. I just wanted to confirm that you're still cool with this" once a year or something, because at least for me, "does this app collect telemetry data?" is not something I constantly have in my working memory.
> A middle way would be to periodically remind people ...
Almost nobody will do this because, if they do, they know people will turn it off. It's the same reason commonly given for making tracking (or "telemetry", as they call it) opt-out and on by default, instead of requiring users to opt-in -- "if we have it off by default and ask users to enable it, pretty much nobody will".
They may very well be correct but, to me, that doesn't make it okay.
Yeah, I believe you're right. It might be a matter of framing. If you show me the value of collecting telemetry data, what good decisions you've made because of it, how you got rid of that annoying thing plenty of users complained about because you saw that it affected a large group of users and wasn't just a loud dozen etc, you might sell it to me.
Most data collection for "analytics" is just about you as the product owner optimizing the product for whatever goal you want to, engagement, sales, ad clicks, retention or what have you. I as a user that provides the data gain nothing from that and, being the target of those manipulations, I might actually lose (by spending more time on your site than I wanted to etc), but if you can show that I give data = I get better product, I don't think I'd be that hard to convince.
I don't know if VS Code does something in that regard (if they do, they don't communicate it effectively enough to reach me), but I do like their approach in general - I once read through an issue where they discussed changing some component for a more advanced one and pretty much everybody agreed that the newer one was better, but they still had a few words on "how can we measure that? how can we make sure that it's not just us devs that work on this thing all day and it won't be the same for the people using it in their daily life on all kinds of experience levels". I liked that a lot (so much so that I still remember it months after reading), but apparently not quite enough to enable telemetry data ;)
I have it enabled on my personal computer, but disabled on my work computer. I like them improving their product, so I want to support that, but at work it's a bit different.
Off topic but from a podcast I was listening to recently: your most valuable asset is your team, not your code. This is evidenced by exits - without the team the codebase ain’t worth much.
1. One less setting to configure, lest the Default monsters gets you
2. VSCodium is just a FLOSS VSCode binary. You could build it yourself from the available VSCode source. However, the VSCode binary is not FLOSS so you cannot be sure what it is running.
It's not like VS code is the next PRISM-- I'm sure MS has better ways to spy on users ;). The real pull is whether you prefer FLOSS by default.
> However, the VSCode binary is not FLOSS so you cannot be sure what it is running.
That's equally true of the VSCodium binary. It's not a reproducible build, I have no way of knowing from which source code the binary was generated.
Of course I could build from source, but VSCodium is just VSCode built from source with a build flag set. So in this regard it's not contributing anything notable (and doesn't claim so either).
How do we know the build of VSCodium is what they say it is? Aren't we in the exact same situation as with using the MS version, except we know even LESS about the maintainers?
It's kind of funny because now you're trusting some third party not to inject unwanted code when you just blindly download binary on every release anyway.
Seems to just bring another problem by trying to solve one.
Very few things, if any, REQUIRE an installer on Windows; it's just common (and annoying) that most software is distributed this way for that platform.
I for one prefer that the developer of an application takes responsibility for its installation. That way you can blame them if it does not work. On Linux distros many packages are close to uselesss because they are outdated or have been modified in weird ways incompatible with tutorials and documentation. This is especially true for all scientific software.
I prefer that all my software is installed and updated in one fell swoop, with one single package manager, instead of requiring myriad installers, uninstallers, and update daemons that run in the background for practically each and every program, just sitting there wasting CPU cycles.
If you dislike outdated software, try switching to a Testing or Unstable version of your operating system, or choosing a distro that packages less conservatively (Arch or Fedora, for instance).
Distribution maintained packages are a really bad idea. You start to realize this once you are not in control of the box you are working on (have no root access). Suddenly you are faced with CentOS 6.5 because some proprietary piece of software depends on a specific gcc version / libc version. Or you can't upgrade the cluster easily because parts of the legacy code depends on specific package versions (boost, gcc). So people invent Spack and before that modules just so that they don't have to depend on the specific package versions they are stuck with. macOS and windows got it right, just distribute all dependencies that you need and don't make any assumptions on the underlying system. Everything else sucks and Linux is a prime example of that.
In short: Once you've worked with a large enough number of nighmarish Linux installations, you treat them as adversarial systems and wish you could install software just by clicking through a few screens.
I understand that you feel powerless and miss your own home PC where you are free to do what you like. However, the same could be true for software where you install by clicking if someone locked it so that you don't have administrator access and then neglected the system.
I don't know about macOS, but doesn't most software on Linux require installation? I think the most common ways to distribute software are (1) as a package for your distribution's package manager (which the package manager installs, typically requiring superuser privileges), (2) as source code that you make & install, (3) as prebuilt binaries that you are meant to unpack and install. "Unpack and run from where they are" binaries do of course exist but aren't they less common than those 3 other types?
Installation yes; installer no. To install VSCode on Debian, you would add the apt repository to your apt configuration, run apt update && apt upgrade, then apt install ______. On Windows, most software comes in the form of an installer instead of being installed with a system package manager.
The difference is whether you have to click through one of those wizard things to get the program you want.
Not for me. I remember disabling it after first install, now I checked (panicking after reading your comment) and it's still disabled even after many updates in between. Ubuntu.
It clearly states in the article that though VSC is open source and you can build it from source, there are opaque components that are part of the VSC installer. The article claims that the installer installs telemetry systems, though they can be opted out of. However, the project owners do not trust Microsoft to honor the 'opt-out' part if not now then in the future.
Why would trust them? honest question. VSCode is more transparent as any other open source project it can be, yet people still have that Microsoft stigma for years ago. To me is just blatant paranoia.
I would not trust an installer from a third party without knowing what was really changed, that's scary as hell if you ask me. Just look at the bootstrap 4 backdoor that was introduced but luckily was caught.
Ah, that's because you are on an untrusted platform in the first place. There is no "installer" on ubuntu, so it's actually easy to verify if the package we got really came from the sources we see in github. But vscode packages are explicitly different from the code by those proprietary additions during build. We have to trust them that they only add telemetry. One very important feature of open-source is that you don't have to trust. Adding this proprietary blackbox completely removes that factor. Not to mention, I really don't want telemetry on my code and this wouldn't be the first time Microsoft let you disable something only to be later revealed that it was still sending.
That's mighty dissonant and hypocritical of them to use and benefit from Microsoft's work and investment, and claim that their telemetry is the part they don't trust.
VSCode is OSS right? So microsoft’s Work and Investment was specifically designed to be reused without any direct financial compensation.
I don’t quite understand your point. Do you think that it’s hypocrtical to use and benefit from OSS work and investment? It seems important to respect the creator’s intent and if software is released under an OSS license it’s not dissonant or hypocritical or bad to reuse, or even make money under many licenses. It’s a feature, not a bug.
In the 90s there used to be these companies that sold “internet in a box” in waldenbooks and other stores in the US. It was about $70 but it was all just OSS stuff- trumpet winsock, Eudora, mosaic, etc. I thought it was really crazy because it’s all available for free. One day I was in line behind an old guy who was buying it and he was so happy because it was conveniently packaging everything together and he had no idea how to bootstrap all these tools onto his PC. OSS is designed to allow this.
This is a clear case of people exploiting and perpetrating FUD (fear, uncertainty and doubt), riding on Microsoft's historic reputation as not being trustworthy, and turning it to their private benefit.
If you have trust issues with Microsoft, you shouldn't be using software authored by them, as software can have backdoors and security issues hiding in plain sight (heart bleed bug, for instance).
Atom has/had the exact same kind of telemetry, but hasn't attracted this type of hysteria, because GitHub wasn't Microsoft. This is all plain and simple dogma.
I think this is why OSS helps to reduce fud. Clearly showing the source that produces telemetry and providing a district that doesn’t use it, and can clearly be built for source is the best way, I think, to reduce fud. MS gets more than others because of their history and size.
You do realize VSCode's built on top of things like Electron, Node, V8 etc. It's a lot of work done by other people to enable MS to build VSCode in this way.
So why is it problematic now for other people to build on top of what Microsoft added?
I hope you see that writing millions of lines of original source code on top of (Election, node, v8,..) and investing tens of engineers at millions per year in cost, to create VSCode is not equivalent to disabling telemetry and slapping a non Microsoft label on it.
There's absolutely zero problems with people using OSS as intended. But for me personally, as I stated in the original parent comment, value obtained by persisting with Microsoft's VSCode, despite telemetry, is worth continuing to use it, as against an entity's fork whose USP is furthering unfounded FUD.
They're not trying to hide/deny it's Microsoft's VSCode. It's called VSCodium, inspired by Google Chrome/Chromium, after all. As for it being FUD, being concerned over OPT-OUT telemetry is for many not FUD, but to each their own.
The Americans with Disabilities Act, which can be used to sue developers of inaccessible websites, has made it clear that in some cases serving the majority can be a bad thing.
Less severely, there's an awful lot of long tail business productivity served by obscure software features that is very difficult to satisfy with modern hyper-engagement-optimized tools.
I'm pretty sure there are false dichotomies going on here.
Software products may be developed for a mass audience or they may be developed for a narrow niche.
In either case, making the product accessible to people with disabilities is something developers should try to do.
And in every case, having data on user behaviour, software performance, bugs, crashes, etc, will enable the developer to do a better job of catering to their users' needs.
Have I missed something about how these objectives must be mutually exclusive?
> I'm sorry but "less effectively" and "greater expense" is just your opinion
No need to be sorry, it is pretty obvious that it is my opinion, since I made no attempt to support the statement, that is all it could be. That said, it would have been more polite to ask me why I believe what I wrote rather than being dismissive.
> "less effectively" ... "greater expense"
I feel that in order for me to provide satisfactory support of my claim to you, we'll have to first agree upon a strict definition of these terms. You're right to call them out in quotes as my use of them was intended to be qualitative and informal. How about more "effective" development being development that is more focused on serving the needs of its users & the goals of its developers, and "expense" include direct monetary cost, manpower, and any other resources whose use incurs an opportunity cost?
> There is NO telemetry in Linux (the kernel) and many other great software.
1. As I mentioned above, my comment was informal. It was not intended to make a strong claim about all software, without exception.
2. Unless a majority of pre-internet software was developed as effectively and cheaply as Linux and the other software you were thinking of, it is possible that my claim is still correct in the general case.
3. Software used primarily by those who actively contribute to it (such as Linux during its early development) has a very different communication dynamic from other software.
4. Linux is not representative of software developed pre-internet, considering the project was first announced by Linus in the comp.os.minix newsgroup in 1992[1]
* To be clear, any additional claims I have made above are _also_ my opinion.
It doesn't works, you can search the proof in vscodium issues(keyword: little snitch). It's the main reason why regex are used in CI pipelines to clean MS code from telemetry servers.
Telemetry not being disabled
I have disabled telemetry as described in the FAQ. I have set the following properties in the settings:
[..]
Now despite of this, when I log my network traffic (with Wireshark) I can see that Visual Studio Code periodically contacts vortex.data.microsoft.com.
For the record,
Even with:
[..]
I still see connection attempts by Visual Studio Code to marketplace.visualstudio.com and vortex.data.microsoft.com at startup.
Response is:
I'm pretty sure we [VS Code Core] are doing the right thing here and we would love any pointers (all code is OSS).
Which almost sounds like a complete deflection of the issue. The ticket was locked.
I just read through that issue, I see nothing wrong with the responses. First was an acknowledgement that when opted out they were still sending an unnecessary "user opted out" telemetry event and that it would be removed. Then indicating that subsequent network connections were for update checking, and extension update checking, and finally an explanation that while they can disable telemetry in the app, they can't control any telemetry that extensions may be doing, and promised that they're working hard to ensure any extensions created by their team and other teams in general do the right thing, but as they don't control that code there's not a whole lot they can do.
I suppose it's not clear why vortex.data.microsoft.com is still on the list of servers being contacted at this point, but it seems quite plausible that this is coming from an extension.
I used past tense, and the parent comment I replied to writes "just disable the telemetry", linking to the same FAQ the OP of the linked issue tried to follow to disable telemetry.
The VSCodium docs (linked in an above comment) mentions baked-in telemetry. Presumable this refers to the binaries rather than the source code, but then the issue is two-fold, being telemetry in the source and telemetry Microsoft may insert into the binaries. So following some FAQ to disable telemetry will not address the latter if you're not building from source.
> and telemetry Microsoft may insert into the binaries.
This is the main problem as I see it and MS hasn't been transparent in question on what can actually be added to the binaries which bring us to the necessity of fork such as VSCodium.
How much data does it send before giving you the opportunity to opt-out? I'm assuming a lot more than any privacy respecting application sends. If they were serious, it would be opt in.
I suspect the tracking in VSCode is mainly used to improve the product. It's probably in my best interest, and the interest of the community as a whole to leave tracking on. I mean, I get it, HN is usually a more skeptical and security-focused crowd. At the same time, it's likely MS will just take that information and tailor their bugfixes and features to the things I need most, so by all means I want them to have it.
I think you’re right, but it needs to be a user choice. If presented with an option, I would likely leave it turned on 90% of the time.
I hope this project motivates MS to open source the runtime and making tracking a user- selected option.
I really like Code and think MS has really helped the dev community by making it so great and free. But I would like to see them embrace f/l/oss for all their non-core products and stay on track for customer/dev-friendliness.
Tracking should always be opt-in, not opt-out. I base this on expected user preference and trying to optimize for user happiness and functionality. I don’t think this is as true for non-OSS software where the purpose is likely more toward revenue generation than community and user functionality.
First, please let's draw a distinction between different kinds of tracking. We're smart enough to appreciate the naunces here. Some information is more valuable than others.
* How many times you clicked Edit -> Paste vs Ctrl/Cmd + V. This is not personal information. It cannot be used in any way to identify you.
* Your location/medical information. Guard this with your life if you must. Share it with literally no one. If it's uploaded somewhere, delete it asap.
All our information falls on a spectrum between these two extremes. Let's please acknowledge that not all information is super sensitive, that it's ok to share information like the paste example.
Second, this information can be useful, and can change the direction of the products we're building. For example, the Office team wanted to re-design the old menu bar interface to surface the features people were using the most. You know what people used the menu bar for the most? EDIT -> PASTE. You'd think everyone knows about Ctrl + V, but apparently not. The vast majority of the world used to click Edit, then click Paste. Knowing that users were doing this allowed the interface to be moulded to suit the needs of the silent, non shortcut using majority instead of HN power users - https://imgur.com/a/WLm4UJd.
Third, if you acknowledge that this information is useful, then it follows that it's only useful when it's opt-out. If it's opt-in and only 0.01% of your users opt-in, you can't make any reasonable conclusion from the data because it wouldn't be representative. If you tried to convince folks that no one uses keyboard shortcuts to paste based on this tiny data set, they'd laugh you out of the room. Collecting opt-in data in this case is almost useless.
Fourth, products that collect this info become better. If you're ok with webapps collecting this but not native apps (like VS Code), then the consequence might be that web apps end up being much better than native apps. Whether you prefer that or not is moot, but personally I like having options. Let's not force native app makers to stop collecting non-identifying information when the downside is minimal/non-existent.
With machine learning they might only need to know your pasting habits and what function names you use most commonly, maybe a bit of information about how fast your type and this will be enough information to identify you based on your behaviour.
Search engines are recording really detailed information as you type, as you browse other websites using their invasive trojan-like analytics. With this data you can probably identify anonymous users from their behaviour (even if the ipv4 address is "encrypted" hah). All it takes is for data from one company to end up at another company and I believe this happens very often.
I think the problem is that you need less and less information to personally identify someone as you have more data. We are not intelligent enough to know what data is identifying or not so therefore the only option we have is to stop giving out our data.
Tell me, who exactly is running ML on this data? They're going to great lengths to anonymize this data. If they wanted identifying information, they'd just upload your unixname, they wouldn't anonymize and then de-anonymize with ML.
I don't even know how you conflated the metrics on menu clicks with uploading paste data. That's ridiculous. Did you even notice you made that leap?
If the menu click data is detailed enough then it may already be personally identifying information. You could take tracking data from another software that collects similar information and using ML match these together and suddently you can get a full name from just a bunch of seemingly innocent data (if the latter software is associated with a name of course).
I don't know what data they collect now or what they will collect later so I am just speculating. But I am sure the ToS has a section about how they can change the terms however they wish.
Please tell me about the great lengths they go to anonymize this data because I believe it to be very difficult to do, and absolutely not in their interest.
Edit: With pasting habits I meant if you paste with shortcuts or go through the menu, not the actual paste content. Sorry for the confusion.
> Fourth, products that collect this info become better. If you're ok with webapps collecting this but not native apps (like VS Code), then the consequence might be that web apps end up being much better than native apps.
So what? VScode can also be used as web app already, it's not about what others do or don't with their online services. The question is about the expectation and respect to the end user, having telemetry on by default is disingenuous to say the least.
In theory, it kind of is: as far as I’m aware, which I haven’t delved deeply into it so grain of salt time, but it doesn’t send data until you make that decision. And it isn’t like you have to dig for it: it’s presented to you explicitly. Yeah, it asks if you wish to opt out of “making the product better” or whatever the verbiage, but I don’t think it is going to confuse many. Especially within the target user demographic.
That is exactly right. Almost no one would turn it on if it were off by defauly, which makes data collected by those few users strongly biased against general users and their usage patterns. (Meaning that the data you do collect will be useless.)
This is literally the only reason that telemetry is almost always on by default. There is no illuminati secretly buying developers to learn your secrets via application telemetry, and it's laughable to think that's the case, in my mind.
If people turn it off when they have a choice, and only leave it on when they are unaware it exists then obviously there is a problem. I don't see how it is laughable to be paranoid in this case.
Opt out is when the tracking is on by default and the user has to take action to stop it. Opt in is off by default and action must be taken to enable it. What happens before the popup decides which one it is. If the user closes or ignores the message about tracking there should be none and there should have been none before the popup.
You can't have your cake and eat it too. Telemetry in this case helps in improvement of the product, it's good for you as end user. If you don't like, please switch to another editor. You paid zero but setting requirements how it should and how it shouldn't. Good luck with utter ugly but no tracking and "free" GNU Emacs with its ancient Lisp and garbage packages.
When I was using azure data studio the opt out was to visit a link, read the wiki page, open the settings.json file using your file manager and then paste in the tracking opt out key and restart the program. Or you could just close the message and be tracked. This is absolutely unacceptable opting out of tracking should be made exactly the same effort or less effort than opting in.
Wait a minute, there is tracking involved in a code editor? No, it's not in anyone's best interest. It can't be. Any info collected with tracking could also be used for other, malign purposes, covertly and illegally. I understand your perspective, I wish it was true as a fact. But it isn't. What I've learned in the recent years - don't trust anything that tracks and collects data unless it's your own thing.
There's a pretty wide area between a skeptic and a conspiracy theorist, but this kind of mentality certainly leans toward the latter. Logic dictates it's vastly more likely the data collected is to improve the editor, than a multi-national multi-billion dollar company collecting data to use it illegally.
No, it's an issue of trust. I wasn't implying that Microsoft does such things. Such things could happen without them even knowing about it. With security in mind I just can't take the risk myself or expose my clients to such risks. VSCode is obviously a consumer product, not an enterprise solution, so I advise my clients to use more trustworthy software, without built-in tracking capability.
Great-grandparent OP was most accurate in that this crowd cares most about privacy regardless of what it is used for, and that it would have little utility outside of this crowd, but your dismissive response is as if you've been in a coma for 10 years.
As grandparent OP said, in recent years this has moved waaaaay beyond theory territory and been shown time and time again that <Corporate Sector> + NSA + FBI + CIA + intelligence agencies around the world all employ different ways of collecting analytical data broadcast over the internet.
NSA just taps the servers without telling anyone.
FBI sends National Security Letters containing gag orders preventing companies from telling you that the federal government is now a data sharing partner.
CIA just pays companies for it.
FISA court issues secret rulings justifying the legality of it all.
Whether that bothers you or not is up to you. Most people don't care. I usually don't. I wouldn't say "logic dictates this won't happen" when it is pretty much only the multi-national multi-billion dollar companies subject to this kind of tampering, and most incentivized to monetize the analytics by allowing these and unknown third parties in.
To avoid the security exhaustion, some people would simply prefer their text editor not be "smart", which is a euphemism for internet connected.
One of the more flabbergasting comments I've seen is when Mozilla removed ALSA support from Firefox because nobody stepped up to maintain it, and the metrics showed that it wasn't widely used. There were people complaining about it being removed, and stating that the people who did use it were more likely to have turned off tracking, and thus did not show up.
I mean, sure, it might be the case that there were indeed tens of thousands of ALSA users with tracking turned off, but... From my perspective, it seems more likely that it was just a handful, and really there's no way to tell the difference. If you turn off telemetry, and be aware of and accept the downsides.
I trust Mozilla, so my telemetry is on, but for many other applications, I often opt to turn off tracking - with the understanding that it's harder for them to tell what my needs are.
No, you don't need to do that, but if you don't, you should realise that that results in biased data, and thus influences the focus of development. If you have an alternative other than "the developers should just magically guess what would make users happy", then I'd be happy to hear it, but otherwise, that's just the way the world works, and thus that's the trade-off you will have to make.
It maybe starts with simple statistics. But then you want to know what features the user use, then you want to know what other programs they have installed. Then you want to know what the users search for on the web. etc. It's a slippery slope.
Why would Microsoft gathering usage data about VSCode turn into spying on the user's other apps and web searches? There's no connection between the two. Tracking usage of VSCode features has a clear connection towards improving VSCode. Spying on the user's activity outside of VSCode has no connection at all to improving VSCode.
Every time I hear the phrase "it's a slippery slope" uttered by someone arguing against something, I am immediately suspicious of the argument that person is making.
There really isn't such a thing, in the way you've used that phrase.
Capturing telemetry on how I use a tool from within that tool is perfectly fine, to me. Collecting telemetry on my search history in the browser by that same tool isn't. THERE ARE NO INTERIM STEPS that makes the second of those ok. There is no slope. If there is, it isn't slippery. There is a series of discreet decisions and at some point (which is different for everyone) a line is crossed. There was no slope or slip that brought you there, only a series of mostly unrelated decisions.
To think that Microsoft's long-term goal is to install a keystroke logger via a multi-decade and multi-phase plan that begins with application usage telemetry in a free developer tool thanks to "a slippery slope" is just simply not realistic.
When you walk in the wrong direction the final step off the cliff is the last one. Better to get off of the slope because choices get fuzzier the closer you get to the sun.
It's not a fallacy. When a printer driver reporting stats back to the manufacturer made national news in the early 2000s and prompted calls for laws regulating privacy, the argument was that it was a slippery slope fallacy to assert that privacy violations would get worse. Look at where we are now.
By itself no, but it is often used fallaciously. Such as in this case, when someone is opposing some good thing on the basis that that good thing might, some day, lead to bad things.
Not realizing that many things actually ARE a slippery slope is what has gotten the world into a lot of messes. Nearly every legitimate privacy concern that we have today started out with a legitimate and we'll meaning purpose.
Our entire legal system is predicated on common law precedent. So it is very valid in many cases to argue that allowing something good now, might set us up for something very bad later.
What you’ve described there is exactly how bad things come to be.
You think people elect for bad things? Bad governments, bad software or privacy violations? They chose things that are full of rhetoric and promises of good things then those bad things get snuck in off the back off relaxed regulation, or existing software adoption, etc.
Take Facebook as an example, people didn’t sign up to it thinking “I wanted to be tracked around the internet so I can have personalised adverts” nor dis Zuckerburg think “Wouldn’t it be good to create a platform that could latter be used for rigging elections”. No, instead we got there because of a serious of good ideas that slowly got abused.
There is a saying that goes “The path to hell is paved by good intentions.” I’m not a religious man but I think that beautifully illustrates how slippery slopes are not a logical fallacy.
No, good things such as telemetry and automated error reporting so that bugs can be fixed effectively and efficiently and everyone is better off for it.
That's true. However, we also don't collect history for the fun of it.
Over the course of the last 20 years, we've seen that once a data collection and digital surveillance framework is put in place, the surveillance tends to expand.
Slippery slope arguments, sans good reasoning, tend to be fallacies. However, don't fall into the trap of thinking that an argument backed by historical record is a slippery slope just because it's predicting an outcome. We might call that the "history is all slippery slopes" fallacy. Stating "this has happened before multiple times before, and each time has lead to x" is a very different argument to stating "this has happened, so the logical extrapolation is x".
Exactly, which is why I'd be more interested in concrete commonalities of slippery slopes that were actually realized, and not just a general "history has plenty of slippery slopes".
The fallacy is to assume, without evidence, that the slope is slippery. There are plenty of slopes that aren't. Probably most, just you don't think about those because you know they aren't slippery already.
For example, I slept in 'til 8:30 today. OMG, a slippery slope. Next thing you know, I'll be sleeping until 3PM. Til midnight! I will never wake up again. But as it happens, sleeping in isn't a slippery slope. I don't think there's any solid evidence that telemetry is either.
Maybe not understanding the problem with a tool that may be used to work on proprietary code containing trade secret information silently and unexpectedly sending information out to the Internet is the reason for software becoming spyware...
Starting with "Lol, probably just an oversight" and ending with an emoticon is excessively dismissive and flippant... that's how you lose customers, or potential ones.
You store trade secrets in your configuration file? I still don't understand what the big issue is. Can you clarify that instead of critizising my lack of knowledge?
It's a shame we have to assume the worst. What would be better is for Microsoft to be more transparent about the telemetry and to enable more granular control. That said, I wouldn't bother checking as I don't really care. I rub shoulders with infosec issues daily, as most IT folk do these days. When I think about the perceived risk of telemetry from VSCode, now and in the future it's a negligible risk that I accept.
They are actually very transparent about the telemetry they collect, and they offer granular control. There is a log of all events sent to MS, and a page in settings dedicated to the different types of telemetry they can be enabled.
Where is the log stored? All I can find online are some instructions that seem out of date as they don't refer to the interface I see (or I don't understand them): "You can inspect telemetry events in the Output panel by setting the log level to Trace using Developer: Set Log Level from the Command Palette." [1]
The "page" in settings consists of just two options, and the complete descriptions of the types of information they collect are "crash reports" and "usage data and errors". That seems the opposite of transparent and granular. Am I missing something?
Windows has an overarching telemetry collection system and VSCode might use that on Windows.
There is an application in the Microsoft store you can install that lets you view all telemetry collected by Microsoft, and the actual data is encrypted. The metadata (which tells you WHAT is being collected) is not. Knowing the people I have worked with in the past, this is to securely prevent modification by users before the data is actually sent. I've worked with lots of people who would modify that data to attempt to get a feature added that they wanted or just to screw with MS.
That tool also gives you the option to delete all telemetry sent from that machine in Microsoft's possession.
The command pallette is what you see when you press CTRL+SHIFT+P, and typing 'set log level' in that dialog will bring up a search result you can click to set the log level' (the level of stuff that gets shown in the Output pane).
Setting that to "Trace" will show the telemetry being sent in the Output pane of the interface, amongst a bunch of other stuff, I am sure.
One could be totally uncritical about the current or future intentions of Microsoft and still prefer a version with no telemetry. Data breaches are very common. Employees, hackers, or governments could all gain access to the data.
The data could be accidentally broadcast or left in a vulnerable place. Even the payroll data for the national security establishment was once reported to be compromised. Everything is vulnerable. Computer science is in such an abysmal state.
Even with properly configured servers, OSes, and databases that are up to date, they are still vulnerable to zero-day attacks because they are not formally verified and have enormous and largely unnecessary complexity. Then throw in the crazy complexity of processor instruction sets, creative side-channel attacks, and stuff which exploits the physical properties of the hardware (rowhammer).
It is reasons like these why we should never really trust transmission of sensitive data over the internet. The concept of secure voting systems, for instance, is literally a joke. insert obligatory xkcd here
"Logic and critical thinking textbooks typically discuss slippery slope arguments as a form of fallacy but usually acknowledge that "slippery slope arguments can be good ones if the slope is real—that is, if there is good evidence that the consequences of the initial action are highly likely to occur. The strength of the argument depends on two factors. The first is the strength of each link in the causal chain; the argument cannot be stronger than its weakest link. The second is the number of links; the more links there are, the more likely it is that other factors could alter the consequences.""
Where's the link between "collecting data on VSCode feature usage" and "gathering a list of all other apps the user has installed on their system and all web searches the user does"?
I mean, I can definitely see “all other apps” being collected as a way to check if there are apps conflicting with or interfering with VSCode. Maybe it’s only collected for a small subset of users with particular issues - but it would still be tempting for a dev to try and collect.
I can also see them collecting code searches done within the app as a way to check if their search system is working well for real use-cases.
Neither is outside the realm of possibility - you just have to put yourself in the mindset of a dev who is assigned to track down a rare crash or to “improve the search experience” who might want a little more data to work with.
Not saying I agree with any of this collection - it’s terrible and definitely falls under “the road to hell is paved with good intentions”. Companies should be extremely clear about what they will and won’t collect - and never cross the line even if it would be useful.
Even with the best intentions at heart, do you trust them not to accidentally leak sensitive information about you through their telemetry? There are so many places a "phone home" system could either be compromised, or accidentally send more than it should.
Telemetry is used either with naivety or malice. There is always some risk to the user.
I completely agree and I’m in the same boat (I’ll continue to use VS Code).
That being said, it’s somewhat amazing we are now in a time when a Microsoft product could have aspects users don’t approve of and rebuild it without it: everyone wins.
To gain cheap trust from people by saying "trust us, the data shows it"
I can think of two reasons for the data to be made public :
- for trust and transparency purposes : It is for the same reason than an election system should be observable and reproducible, from data collection to final decision, including counting methods, etc. Otherwise it would be like "trust us, the data shows it" and you don't show the data to anyone to prove it.
- for coordination and sharing knowledge : some people might interpret the data differently, chose to focus on a niche market by doing different bets than MS, and create a complementary editor to the one from MS. MS has no obligation to support minorities, but someone else might be interested, and those minorities are detectable in the data
You can view a real time log of all telemetry events in-editor. Or do you mean you’d like them to share with you all he telemetry from all their users?
Well, why not? A bunch of people here are more than happy to defend Microsofts tracking. If telemetry really isn't a big deal, make all that data public. It's our data anyway, collected from how we use their software.
That would be like publishing the recipe to the secret sauce that gives them their competitive advantage. I doubt that theynwant to share that insight.
It’s possible that MS only uses it for bug fixes. But it’s also known that MS shares data with the NSA and other government agencies. It’d be very unlikely that the NSA would say, “yes, we want your data, but not Visual Studio data. That’s private. :)”
> It's probably in my best interest, and the interest of the community as a whole to leave tracking on.
I hope this is sarcasm. If not, what did I miss? This is the same empty phrase that facebook, google, etc. use. Why is Microsoft more trustworthy in that regard? I am 100% sure they use the data to make money in short terms or in a long run. They for sure use it to make VSCode better, but only to get more people use VSCode and make them dependent on it. VSCode is a prime example of the Embrace, Extend and Extinguish strategy. I already see them grasping for the Python community.
The telemetry source code is not secret. In fact, it is annotated thought the code, in order to comply with certain GDPR requirements. Try searching "GDPR" globally in the vscode source.
I think I would be okay with companies like Microsoft collecting data on me if they make it more clear what data they are collecting, what they’re using the data for, having the ability to disable data collection (defaulted to disable preferably), and being able to download and have a guid to understand my own data.
As a dev myself I know all of that is difficult and sounds ridiculous, but I really do think we have a right to the data collected on us and on our behavior. Transparency, ownership, and access, that’s all I ask.
But, speaking as a developer of an open source software product that includes telemetry[1], I expect they're tracking really basic stuff, like: DAU, MAU, edited file types, project size, crash reports, etc. Basically, information that helps to internally justify the continued existence of the project, and data that lets them better prioritize resources on the project.
> I expect they're tracking really basic stuff [...] information that helps to internally justify the continued existence of the project, and data that lets them better prioritize resources on the project.
This is my big issue with almost all analytics setups. Sure, you expect that they're probably tracking stuff like that, which is perfectly reasonable and benefits everyone. But when you look at the privacy policy inevitably you find that the data they "may" collect is incredibly vaguely defined, wide-ranging, and generally not actually limited (they use phrases like "we collect data such as..." and "some examples of data we collect are...").
And then what they do with the data is also left unrestricted by phrases like "ways we use the data include..."
And then when you point this out, everyone tells you that you're being paranoid and they're just covering themselves and don't be silly.
And then when they do precisely what their policy legally enables them to do (cough Facebook cough Cambridge Analytica cough) everyone is aghast.
I wouldn’t (unfortunately) ever bother with what terms say. They are written by a legal person who worries about future legal issues.
This is basically down to the reputation of the vendor and what information I can guess they gather based on what sort of outrage they would face if they cross a line.
There are two messages here:
1. Legal. Basically a catch all that says they might sample your blood in the future
2. Non-legal e.g developers. Says they gather harmless statistics.
Obviously #1 smells. But that’s how US corporate legal culture works. The judgement I have to do is whether the vendor can be trusted to do only what they say in message #2. I wouldn’t trust all companies in this respect, especially not those that trade in information like Facebook, but I do give Microsoft the benefit of the doubt.
Thank you! I can always use more hands with it, too. I’m essentially the only person working on the iOS version right now, and I do it pretty much entirely on a volunteer basis (ie nights and weekends, when I’m lucky)
I derive immense value from MSFT's VSCode. I don't think the telemetry in VSCode is similar to cookie tracking by Facebook et al. Frankly, I do not understand the fiscal benefit the Microsoft from VSCode, but I'd like to continue supporting Microsoft's version, as the telemetry from an IDE don't seem to be all encompassing privacy wise, like web tracking
Meh. I just set the environment variable in my dotfiles which opts me out of telemetry. I can't recall a case where MS would outright lie about stuff like this. I.e. with Win 10 they flat out tell you you can't disable all telemetry, and in the case of VSCode they're targeting a real hard-ass demographic, so I trust them to not track me when I told them not to. I also don't use VSCode very often, though.
This is nice and all but, from my perspective, the biggest issue with VSCode isn't the telemetry but the battery drain. Even though I prefer VSCode to Sublime the negative effect on battery life, particularly for sizeable projects, forced me back to Sublime for most use cases.
If you are concerned with privacy and use any Microsoft products (you're already making a mistake by doing so in the first place but) I would recommend, by default, blocking all Internet access to/from the host (edit: by default; make exceptions as needed, obviously).
I've got one Windows machine here, just so I can run one specific client that I have to use for an internally-hosted application. That machine doesn't have a default route, just a single static route that lets it communicate with the (internal) things it needs to and, just for good measure, there are firewall rules (on the router connected to my upstream) that block any traffic to/from this machine and the Internet. (Sadly, I would not be surprised to learn that it can "fallback" to using DNS queries or some such to report back to the mothership.)
I think we'll eventually get to the point where, in general, devices won't have a default route. It might take a while, though -- currently, way too many people are still completely okay with every device and application they use spying on them and reporting back on what they do.
So-called "default deny" firewall policies for incoming traffic are pretty common nowadays. I can't wait for "default deny" policies for outbound traffic to become standard as well.
Some developers are a funny bunch, always chasing after the next cool thing: either a language, an editor, a stack or a methodology. I’ve watched the RoR hype, the Agile frenzy, the Sublime Text popularity, the Angular and React mania... The truth of the matter is that while a developer needs to keep up with new tech, often “new” doesn’t mean better, just different. Choosing the right tools for the job at hand from all the available ones, not just the “cool” ones, is an often missed quality of a good developer.
VSCode seems rather monolithic. Skimming it briefly, there seems code which might be usefully repackaged as npm packages for service elsewhere. (I didn't check whether npm packages exist providing similar functionality.)
This can certainly be the right call for a project. But maybe it's an untapped opportunity for the broader community?
Has anyone looked at running a tracking fork, say mechanically massaging vscode into a monorepo?
As I explore opportunities for coding inside VR, having a more integrated ecosystem for creating IDEs would be nice.
I shall still prefer to go with the usual editors like sublime, vim. At the very least I can be certain that vim is not reading the files I am editing.
That sounds like they legally released it under MIT. They legally can't just revoke a license grant, even completely unpublishing it by rewriting git history wouldn't solve that.
I want Microsoft to share the vscode extensions, but that reasoning seems weird to me. MS is hosting the extensions, they could decide what applications get access to them. But also saying anything published by mistake immediately makes it part of the current state of an open source project seems too violent. Software authors should decide the terms of their software, maybe past versions maintain the license included in them, but they are not eternal, should the author decide so.
If at some point I want to turn my application more commercial friendly by amending the license or changing it completely, and people tell me "No haha sorry you released as MIT at some point so it's now free forever mate" I would get pissed off and stray away from open source altogether. At least this is how the "you can't revoke a license" argument feels to me. But like I said, past versions that include a specific license should still be governed by that license.
> MS is hosting the extensions, they could decide what applications get access to them.
A better way to do that would be to add some trivial access protection (like a password they didn't accidentially publish). No matter how weak, any attempt to circumvent it would violate anti-hacking laws in most jurisdictions.
> "No haha sorry you released as MIT at some point so it's now free forever mate"
For you existing code that's exactly how it works. Otherwise the concept of licensing something becomes close to meaningless. Imagine Google releases Kubernetes as open source, you build your business on it, and suddenly Google turns around and says "just kidding, everyone who wants to use Kubernetes after next monday has to pay us absurd licensing fees". Using anything open source would be an insane risk if that was possible.
Instead what people usually do is to say "everything I do from now on is closed source. You can maintain a fork of the old version, but good luck keeping up with my version". Or alternatively "everything I do from now on is under [GPL/AGPL/similar restrictive license], if you want to use it beyond that contact me for a more permissive license deal". You can give people more permissions on things you own, or attach fewer permission to new things than you did in the past, but you can't take permissions you already gave away.
When something was obviously committed in error such as in this case (I say obviously because it is well documented that the URL's are not to be made public, and the commit was reverted quickly), I think it is in bad faith to take that mistake and use it against the company giving you the free product. Sure, it might be legally fine, but from an ethical standpoint (and even from a "lets not make OS sound so dangerous to companies that they never want to contribute again" standpoint), I personally don't think this is in good faith at all.
>Instead what people usually do is to say "everything I do from now on is closed source. You can maintain a fork of the old version, but good luck keeping up with my version". Or alternatively "everything I do from now on is under [GPL/AGPL/similar restrictive license], if you want to use it beyond that contact me for a more permissive license deal".
Yes this is what I was describing as reasonable. "Everything after this is governed by X terms" is reasonable. But the whole thing can sound like even if you change terms, previous licenses would still apply, which would be wrong.
To circle back to the specific case of the VSCode extension URLs, at the time they were "new stuff" they were published in a repository with an MIT license notice, effectively publishing them under MIT. For that version that license applies forever. If they change the URLs and keep the change proprietary that's fine, they just can't take back the past.
Though it should be added that I'm just expressing the common understanding, barely anything surrounding open source licenses was ever actually tested in court. There are also some obvious legal positions that would completely change this: does every change need to state the license, are open source licenses actually legally binding etc. However nobody would ever argue those positions because they are detrimental for everyone (ok, the latter one was once argued in a GPL trial, but the court decided not to decide on that)
this is great. we could make it a proper fork and get rid of the constant nagging of notifications wanting to manage your git repo, install plugins, etc...
I do not trust Facebook with my personal data on their site, but I would absolutely trust them to turn off arbitrary telemetry data if in fact it said it was 'off'.
I just don't think there's a big evil conflict of interest or whatever for this stuff to get slippery.
It doesn't help anybody except probably the one guy who 'Showed Microsoft'.
Telemetry helps improve products and opt-out means the product company will miss out on the behavior of power users thereby not being able to optimize their software for their usage.
Did you mean to write "opt-in", or are you suggesting it's OK for software to refuse the choice to opt out of data tracking? "Power users" simply do not use such software or block connections via firewall.
GPS-related telemetry, say in modern cars or smartphones, is akin to being followed. The latter isn't even anonymized, e.g. recently police identified suspects by issuing a warrant to Google for devices in the vicinity, or Tesla having all the car data in case of an accident. Web tracking such as analytics or canvas, things like "advertising tags" being assigned to your profile, is you being digitally followed across various sites you visit. "Usage-data" decided by developers, malicious or otherwise, may not be just usage data from the user's perspective.
You can hardly dismiss all telemetry concerns as invalid when "telemetry" is a catchall term for most any data collection. And the other side of it is the user and being in control of the software they run, which entails opt-in or at the minimum leaving the option to opt-out.
In the main version of Vscode it will give me micro stutters when I use WSR+ vocola. _So far_ there's been none of that with vscodium, however this could be because as with this fresh install there's no plugins.
I gave up on vscode when it locked up on a 32gb machine. Seeing that there's analytics code inside clears that right up. I assumed the problems I had with electron based apps were the overhead of running a browser on top of the application. The real problem is that it's running 1990s omniture code. Thankfully they haven't added auto-play video to the search yet...
Regular VSCode takes about 100MB Ram on my Laptop. And I can't remember the last time it crashed on a 32GB machine. I think you're talking out of your a.
I guess a big WARNING banner might scare users away, but it's still a bit disturbing to see such a lax attitude towards tools which developers use to work with a software company's most valuable assets. A lot of people, developers included, don't really read EULAs, and it's the same reason (traditional) spyware could thrive: no doubt they all specify in their license agreements the fact that they all collect information, but approximately no one reads those.
Put it another way, would you want your compiler or other parts of your toolchain sending information about all the source files it processed? I wouldn't consider myself particularly paranoid when it comes to security, maybe even looser than the average on HN, and even then I wouldn't use such tools. I wonder how many companies have already banned their use...