I used to work for Crittercism (now Apteligent), but I don't claim to speak for ...

stan_rogers · on Feb 8, 2019

That steps way the hell and gone past my creepiness threshold. Being able to justify it in your own head as being "for [the users'] benefit" doesn't change the fact that there's a shitload of raw data there that the users wouldn't voluntarily provide if they knew about it.

chillacy · on Feb 8, 2019

I like what Apple does, they ask you if you'd like to automatically submit usage data and crash reports, and you can opt out without any loss of functionality. I have no idea what % of users opt in, but as long as some people opt in, it'll be win-win for users and developers.

kstrauser · on Feb 8, 2019

In this specific case, given what I've told you about the kind of data they collected, how it was used, and how it was surfaced to the consumers of the data, what specifically about that data flow bothers you?

pdkl95 · on Feb 8, 2019

> I've told you about the kind of data they collected, how it was used

No, you've only talked about what currently happens when everything is working properly. What happens if the company ends up in financial trouble; to they have a Ulysses Contract[1][2] on record that binds their future ability to monetize all of this data? Without legal enforcement, we just have to hope this company will somehow resist the temptation that most other companies are not able to resist.

> what specifically about that data flow bothers you?

> it generates a UUID

That's obviously personally identifying, which it's a header in all of the analytics you describe. Just because it's synthetic doesn't make it anonymous. Once it's mapped back to other information - which is trivial if you correlate IPs[3] or event timestamps[4] - this type of analytics is only an INNER JOIN away from being merged into someone's pattern-of-life[5].

The problem isn't what happens when everything works as intended. You need to also prepare for when (not if) your data is merged into other databases, and what others might do with the data in a future.

[1] https://en.wikipedia.org/wiki/Ulysses_pact

[2] https://www.youtube.com/watch?v=zlN6wjeCJYk

[3] https://news.ycombinator.com/item?id=17170468

[4] Take a set of "UUID 1234... launched app" events for a common app that is regularly launched e.g. when someone wakes up (or whenever). Correlate those times to other times that also happen to be launched (or webpages/email visited) at similar times. What are the odds that two unrelated people just happened to open different apps [..., 2019-02-04T10:11:22, 2019-02-06T10:17:44, 2019-02-07T10:14:52, ...] (+/- maybe 30 seconds)? A unique identifier and a few high resolution (seconds) timestamps can easily identify someone uniquely when you have enough data points.

[5] https://en.wikipedia.org/wiki/Pattern-of-life_analysis

kkarakk · on Feb 8, 2019

Literally every app that uses firebase can do this, and firebase is pretty much the standard for scalable apps for the indie dev

saagarjha · on Feb 7, 2019

The problem with analytics data is that it can look harmless, but it's easy to to convert seemingly trivial data into tracking points. For example, consider this: UUID 1234 always watches video between 16:00 and 17:00 UTC, and then again after 02:00. This gives you a pretty good idea where UUID 1234 lives, as well as their daily schedule (perhaps they have a job during that period? Maybe the fact they don't watch video at 21:00 means they went out to eat that day?)

kstrauser · on Feb 7, 2019

That's true if you actually collect any identifying information about UUID 1234. If I, as the consumer of the data, can't tell whether they're in Los Angeles or Beijing, then that information doesn't tell you much.

I can say that at the time I was there, it was not possible for a developer logged into our system to suss out any fine-grained information about a particular user. They just got aggregated data like "92% of people who experienced this symptom did this other thing right before it happened".

wtmt · on Feb 8, 2019

Wouldn’t IP addresses be routine information collected by apps that collect information over the network? That by itself would give away the location to a good degree of accuracy in most cases.

scrollaway · on Feb 8, 2019

On mobile no it wouldn't. Often enough it barely even gives away the country. When roaming with Fi in Europe I have a US IP.

AlexandrB · on Feb 8, 2019

> So sure, apps were gathering a lot of information about what you were doing, but it really was genuinely for your benefit.

For my benefit without giving me a clear understanding of what was being collected or the option to opt out. Gee, you really shouldn't have.