I used to work for Crittercism (now Apteligent), but I don't claim to speak for them in any way. I can state that while I was there, I never saw anything even remotely creepy being done with the data that was sent to them. As a baseline, I'm a card-carrying member of the EFF and my definition of "creepy" is a lot easier to meet than most peoples'.
A typical setup would work like this: when you launch an "instrumented" app, it generates a UUID. Then whenever a user interacts with the app, it would send messages like "UUID 1234... launched app version 3.14. UUID 1234 clicked the 'home' button. UUID 1234 viewed their profile page. UUID 1234 searched for a video. UUID 1234 played a video. UUID experienced an OutOfMemory exception in module foo, line 942." These were aggregated together so that you could run reports like "among people who experienced the OutOfMemory exception in module foo, line 942, how many viewed their profile page first?" That allowed developers to very quickly focus on the exact steps required to reproduce a specific problem.
So sure, apps were gathering a lot of information about what you were doing, but it really was genuinely for your benefit. There was no way for customers to run queries like "what video was heywire watching?" or the like. Everything was 100% focused on being able to quickly and accurately identify the cause of crashes. Now, that was just one company and it was several years ago. Maybe every other company was creepy? Maybe Apteligent is, too, now? I don't know. I don't have any insider knowledge into the current state of things. But at the time I personally witnessed it, I would have felt very comfortable at an EFF meeting explaining how every byte of metrics information was being used.
That steps way the hell and gone past my creepiness threshold. Being able to justify it in your own head as being "for [the users'] benefit" doesn't change the fact that there's a shitload of raw data there that the users wouldn't voluntarily provide if they knew about it.
I like what Apple does, they ask you if you'd like to automatically submit usage data and crash reports, and you can opt out without any loss of functionality. I have no idea what % of users opt in, but as long as some people opt in, it'll be win-win for users and developers.
In this specific case, given what I've told you about the kind of data they collected, how it was used, and how it was surfaced to the consumers of the data, what specifically about that data flow bothers you?
> I've told you about the kind of data they collected, how it was used
No, you've only talked about what currently happens when everything is working properly. What happens if the company ends up in financial trouble; to they have a Ulysses Contract[1][2] on record that binds their future ability to monetize all of this data? Without legal enforcement, we just have to hope this company will somehow resist the temptation that most other companies are not able to resist.
> what specifically about that data flow bothers you?
> it generates a UUID
That's obviously personally identifying, which it's a header in all of the analytics you describe. Just because it's synthetic doesn't make it anonymous. Once it's mapped back to other information - which is trivial if you correlate IPs[3] or event timestamps[4] - this type of analytics is only an INNER JOIN away from being merged into someone's pattern-of-life[5].
The problem isn't what happens when everything works as intended. You need to also prepare for when (not if) your data is merged into other databases, and what others might do with the data in a future.
[4] Take a set of "UUID 1234... launched app" events for a common app that is regularly launched e.g. when someone wakes up (or whenever). Correlate those times to other times that also happen to be launched (or webpages/email visited) at similar times. What are the odds that two unrelated people just happened to open different apps [..., 2019-02-04T10:11:22, 2019-02-06T10:17:44, 2019-02-07T10:14:52, ...] (+/- maybe 30 seconds)? A unique identifier and a few high resolution (seconds) timestamps can easily identify someone uniquely when you have enough data points.
The problem with analytics data is that it can look harmless, but it's easy to to convert seemingly trivial data into tracking points. For example, consider this: UUID 1234 always watches video between 16:00 and 17:00 UTC, and then again after 02:00. This gives you a pretty good idea where UUID 1234 lives, as well as their daily schedule (perhaps they have a job during that period? Maybe the fact they don't watch video at 21:00 means they went out to eat that day?)
That's true if you actually collect any identifying information about UUID 1234. If I, as the consumer of the data, can't tell whether they're in Los Angeles or Beijing, then that information doesn't tell you much.
I can say that at the time I was there, it was not possible for a developer logged into our system to suss out any fine-grained information about a particular user. They just got aggregated data like "92% of people who experienced this symptom did this other thing right before it happened".
Wouldn’t IP addresses be routine information collected by apps that collect information over the network? That by itself would give away the location to a good degree of accuracy in most cases.
A typical setup would work like this: when you launch an "instrumented" app, it generates a UUID. Then whenever a user interacts with the app, it would send messages like "UUID 1234... launched app version 3.14. UUID 1234 clicked the 'home' button. UUID 1234 viewed their profile page. UUID 1234 searched for a video. UUID 1234 played a video. UUID experienced an OutOfMemory exception in module foo, line 942." These were aggregated together so that you could run reports like "among people who experienced the OutOfMemory exception in module foo, line 942, how many viewed their profile page first?" That allowed developers to very quickly focus on the exact steps required to reproduce a specific problem.
So sure, apps were gathering a lot of information about what you were doing, but it really was genuinely for your benefit. There was no way for customers to run queries like "what video was heywire watching?" or the like. Everything was 100% focused on being able to quickly and accurately identify the cause of crashes. Now, that was just one company and it was several years ago. Maybe every other company was creepy? Maybe Apteligent is, too, now? I don't know. I don't have any insider knowledge into the current state of things. But at the time I personally witnessed it, I would have felt very comfortable at an EFF meeting explaining how every byte of metrics information was being used.