Hacker News new | past | comments | ask | show | jobs | submit login
Discord has been using ML to determine the gender and age of some of its users (twitter.com/discordpreviews)
126 points by davikr on May 13, 2024 | hide | past | favorite | 119 comments



At some point they'll just resort to calling the metric "person type" and the gender and age will be baked into it. They'll then use that "person type" metric to determine what you want to see, hear and consume.


Discord says I'm a 21ecaf26-4e27-477d-a6c0-45c2c69a7645 but I identify as a 076db833-35be-4fa8-856e-fe2fa4128cee.


People already do this on their own and advertisers already use it.

The Venn diagram of people who subscribe to 'The Economist' minimally overlaps with subscribers to the 'Razorcake' zine and very few companies would want to have an ad in both.


Cosine similarity is low.


> the metric "person type"

"demographic"



Yes but demographic has the unfortunate problem of needing to be connected to real world observable traits. Clustering people into as many groups as naturally fall out can capture way more.

The downside is that you don't know what the clusters actually mean precisely but you figure that out broadly by running tests.


There's no specific requirement for this, it's just a word. In this case it's being connected to the real world traits of language use though.


That's existed forever, it's customer segmentation. It's the underlying reason people want this info, age etc is just a proxy. From an advertising perspective nobody cares how old you actually are, they want to know if you're likely to respond to an ad.


Customer segmentation is based on user-submitted info like name, zip code and product SKU.

This is using typed communications by anonymous users to make a guess about those attributes.


>nobody cares how old you actually are, they want to know if you're likely to respond to an ad.

sadly, people will pay a lot to know the age of someone precisely because they just wanna sell you stuff. Which is the problem in and of itself. I'm glad governments are slowly starting to stamp down on this


An unsupervised approach makes way more sense in some ways but advertisers often like to pick demographics manually so the categories need to be understandable by a human.


If the advertisers still use A/B testing and WANT to learn from said tests. Sure.

If the advertisers use multi-armed bandits you wouldn't care less about the segments themselves.


So, is there a reason they could be doing this?

Like from Discord's perspective, what would this actually accomplish business wise?

I'm kinda drawing a blank here.


Demographics are used pretty heavily in market research, etc. This could be used both in targeting ads to specific people, and also in providing aggregate data to business and marketing partners -- being able to say they have approximately x 18-24s is valuable to pitching partnerships, etc.


Selling ads for Red Bull to high school and collage age boys.

Selling ads for paroxetine to women aged 40-50.


Market research and marketing. Given their rollout of "quests", it's probably another marketing dimension that someone could tailor their ads for.

Ex, maybe don't show the "ad" during the day because they're likely at work if they're >24, or show a "quest" that's targeted towards nostalgia.


> Demographics are used pretty heavily in market research, etc. This could be used both in targeting ads to specific people, and also in providing aggregate data to business and marketing partners

I'm thinking about the workflow here.

First, Discord profiles you based on your behavior. They conclude you are a 20-year-old male.

Second, they show you ads appropriate for a 20-year-old male.

Third, these ads do better than average because they match your behavior.

We already have behaviorally-targeted ads. We've had them forever. How is introducing a level of indirection, where we infer age and sex from behavior and then decide on appropriate ads based on the age/sex construct, supposed to improve over deciding on the ads based on behavior?


I think the world is being segmented into those who are exposed to targeted advertising and those who never or almost never see any advertising because we use uBlock Origin.


Oh, you. Thinking uBlock Origin actually blocks advertising. It doesn't block sponsored blog posts, sponsored comments, paid reviews, conflicts of interest, marketing copy on landing pages, etc. Advertising is everywhere.

And uBlock can't protect you from anything not in your web browser.


I don't think I use any proprietary web services outside of either my web browser (90%) or command-line CURL (10%).


I have 3 PC game platforms, Discord's desktop client, 4 cloud services, and probably some company hubs for drivers (Nvidia, Logitech, Razer, etc) that phone home. You can definitely access all these through a browser intermediary, but many don't. Steam sort of needs to stay on your device for DRM purposes anyway.


What? You mean it doesn't block billboards or radio ads either? It doesn't block my friends when they start talking about their iPhones or being Vegan?


There's just no avoiding those astroturfed marketing campaigns being pushed by Big Vegan.


Minimalism is a scam created by by big small to sell more less.


Inferences have uses. I’ll always try to target marketing based on user segments that are deterministic and I’ve read how the data is collected. If I can’t find that, I’ll look for modeled/etc audiences that include inferred data points. It’s likely discord would have targeting/data for self-reported demographics at a smaller scale and higher price point. Then the inferred demographics at a larger scale and lower price point.


That means your strategy is to make some assumptions about where your marketing will be well received, and insist on only showing it there.

This limits you to the quality of your assumptions. It's always going to be better to show your marketing where it actually works than to show it where you believe it works.


Well yeah you start with an educated guess and optimize once it’s live. So budget/etc moves fluidly and if discord audience 1 performs well it gets more budget. DSP I use doesn’t have discord inventory (if it exists?) but pretty much everything is optimized by deep learning outside of moving branding budgets around.


This isn't just useful for their own marketing as some may imply. My guess is building out an ad network/platform for 3p advertisers to run ads on it.


Discord has nearly finished killing IRC, so it must be time to enshittify!


Until Discord makes it impossible to run an IRC server or client I dont see IRC going anywhere.


You, Me, and the 12 other IRC users remaining aren't going anywhere.

But denying that Discord has effectively killed IRC is just self-delusion in my opinion.


Yeah, protocols are unkillable, but the interest in IRC was definitely hurt by Discord. Had they started putting ads in the client from the start, I doubt so many groups would have migrated to it.


A non-marketing answer is determining whether users are under age for regulatory purposes. Discord was and still is a gaming chat platform so the draw of young children can be dangerous.

Could be used to trigger a verification workflow in the event certain municipalities start requiring it.


How does that explain the focus on gender? Admittedly there are countries like Iran and Saudi Arabia where there might be laws about what you're allowed to show women.


It'd be more remarkable for a Western company to not be focused on gender.


I recall hearing that they added some sort of advertising into the streaming built into discord. Maybe they're intending to gradually expand that and make them more targeted.

https://www.cnet.com/tech/discord-is-adding-ads-but-with-a-g...


In the best case it helps them identify under-18s to keep them away from adult content/creepers.


Maybe finding out if you've got under aged users in order to prevent them from using the service or accessing resources they should not?

The obvious answer is marketing but I think there "could" be a real use case for:

"We want to know if we've got under aged users accessing what they shouldn't and they're not honest about their age so we have to figure it out somehow."


Age and gender are super important for marketing but I'm not sure why it's a big deal. Your (inexact) age and your gender aren't private.


I think the "big deal" about it comes from the fact that users did not give that information willingly to Discord. If I did not provide you that information, why would I expect you to know it? And in a state that California (where Discord is based), I have the right to know how my data is used/shared, and request it be deleted. But I can't request information to be deleted if I don't know you have it. I can't know how you're using/sharing my data if I don't know you have it.


I guess they are saying they don't need to ask you anymore - if you willingly give your actions (servers you browse, things you say, chats you react to, etc.), and they can infer from that, then where's the foul?

I think it comes before that - the first pieces of data they are scraping. I don't buy that the problem is that they can and will infer your age/gender/etc. based on that.


The foul is that I don't want to share my age and gender with every online service that I interact with. I'm bewildered so many people defend discord's actions here. If you give company your email address are they allowed to fully doxx you (your real name, home address, job history, age, gender, friend list, phone number)? Even though you've shared your email address and they just used it to infer the rest (using publicly available sources).


> If you give company your email address are they allowed to fully doxx you

I'm not sure but I'd also be interested to know why information that used to be printed in any phone book for the world to see is suddenly the most private information to hold dear.

I might have my hottest take of all: Anonymity is bad for the web and (ironically) makes us less safe. It creates a power imbalance. I'm only "anon" here because others are. If nobody was anon (somehow) we could know our attackers, trolls, and harassers and take action.


At least with a phone book you were able to opt out of your information being included in it. As of right now, there's no way to opt out of Discord.

Anonymity on the internet protects a lot of us from state actors, governments, and fringe extremist groups. We're already losing that anonymity with many state laws (submitting IDs to view adult content is one of those). Imagine if the President had a list of every person worldwide who said anything bad about him. Name, number, gender, age, address, list of friends on every network, etc.

Having to deal with trolls and harassers that can be blocked or removed from the platform for gross violations is a much better option than everyone in the world being able to find out who I am, who my family is, who my friends are, etc.


Yeah you wish that would work.

All the bad actors will do is steal someone else's credentials. Just like they do now. You still won't figure out who they are. They'll set up networks of fake names, fake friends, fake businesses, fake pages, fake comments, fake posts. They're not going to just roll over and start doing shady shit in their own names.


Note my (somehow). I'm aware it wouldn't be possible.


The foul is that PII per California law the user has the right to know what information is collected, how it's used/shared, and request the data to be deleted. Discord is based in California and will be under subject to these laws.

Looking through Discord's ToS and Privacy Policy, gender isn't mentioned once. And age is mentioned once in the Privacy Policy in regards to account creation.

This practice circumvents the law by gathering this information without the user's express permission.


They aren't collecting gender. They're guessing at gender. If they're doing it right, they're just storing a score representing likelihood. I don't think it's a given how the courts would rule on that... even in California.


They are spying on you, looking for clues to your gender. That is the same as collecting the data, but more insidious and automated.


If you talk about cookie recipes, you might be the type that buys baking sheets. Discord knows you talk about cookie recipes. Presumably it's in their TOS or something.

To me, this strikes me as someone walking in the park with a stroller getting indignant about someone assuming they might be a parent.


Sure, but now consider that the government also has access to this data, and pair that with recently passed laws around gender & reproductive rights. It gets ugly fast. Discord is holding a dangerous piece of data, here.


> request it be deleted

You hear that, HN? Not providing an option to delete our comments is against the law (in California and probably in EU as well). No, anonymizing the comment doesn't count.


> No, anonymizing the comment doesn't count.

It would be nice if HN, or literally anyone, offered that option.


The fact that your online behaviour outs you as gay to an algorithm might be something that you want to keep private though.


>Your (inexact) age and your gender aren't private.

"On the internet, no one knows you're a cat". Most people probably don't care enough to conceal such factors, but maybe we should be more careful divulging that information on a semi-anonymous platform.

Age is an especially dangerous thing to share if you're young.


Does the EU require it to track for underage users?


Not tracking, but EU and their countries have strong laws to protect children. And the Digital Service Act did raise the bar in all of EU. So figuring out users they are supposed to protect, might be beneficial to prevent high penalties. And DSA was finalized in October 2022, so the date matches.


Ads. These apps have no other way to monetize.


Under the UK Online Safety Act pretty much every website going forward is required to figure out the age of every user - https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-re...

This is almost certainly a starting part of implementing that, designed to reduce the amount of users they have to ask for harder age checks for.


> So, is there a reason they could be doing this?

Ummm... like, business?


I would be more surprised if a social app wasn't doing this


I would assume the same. Even if just to address issues with under age users.


as long as they are accurate I do not mind. I was annoyed that as women who is also an engineer google's ad profile always insisted I was a single man. Despite my many skincare/make up/ and women's clothing/shoe purchases. I guess they weighted reference schematics and data sheet searches more heavily than other things. If you are going to show me targeted ads please make them ads I want to see.


I think that in Europe, that will soon be prohibited without declaration and explicit consent.


If that's true, it's pretty ridiculous. I wish legislation would get at the core issue, which is not processing, but that collection happens at all. Normalizing surveillance, but asking companies to please not do anything with that data for minors, is such a far cry from what we deserve.


But Discord must store the messages, to make searching possible. And I assume the model works based on the writing style, topics discussed, etc. What would you propose, automatically remove all messages after a few weeks? Even then Discord technically can process the messages and build the same model. So yeah, in this case the processing is the problem since I don't want Discord to know my gender and age (unless I explicitly tell it).


> What would you propose, automatically remove all messages after a few weeks?

This but unironically.

If people or bots or the NSA want to keep records, let them scrape and cache 'em.

I actually think it should all be P2P, but that would kill the thin shell of a business model that exists.


I would hate such a model, having persistent history is one of the killer feature of Discord vs, say, IRC.


I dunno, the search on Discord is trash. The one time I thought it would be useful to go back and find something, it was completely incapable.


This is not my experience, I use the search heavily and find it an invaluable tool to prevent asking repetitive questions.


At least under GDPR, collecting personal data is already considered processing. Processing is only allowed under specific conditions and purpose.

It would be perfectly fine if Discord stored messages from users (for brevity, assuming that's personal data), made it searchable and available as long as the user exists on the platform, provided that they specify this as the purpose of processing, and the user explicitly gave informed consent. Should Discord want to perform any other processing besides that designated purpose, they would (again) have to receive the user's explicit, informed consent. Otherwise, it's simply not allowed.

No need to remove messages in such a case. Of course, technically they could process that data for a different purpose than specified, but that's illegal and may eventually result in a fine that can be based on a percentage of total global revenue.

Not profit or loss, revenue. Then suddenly it becomes a really interesting business decision to do other than defined things with the data at hand.

(and something something purpose limitation, data minimization etc.)


If you suspected you had underage users using your service illicitly. Could you not use such methods to find them?

I wonder at what point analytics, queries, or ML to determine age gets in the way of actually keeping them from getting into things they shouldn't...

Under age users aren't going to just tell you their age honestly, so you'd have to use some method to derive their age.


> If you suspected you had underage users using your service illicitly. Could you not use such methods to find them?

You could use something like that and it would be counted as “legitimate interest” under GDPR, but:

- you wouldn't need (and then would be forbidden) to do segmentation between more categories than “too young”/“old enough”. This is not what they are doing here.

- you'd still need to tell the users that you're doing it, and provide a way for the users to modify that data upon request

> Under age users aren't going to just tell you their age honestly, so you'd have to use some method to derive their age.

You don't have a legitimate interest of knowing their age, only for knowing if they are underage or not.


Under the DMA's requirements to protect children I think it will be basically mandatory to do this.

In the UK it already is under the Online Safety Bill.


I wonder if Discord may end up using this in some capacity to determine whether someone may be under the age of 13, which is the minimum age you need to be in order to have an account according to their terms of service.


. . . with the added bonus of culling folks who are over 13 but can't act like it.


Social networks for 13-year-olds, enforced by ML. This is genius. It’s like IRL groups who recognize each other’s age by looking at each other.


The beginning of enshitification of discord (while 100% expected) for some reason hits harder then any other service I've used throughout all these years. It has entirely replaced social media for me. It just felt more organic to me then anything else.

So... since I've heard about the ads coming to discord, I have looked into alternatives.

They do exist, in varying quality, and there are programs for some of them that make a "bridge" from a discord server to your new platform. It's possible to have your cake and eat it to.

Matrix looks really close, but it needs ALOT of love. I have high hopes for matrix because of the idea of it. Perhaps my lackluster experience was because I didn't choose the best client available. I'm still trying new things here.

https://matrix.org/

Revolt is the more complete product as of today.

https://revolt.chat/

Spacebar I have yet to try but it's on my radar.

https://github.com/spacebarchat

I just wanted to see if anyone else had other recommendations for me to try. I really would appreciate help here. Hopefully something open source and self hosted so I can stop migrating and sink some roots in somewhere.

Oh crap, is it IRC?


Been following Mikoto[https://github.com/mikotoIO/mikoto] which seems like a Discord replacement on steroids.


I'm just happy they resisted the buyout by Microsoft.

> Matrix looks really close, but it needs ALOT of love.

Matrix is just a protocol. Clients competing with Discord would be things like Element. https://element.io/

It's no Discord (which itself is no WLM, by the way -- rip to oldschool fat clients which had even less of a business model and whose features you can't have at all or have to pay for now), but I sorely hope it or something open source gets there one day.


I'm a big fan of Matrix, and run a small homeserver for my family in friends. But if you really want to explore the frontiers peer to peer seems really intriguing because you don't need any server. https://tox.chat/ just to name one.


Quiet, Briar, and Keet are some other p2p chats


Revolt's UI looks super nice, I just wish it was connected to Matrix on the backend so I could use it with all the communities I participate in on matrix.


I understand and sympathise, but Discord is one service I just stayed away from because I could see the enshittification coming a mile off. And I'm disappointed every time I see an open source project house its community on Discord. I hoped that people would learn from the painful experience with Reddit's enshittification but apparently not.


I've been hosting a Mattermost server as an invite-only chat board for about six years. Slack is the direct comp. Mattermost is not immune to being enshittified either, but at least I get some warning: I can read the patch notes before updating the server, and when I see they've turned fully evil, I just won't upgrade the server, and we can start looking for an escape route before the clients break.

Since I own the server, I have some confidence I'm the only person who can possibly read the user data too. That is to say, I know that I won't do it, whereas I believe Discord and Slack would.


Discord last month announced they would be adding advertising to their app.

https://news.ycombinator.com/item?id=39903541

This would make sense with regards to this. It's essential data to be able to sell ads to companies.

If you are an open source or community project on the platform you should probably set up a Libre alternative...


Can you read your own Discord stats, or do you have to be an advertiser to get this info?


you can get this data from a compliance export


Well... Might hit hard a large slice of anagraphically adults witch are still mentally children... Some might even trigger scandals being in significant positions in the social pyramid...


This is really smart. You can get around privacy laws and still derive useful data without having to ask the user personal information.


If someone asks, i'm 16/f/cali!


I actually was that at one point, but I would tell people 53/eunuch/Afghanistan. It did a pretty good job of culling the people I didn't want to talk to and keeping the people I did want to talk to.


your inbox must be an interesting place


does anyone know the name of the monospaced font being used in this screenshot


Later replies say it's Miracode; https://github.com/IdreesInc/Miracode


more surprised that they made it this obvious tbh, somebody is probably getting shafted for not obfuscating this


Probably down to the developers' literal reading of, and desire to comply with GDPR-like laws.

Or, something similar is in various companies' data takeout packages, but nobody's intently looking for it or thinks it's newsworthy. Personally, the only time I've looked through one was from an amazon account because I couldn't find where on the website to read past customer support chats.


Discord is also using Ml to determine the topic of conversation in a voice channel and share it to potential listeners to decide if they want to join the conversation or not.

Frankly I find that unacceptable since the expectation is if no one is in the channel but the person you’re talking with you wouldn’t expect randoms to be pseudo listening in.

Edit: Not certain why I've been downvoted, I can easily provide proof in the form of screenshots


i know the feature you mean and... it only sends it to people who are in that server. i dislike it because a ML model is looking at my text and trying to regurgitate it, but theres no expectation that the people who get those notifications were never able to see. people lurk all the time, and discord messages arent ephemeral


Yeah that’s true, but if they’re not there. They're not there. The content of the conversation isn’t meant for them.


Upvoted, I'd like to see screenshots.



Thanks. Do you know if it works the same in desktop app + web browser?


And another privacy violation to be prosecuted under GDPR: go NOYB, go!


y'know, the GDPR has known a Right To Be Forgotten since 2018, and yet in Discord there's no way to delete your messages in bulk (instead they're being pseudonymized upon account deletion)


I'm not sure about this but I think pseudonymization is fine from GDPR perspective, you are effectively forgotten, but not your writings.


I've never put any stock in this point, your writings cannot be made distinct from you.

How many messages do you need before you can accrue enough personally identifying information to reveal who that person was? Every time they mention where they live, how old they are, what happens about that?

None of that goes away if you only soft delete the user's profile, leaving all the content and context for anyone to rebuild at their leisure.


Pretty sure the original account id is still tied to the message and sent to clients even after deletion. People can easily correlate it if they can prove just one of the messages belonged to you.


...but why?


Moneys.


didyoujustassumemygender.png

The data analytics work, that's why companies use them.


Since discord doesn't serve ads, I assume this is just for data collection that then gets sold? Like an analytics company?


They're increasingly interested in ads[1] because, somehow, $600M revenue doesn't make them profitable.

1. https://arstechnica.com/gadgets/2024/04/discord-starts-down-...


The commentary here shows how ignorant people are with things like mandatory reporting and protecting underage children from online predators. Discord is most assuredly a vector for attack, and someone pretending to be some other age or being under age is very much a real issue that is not uncommon on even smaller social networks.

While I'm sure some of this is related to advertising and marketing, the simple fact is, any social network (of which you can lump Discord into) will already be doing things similar to this, even if not for advertising. If not, they are willfully ignorant.

This isn't some "pretend" problem either. Anyone suggesting otherwise is ignorant. It's that simple.


If they really gave a shit, then why didn't they just ask for age on sign up? Or require people to enter that information to keep their account open? Why use some hocus pocus shit to try to guess people's ages?


they do and people lie. regardless, i strongly dislike the idea of an ML model being used to try to guess this. can't wait for one to hallucinate that i'm 10


People lie! Once again, ignorance rears it's head. This isn't some hocus pocus shit. This is stuff that actually works. Not doing this at this point is practically being complicit in child trafficking. If you think this is hyperbole, you are ignorant and need to stand back and listen and learn.

Hocus pocus? Seriously, this is programming 101. Talk to a programmer, it's not complicated.


In this thread, people don't understand the size and scope of the Pig Butchering problem or Sexploitation scam problem or the Underage User problem or the Revenge Reporting problem or how ML can be used for the user's benefit and safety or for anti-crime. But you know, go on and assume any and all technology is used for $$$$ and never to you know, stop grandmas from losing their life savings or kids from being talked to by pedophiles.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: