Hacker News new | past | comments | ask | show | jobs | submit login

I remember seeing some machine learning algorithm posted here a while back that did an amazing job fingerprinting writing samples. It could use that fingerprint to match up accounts from multiple different sites. Some people here had it correctly find their Reddit profile based off nothing but their public HN data.

If you are writing extensively both anonymously and non-anonymously, you should probably assume that someone motivated enough could match the two together either presently or in the near future as such technology becomes more widespread.




My name here is my name on Reddit is my name on GitHub is my name in real life. I’ve never believed it was possible to be totally anonymous on someone else’s server, so it’s good to have the reminder that I’m absolutely not every time I post anything.


This evidence is inadmissible in general public. You can't go on Twitter, say that the writing style of that and this person is 98% similar per the state of the art SGERT model and convince the public with that. It's also trivial to for a doxing target to dismiss this evidence as another flaky piece of software that's confusing writing styles. This kind of software is useful, but for more specialized purposes: forensics, intelligence.


You don't need any evidence at all to convince the online public of anything it wants to believe, especially if it gives it a target for vitriol. I have to agree with my sibling commenter that believing otherwise is naive given the wealth of evidence to the contrary.


I'd really hate to ruin your naïveté...but sufficiently riled-up mobs have gone after individuals for much, much flimsier evidence than "because this AI says so".


In this case a made up evidence would suffice.


After they're identified, keen humans will go looking for stronger clues. Maybe the different accounts told the same anecdotes or show they have the same set of opinions or knowledge or the posting times are similar but never coincident or whatever other human-readable evidence.


Yes. The time of day the user posts can reveal to their time zone. People often leave comments that reveal their age or gender. If the user mentions a business or product name, it might only available in certain part of the world. Many people reuse account names on other services and their content their may have more clues... etc.


> some machine learning algorithm

A friend of mine asked GPT-3 to mimic a text written by "nindalf". It was so good that I thought it had plagiarised something I had actually written on HN. But it was only mimicking the style of my comments.


I purposefully adopt different writing styles and spellings and misspellings on different platforms to thwart this.


Can you find that?


I don't know the link but it probably used features from a field called stylometry.

https://en.m.wikipedia.org/wiki/Stylometry

It's been used to discover the author of disputed novels from the past.



Yeah, I tried that and it completely struck out. Gave a "similarity score" of .992 or .993 for ten other accounts that weren't mine. Detected a big fat zero of my old accounts (I rotate them regularly).

AI is hype deep-fried in hypesauce.


Did not worked for me either.

We might be the edge cases though and in general it works. In the way autonomous driving works in general, just not in unexpected situations ...


> In the way autonomous driving works in general, just not in unexpected situations ...

I see what you did there :)


That wasn't the one I was referencing. I was talking about a post that was at least a couple years old. I can't seem to find it at the moment.


ah okay I remember that, it was hugged to death when it was trending, nice to be able to check it out


The author almost immediately took the site down and packaged it as part of some social media analysis tool. I can't seem to find the actual post at the moment.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: