Sorry I don’t buy it. Hundreds of millions of people use Twitter, and we are to understand that there are an enough people scraping to the extent that they had to suddenly take drastic action by shuttering unauthenticated access? Any dev would have told him that those supposedly scraping could simply setup Selenium or some other headless browser to login before scraping.
This smells of another failed Musk experiment at twiddling with the knobs to increase engagement, to me.
A bot scraping content will tend to go deep into the archives and hit all content systematically. Caching isn't as effective if you hit everything whereas real users will tend to hit the same content over and over again.
You don't generally need to accept licenses in order to scrape something, only if you want to distribute it.
The legal ambiguity comes from the question of whether LLM outputs are a derivative work of the training data. I expect that they aren't, but anything can happen.
> Hundreds of millions of people use Twitter, and we are to understand that there are an enough people scraping to the extent that they had to suddenly take drastic action by shuttering unauthenticated access
Suppose 1 million people are accessing Twitter at any given time. An actual person might only be making 1 request / second. That's 1 million requests / second.
Suppose there are 100 AI companies scraping Twitter. A bot like this can make thousands to tens of thousands of requests per second. That's an additional million requests / second.
There are probably more than 100 "AI" companies now, trying to train their own bespoke LLMs. They're popping up like weeds so I can totally see Twitter's load doubling or tripling recently. So sorry, I just don't get the skepticism. Sure it could be a cover for something else, but his actual stated reason seems totally possible.
It’s not a little “use selenium” switch you can click, but it absolutely is an option (and there are others) if the barrier is simply to have an authenticated account and be logged in.
If these data scraping operations are as sophisticated and determined as he claims this measure is insufficient and actually it really hurts Twitter far more than it helps. Case in point: we stopped sharing Twitter links because when you click them in most iOS apps it opens up an unauthenticated web view and presents you with a login screen. So we just collectively decided “ah ok no sharing Twitter” and moved on.
I’m sure there are companies scraping Twitter. I just don’t buy that it’s as big of an issue as he claims it is, and that preventing people from viewing tweets without logging in is a way to mitigate against that (I’d first look at banning problematic IP addresses first, personally).
To me it’s either:
1) a very poor and very temporary mitigation against scraping, that could be bypassed with a bit of effort
2) an experiment in optimising metrics - Musk sees lots of unauthenticated users consuming Twitter, tries to steer them into signing up
3) it’s all just a big mistake
Option #2 makes the most sense to me, but frankly none of them are good
This smells of another failed Musk experiment at twiddling with the knobs to increase engagement, to me.