What's particularly insidious about a lot of these link shorteners is the use of non-semantic redirects. That is, redirects which are not based on HTTP Location: headers but things like meta http-equiv="Refresh". I assume this is done to allow these pages to be loaded with tracking scripts.
Of course this is a completely broken way to implement a link shortener since it won't work with non-browser tools such as curl. I tried a t.co URL with curl and it returns a Location: header, which means they're doing user agent sniffing. If you need to use user agent sniffing to make something practical, it's generally a good sign you shouldn't be doing that thing.
$ curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15" https://t.co/88MpPkUoJg
I assume it’s to remove the t.co page from the browser history, which of course is not relevant or useful for curl. There’s nothing in that response that looks malicious.
They already return different results based on the user agent header; they could easily be returning different results based on other HTTP headers, IP headers, etc.
Arguments that implicitly assume everyone receives the same data from a server are frighteningly common. This is extra strange when it happens on forums like HN that also regularly assume the same server might be A/B testing or providing "targeted" advertising - or prices - that is unique for most users.
Any discussion about data from an unknown server should always include some sort of checksum. Without verification everyone is receiving the same data, statements about a server's responses don't mean much.
Couldn't any site be sending different results based on any header? I guess I don't get how "they could easily be returning different results based on other HTTP headers, IP headers, etc" doesn't apply to literally every site
As others have pointed out, the same thing can be accomplished using a http redirect. The only purpose this kind of intermediate page has is to hide the HTTP Referer field and make it look like it was coming from t.co. This ensures that only Twitter knows which tweet someone was coming from.
Yup. The RFC is literally full of phrases like "status code indicates that the target resource has been assigned a new permanent URI and any future references to this resource ought to use one of the enclosed URIs. Clients with link-editing capabilities ought to automatically re-link references to the effective request URI to one or more of the new references sent by the server, where possible." (emphasis mine).
What bothers me as someone who works with web standards is that these URL tracking services should have been fundamentally rejected as they're only used for tracking and completely unnecessary.
Additionally, they're not actually part of web technology due to Twitter's ToS...
I run a web crawling company (http://www.datastreamer.io/) and we license data to other companies based on what we crawl.
This really opens up some weird situations for us...
If a URL is copied and shared OUTSIDE of Twitter but behind a t.co URL you can't access it without agreeing to their ToS even though the link might be to the nytimes or some other service.
I was initially upset about the GDPR but I'm starting to see the light of day here.
You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.
It's like standing on the street corner and yelling and then saying everyone around you owes you royalties because they're hearing your copyrighted speech.
They might be unnecessary in this case but not always. For example I used to work with elearning materials which did outbound linking to other materials, which might change, or be on services not under our control. Being able to manage the link endpoint without having to republish the materials is a big win for time/effort and sometimes is just not possible.
As an end-user of similar types of materials: no, I will emphatically state that using a link shortener makes the material worse. If you don't update the link and it points to a broken page, at least the URL normally has enough information for me to Google the underlying material. A shortened link loses all of that context.
> You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.
Can you explain that further because you pretty much to have to have a ToS if you don't want to get sued to death for any moderately sized website? The WWW is not a complete free for all.
I just wish these awful link shortener/trackers were faster, on lower end network connections you have to sit there and stare at the waiting for t.co for two or three seconds before you actually get the link you want.
I think the http-equiv="Refresh" redirect is done so that the http referer header is from t.co, and not twitter.com (or whereever the user clicked the link from).
(I don't think rel='noreferrer' is fully supported by all browsers)
I remember reading a HN thread a few years ago where this was suggested as the cheapest way to create short links. Instead of running a server with routes on it, you just need to generate 1 static page with this meta tag per each link, then it's always there. Could it be that Twitter folks were simply trying to be efficient?
Um, they have to run a lookup to find the 'value' for the given 'key' regardless...? I cannot think of any positive value for the user here -- it's non-standard & slower. 3XX redirects have been around a long time and basically every single client out there knows how to use them, and those that don't can look the status codes up to see how they should handle them if they want to.
AFAICT this is purely to allow for the 'pseudo injection' of the third-party JS, presumably for tracking purposes...
Only question I'd have is why they can't read the cookie server-side instead, but I'm guessing there are cookies on other domains that their JS is looking for? Haven't done web stuff in a few years so I'm behind on CORS-ish pros/cons/knowledge.
I'm guessing there are cookies on other domains that their JS is looking for?
Nah, the browser doesn't let you do that. This SO answer suggests it's to pass the Referrer header so that the destination site knows the user came from Twitter:
I really hate it when websites use shortened links instead of real ones. Twitter’s not the only website that does this; everything from Google to Discourse seems to be doing this these days. Not only is this horrible for privacy, it also makes copying links really annoying.
This is the theory, but I've never seen it in action. Even those cryptocurrency scam bots that reply to high profile accounts with wallet stealing sites have their links working for hours / days.
You don't need to do a giant grep. You could put all links in a database, which contains pointers to where the links are used. Then if you want to delete a link, you can delete all those places.
This seems equal or less effort than making a url shortener.
Nope, I mean Discourse, which does some sort of stupid link tracking so that it can display a count of how many people have clicked on a link. In doing so, they somehow break Force Touch in Safari, which makes it doubly annoying.
Ah, weird I never noticed till I used Firefox Developer Tools and saw the 302 redirect, when you hover over URLs it works normal, but when you click I guess JS hijacks everything... Does it do it on any Discord hosted forum with their own url tracker or is it just part of the forum itself? I wouldnt want them tracking all my users URL traffic... or else I wouldnt want to use this forum script at all, I often wondered how they track the clicks, seems to me it would of been easier to just increment onclick? Do a POST and once fulfilled, then redirect the user with JS.
I have a Firefox plugin that removes the tracked URLs, this must be a Chrome only thing, and considering they track you as it is, I'm not surprised if they do something special for Chrome to hide the tracking internally (who knows what they bundle with their proprietary rendition of Chrome?).
It's called "Google search link fix" and comes from Wladimir Palant. I use it on Firefox. AFAIK it works with multiple search engines, despite the name.
in Firefox, when I hover over the links in Google results I see the original URL on the bottom left tooltip but when I copy the address, I get the redirect link, and then when I go back and hover over the link the url shown in the tooltip is also changed to the redirected link.
When I copy it I get a lengthened link. It has the destination URL in a query string, along with a "ved" value which includes a bunch of information about the link that you clicked on:
If you're looking in your browser's status bar when you hover the link, Google is manually displaying the end destination URL. The link doesn't actually go there directly.
This is using Firefox while not signed in to a Google account. If you're really getting direct links, perhaps if you're signed in or using Chrome they give you real links and track you by other means instead.
I forgot why we even needed url shortening until I remembered I used them specifically for Twitter due to the character limits. It's odd that people here are surprised by the analytics, and tracking behavior used by t.co links. Bit.ly is another example of this and they have quite an extensive data science team devoted to this. That being said bit.ly does use a standard HTTP redirect
I remember the first URL shorteners I found; the primary purpose seemed to be to make life easier to each other when chatting on IMs, as long and crazy URLs (like Google's) tend to break in chat windows. But then someone added tracking clicks. Suddenly, people would send shortened URLs just to track how many people visited the resource. And then adtech took off and everything went to shit. I don't use link shorteners anymore.
> It's odd that people here are surprised by the analytics, and tracking behavior used by t.co links.
For EU citizens, collecting data either requires a very strong reason (like not being to operate the service otherwise), or opt-in.
You can absolutely operate an URL shortening service without massive data collection, which means they need to get an opt-in for data storage from every EU citizen clicking on such a link, otherwise they are in huge trouble with GDPR.
So yes, I can absolutely be surprised that they don't seem to care about the law.
Link shorteners in general are still handy esp when communicating the link in an medium that you can’t click on the url. TV, Radio, saying the link in a YouTube video (though that does have the description), saying the link during a live stream (though you have chat) and for just pure branding.
How so? By shortening the link, you're hiding where the link goes to. bit.ly/12345 could go to amazon.com or big-scam-with-a-virus.com, and until you click on it you'd never know.
With bit.ly specifically, add a "+" at the end of the url to see what it points to. It also shows you some stats like creation date and number of clicks over time.
It's useful to know the domain of the link before you click because some people might not want to navigate to unknown sites at work, or at least don't want to navigate to certain sites at work (Facebook, Instagram, YouTube, pornhub, etc, etc.)
You don't need a link shortening service for that. The website and API can just start changing the URL it includes in the tweet if it determines the original URL is a scam.
They can redirect you anywhere. They can also rewrite anything in the URL, like add affiliate IDs or whatever. I'm sure some of them do that, because why not.
That's a decent point about email, but there is nothing they're doing on the website that couldn't be done without a link shortener. And even within the context of email it doesn't really make sense, because email clients can just do the same thing without rewriting the URL.
Every time a link is clicked, send an event to the server with the URL so that it can be tracked. If the URL is already known to be malicious when the page is generated, either don't include the URL or use javascript to intercept the click event and display the interstitial. If links need to be checked for validity at the moment the user clicks them, then just wait for the 200 response and do the same thing, the performance would be identical either way.
> And you think running that type of JS on the page is more secure than a simple redirect?
It's not more secure, but it's not less secure and it doesn't break the web. It also shouldn't add an appreciable amount of complexity, given that most of the heavy lifting to sanitize, parse, and format UGC content already happens on the server. E.g. if you're already turning UGC snippets into an AST on the server so that you can cleanly syndicate them in different formats, having the AST generate some js around URLs isn't a big lift.
> I still don’t understand why you think url shorteners break the web.
How do you know where the links resolve to once FB goes out of business?
Given the fact that there are still lots of people whose entire job is translating 6,000-year-old grocery receipts from Sumeria, it's not at all unlikely that tweets being written today will be still be widely studied and considered important 10,000 years from now. But those short links are unlikely to resolve for even the next 20 years.
Also, adding js should no longer add more attack surface now that we have things like subresource integrity in addition to CSPs.
TL;DR: clicking on their shortener can trigger just-in-time malware scan; they can retroactively block links already sent to people; they can strip away the Referer; they can inject their own analytics.
That sounds like the same authoritarian justification for hiding URLs in browsers and such --- "we'll tell you if it's safe, you don't need to know"...
It's not like you can't see the original URL and manually skip the redirect if you wanted to. It's just that most users won't do that which limits the ROI of spam and phishing campaigns.
Is there a way to disable Twitter's awful auto-linking behavior? It's extremely annoying to have an example or templated URL become a shortened link[0].
> "claimed that it was technically within the company’s aim to determine someone’s approximate location"
What does this even mean? It's a weirdly formatted sentence that makes it sound like Twitter has the magical capability of determining your location... just like everyone else on the internet can with a geoip database.
It strikes me that companies and being squeezed from both ends by government. On one hand they are getting lambasted for too much data collection. On the other they are being sued because they don't collect enough data in the case of Apple not being able to unlock iPhone for example.
Different data and different governments? They're not really the same issue at all, there's no government level campaign for privacy in the US that corresponds to the EU approach.
Of course this is a completely broken way to implement a link shortener since it won't work with non-browser tools such as curl. I tried a t.co URL with curl and it returns a Location: header, which means they're doing user agent sniffing. If you need to use user agent sniffing to make something practical, it's generally a good sign you shouldn't be doing that thing.