Twitter is being investigated over data collection in its link-shortening system

hlandau · on Oct 14, 2018

What's particularly insidious about a lot of these link shorteners is the use of non-semantic redirects. That is, redirects which are not based on HTTP Location: headers but things like meta http-equiv="Refresh". I assume this is done to allow these pages to be loaded with tracking scripts.

Of course this is a completely broken way to implement a link shortener since it won't work with non-browser tools such as curl. I tried a t.co URL with curl and it returns a Location: header, which means they're doing user agent sniffing. If you need to use user agent sniffing to make something practical, it's generally a good sign you shouldn't be doing that thing.

LeoPanthera · on Oct 14, 2018

You are correct:

$ curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15" https://t.co/88MpPkUoJg

  <head><noscript><META http-equiv="refresh" content="0;URL=https://bbc.in/2yDY0F5"></noscript><title>https://bbc.in/2yDY0F5</title></head><script>window.opener = null; location.replace("https:\/\/bbc.in\/2yDY0F5")</script>

I had no idea they were doing it that way. How gross.

eli · on Oct 14, 2018

I assume it’s to remove the t.co page from the browser history, which of course is not relevant or useful for curl. There’s nothing in that response that looks malicious.

pdkl95 · on Oct 15, 2018

They already return different results based on the user agent header; they could easily be returning different results based on other HTTP headers, IP headers, etc.

Arguments that implicitly assume everyone receives the same data from a server are frighteningly common. This is extra strange when it happens on forums like HN that also regularly assume the same server might be A/B testing or providing "targeted" advertising - or prices - that is unique for most users.

Any discussion about data from an unknown server should always include some sort of checksum. Without verification everyone is receiving the same data, statements about a server's responses don't mean much.

tdb7893 · on Oct 15, 2018

Couldn't any site be sending different results based on any header? I guess I don't get how "they could easily be returning different results based on other HTTP headers, IP headers, etc" doesn't apply to literally every site

stingraycharles · on Oct 15, 2018

As others have pointed out, the same thing can be accomplished using a http redirect. The only purpose this kind of intermediate page has is to hide the HTTP Referer field and make it look like it was coming from t.co. This ensures that only Twitter knows which tweet someone was coming from.

Leace · on Oct 15, 2018

Of course for that there is also a standard-compliant way: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Re...

sebazzz · on Oct 15, 2018

That doesn't work for a part of Twitter audience, among Edge users.

teraflop · on Oct 14, 2018

A normal HTTP redirect would accomplish the same thing.

mvanbaak · on Oct 15, 2018

if you enable '-v' you can see they set a cookie:

set-cookie: muc=4673c8f0-5aef-45eb-8e4b-ab06bc59944c; Expires=Wed, 14 Oct 2020 10:10:19 GMT; Domain=t.co

xxs · on Oct 14, 2018

location.replace removes the back button interaction, i.e. history.

I have always preferred to use it (location.replace) within the same site. Also it allows to better control browser cache policies.

Although it has been almost a decade since, I doubt much has changed.

ceejayoz · on Oct 15, 2018

> location.replace removes the back button interaction, i.e. history.

A 301/302 redirect works just fine for this.

TeMPOraL · on Oct 15, 2018

Yup. The RFC is literally full of phrases like "status code indicates that the target resource has been assigned a new permanent URI and any future references to this resource ought to use one of the enclosed URIs. Clients with link-editing capabilities ought to automatically re-link references to the effective request URI to one or more of the new references sent by the server, where possible." (emphasis mine).

burtonator · on Oct 15, 2018

What bothers me as someone who works with web standards is that these URL tracking services should have been fundamentally rejected as they're only used for tracking and completely unnecessary.

Additionally, they're not actually part of web technology due to Twitter's ToS...

I run a web crawling company (http://www.datastreamer.io/) and we license data to other companies based on what we crawl.

This really opens up some weird situations for us...

If a URL is copied and shared OUTSIDE of Twitter but behind a t.co URL you can't access it without agreeing to their ToS even though the link might be to the nytimes or some other service.

I was initially upset about the GDPR but I'm starting to see the light of day here.

You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.

It's like standing on the street corner and yelling and then saying everyone around you owes you royalties because they're hearing your copyrighted speech.

pacifika · on Oct 15, 2018

They might be unnecessary in this case but not always. For example I used to work with elearning materials which did outbound linking to other materials, which might change, or be on services not under our control. Being able to manage the link endpoint without having to republish the materials is a big win for time/effort and sometimes is just not possible.

zrobotics · on Oct 15, 2018

As an end-user of similar types of materials: no, I will emphatically state that using a link shortener makes the material worse. If you don't update the link and it points to a broken page, at least the URL normally has enough information for me to Google the underlying material. A shortened link loses all of that context.

mrep · on Oct 15, 2018

> You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.

Can you explain that further because you pretty much to have to have a ToS if you don't want to get sued to death for any moderately sized website? The WWW is not a complete free for all.

crunchlibrarian · on Oct 14, 2018

I just wish these awful link shortener/trackers were faster, on lower end network connections you have to sit there and stare at the waiting for t.co for two or three seconds before you actually get the link you want.

userbinator · on Oct 15, 2018

In my experience it's the target that takes the most time to load; the shortener itself is usually quick.

underwater · on Oct 15, 2018

I believe that some of the weirder redirect methods are aimed at preventing the browser from forwarding “Referrer” headers to the destination site.

the8472 · on Oct 15, 2018

That can be achieved with a much simpler <a rel="noreferrer" ...>

meowface · on Oct 15, 2018

Not for IE 11 for Windows versions below 10, or IE versions below 11 for all OS versions. Source: https://caniuse.com/#feat=rel-noreferrer

jwilk · on Oct 15, 2018

This works with JS disabled:

https://caniuse.com/rel-noreferrer

webdevetc · on Oct 15, 2018

I think the http-equiv="Refresh" redirect is done so that the http referer header is from t.co, and not twitter.com (or whereever the user clicked the link from).

(I don't think rel='noreferrer' is fully supported by all browsers)

UncleEntity · on Oct 14, 2018

I don't understand...

I used wget to get a t.co and original link (from the sibling comment) and diff showed no differences in the fetched pages.

--edit--

So HN is not a discussion site then?

LeoPanthera · on Oct 15, 2018

They detect curl (and wget) and serve up a "real" redirect. You need to spoof a real browser user agent, like I did in my comment above.

kirillzubovsky · on Oct 15, 2018

I remember reading a HN thread a few years ago where this was suggested as the cheapest way to create short links. Instead of running a server with routes on it, you just need to generate 1 static page with this meta tag per each link, then it's always there. Could it be that Twitter folks were simply trying to be efficient?

adanto6840 · on Oct 15, 2018

Um, they have to run a lookup to find the 'value' for the given 'key' regardless...? I cannot think of any positive value for the user here -- it's non-standard & slower. 3XX redirects have been around a long time and basically every single client out there knows how to use them, and those that don't can look the status codes up to see how they should handle them if they want to.

AFAICT this is purely to allow for the 'pseudo injection' of the third-party JS, presumably for tracking purposes...

Only question I'd have is why they can't read the cookie server-side instead, but I'm guessing there are cookies on other domains that their JS is looking for? Haven't done web stuff in a few years so I'm behind on CORS-ish pros/cons/knowledge.

icebraining · on Oct 15, 2018

I'm guessing there are cookies on other domains that their JS is looking for?

Nah, the browser doesn't let you do that. This SO answer suggests it's to pass the Referrer header so that the destination site knows the user came from Twitter:

https://softwareengineering.stackexchange.com/a/343667

Leace · on Oct 15, 2018

> Could it be that Twitter folks were simply trying to be efficient?

That wouldn't explain the User-Agent sniffing (curl gets a proper HTTP redirect).

saagarjha · on Oct 14, 2018

I really hate it when websites use shortened links instead of real ones. Twitter’s not the only website that does this; everything from Google to Discourse seems to be doing this these days. Not only is this horrible for privacy, it also makes copying links really annoying.

raverbashing · on Oct 15, 2018

There is one reason for this: anti-spam/anti-malicious links

If a problematic link is shared, it can be pulled from the platform without "doing a gigantic grep"

wlesieutre · on Oct 15, 2018

On the other hand, link shorteners are also great for hiding malicious links because you can't see where it's going before you click on it.

cremp · on Oct 15, 2018

Just a note:

On bitly, you can add a + to the end, to get to the stats page for that link; it also gives the destination.

On the goo.gl links, add .info to the end.

r1ch · on Oct 15, 2018

This is the theory, but I've never seen it in action. Even those cryptocurrency scam bots that reply to high profile accounts with wallet stealing sites have their links working for hours / days.

raverbashing · on Oct 15, 2018

I saw some links being pulled already, but I won't click the crypto scams to see if they were or not.

Buge · on Oct 16, 2018

You don't need to do a giant grep. You could put all links in a database, which contains pointers to where the links are used. Then if you want to delete a link, you can delete all those places.

This seems equal or less effort than making a url shortener.

giancarlostoro · on Oct 15, 2018

> Not only is this horrible for privacy

Those websites are not funded by respecting their users privacy. Although I think you mean Discord and not Discourse?

saagarjha · on Oct 15, 2018

Nope, I mean Discourse, which does some sort of stupid link tracking so that it can display a count of how many people have clicked on a link. In doing so, they somehow break Force Touch in Safari, which makes it doubly annoying.

giancarlostoro · on Oct 15, 2018

Ah, weird I never noticed till I used Firefox Developer Tools and saw the 302 redirect, when you hover over URLs it works normal, but when you click I guess JS hijacks everything... Does it do it on any Discord hosted forum with their own url tracker or is it just part of the forum itself? I wouldnt want them tracking all my users URL traffic... or else I wouldnt want to use this forum script at all, I often wondered how they track the clicks, seems to me it would of been easier to just increment onclick? Do a POST and once fulfilled, then redirect the user with JS.

Mononokay · on Oct 15, 2018

Discord doesn't host forums.

Buge · on Oct 15, 2018

When I hover over links, or right click and click "copy link address" in a google search result page, I get the real link, not a shortened link.

giancarlostoro · on Oct 15, 2018

I have a Firefox plugin that removes the tracked URLs, this must be a Chrome only thing, and considering they track you as it is, I'm not surprised if they do something special for Chrome to hide the tracking internally (who knows what they bundle with their proprietary rendition of Chrome?).

Larrikin · on Oct 15, 2018

Which plugin? Does it work on more than just Google?

majewsky · on Oct 15, 2018

It's called "Google search link fix" and comes from Wladimir Palant. I use it on Firefox. AFAIK it works with multiple search engines, despite the name.

thinkingemote · on Oct 15, 2018

in Firefox, when I hover over the links in Google results I see the original URL on the bottom left tooltip but when I copy the address, I get the redirect link, and then when I go back and hover over the link the url shown in the tooltip is also changed to the redirected link.

wlesieutre · on Oct 15, 2018

When I copy it I get a lengthened link. It has the destination URL in a query string, along with a "ved" value which includes a bunch of information about the link that you clicked on:

https://moz.com/blog/inside-googles-ved-parameter

If you're looking in your browser's status bar when you hover the link, Google is manually displaying the end destination URL. The link doesn't actually go there directly.

This is using Firefox while not signed in to a Google account. If you're really getting direct links, perhaps if you're signed in or using Chrome they give you real links and track you by other means instead.

entropie · on Oct 15, 2018

Hm, you are right.

Did they fix that recently? Because Iam sure that wasnt always the case.

Buge · on Oct 15, 2018

I think it's been this way for the last year or so. I definitely remember links becoming mangled when clicked sometimes in the past.

pjc50 · on Oct 15, 2018

Maybe only in Chrome, because it definitely does the wrong thing in Firefox.

rapnie · on Oct 14, 2018

LinkedIn also quite good at shortening.

chopin · on Oct 15, 2018

And they are good in being intrusive, too...

nyxtom · on Oct 15, 2018

I forgot why we even needed url shortening until I remembered I used them specifically for Twitter due to the character limits. It's odd that people here are surprised by the analytics, and tracking behavior used by t.co links. Bit.ly is another example of this and they have quite an extensive data science team devoted to this. That being said bit.ly does use a standard HTTP redirect

TeMPOraL · on Oct 15, 2018

I remember the first URL shorteners I found; the primary purpose seemed to be to make life easier to each other when chatting on IMs, as long and crazy URLs (like Google's) tend to break in chat windows. But then someone added tracking clicks. Suddenly, people would send shortened URLs just to track how many people visited the resource. And then adtech took off and everything went to shit. I don't use link shorteners anymore.

perlgeek · on Oct 15, 2018

> It's odd that people here are surprised by the analytics, and tracking behavior used by t.co links.

For EU citizens, collecting data either requires a very strong reason (like not being to operate the service otherwise), or opt-in.

You can absolutely operate an URL shortening service without massive data collection, which means they need to get an opt-in for data storage from every EU citizen clicking on such a link, otherwise they are in huge trouble with GDPR.

So yes, I can absolutely be surprised that they don't seem to care about the law.

jaclaz · on Oct 15, 2018

On some boards there are "word filters on", so you couldn't post a link to (example only):

https://www.hirokomatsushita.com/

as it would come out as:

https://www.hirokomatsu****a.com/

randomsearch · on Oct 15, 2018

We don’t _need_ shorteners. Twitter could exclude a URL’s length from the limit. Etc

thinkingemote · on Oct 15, 2018

For some URLs Twitter does exclude it. For example using a url parameter with the intent API or attaching a gif or video.

Crosseye_Jack · on Oct 15, 2018

Link shorteners in general are still handy esp when communicating the link in an medium that you can’t click on the url. TV, Radio, saying the link in a YouTube video (though that does have the description), saying the link during a live stream (though you have chat) and for just pure branding.

_pmf_ · on Oct 15, 2018

> I forgot why we even needed url shortening until I remembered I used them specifically for Twitter due to the character limits.

That's the cover story.

conquistadog · on Oct 14, 2018

By obscuring the real destination, it's also terrible for security.

clubm8 · on Oct 15, 2018

> By obscuring the real destination, it's also terrible for security.

Ah yes, I remember when Tinyurl first came into play - people were extremely hesitant to click anything behind one because so often it was a goatse.

nvr219 · on Oct 15, 2018

that's why they added preview.tinyurl.com feature

peterhunt · on Oct 14, 2018

That’s completely the opposite of reality. The whole point of link shortening on a social network is to improve security and reduce abuse.

freehunter · on Oct 14, 2018

How so? By shortening the link, you're hiding where the link goes to. bit.ly/12345 could go to amazon.com or big-scam-with-a-virus.com, and until you click on it you'd never know.

shpx · on Oct 15, 2018

With bit.ly specifically, add a "+" at the end of the url to see what it points to. It also shows you some stats like creation date and number of clicks over time.

https://bit.ly/19y8wyr+

TeMPOraL · on Oct 15, 2018

I also didn't know about that, so thanks. But - how on Earth was I to know? How are all my non-tech friends to figure it out?

shoo · on Oct 15, 2018

> But - how on Earth was I to know?

from a Don Norman design-of-everyday-things perspective the design is completely non-discoverable https://en.wikipedia.org/wiki/Affordance#As_perceived_action...

iuwhagtr · on Oct 15, 2018

What does that matter? Once they've clicked they'll see the URL in the location bar

astura · on Oct 15, 2018

It's useful to know the domain of the link before you click because some people might not want to navigate to unknown sites at work, or at least don't want to navigate to certain sites at work (Facebook, Instagram, YouTube, pornhub, etc, etc.)

esnard · on Oct 15, 2018

It also works for goo.gl links. [0]

Also note that a ".info" suffix might sometimes be easier to type. [1][2]

Too bad most URL shorteners don't support them. :(

[0]: http://goo.gl/vulnz+

[1]: https://bitly.com/19y8wyr.info

[2]: http://goo.gl/vulnz.info

TeMPOraL · on Oct 15, 2018

Fun fact: Google is shutting down their shortener.

https://developers.googleblog.com/2018/03/transitioning-goog...

nvr219 · on Oct 15, 2018

This is an awesome thing I will never remember to use.

Ayesh · on Oct 15, 2018

TIL. Thank you.

TheSmiddy · on Oct 14, 2018

Once the link shortening service knows it's a scam they can redirect you to a "saved you from being scammed" page.

(although evidence of this happening in practice hasn't crossed my radar, but it's probably because I just don't click those links in the first place)

lilyball · on Oct 14, 2018

You don't need a link shortening service for that. The website and API can just start changing the URL it includes in the tweet if it determines the original URL is a scam.

megous · on Oct 14, 2018

They can redirect you anywhere. They can also rewrite anything in the URL, like add affiliate IDs or whatever. I'm sure some of them do that, because why not.

Alex3917 · on Oct 14, 2018

> The whole point of link shortening on a social network is to improve security and reduce abuse.

How does link shortening do that?

peterhunt · on Oct 14, 2018

See this great post by Matt Jones (from FB antispam/security team) about Facebook's link shortener https://www.facebook.com/notes/facebook-security/link-shim-p...

Alex3917 · on Oct 14, 2018

That's a decent point about email, but there is nothing they're doing on the website that couldn't be done without a link shortener. And even within the context of email it doesn't really make sense, because email clients can just do the same thing without rewriting the URL.

peterhunt · on Oct 14, 2018

How would you show an interstitial without rewriting the url?

Alex3917 · on Oct 15, 2018

Every time a link is clicked, send an event to the server with the URL so that it can be tracked. If the URL is already known to be malicious when the page is generated, either don't include the URL or use javascript to intercept the click event and display the interstitial. If links need to be checked for validity at the moment the user clicks them, then just wait for the 200 response and do the same thing, the performance would be identical either way.

peterhunt · on Oct 15, 2018

And you think running that type of JS on the page is more secure than a simple redirect? What benefit do we get by adding all of this complexity?

Also -- anyone who views a copy/pasted version of this content won't get this protection.

Alex3917 · on Oct 15, 2018

> And you think running that type of JS on the page is more secure than a simple redirect?

It's not more secure, but it's not less secure and it doesn't break the web. It also shouldn't add an appreciable amount of complexity, given that most of the heavy lifting to sanitize, parse, and format UGC content already happens on the server. E.g. if you're already turning UGC snippets into an AST on the server so that you can cleanly syndicate them in different formats, having the AST generate some js around URLs isn't a big lift.

peterhunt · on Oct 15, 2018

Requiring js for your security features to work adds more attack surface area but yes, it can be mitigated. But so much extra complexity!

I still don’t understand why you think url shorteners break the web.

Alex3917 · on Oct 15, 2018

> I still don’t understand why you think url shorteners break the web.

How do you know where the links resolve to once FB goes out of business?

Given the fact that there are still lots of people whose entire job is translating 6,000-year-old grocery receipts from Sumeria, it's not at all unlikely that tweets being written today will be still be widely studied and considered important 10,000 years from now. But those short links are unlikely to resolve for even the next 20 years.

Also, adding js should no longer add more attack surface now that we have things like subresource integrity in addition to CSPs.

megous · on Oct 14, 2018

onclick handler and event.preventDefault

makomk · on Oct 15, 2018

Replacing links with onclick handlers breaks "open in new tab".

megous · on Oct 15, 2018

You can use window.open to simulate that. If you're fb, you're probably already whitelisted in the popup blocker.

Though I agree it's not ideal.

inetknght · on Oct 15, 2018

I'd like to read this but I have facebook blackholed and refuse to change that. Do you have another link?

ambicapter · on Oct 15, 2018

https://web.archive.org/web/20180818114317/https://www.faceb...

TeMPOraL · on Oct 15, 2018

TL;DR: clicking on their shortener can trigger just-in-time malware scan; they can retroactively block links already sent to people; they can strip away the Referer; they can inject their own analytics.

userbinator · on Oct 15, 2018

That sounds like the same authoritarian justification for hiding URLs in browsers and such --- "we'll tell you if it's safe, you don't need to know"...

peterhunt · on Oct 15, 2018

It's not like you can't see the original URL and manually skip the redirect if you wanted to. It's just that most users won't do that which limits the ROI of spam and phishing campaigns.

jest3r1 · on Oct 14, 2018

Link shortening makes it easier to brute force.

Shortened links become trackable by a third-party (less secure), obfuscate the real URL (less secure), and can be brute forced easier: https://www.schneier.com/blog/archives/2016/04/security_risk...

unstuckdev · on Oct 14, 2018

The point of link shortening was to allow links within the constraint of 140 characters.

woodruffw · on Oct 15, 2018

Is there a way to disable Twitter's awful auto-linking behavior? It's extremely annoying to have an example or templated URL become a shortened link[0].

[0]: https://twitter.com/8x5clPW2/status/1043236568394280961

djhworld · on Oct 14, 2018

It will be interesting to see what they are gathering.

My Pi-Hole blocks twitters analytics endpoint so I get an annoying name resolution failure when clicking t.co links

annadane · on Oct 15, 2018

All these deceptive practices seem to be done by Silicon Valley. The attitude/approach to people there must be a little... lacking.

coldacid · on Oct 15, 2018

That's because they aren't doing these things for people.

rdiddly · on Oct 14, 2018

Wish I could recommend as an alternative, the (satirical, and now defunct) URL shortening service by David Rees, http://urlshorteningservicefortwitter.com

Who is David Rees? Glad you asked...

http://www.mnftiu.cc

https://motherboard.vice.com/en_us/article/vvvve8/motherboar...

wlesieutre · on Oct 14, 2018

I'm a fan of the spaaaccccce.com URL lengthener

http://spaaaccccce.com/Gotta_go_to_space_Theres_a_star_There... (link to HN homepage)

Full URL since HN abbreviates it:

    http://spaaaccccce.com/Gotta_go_to_space_Theres_
    a_star_Theres_another_one_Star_Star_star_star_
    Star_Space_Are_we_in_space_Oh_oh_oh_This_is_space_
    Im_in_space

tyingq · on Oct 15, 2018

Shadyurl is funny: http://www.shadyurl.com

jwilk · on Oct 15, 2018

Archived copy without GDPR nag screen:

https://web.archive.org/web/20181015144639/http://fortune.co...

nvr219 · on Oct 15, 2018

I use ublock origin to block those

strictnein · on Oct 15, 2018

> "claimed that it was technically within the company’s aim to determine someone’s approximate location"

What does this even mean? It's a weirdly formatted sentence that makes it sound like Twitter has the magical capability of determining your location... just like everyone else on the internet can with a geoip database.

timdavila · on Oct 15, 2018

Most journalists don't understand how the web works.

cpeterso · on Oct 15, 2018

"Yet Another Twitter Link Expander " is a Firefox extension that expands shorted t.co links so you can see the destination URL inline in the tweet:

https://addons.mozilla.org/en-US/firefox/addon/another-twitt...

pmorici · on Oct 15, 2018

It strikes me that companies and being squeezed from both ends by government. On one hand they are getting lambasted for too much data collection. On the other they are being sued because they don't collect enough data in the case of Apple not being able to unlock iPhone for example.

pjc50 · on Oct 15, 2018

Different data and different governments? They're not really the same issue at all, there's no government level campaign for privacy in the US that corresponds to the EU approach.

nukleosome · on Oct 14, 2018

this entails that any link-shortening service should be investigated. there's no reason why the others wouldn't be doing any data collection.

it's also interesting to think about why Google shut down goo.gl, in light of this and the Google+ story.

ndnxhs · on Oct 15, 2018

Why does twitter even need the redirect links when they could just track what you click with JS?

gfosco · on Oct 15, 2018

Because these links are shared off Twitter.

netcan · on Oct 15, 2018

Is there a reason why Twitter doesn't just support web links?(currently, not the 3rd party history of how we got to here)

dang · on Oct 15, 2018

Url changed from https://theblogroom.com/twitter-being-investigated-collectio..., which mentions the original source but doesn't link to it.