Hacker News new | past | comments | ask | show | jobs | submit login
Twitter is being investigated over data collection in its link-shortening system (fortune.com)
243 points by Korosh on Oct 14, 2018 | hide | past | favorite | 115 comments



What's particularly insidious about a lot of these link shorteners is the use of non-semantic redirects. That is, redirects which are not based on HTTP Location: headers but things like meta http-equiv="Refresh". I assume this is done to allow these pages to be loaded with tracking scripts.

Of course this is a completely broken way to implement a link shortener since it won't work with non-browser tools such as curl. I tried a t.co URL with curl and it returns a Location: header, which means they're doing user agent sniffing. If you need to use user agent sniffing to make something practical, it's generally a good sign you shouldn't be doing that thing.


You are correct:

$ curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15" https://t.co/88MpPkUoJg

  <head><noscript><META http-equiv="refresh" content="0;URL=https://bbc.in/2yDY0F5"></noscript><title>https://bbc.in/2yDY0F5</title></head><script>window.opener = null; location.replace("https:\/\/bbc.in\/2yDY0F5")</script>
I had no idea they were doing it that way. How gross.


I assume it’s to remove the t.co page from the browser history, which of course is not relevant or useful for curl. There’s nothing in that response that looks malicious.


They already return different results based on the user agent header; they could easily be returning different results based on other HTTP headers, IP headers, etc.

Arguments that implicitly assume everyone receives the same data from a server are frighteningly common. This is extra strange when it happens on forums like HN that also regularly assume the same server might be A/B testing or providing "targeted" advertising - or prices - that is unique for most users.

Any discussion about data from an unknown server should always include some sort of checksum. Without verification everyone is receiving the same data, statements about a server's responses don't mean much.


Couldn't any site be sending different results based on any header? I guess I don't get how "they could easily be returning different results based on other HTTP headers, IP headers, etc" doesn't apply to literally every site


As others have pointed out, the same thing can be accomplished using a http redirect. The only purpose this kind of intermediate page has is to hide the HTTP Referer field and make it look like it was coming from t.co. This ensures that only Twitter knows which tweet someone was coming from.


Of course for that there is also a standard-compliant way: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Re...


That doesn't work for a part of Twitter audience, among Edge users.


A normal HTTP redirect would accomplish the same thing.


if you enable '-v' you can see they set a cookie:

set-cookie: muc=4673c8f0-5aef-45eb-8e4b-ab06bc59944c; Expires=Wed, 14 Oct 2020 10:10:19 GMT; Domain=t.co


location.replace removes the back button interaction, i.e. history.

I have always preferred to use it (location.replace) within the same site. Also it allows to better control browser cache policies.

Although it has been almost a decade since, I doubt much has changed.


> location.replace removes the back button interaction, i.e. history.

A 301/302 redirect works just fine for this.


Yup. The RFC is literally full of phrases like "status code indicates that the target resource has been assigned a new permanent URI and any future references to this resource ought to use one of the enclosed URIs. Clients with link-editing capabilities ought to automatically re-link references to the effective request URI to one or more of the new references sent by the server, where possible." (emphasis mine).


What bothers me as someone who works with web standards is that these URL tracking services should have been fundamentally rejected as they're only used for tracking and completely unnecessary.

Additionally, they're not actually part of web technology due to Twitter's ToS...

I run a web crawling company (http://www.datastreamer.io/) and we license data to other companies based on what we crawl.

This really opens up some weird situations for us...

If a URL is copied and shared OUTSIDE of Twitter but behind a t.co URL you can't access it without agreeing to their ToS even though the link might be to the nytimes or some other service.

I was initially upset about the GDPR but I'm starting to see the light of day here.

You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.

It's like standing on the street corner and yelling and then saying everyone around you owes you royalties because they're hearing your copyrighted speech.


They might be unnecessary in this case but not always. For example I used to work with elearning materials which did outbound linking to other materials, which might change, or be on services not under our control. Being able to manage the link endpoint without having to republish the materials is a big win for time/effort and sometimes is just not possible.


As an end-user of similar types of materials: no, I will emphatically state that using a link shortener makes the material worse. If you don't update the link and it points to a broken page, at least the URL normally has enough information for me to Google the underlying material. A shortened link loses all of that context.


> You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.

Can you explain that further because you pretty much to have to have a ToS if you don't want to get sued to death for any moderately sized website? The WWW is not a complete free for all.


I just wish these awful link shortener/trackers were faster, on lower end network connections you have to sit there and stare at the waiting for t.co for two or three seconds before you actually get the link you want.


In my experience it's the target that takes the most time to load; the shortener itself is usually quick.


I believe that some of the weirder redirect methods are aimed at preventing the browser from forwarding “Referrer” headers to the destination site.


That can be achieved with a much simpler <a rel="noreferrer" ...>


Not for IE 11 for Windows versions below 10, or IE versions below 11 for all OS versions. Source: https://caniuse.com/#feat=rel-noreferrer


This works with JS disabled:

https://caniuse.com/rel-noreferrer


I think the http-equiv="Refresh" redirect is done so that the http referer header is from t.co, and not twitter.com (or whereever the user clicked the link from).

(I don't think rel='noreferrer' is fully supported by all browsers)


I don't understand...

I used wget to get a t.co and original link (from the sibling comment) and diff showed no differences in the fetched pages.

--edit--

So HN is not a discussion site then?


They detect curl (and wget) and serve up a "real" redirect. You need to spoof a real browser user agent, like I did in my comment above.


I remember reading a HN thread a few years ago where this was suggested as the cheapest way to create short links. Instead of running a server with routes on it, you just need to generate 1 static page with this meta tag per each link, then it's always there. Could it be that Twitter folks were simply trying to be efficient?


Um, they have to run a lookup to find the 'value' for the given 'key' regardless...? I cannot think of any positive value for the user here -- it's non-standard & slower. 3XX redirects have been around a long time and basically every single client out there knows how to use them, and those that don't can look the status codes up to see how they should handle them if they want to.

AFAICT this is purely to allow for the 'pseudo injection' of the third-party JS, presumably for tracking purposes...

Only question I'd have is why they can't read the cookie server-side instead, but I'm guessing there are cookies on other domains that their JS is looking for? Haven't done web stuff in a few years so I'm behind on CORS-ish pros/cons/knowledge.


I'm guessing there are cookies on other domains that their JS is looking for?

Nah, the browser doesn't let you do that. This SO answer suggests it's to pass the Referrer header so that the destination site knows the user came from Twitter:

https://softwareengineering.stackexchange.com/a/343667


> Could it be that Twitter folks were simply trying to be efficient?

That wouldn't explain the User-Agent sniffing (curl gets a proper HTTP redirect).


I really hate it when websites use shortened links instead of real ones. Twitter’s not the only website that does this; everything from Google to Discourse seems to be doing this these days. Not only is this horrible for privacy, it also makes copying links really annoying.


There is one reason for this: anti-spam/anti-malicious links

If a problematic link is shared, it can be pulled from the platform without "doing a gigantic grep"


On the other hand, link shorteners are also great for hiding malicious links because you can't see where it's going before you click on it.


Just a note:

On bitly, you can add a + to the end, to get to the stats page for that link; it also gives the destination.

On the goo.gl links, add .info to the end.


This is the theory, but I've never seen it in action. Even those cryptocurrency scam bots that reply to high profile accounts with wallet stealing sites have their links working for hours / days.


I saw some links being pulled already, but I won't click the crypto scams to see if they were or not.


You don't need to do a giant grep. You could put all links in a database, which contains pointers to where the links are used. Then if you want to delete a link, you can delete all those places.

This seems equal or less effort than making a url shortener.


> Not only is this horrible for privacy

Those websites are not funded by respecting their users privacy. Although I think you mean Discord and not Discourse?


Nope, I mean Discourse, which does some sort of stupid link tracking so that it can display a count of how many people have clicked on a link. In doing so, they somehow break Force Touch in Safari, which makes it doubly annoying.


Ah, weird I never noticed till I used Firefox Developer Tools and saw the 302 redirect, when you hover over URLs it works normal, but when you click I guess JS hijacks everything... Does it do it on any Discord hosted forum with their own url tracker or is it just part of the forum itself? I wouldnt want them tracking all my users URL traffic... or else I wouldnt want to use this forum script at all, I often wondered how they track the clicks, seems to me it would of been easier to just increment onclick? Do a POST and once fulfilled, then redirect the user with JS.


Discord doesn't host forums.


When I hover over links, or right click and click "copy link address" in a google search result page, I get the real link, not a shortened link.


I have a Firefox plugin that removes the tracked URLs, this must be a Chrome only thing, and considering they track you as it is, I'm not surprised if they do something special for Chrome to hide the tracking internally (who knows what they bundle with their proprietary rendition of Chrome?).


Which plugin? Does it work on more than just Google?


It's called "Google search link fix" and comes from Wladimir Palant. I use it on Firefox. AFAIK it works with multiple search engines, despite the name.


in Firefox, when I hover over the links in Google results I see the original URL on the bottom left tooltip but when I copy the address, I get the redirect link, and then when I go back and hover over the link the url shown in the tooltip is also changed to the redirected link.


When I copy it I get a lengthened link. It has the destination URL in a query string, along with a "ved" value which includes a bunch of information about the link that you clicked on:

https://moz.com/blog/inside-googles-ved-parameter

If you're looking in your browser's status bar when you hover the link, Google is manually displaying the end destination URL. The link doesn't actually go there directly.

This is using Firefox while not signed in to a Google account. If you're really getting direct links, perhaps if you're signed in or using Chrome they give you real links and track you by other means instead.


Hm, you are right.

Did they fix that recently? Because Iam sure that wasnt always the case.


I think it's been this way for the last year or so. I definitely remember links becoming mangled when clicked sometimes in the past.


Maybe only in Chrome, because it definitely does the wrong thing in Firefox.


LinkedIn also quite good at shortening.


And they are good in being intrusive, too...


I forgot why we even needed url shortening until I remembered I used them specifically for Twitter due to the character limits. It's odd that people here are surprised by the analytics, and tracking behavior used by t.co links. Bit.ly is another example of this and they have quite an extensive data science team devoted to this. That being said bit.ly does use a standard HTTP redirect


I remember the first URL shorteners I found; the primary purpose seemed to be to make life easier to each other when chatting on IMs, as long and crazy URLs (like Google's) tend to break in chat windows. But then someone added tracking clicks. Suddenly, people would send shortened URLs just to track how many people visited the resource. And then adtech took off and everything went to shit. I don't use link shorteners anymore.


> It's odd that people here are surprised by the analytics, and tracking behavior used by t.co links.

For EU citizens, collecting data either requires a very strong reason (like not being to operate the service otherwise), or opt-in.

You can absolutely operate an URL shortening service without massive data collection, which means they need to get an opt-in for data storage from every EU citizen clicking on such a link, otherwise they are in huge trouble with GDPR.

So yes, I can absolutely be surprised that they don't seem to care about the law.


On some boards there are "word filters on", so you couldn't post a link to (example only):

https://www.hirokomatsushita.com/

as it would come out as:

https://www.hirokomatsu****a.com/


We don’t _need_ shorteners. Twitter could exclude a URL’s length from the limit. Etc


For some URLs Twitter does exclude it. For example using a url parameter with the intent API or attaching a gif or video.


Link shorteners in general are still handy esp when communicating the link in an medium that you can’t click on the url. TV, Radio, saying the link in a YouTube video (though that does have the description), saying the link during a live stream (though you have chat) and for just pure branding.


> I forgot why we even needed url shortening until I remembered I used them specifically for Twitter due to the character limits.

That's the cover story.


By obscuring the real destination, it's also terrible for security.


> By obscuring the real destination, it's also terrible for security.

Ah yes, I remember when Tinyurl first came into play - people were extremely hesitant to click anything behind one because so often it was a goatse.


that's why they added preview.tinyurl.com feature


That’s completely the opposite of reality. The whole point of link shortening on a social network is to improve security and reduce abuse.


How so? By shortening the link, you're hiding where the link goes to. bit.ly/12345 could go to amazon.com or big-scam-with-a-virus.com, and until you click on it you'd never know.


With bit.ly specifically, add a "+" at the end of the url to see what it points to. It also shows you some stats like creation date and number of clicks over time.

https://bit.ly/19y8wyr+


I also didn't know about that, so thanks. But - how on Earth was I to know? How are all my non-tech friends to figure it out?


> But - how on Earth was I to know?

from a Don Norman design-of-everyday-things perspective the design is completely non-discoverable https://en.wikipedia.org/wiki/Affordance#As_perceived_action...


What does that matter? Once they've clicked they'll see the URL in the location bar


It's useful to know the domain of the link before you click because some people might not want to navigate to unknown sites at work, or at least don't want to navigate to certain sites at work (Facebook, Instagram, YouTube, pornhub, etc, etc.)


It also works for goo.gl links. [0]

Also note that a ".info" suffix might sometimes be easier to type. [1][2]

Too bad most URL shorteners don't support them. :(

[0]: http://goo.gl/vulnz+

[1]: https://bitly.com/19y8wyr.info

[2]: http://goo.gl/vulnz.info


Fun fact: Google is shutting down their shortener.

https://developers.googleblog.com/2018/03/transitioning-goog...


This is an awesome thing I will never remember to use.


TIL. Thank you.


Once the link shortening service knows it's a scam they can redirect you to a "saved you from being scammed" page.

(although evidence of this happening in practice hasn't crossed my radar, but it's probably because I just don't click those links in the first place)


You don't need a link shortening service for that. The website and API can just start changing the URL it includes in the tweet if it determines the original URL is a scam.


They can redirect you anywhere. They can also rewrite anything in the URL, like add affiliate IDs or whatever. I'm sure some of them do that, because why not.


> The whole point of link shortening on a social network is to improve security and reduce abuse.

How does link shortening do that?


See this great post by Matt Jones (from FB antispam/security team) about Facebook's link shortener https://www.facebook.com/notes/facebook-security/link-shim-p...


That's a decent point about email, but there is nothing they're doing on the website that couldn't be done without a link shortener. And even within the context of email it doesn't really make sense, because email clients can just do the same thing without rewriting the URL.


How would you show an interstitial without rewriting the url?


Every time a link is clicked, send an event to the server with the URL so that it can be tracked. If the URL is already known to be malicious when the page is generated, either don't include the URL or use javascript to intercept the click event and display the interstitial. If links need to be checked for validity at the moment the user clicks them, then just wait for the 200 response and do the same thing, the performance would be identical either way.


And you think running that type of JS on the page is more secure than a simple redirect? What benefit do we get by adding all of this complexity?

Also -- anyone who views a copy/pasted version of this content won't get this protection.


> And you think running that type of JS on the page is more secure than a simple redirect?

It's not more secure, but it's not less secure and it doesn't break the web. It also shouldn't add an appreciable amount of complexity, given that most of the heavy lifting to sanitize, parse, and format UGC content already happens on the server. E.g. if you're already turning UGC snippets into an AST on the server so that you can cleanly syndicate them in different formats, having the AST generate some js around URLs isn't a big lift.


Requiring js for your security features to work adds more attack surface area but yes, it can be mitigated. But so much extra complexity!

I still don’t understand why you think url shorteners break the web.


> I still don’t understand why you think url shorteners break the web.

How do you know where the links resolve to once FB goes out of business?

Given the fact that there are still lots of people whose entire job is translating 6,000-year-old grocery receipts from Sumeria, it's not at all unlikely that tweets being written today will be still be widely studied and considered important 10,000 years from now. But those short links are unlikely to resolve for even the next 20 years.

Also, adding js should no longer add more attack surface now that we have things like subresource integrity in addition to CSPs.


onclick handler and event.preventDefault


Replacing links with onclick handlers breaks "open in new tab".


You can use window.open to simulate that. If you're fb, you're probably already whitelisted in the popup blocker.

Though I agree it's not ideal.


I'd like to read this but I have facebook blackholed and refuse to change that. Do you have another link?



TL;DR: clicking on their shortener can trigger just-in-time malware scan; they can retroactively block links already sent to people; they can strip away the Referer; they can inject their own analytics.


That sounds like the same authoritarian justification for hiding URLs in browsers and such --- "we'll tell you if it's safe, you don't need to know"...


It's not like you can't see the original URL and manually skip the redirect if you wanted to. It's just that most users won't do that which limits the ROI of spam and phishing campaigns.


Link shortening makes it easier to brute force.

Shortened links become trackable by a third-party (less secure), obfuscate the real URL (less secure), and can be brute forced easier: https://www.schneier.com/blog/archives/2016/04/security_risk...


The point of link shortening was to allow links within the constraint of 140 characters.


Is there a way to disable Twitter's awful auto-linking behavior? It's extremely annoying to have an example or templated URL become a shortened link[0].

[0]: https://twitter.com/8x5clPW2/status/1043236568394280961


It will be interesting to see what they are gathering.

My Pi-Hole blocks twitters analytics endpoint so I get an annoying name resolution failure when clicking t.co links


All these deceptive practices seem to be done by Silicon Valley. The attitude/approach to people there must be a little... lacking.


That's because they aren't doing these things for people.


Wish I could recommend as an alternative, the (satirical, and now defunct) URL shortening service by David Rees, http://urlshorteningservicefortwitter.com

Who is David Rees? Glad you asked...

http://www.mnftiu.cc

https://motherboard.vice.com/en_us/article/vvvve8/motherboar...


I'm a fan of the spaaaccccce.com URL lengthener

http://spaaaccccce.com/Gotta_go_to_space_Theres_a_star_There... (link to HN homepage)

Full URL since HN abbreviates it:

    http://spaaaccccce.com/Gotta_go_to_space_Theres_
    a_star_Theres_another_one_Star_Star_star_star_
    Star_Space_Are_we_in_space_Oh_oh_oh_This_is_space_
    Im_in_space


Shadyurl is funny: http://www.shadyurl.com



I use ublock origin to block those


> "claimed that it was technically within the company’s aim to determine someone’s approximate location"

What does this even mean? It's a weirdly formatted sentence that makes it sound like Twitter has the magical capability of determining your location... just like everyone else on the internet can with a geoip database.


Most journalists don't understand how the web works.


"Yet Another Twitter Link Expander " is a Firefox extension that expands shorted t.co links so you can see the destination URL inline in the tweet:

https://addons.mozilla.org/en-US/firefox/addon/another-twitt...


It strikes me that companies and being squeezed from both ends by government. On one hand they are getting lambasted for too much data collection. On the other they are being sued because they don't collect enough data in the case of Apple not being able to unlock iPhone for example.


Different data and different governments? They're not really the same issue at all, there's no government level campaign for privacy in the US that corresponds to the EU approach.


this entails that any link-shortening service should be investigated. there's no reason why the others wouldn't be doing any data collection.

it's also interesting to think about why Google shut down goo.gl, in light of this and the Google+ story.


Why does twitter even need the redirect links when they could just track what you click with JS?


Because these links are shared off Twitter.


Is there a reason why Twitter doesn't just support web links?(currently, not the 3rd party history of how we got to here)


Url changed from https://theblogroom.com/twitter-being-investigated-collectio..., which mentions the original source but doesn't link to it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: