Hacker News new | past | comments | ask | show | jobs | submit login

What's particularly insidious about a lot of these link shorteners is the use of non-semantic redirects. That is, redirects which are not based on HTTP Location: headers but things like meta http-equiv="Refresh". I assume this is done to allow these pages to be loaded with tracking scripts.

Of course this is a completely broken way to implement a link shortener since it won't work with non-browser tools such as curl. I tried a t.co URL with curl and it returns a Location: header, which means they're doing user agent sniffing. If you need to use user agent sniffing to make something practical, it's generally a good sign you shouldn't be doing that thing.




You are correct:

$ curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15" https://t.co/88MpPkUoJg

  <head><noscript><META http-equiv="refresh" content="0;URL=https://bbc.in/2yDY0F5"></noscript><title>https://bbc.in/2yDY0F5</title></head><script>window.opener = null; location.replace("https:\/\/bbc.in\/2yDY0F5")</script>
I had no idea they were doing it that way. How gross.


I assume it’s to remove the t.co page from the browser history, which of course is not relevant or useful for curl. There’s nothing in that response that looks malicious.


They already return different results based on the user agent header; they could easily be returning different results based on other HTTP headers, IP headers, etc.

Arguments that implicitly assume everyone receives the same data from a server are frighteningly common. This is extra strange when it happens on forums like HN that also regularly assume the same server might be A/B testing or providing "targeted" advertising - or prices - that is unique for most users.

Any discussion about data from an unknown server should always include some sort of checksum. Without verification everyone is receiving the same data, statements about a server's responses don't mean much.


Couldn't any site be sending different results based on any header? I guess I don't get how "they could easily be returning different results based on other HTTP headers, IP headers, etc" doesn't apply to literally every site


As others have pointed out, the same thing can be accomplished using a http redirect. The only purpose this kind of intermediate page has is to hide the HTTP Referer field and make it look like it was coming from t.co. This ensures that only Twitter knows which tweet someone was coming from.


Of course for that there is also a standard-compliant way: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Re...


That doesn't work for a part of Twitter audience, among Edge users.


A normal HTTP redirect would accomplish the same thing.


if you enable '-v' you can see they set a cookie:

set-cookie: muc=4673c8f0-5aef-45eb-8e4b-ab06bc59944c; Expires=Wed, 14 Oct 2020 10:10:19 GMT; Domain=t.co


location.replace removes the back button interaction, i.e. history.

I have always preferred to use it (location.replace) within the same site. Also it allows to better control browser cache policies.

Although it has been almost a decade since, I doubt much has changed.


> location.replace removes the back button interaction, i.e. history.

A 301/302 redirect works just fine for this.


Yup. The RFC is literally full of phrases like "status code indicates that the target resource has been assigned a new permanent URI and any future references to this resource ought to use one of the enclosed URIs. Clients with link-editing capabilities ought to automatically re-link references to the effective request URI to one or more of the new references sent by the server, where possible." (emphasis mine).


What bothers me as someone who works with web standards is that these URL tracking services should have been fundamentally rejected as they're only used for tracking and completely unnecessary.

Additionally, they're not actually part of web technology due to Twitter's ToS...

I run a web crawling company (http://www.datastreamer.io/) and we license data to other companies based on what we crawl.

This really opens up some weird situations for us...

If a URL is copied and shared OUTSIDE of Twitter but behind a t.co URL you can't access it without agreeing to their ToS even though the link might be to the nytimes or some other service.

I was initially upset about the GDPR but I'm starting to see the light of day here.

You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.

It's like standing on the street corner and yelling and then saying everyone around you owes you royalties because they're hearing your copyrighted speech.


They might be unnecessary in this case but not always. For example I used to work with elearning materials which did outbound linking to other materials, which might change, or be on services not under our control. Being able to manage the link endpoint without having to republish the materials is a big win for time/effort and sometimes is just not possible.


As an end-user of similar types of materials: no, I will emphatically state that using a link shortener makes the material worse. If you don't update the link and it points to a broken page, at least the URL normally has enough information for me to Google the underlying material. A shortened link loses all of that context.


> You can't have your cake and eat it too. You can't both be on the Internet but then put up an insane ToS claiming you have rights that restrain Internet users.

Can you explain that further because you pretty much to have to have a ToS if you don't want to get sued to death for any moderately sized website? The WWW is not a complete free for all.


I just wish these awful link shortener/trackers were faster, on lower end network connections you have to sit there and stare at the waiting for t.co for two or three seconds before you actually get the link you want.


In my experience it's the target that takes the most time to load; the shortener itself is usually quick.


I believe that some of the weirder redirect methods are aimed at preventing the browser from forwarding “Referrer” headers to the destination site.


That can be achieved with a much simpler <a rel="noreferrer" ...>


Not for IE 11 for Windows versions below 10, or IE versions below 11 for all OS versions. Source: https://caniuse.com/#feat=rel-noreferrer


This works with JS disabled:

https://caniuse.com/rel-noreferrer


I think the http-equiv="Refresh" redirect is done so that the http referer header is from t.co, and not twitter.com (or whereever the user clicked the link from).

(I don't think rel='noreferrer' is fully supported by all browsers)


I don't understand...

I used wget to get a t.co and original link (from the sibling comment) and diff showed no differences in the fetched pages.

--edit--

So HN is not a discussion site then?


They detect curl (and wget) and serve up a "real" redirect. You need to spoof a real browser user agent, like I did in my comment above.


I remember reading a HN thread a few years ago where this was suggested as the cheapest way to create short links. Instead of running a server with routes on it, you just need to generate 1 static page with this meta tag per each link, then it's always there. Could it be that Twitter folks were simply trying to be efficient?


Um, they have to run a lookup to find the 'value' for the given 'key' regardless...? I cannot think of any positive value for the user here -- it's non-standard & slower. 3XX redirects have been around a long time and basically every single client out there knows how to use them, and those that don't can look the status codes up to see how they should handle them if they want to.

AFAICT this is purely to allow for the 'pseudo injection' of the third-party JS, presumably for tracking purposes...

Only question I'd have is why they can't read the cookie server-side instead, but I'm guessing there are cookies on other domains that their JS is looking for? Haven't done web stuff in a few years so I'm behind on CORS-ish pros/cons/knowledge.


I'm guessing there are cookies on other domains that their JS is looking for?

Nah, the browser doesn't let you do that. This SO answer suggests it's to pass the Referrer header so that the destination site knows the user came from Twitter:

https://softwareengineering.stackexchange.com/a/343667


> Could it be that Twitter folks were simply trying to be efficient?

That wouldn't explain the User-Agent sniffing (curl gets a proper HTTP redirect).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: