From some quick googling, it seems to me that having a ~ in the url is not a standard, rather it is only a coincidence that most of the time it is used in cases where the content is user uploaded (e.g. the user home directory on a UNIX machine).
It's not a coincidence, the reason that universities and other content providers follow this pattern is specifically to signal that the company/institution doesn't vouch for whatever is being hosted.
Interesting. This makes sense from a technical perspective.
Nevertheless, it's clearly associated with UGC content, and as far as I know there have never been any major sites that have hosted non-UGC content using this scheme. (E.g. there is no history of use for things like Amazon product pages or whatever.)
And in school we were always taught that content coming from user pages on university systems shouldn't be cited as if it were academic content being published or endorsed by the university. I'm sure others were taught the same.
One of the biggest reasons why GitHub (for example) uses a separate subdomain is so that a persistent XSS exploit on their UGC domain cannot access HTTPOnly cookies or other information from their real domain. Impersonation is a bigger problem for a government, but it's also a security measure against how broken the current security model for things like cookies is.
I get why subdomains can be useful for preventing XSS and CORS issues, especially for non API-driven sites. The problem is that subdomains are used for all sorts of things, so just because something is on a subdomain doesn't signal that it's UGC content. Even if you hosted each person's content on its own subdomain, it would still be useful to have a standard way to signal that this content wasn't created by the organization who owns the domain.
The benefit of the tilde is that, at least as far as I know, it has never been used for anything other than signaling that something is UGC content. (Even if that was a technological accident and not its original intent.)
> it would still be useful to have a standard way to signal that this content wasn't created by the organization who owns the domain.
Do you think the assumption that the content was created by the organization who owns the domain should be default? Wouldn't it be better for the organization to provide a signature for the content it did create?
> Do you think the assumption that the content was created by the organization who owns the domain should be default?
Sure, and I think that is the current assumption. What's missing is a way to specify UGC content that wasn't created by the organization.
> Wouldn't it be better for the organization to provide a signature for the content it did create?
I don't think this would be viable for two reasons:
- It would require people to do the work to opt in, without any obvious incentive for doing so.
- No obvious way to different UGC content from javascript dependencies, fonts, ad trackers, etc.
Whereas there are good use cases for allowing folks to mark content as being UGC. For example, let's say the game Draw Something wanted to let users upload their creations. So no security issues, since images are created through their own app, but they don't necessarily want everyone thinking that they're spending all day creating and uploading millions of dick drawings either.
So the original question was how to signal that files uploaded by users to the FCC were not created by the FCC. Some people suggested entirely different domains, but that's hacky and doesn't really solve the issue. And maybe the content should be hosted on a subdomain for security reasons, but that still doesn't solve the issue of signaling that it wasn't created by the FCC.
I suggested just adding a ~ to the domain name, because when you see a domain name like this:
It's universally recognized that the content on that page was not created by columbia.edu or cs.columbia.edu in an official capacity.
Other people said we can't do this because it's not an official standard, so I said let's just make it a standard. Which I think is good because it keeps an important piece of Internet culture alive by codifying it, which would let people rely on it when designing new systems. And ultimately it should work because there is no history of this URL pattern being used for non-UGC content.
They use it because universities traditionally exposed per-user web pages through the Unix shortcut path to their home account directory. The tilde has no other significance and doesn't make much sense outside of a Unix system.