[EDIT: I assume the main point was to make documents readable, but hard to copy. My comment is about breaking this trivial copy protection scheme. I see that other comments read the purpose of the font differently.]
Given a long document, one could easily crack this Caesar substitution cipher. Of course one could also do OCR on the characters to learn the mapping.
Tangentially, these thoughts remind me of a cool Master's thesis [1] where they did basic OCR, by clustering blobs of ink and solving the Caesar substitution cipher, rather than trying to recognize the shapes of the letters! That approach can be used to adapt a real OCR system to the current document.
This is a really neat demo! That said, wouldn't this be extraordinarily inaccessible to users employing screen-reading software? They'd "hear" a jumble of characters.
This would be horrible for any one using screen readers. You could use something like longdesc but then you've ruined your obfusciation. Really not something one should apply in production.
The only saving grace for this might be in internal sites where you want to display information to the user but do not want that user copy/pasting sensitive information into emails/chat/ect.
Could be made considerably stronger by using a homophonic substitution cipher. Given that there are lots and lots of Unicode characters it would be pretty easy to flatten the distribution to make letter frequency analysis hard.
If you do things as a polyalphabetic substitution then, provided you limited the length of the ciphertext by the available letter spaces in the font file, couldnt you avoid frequency analysis completely? So for example each letter s in the plaintext would be generated from a different ciphertext symbol.
That was another thing I thought about. There are so many possible characters that you do not have to map them to regular ascii, but I wanted to make the demo easy to understand.
How about having a scrambled page and using a separate channel to distribute the font. Yes it's still just a substitution cipher but various tricks could be added to make it more interesting - you could add in steganography for example (eg by a particular font characteristic) or use a sparse font file generated for each para or have the font file as a one time pad (each cyphertext letter is replaced by a word).
I've been playing with a few ways to abuse the browser. You /might/ be able to hide the font in an image file[1] so that it's harder to see. I have a few more tricks but can't show them at this moment.
Try to copy and paste the "ciphered" text into a text editor and see what you get. You can take it a step farther and generate font files on the fly to make it even harder to predict the sequence; you can even map the harder to see unicode characters. By no means is it secure, but I thought it cute.
Why would a lyric site want to obfuscate lyrics in the source code? It seems like that'd be counterproductive since their traffic is largely driven by search engines.
Ideally you would store the original in an unciphered format and, say, GoogleBot hits your site you just serve it that unciphered one. If it is not a spider you can serve them the jumbled one.
As for lyric sites, they don't A) want people stealing content, B) have strict rules when publishing lyrics.
I think you're thinking of it backwards. The source looks jumbled, but the font that the browser renders the page with is readable because the font acts as a substitution cipher.
Still pretty pointless though, since substitution ciphers are child's play to break.
>It's an interesting result that probably hasn't been considered by most people.
Is it? I mean it's cute to see it actually implemented, but this is something I thought of the first time I read about substitution ciphers. I suppose it is, if nothing else, an interesting way to introduce someone to some basic cryptography concepts (assuming they already know about typefaces).
The first time you saw substitution ciphers you thought "hey how about using webfonts as a one time key"?
I guess because I learnt about such ciphers when our primary school only just got it's first computer, running at 4MHz, the conflation of webfonts and subst. ciphers never struck me before.
Before today I've seen this done with javascript and, TBH, I think that would have been the first method that sprung to mind but I've not really ever bothered to think about it.
Trivial I'll give you with reservations but I found it interesting as a concept to develop. I think it's got more possibilities than just the simple font, particularly the idea of creating the webfont on the fly.
Perhaps you could show the ciphertext page and then have a script operate when the password (or button-click) is entered that alters the style of the ciphertext parts so that they use the key font instead. Extra marks if you use jQuery to fade through several font 'keys' before presenting the plaintext. Also preload the font files.
Given a long document, one could easily crack this Caesar substitution cipher. Of course one could also do OCR on the characters to learn the mapping.
Tangentially, these thoughts remind me of a cool Master's thesis [1] where they did basic OCR, by clustering blobs of ink and solving the Caesar substitution cipher, rather than trying to recognize the shapes of the letters! That approach can be used to adapt a real OCR system to the current document.
[1] http://www.cs.toronto.edu/~scottl/research/msc_thesis.pdf