Fun with fonts. Rendering obfuscated HTML.

imurray · on April 11, 2012

[EDIT: I assume the main point was to make documents readable, but hard to copy. My comment is about breaking this trivial copy protection scheme. I see that other comments read the purpose of the font differently.]

Given a long document, one could easily crack this Caesar substitution cipher. Of course one could also do OCR on the characters to learn the mapping.

Tangentially, these thoughts remind me of a cool Master's thesis [1] where they did basic OCR, by clustering blobs of ink and solving the Caesar substitution cipher, rather than trying to recognize the shapes of the letters! That approach can be used to adapt a real OCR system to the current document.

[1] http://www.cs.toronto.edu/~scottl/research/msc_thesis.pdf

AdleyEskridge · on April 11, 2012

This is a really neat demo! That said, wouldn't this be extraordinarily inaccessible to users employing screen-reading software? They'd "hear" a jumble of characters.

woodall · on April 11, 2012

This would be horrible for any one using screen readers. You could use something like longdesc but then you've ruined your obfusciation. Really not something one should apply in production.

The only saving grace for this might be in internal sites where you want to display information to the user but do not want that user copy/pasting sensitive information into emails/chat/ect.

jgrahamc · on April 11, 2012

Could be made considerably stronger by using a homophonic substitution cipher. Given that there are lots and lots of Unicode characters it would be pretty easy to flatten the distribution to make letter frequency analysis hard.

pbhjpbhj · on April 11, 2012

If you do things as a polyalphabetic substitution then, provided you limited the length of the ciphertext by the available letter spaces in the font file, couldnt you avoid frequency analysis completely? So for example each letter s in the plaintext would be generated from a different ciphertext symbol.

woodall · on April 11, 2012

That was another thing I thought about. There are so many possible characters that you do not have to map them to regular ascii, but I wanted to make the demo easy to understand.

Paul_S · on April 11, 2012

Is this meant for copy protecting text on the website?

If I can see it I can copy it regardless of how fun you make the process.

You'd be better off just releasing it under a license of choice. Technical means of protection are pointless.

pbhjpbhj · on April 11, 2012

How about having a scrambled page and using a separate channel to distribute the font. Yes it's still just a substitution cipher but various tricks could be added to make it more interesting - you could add in steganography for example (eg by a particular font characteristic) or use a sparse font file generated for each para or have the font file as a one time pad (each cyphertext letter is replaced by a word).

woodall · on April 11, 2012

I've been playing with a few ways to abuse the browser. You /might/ be able to hide the font in an image file[1] so that it's harder to see. I have a few more tricks but can't show them at this moment.

[1] http://jsfiddle.net/a2zK5/

MindTwister · on April 11, 2012

This would be decoded in about 5 seconds, since the first thing I'd do would be copying the text into my editor of choice...

woodall · on April 11, 2012

Try to copy and paste the "ciphered" text into a text editor and see what you get. You can take it a step farther and generate font files on the fly to make it even harder to predict the sequence; you can even map the harder to see unicode characters. By no means is it secure, but I thought it cute.

est · on April 11, 2012

> ... would be copying the text into my editor of choice... reply

Or copy into the address bar.

rblatz · on April 11, 2012

The reverse of this is actually the interesting part. Give scrambled HTML to the browser, use a font to convert it to human readable text.

Obviously this wouldn't stop anyone willing to put a bit of effort into decoding it, but I bet lyric sites start using this soon.

Edit: Apparently I was confused like several others, but came to the right conclusion anyways.

benholmen · on April 11, 2012

Why would a lyric site want to obfuscate lyrics in the source code? It seems like that'd be counterproductive since their traffic is largely driven by search engines.

woodall · on April 11, 2012

Ideally you would store the original in an unciphered format and, say, GoogleBot hits your site you just serve it that unciphered one. If it is not a spider you can serve them the jumbled one.

As for lyric sites, they don't A) want people stealing content, B) have strict rules when publishing lyrics.

TazeTSchnitzel · on April 11, 2012

Eh, they could render it differently for Googlebot.

nekitamo · on April 11, 2012

Then they would get hit with a cloaking penalty when the googlebot figures out its being cloaked.

ecesena · on April 11, 2012

Apart from the utility, it's really cool! Look forward to see Vigenère cipher ;)

themstheones · on April 11, 2012

I'm missing the point. Who wouldn't view source / inspect element to see the real text?

mistercow · on April 11, 2012

I think you're thinking of it backwards. The source looks jumbled, but the font that the browser renders the page with is readable because the font acts as a substitution cipher.

Still pretty pointless though, since substitution ciphers are child's play to break.

pbhjpbhj · on April 11, 2012

>Still pretty pointless though, since substitution ciphers are child's play to break. //

It's an interesting result that probably hasn't been considered by most people.

Also, child's play is big business.

mistercow · on April 11, 2012

>It's an interesting result that probably hasn't been considered by most people.

Is it? I mean it's cute to see it actually implemented, but this is something I thought of the first time I read about substitution ciphers. I suppose it is, if nothing else, an interesting way to introduce someone to some basic cryptography concepts (assuming they already know about typefaces).

pbhjpbhj · on April 12, 2012

The first time you saw substitution ciphers you thought "hey how about using webfonts as a one time key"?

I guess because I learnt about such ciphers when our primary school only just got it's first computer, running at 4MHz, the conflation of webfonts and subst. ciphers never struck me before.

Before today I've seen this done with javascript and, TBH, I think that would have been the first method that sprung to mind but I've not really ever bothered to think about it.

Quick search only found one other example of this technique: http://eligrey.com/blog/post/tag/rot13 (rot13 fonts, lol) as well as this http://jsfiddle.net/QQ9WQ/ from the current author. It's hard to search as there is a font called "cipher font", which of course is available as a webfont! Of course there are lots of pages using js, like http://rumkin.com/tools/cipher/substitution.php which does some funky substitutions.

mistercow · on April 12, 2012

Don't be glib now; of course I didn't think "webfonts". I thought "fonts". The addition of "web" to the concept is trivial and uninteresting.

pbhjpbhj · on April 12, 2012

Trivial I'll give you with reservations but I found it interesting as a concept to develop. I think it's got more possibilities than just the simple font, particularly the idea of creating the webfont on the fly.

woodall · on April 11, 2012

Sorry, from the comments it's apperent that this was a horrible demo.

pbhjpbhj · on April 11, 2012

Perhaps you could show the ciphertext page and then have a script operate when the password (or button-click) is entered that alters the style of the ciphertext parts so that they use the key font instead. Extra marks if you use jQuery to fade through several font 'keys' before presenting the plaintext. Also preload the font files.

IMO that'd be cool.