Hacker News new | past | comments | ask | show | jobs | submit login
PaperBack – How to store data on a single A4/Letter sheet (ollydbg.de)
255 points by jguegant on Sept 20, 2015 | hide | past | favorite | 114 comments




Thanks. I was searching for it on the description page, but then rightly thought someone would surely post it on HN :).


I wonder, why so many alignments (or, why such small blocks)? Is this the most efficient size?


Probably to give the scanning software a way to re-sync and correct for skew, etc.


With HCCB (High Capacity Color Barcode) of which an implementation is Microsoft Tag you can get 1.5 MB with 600dpi color on a A4. If you add the best compression techniques (cmix v7) depending on the source material you could get 8 times as much english text, easily fitting the pearls or religion: Tenach, New Testament, Quran, Mahabarata, The Dhamapada, the Atthakavagga and the Parayanavagga, Rhinoceros and Lotus Sutra, The Book of the Dead and the Sri Guru Granth Sahib and explanatory texts and commentary on one single A4.

Or you could get 4 minutes of stereo music, Opus@40Kbps or half an hour of speech Opus@6Kbps - (Vinyl LP seems to be more efficient and simpler for that).

Or the best 25 classical pieces in MIDI, for example Beethoven Symphony #5 in C minor is a 322 KB midi file (compressed 60 KB). With compressed ABC or Alda notation you could even fit 10 times as much. Or you can just read the ABC and automatically 'hear' the music in your head (is there a similar notation for human movement or dance i wonder...).

Even so perhaps a better and more cost effective system is microfilm, here an example of the bible printed on a 2 inch film: http://www.amazon.com/Microform-Bible-Much-Like-First/dp/B00... - you don't need a computer, just a lensing system. It has one downside, the blind cannot use this unless you'd make an OCR to TTS or braille display converter on top of it.

Or as described in the book Mindhacker you could use Dutton Speedwords/Briefscript word abbreviations preferrable tailored to Esperanto to double the text on the microfilm or on your handwritten A4. https://github.com/leonardoce/bumot


  > is there a similar notation for
  > human movement or dance ...
And for dance there's "Dance Notation"[0], most specifically Labanotation[1], but there are others, such as Benesh notation[2].

For juggling[3] there is SiteSwap[4] for vanilla flavored juggling, and now it's been extended in different directions, perhaps most completely by BeatMap[5].

[0] https://en.wikipedia.org/wiki/Dance_notation

[1] http://dancenotation.org/lnbasics/frame0.html

[2] https://en.wikipedia.org/wiki/Benesh_Movement_Notation

[3] https://en.wikipedia.org/wiki/Juggling_notation

[4] https://en.wikipedia.org/wiki/Siteswap

[5] http://juggle.wikia.com/wiki/Beatmap


I might like to save my most important source code as microfilm. So it would be future-proof. No fancy encoding - just text.


I imagine someone in a post apocalyptic world carrying around a single sheet like this without anyway to replay it but taking solace in knowing it exists somewhere in there.


Ha, yes i don't think an android phone or e-reader will last very long in a post-apocalyptic world. For a surprisingly slight trade-off in capacity one could opt to carry around a A4 sized microfiche (or 4 stacked microfiches) with 500 pages on it with 10K ascii a page (multi columned) which results in 5 MB uncompressed text. You can add a braille grade 2 or briefscript type compression (which is readable) and get 1-2 extra MB of data, contrast that to all the trouble of getting the 8 MB of data on the same 600dpi surface through multi-colored 2D codes and compression. In this case you just need a good portable magnifying lense. It uses even less power then an e-reader. http://www.rootschat.com/forum/index.php?topic=58324.0


I commend your enumeration of religious and mystic texts.


Yes I was also impressed. But in making the point, the poster surely went overboard - Mahabarata (I just read about it on Wikipedia) is the "longest poem ever written", ten times as long as the combined Iliad and Odyssey, and at 1.8 million words that is going to be hard to fit into 1.5 megabytes even by itself - unless it is massively redundant I don't see how you can do it. The entropy per word has to be about ~1 byte which is cutting it very, very close. Add in all of the other mentioned texts and commentary and I don't buy it.

The mentioned compression (cmix v7) surely won't fit it all http://www.byronknoll.com/cmix.html


Yes, I must confess it was an educated guess and I was not precise enough. The Mahabhrata is huge so I use an abridged translation (the one by Ramesh Menon). The complete translated Mahabhrata (by Bibek Debroy) is 12 Mbyte uncompressed. Still it's a challenge to get my pearls of wisdom collection down to 1.5 Mbyte (with either an abridged Mahabharata or the Bhagavad Gita section) even without the commentaries as the combined english plain text files are 1.7 Mbyte compressed - 200 Kbyte to much - (from 11 Mbyte uncompressed ascii with the Mahabharata epos substituted by the Bhagavad Gita). So I went a bit overboard, but I still think it's possible to get there, some redundant lines, tabs, introductions and spaces can still be removed. Perhaps I can remove and include certain texts to conform more to what I personally consider the religious highlights of humanity. Removing the ceremonial descriptions, Revelation, chronicles and battles and including The Tao te Ching, The Upanishads, The Gospel of Thomas or The Veda's or make a more contemporary selection including the dark night of the soul, imitation of christ, Rumi and certain hadiths on the life of Mohammed. 10 Mbytes of ascii should be enough for every spiritual adept (and fit 1 A4 of multicolor matrix barcodes), it for sure fits on a 2M formatted floppy.


And it would also require unicode, so it gets even more harder since you have to go out of ASCII range.


Yes, it would require unicode if I'd include the native language versions. But in this example I meant the english versions I have for personal use - so it's pure ascii markdown. Except for the greek I don't understand arabic, hebrew, sanskrit, pali or the gurmuki script.


Regarding notation for human movement, there's Siteswap (https://en.wikipedia.org/wiki/Siteswap) for juggling :)


This brings back memories.

In the late eighties, I wrote a program similar to this one for archiving one Apple II floppy disk on a sheet of paper.

One floppy disk is 140 kilobytes (35 tracks of 16 sectors containing 256 bytes each), so a sheet of paper with a 7-inch by 10-inch printed area needs a data density of just 2 kilobytes per square inch to fit a floppy disk, which comes out to a linear resolution of 128 dots per inch. This was (barely) within the specs of my dot matrix printer.

I was a teenager at the time so I had no clue about error correcting codes and the like, it was a straight bit-to-pixel mapping.

At the time, this was a write-only medium, as I did not have access to a scanner. No matter, I used it to archive my most important data disks, figuring that after a few years, technology would have improved enough to allow easy scanning. I eventually lost the sheets of paper, but the floppy disks themselves were robust enough to last until 1994, at which time I wrote a program to send them to my PC through a null-modem connection.


Reminds me of this https://en.wikipedia.org/wiki/Cauzin_Softstrip.

Magazines would print out their source code and you could scan it in. Of course I had no money to buy such an expensive machine so was relegated to mind numbing typing and the inevitable typos that came with it.



So I'd like this on medium that's a bit more permanent than paper, but with similar information density per kg, and similar re-creatability without having any of the original tools available. I'd like to store data for say 100 thousand years.


When I was in college I worked in a lab making tiny circuits used for testing a nanorotor they were working on.

First we would put a small piece of glass in a sputter deposition machine, which would deposit a thin film of gold on the surface. Then in a clean room, the gold film would be covered by a photoresistive chemical and placed in a machine similar to an enlarger in a darkroom, except it would shrink the image instead of enlarging. An A4 sized transparency containing the circuit design would be placed in the machine, which was optically shrunk to the (~2cm x 2cm) size of the glass and exposed with UV light. Finally, it was processed with chemicals to etch away the exposed gold, leaving the gold microcircuit traces.

I'm certain you could use this process to encode data with a scheme like PaperBack, and if you could find a substrate lighter than glass that both resisted delamination and could survive the etching chemical bath, the gold film could probably last for 100k+ years.


Etching the data into silicon wafers has been done before:

https://en.wikipedia.org/wiki/Teeny_Ted_from_Turnip_Town

It does require quite a bit more equipment to read back, however.


One nice thing about the gold-on-glass idea is that it can be read with a simple slide projector or similar. It's small but not ridiculously so.


Gold itself may not be a very good medium for storage, because in the meantime (e.g. some intermittent dark ages) someone may decide that the gold is worth more than the bits they encode, and scrape them off.


The Church of Scientology is transcribing L. Ron Hubbard's writing onto stainless steel plates, encased in titanium capsules and buried in the desert[0]. So in the event of a global apocalypse any future survivors or aliens may think that Dianetics was the pinnacle of our culture!

[0] https://en.wikipedia.org/wiki/Trementina_Base


You'll want something that an unsophisticated civilization can make copies of. Might be interesting to have a stylized form of a modern language with layout that incorporates simple integrity checks. The monks who copied documents in the middle ages made a lot of errors.


The Jewish scribes who copied the Masoretic text of the Hebrew bible did use a system of integrity checks, in a traditional apparatus called the 'Masorah parva'. It wasn't perfect, and didn't have checks throughout, but as a computer scientist, I was fascinated to see how they'd dipped their toes into the idea.

I wonder if there are other texts that were copied by hand that had similar approaches. The Qur'an, perhaps?


You could laser burn it onto the rubber material stamps are made of. That would allow for easily making copies on paper.

Not sure whether "stamp" is the right word. I'm talking about this: http://www.stempel-wolf.de/de/img/stempel/herstellung/gravur...


Clay tablets? They survived from Sumer, Assyria and Babylon thousands of years ago.


True, some of them did. Any idea what fraction survived?

I quite like the idea of being able to frame the backup an important piece of text and put it on my wall.


A significant number. The British Museum alone has 30,943 tablets. The study of Mesopotamia is relatively new and as far as I know not all of the tablets were translated.


Archiving data to 35mm motion picture film has been done, including putting the source code to the decoding software on the leader of the film.



This is both a great summary pitch and intriguing storage system!

I found it interesting. Thanks.


I'm not sure how long it would last but how about something like a bubblegram [1] (laser crystal)... You could use a flatter plastic block and create a 2D image inside. Also, maybe other materials can be used.

1. https://en.wikipedia.org/wiki/Bubblegram


Concrete? Perhaps a 3D printer could make a mold from a data format similar to this one, even one in 3D..

http://www.instructables.com/id/How-to-3D-Print-Molds-for-Sm...


Plastic may already be sufficiently inert. Maybe compose/print sheets of plastic using two kinds of plastics, which will always have different colors.


How would you read it back?


* https://en.wikipedia.org/wiki/Optical_braille_recognition

* http://www.cs.huji.ac.il/~springer/DigitalNeedle/index.html

* 3D Scanning At Home! (Using an xbox Kinect) : https://www.youtube.com/watch?v=_cKb3oEM47E

... If a picture is worth a thousand words, how much could we encode in a statue?


Tyvek, perhaps? I'm not sure how long-term it would stick around, but it wouldn't decay as fast as paper in an unprotected environment.


Not 100,000 years, but if you're looking for something that will last longer than paper, I recommend... paper. Rag paper. 100% rag, acid free copier/laser paper should be available at most office supply stores.

I'd be more worried about toner de-lamination than paper deterioration in that case.


Paper's pretty good. Not sure it's 100K years good, but you're definitely better with paper than almost anything else in the modern world. People underestimate paper.

It's vulnerable to moisture, mold and insects, though.

If you can solve problems with breakage, very little is going to compete with glass or ceramic. So, something it's hard to apply breaking force to: Maybe etch glass balls, add some impurities for visibility, then re-coat with more glass for protection. A handful of beads will survive nearly anything.


In a glass bead you could set up with the data and like a lens, i.e. the data is small and when you shine light through it it projects the information at a wall. On the other hand spheres may have the problem that they roll away; and you always need another medium (like a box) to keep them together.


Tyvek is basically HDPE plastic, and relative to paper it has much poorer temperature resistance:

https://en.wikipedia.org/wiki/High-density_polyethylene


An IBM researcher did this with a Tyvek-like material in the 90s, a polyester made by DuPond if I remember correctly. Just tried to google a reference but no luck.


You beat me to it. Tyvek would be an excellent candidate for its amazing durability and that it behaves much like paper (at least in terms of texture).



A company called Norsam (norsam.com) can engrave text on a 5" nickel or gold disk using ion deposition. They get 20,000 pages per disk. Nickel will not corrode. All any future civilization would need to read it back is a microscope and OCR.


DNA? You didn't mention price :)


See the front page right now [1]. Price of DNA is currently dropping!

[1] https://news.ycombinator.com/item?id=10246514


>a) The key used for (en|de)cryption in version 1.00 provides at most an effective key strength of less than 50 bits (and likely far less, perhaps on the order of 15-25 bits, depending on password quality) instead of the expected 256 bits. Version 1.10 derives the encryption key from the password via key stretching, significantly improving key strength. This change causes a small delay in the encryption step.

>b) PaperBack version 1.0 implements ECB mode symmetric encryption. This mode is subject to a watermark attack and leaks information about the encrypted data. Version 1.00 changes the encryption mode to CBC, which mitigates this attack.

These are classic newb mistakes, and great reasons to not trust any encryption on any product unless it has been vetted by others.



Note, Paperback has over twice the density of Optar.

Not that I'm complaining. I bookmarked both because I think they're nifty, despite "media abuse" being the best tag I could think of.


Note: it's extremely easy to increase the density of optar (by editing two constants in the source). optar is configured for 200dpi by default (based on real-world testing).


The name of the project is unfortunate, as it makes it very difficult to search the web for related projects such as Linux or OS X ports.


Many projects fall in this trap. I use Awesome as my window manager, and the Awesome theming library is called Beautiful. Now try googling "awesome beautiful wallpaper"...


UPD: http://i.imgur.com/TVLZgqw.png

I don't know, how google did it, but the first result for exactly the query of yours is: http://awesome.naquadah.org/wiki/Beautiful

What surprises me, is that it works in firefox private mode, where neither of search customizations should work, in theory (fresh clean cookies and local storage, isn't it?). However, yeah, I do indeed google awesome-related stuff from time to time.

UPD #2: it works from different browser and different IP either. So it seems that you example is not relevant anymore :)


If you search for, and click on, a lot of tech/linux related stuff, Google likely bubbled you such that those kinds of results come out on top.


Yeah, I think Awesome WM has reached enough users that the specific phrase is now recognized by google. But I remember this was a pain about five years ago.


Or the file manager in Fedora being called "Files".


Its real name is nautilus


Which is not present in the icon, the menu bar, the about dialog or the help file.

http://imgur.com/v4jH7Yf

At least they've managed to be consistent with the rename. Compare this to Packages, which is called "packages", or "software", or "software install" or "packagekit".

http://imgur.com/dFidCWL


Gnome do that with a crap load of their software, i guess they think it is "user friendly".

For example the binary for their file archive program is file-roller, but the entry in XDG compatible menus are Archive Manager.

Frankly i find that Freedesktop has been another example of "the road to hell is paved with good intentions". More and more Freedesktop projects reminds me of the kind of Windows stuff i moved to Linux to get away from.


'Files' is a much better name than 'Nautilus', IMO.

If you've ever spent some time teaching Linux to a non computer literate person, you come to appreciate sensible does-what-it-says GUI program names and the non-obvious names start to stick out.

And a special curse for all those people who decide to name their programs in part by the language or toolkit that it uses. What on earth are you thinking? Users don't care about your little details, you are just making things awkward for them.


But this sub-thread is about searching for help.

How is a new user supposed to search for help with Files? Including the distro name doesn't help. And if the rename is complete then no-one should be calling it nautilus in the forums either, so knowing that it used to be called something else doesn't help.

Call it something like Nautilus File Manager or Gnome File Manager or Fedora File Manager.

But "files" on its own is a useless name.


I converted the code to compile with Visual Studio and ran it. PaperBack was able to input a text file, create a "BMP", and read it back in. The destination file was the same as the source.

However the BMP files that PaperBack generates are not recognized and cannot be read by other programs. PaperBack can't read BMP files created by other applications.

I didn't try scanning the printed page directly with PaperBack. I used a separate computer to scan the page and create a BMP.


What does the output look like? There's no example of it on that page...


TRivia: If the printer can also output white ink, and the the encoding algorithm knows any dirt on the paper, then the capacity of the paper is the same even if the paper is dirty. http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=105665...


This inspired me prototype out a system which would use QR codes and use a smartphone camera for recovery, rather than a flatbed scanner.

Here's a demo PNG file which contains 48 QR codes (6x8) at V23 level with L correction (~7% recovery). That's 1091 bytes per code, or just over 51KB per page when printed at 300 dpi. This level of density seems to be very reliable using ambient lighting and an iPhone 5c.

I was also able to read a QR code which was 66% of that size, but it required turning on the iPhone's flash in order to be reliably read. That level of density would allow fitting 9x12 codes on a single page at 300 dpi, which would yield 115KB of data per page.

(At first I tried using 600 dpi, but at 1 dot per pixel the codes were totally unreadable by a cell phone. The above 115KB/page density is the equivalent of 4 dots per pixel at 600 dpi. The good news is this means the system doesn't require a 600 dpi printer. In fact, at 1 dot per pixel, the 51KB/page density is equivalent to 100dpi, and the 115KB/page density is equivalent to 150dpi).


Oops, forgot to link to the 51KB/page PNG file: https://raw.githubusercontent.com/cellularmitosis/QR_playgro...


This is a revolution for transporting confidential information through customs :) If it's not electronic they won't even bother checking it!


No, sorry, it's not. That's already trivial: Take a microSD card with your info. Put it together with a couple of tablespoons of oatmeal into a tightly and well wrapped plastic package. Swallow it. Travel. Arrive. Poop. Voila.

Also, if you bring big stacks of paper it will often be inspected, since a common way of smuggling currency is putting hundred-dollar bills between the pages of books.


I guess it's better to upload TrueCrypt container somewhere in the cloud and transfer just password/URL. Password/URL may be derived from the first letters of each sentence in the U.S. Constitution or other similar public source of static random data.


Or just remember password and URL. It's not that hard if you really need to.


Why not public key cryptography?


Because you can't carry a key.


All the keys that are exchanged can be exchanged in plain text.


I think you (unintentionally?) are proving the OP right. Because it's far easier to swallow and poop out a plastic bag with a microSD card in it than it is to put a piece of paper in your carry-on.</sarcasm>

Plus--wouldn't full-body scanners catch something like this?


Airport/security full body scanners would not catch something like this. They only image through clothing for the most part. They only image through the top layers of skin. See figure 4 in the linked to document:

http://www.tek84.com/downloads/Holt-Letter2010-12-2.pdf


It depends on the amount of information you want to transfer, and how likely it is that you'll get a full customs inspection. A single microSD card can hold 512 GB of data; try bringing that in paper format through customs.


Obviously this technology is not for bringing Terabytes of information with you. But you can put an awful lot of information in .txt format and something like 3 megabytes.


In a slightly similar vein, there was a device that stored data on VHS cassettes as black and white blobs - https://web.archive.org/web/19970110151745/http://www.danmer...


It could up to 3 gigabytes per VHS tape! Wow!

Someone else in the thread posted a link to screenshots of a Linux port: https://news.ycombinator.com/item?id=10246531


Couldn't you store more data in a color image? Just curious.


Probably, Microsoft did something like that (and failed to gain popularity) http://tag.microsoft.com/what-is-tag/home.aspx


I seem to recall it was used to print some HDCP keys on t-shirts.


There are likely many ways to increase the data capacity of a piece of paper.

The information capacity of the paper would depend highly on the printer, the environment the paper had to survive, and the camera.


Yes, around 3 times as much if the table on this Quora post is correct https://www.quora.com/Which-2D-barcode-format-has-the-highes...


Except most people's color printers are inkjet instead of laser, so the DPI penalty more than compensates for the 3x gain with color.


Colour ink has the tendency to fade with time.


This reminds me of the Danmere Backer: http://linbacker.sourceforge.net/screenshots.html


It would be nice to see a detailed comparison regarding alphanumeric efficiency, licenses and mobile readers, error correction, no error correction between the options below as Voiceye claims uncompressed 3M per A4, which is twice the efficiency of Paperback or QR codes. The mobile app is usable and supports midi encoding as well, but to make Voiceye codes you have to buy 500$ software. Are they using a form of compression?

* QR version 40 (177x177) tiled

* Voiceye

* Paperback without compression

* Grade 1 braille

* Grade 2 braille

* Morse code

* Optar

Braille would be suitable for people who are deaf and blind as well.


Has anyone actually used this? I'm intrigued by the idea and wonder if this is a viable backup option; and if so, are there more modern, maintained, and cross-platform open source solutions that do this?


tar and split combined with a QR encoder would solve the problem just fine.


I'm pretty sure systemd has this feature in there somewhere


To back up my drive took 2216 meters of paper, writing front and back.


Just don't use an inkjet (at least, with the normal cheap ink), any excess humidity will make the ink run, ruining your archive.


there's also http://www.jabberwocky.com/software/paperkey/, for backing up private gpg keys on paper (text only).


Has anybody encoded midi with voiceye or know how to do it?


this has some pretty powerful applications for example for politics or mass dissemination of paper based digital files for the public to quickly consume.

Imagine, if students could start posting free textbook pdf files just by handing out a piece of paper which you could scan, or maybe with a phone one day, and instantly receive the file.

Why not QR code? QR code needs connectivity while this solution would just require you to get your hands on the paper.


You most certainly can use QR codes to store data, using Version 40 QR codes. I'm sure PaperBack and QR codes have much in common. Also, QR codes let you adjust the level of error-correction that you want.

Even if a QR code has a maximum size, you could use more than one per page.

https://en.wikipedia.org/wiki/QR_code#Design

I'm sure there are other 2D barcodes that could be used for this application. E.g., Data Matrix (also in the public domain).

https://en.wikipedia.org/wiki/Data_Matrix


Maybe I'm just slow tonight but I don't see the usefulness in the applications you're describing.

>for politics or mass dissemination of paper based digital files for the public to quickly consume

> Imagine, if students could start posting free textbook pdf files just by handing out a piece of paper which you could scan, or maybe with a phone one day, and instantly receive the file.

With both of those, how is this more efficient or easier to pass around than a cheap micro sd card, or just emailing a pdf?


You can't do anonymous mass dissemination with SD cards because of the cost. You can do it with CD-ROMs however. In China, people actually do this and sometimes you can find a CDROM with dissident political messages lying on the ground.

It's also common to print anti-government messages on the money, which might be a great application of paperback/QR - higher density public messages via circulating currency.


I wouldn't trust a CDROM lying on the ground. How could you know it is not a malware ?


Just don't run autoplay.exe


Standing outside with a stack of paper, being able to give them to passers-by, staying anonymous and free of network surveillance.


Always doubt your anonymity. Many color laser printers output hidden, self identifying serial codes.


I did not know this, do you have a source?



Fascinating, thanks for the links!


Digitally anonymous and free of network surveillance, sure. Physically anonymous and free of a cop simply walking up and arresting you, not so much...


Definitely not free of digital surveillance

http://www.pcworld.com/article/118664/article.html


QR code doesn't actually need connectivity, but its standard capacity is limited, so it's usually only used for small data like URIs and serial numbers. In principle, though, there's nothing stopping anyone from encoding a PDF as a big array of QR codes.


On that note, i seem to recall that certain encryption algorithms were transported out of USA by way of book back in the day.

Back then (and i think the law is still in place) anything beyond something like a 90-bit key was considered a military weapon, and thus needed relevant export licenses.

But this only applied to working binaries, not the code printed on paper. So what some researchers did was to get the code printed in a book, using a OCR friendly font, and then ship the book to something like Switzerland where it was scanned, compiled, and put online.


Yup, PGP source code, and I think it had checksums on the lines too to avoid errors. Didn't the book with the EFF's DES-cracking project (source included) get published the same way?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: