Hacker News new | past | comments | ask | show | jobs | submit login

Converting to and from base 36 (or 32) would probably do more to help the problem than any heuristics. Compare:

  66c22ba6fbe0724ecce3d82611ff0ec5c2b0255f
to:

  c04bo5604v5qsp6asgasjp9y4paxu8v
That's approx a 25% gain in compactness.




Interesting, I was hoping for an example at the end of the readme


(I'm the author.) That is an excellent suggestion. Now that you've mentioned it, it seems like a glaring omission. I'll try to fix that up once I get home.

In case the other parts of the README weren't clear, the concept was to use any Unicode character. I was even thinking of (eventually) getting it to encode data with combining accents. Note that it was intended to optimize the string for screen display space (pixels), not space.

I'm not sure it'd be a good fit for git hashes, simply b/c sometimes you need to type or speak a git hash, and the output from baseunicode was definitely not intended to be pronounceable. (Esp. since I was thinking of using CJK characters, but trying to weight them down for their wider screen area; but imaging trying to describe that to a co-worker who might only speak English.)

I wrote it mostly for fun, after I had a couple of difficult to transfer files between machines in the cloud. I find myself ssh'd into weird places, and scp'ing is sometimes trying. (I do machine-to-machine, so I almost always need -3, which I don't know why that isn't the default; scp doesn't deal well with the file being only accessible by root, not your user; scp has the weirdest arg syntax if you have ill-advised characters in your filenames, like spaces…) So I was cat'ing files, copying them from one window, and pasting into another window. base64 for binary data, tar/gzip for making it smaller. But for the copy/paste, scrolling is a pain, and heaven forbid if you're in screen/tmux.

(Also, if you find yourself really without a file that you can't scp, you can "re-implement" scp with `ssh $hosta sudo tar -cz <stuff> | ssh $hostb sudo tar -xz`; see also the -C flag, and don't forget you can also `ssh $host "sudo bash -c 'cd /where && tar -cz <stuff>'"`)


That's an interesting thought, and I don't feel there's any advantage in the text being in hex.

The only problem would by now probably too much code expects hex, so I'm not sure the gain is big enough to go through the pain of the switch.


A nice side effect of hex is that you can pick it out of a text commit message with higher accuracy (e.g., to turn it into a hyperlink). Tools like `gitk` and sites like GitHub do this using a regex.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: