Git is as revolutionary as Unix pipes (2008)

tzury · on March 14, 2011

This is off topic a bit, but it tells my story about building something on top of git.

Just several weeks before dropbox came out into light I have completed building a prototype of what came out to be a dropbox clone on top of git (in fact, I have had it working with mercurial and bazzar as well, I designed it to be platform independent) and thought I have in my bare hands a potential for a great startup (demos were working smoothly, auto syncing files between clients, web viewer, etc). It was a side project, which I used to work on late night and weekends, and I gained what I evaluate as great results with no much effort.

Yet, one morning, I was opening my browser, pointing, as usual, to HN and saw a post saying dropbox have raised this and that money from Sequoya Capital. I was eager to know what is this dropbox and what they do and was terribly shocked to find out they were actually doing the same shit I was in, but, for longer time, with more and smarter people and fundings.

Soon after, I dropped my project (BTW, it was named StoreAge), and yet could not use dropbox or hear anything about it for a very long time.

Today I am a happy dropbox user, and git user and waking up every morning wondering about what will be my next startup.

andywhite37 · on March 14, 2011

I think Git owes a ton of its success to github.com. Without this extremely well-designed, central repository for repositories, the uptake of Git would have been much slower, and faced much more resistance in the wild. Git is a great tool on its own, but having a centralized place for people to learn and use Git has been huge.

imr · on March 14, 2011

In addition to github, you have the switch to git of the linux kernel which really put the project over the other distributed revision control systems such as Darcs and Mercurial.

j_baker · on March 14, 2011

On the other hand, one could also attribute github's success to git's success. I suppose it's most likely symbiotic: github contributed to git's success and vice versa.

michaelbuckbee · on March 14, 2011

That's undoubtedly true, but I wonder if the GitHub guys had picked up Mercurial instead if it would have grown as fast.

lurker19 · on March 14, 2011

It git or it github?

ggchappell · on March 14, 2011

There are two kinds of articles about DVCSs: (1) git, hg, bzr are all better than whatever you're using now; (2) git is completely revolutionary, a whole new paradigm.

This is, of course, the latter sort. I'm wondering why hg, etc. are never called "a whole new paradigm". I've used all three. Clearly, hg & bzr are extremely similar, while git works a little differently (the staging area, the differing semantics of "add", etc.). But I don't see that git has significantly more awesomeness than the other two.

Am I missing something?

jmillikin · on March 14, 2011

The second type of article is usually written by converts from legacy VC systems (cvs/svn) who have never experienced a DVCS before. They attribute the obvious improvement in usability to Git, never venturing to try anything else; possibly due to the same fundamental lack of curiosity or self-improvement that leads to using CVS/SVN at all (post 2005).

What's more, the migration from a single-branch to multiple-branch model is so empowering that such users tend to view all of Git through rose-tinted glasses forever after. Compared to that experience, relatively minor usability improvements like Bazaar's "every working copy has its own tree" or Mercurial's queues seem insignificant and/or unimportant.

nupark · on March 14, 2011

... possibly due to the same fundamental lack of curiosity or self-improvement that leads to using CVS/SVN at all (post 2005).

I (and my successful small business) still use SVN, and have done so since 2007. This choice has nothing to do with "fundamental lack of curiosity or self-improvement." It's a choice born of careful reasoning regarding the suitability of using DVCS in a centralized organization.

dexen · on March 14, 2011

I've been toying around with Hg for some time before Git. Indeed, DVCS by itself implies high degree of flexibility and whatnot.

However, I'd still like to single Git out, because of its storage format. In Hg, for every file in your repo, you get one storage file holding revisions -- plus one master file with changelogs. It's just boring files all the way down.

In Git, every object (file, tree of files, commit with a tree of files & parent commit(s)) are represented by hash. The data itself is stored somewhere -- in an interchangeable format (currently two formats are used: blob and pack with index). Storage is somewhat decoupled from toolkit. You can even fix a broken repo by literally copying in file(s) with proper content.

But the true power comes from somewhere else: you can envision defining own data types, beyond files, trees and commits, and plugging them into Git. And having them play along with the usual ones.

A general content-addressable storage :D

(Too bad Venti [1] was invented a bit earlier)

I believe Fossil (the DVCS [2], not the FS [3]) comes pretty close to that, too.

--

[1] http://en.wikipedia.org/wiki/Venti

[2] (the DVCS) http://en.wikipedia.org/wiki/Fossil_(software)

[3] (the FS) http://en.wikipedia.org/wiki/Fossil_(file_system)

zb · on March 14, 2011

He doesn't seem to be claiming that Git is more revolutionary than Mercurial/Bazaar/&c. as a VCS.

I think it's fair to say that Git makes it easier to build some system other than a VCS from it, since all of the low-level commands for directly manipulating the repository are exposed in the shell. Contrast that with Mercurial, which doesn't even have a Python API.

(For the record, I have actually built such a system on top of hg, and I'm pretty familiar with git.)

jimmyjazz14 · on March 14, 2011

It's kinda sad that people have kinda forgotten about darcs (at least in the mainstream) which is still an interesting project despite some of its shortcomings (and one of the first good DVCSs). That said I still prefer git.

j_baker · on March 14, 2011

I think it boils down to difficult to quantify factors: most notably, love. In spite of all of git's warts (or perhaps even because of them), it's very clear that git is a tool that was created by the people who would be stuck using it. I'm not saying that isn't true about the other ones. I'm just saying it just doesn't shine through as much.

sliverstorm · on March 14, 2011

Well, it's been two years and git still isn't making me sandwiches (unlike Unix pipes), so I think he's been proven wrong.

Honestly, I like git, and I like the 'local copy' model, but I feel people go a little overboard sometimes with their enthusiasm for git.

neutronicus · on March 14, 2011

What he's going on about is that git is an efficient implementation of a purely functional data structure on disk. He's advocating using it as one if you ever need one. This is tangential to its role as a version control system.

apenwarr · on March 14, 2011

Wow, you just put exactly what I was trying to say into a single simple sentence. Thanks!

erikpukinskis · on March 14, 2011

The vast majority of these comments seem to misunderstand the article as saying that Git is revolutionary for managing code.

That's wrong. The article is about using Git for managing data. The examples cited are using it as a backend for a distributed filesystem or a wiki. The OP is talking about Git as a revolutionary new kind of datastore and network protocol, not a revolutionary new kind of VCS.

j_baker · on March 14, 2011

I say this as someone who absolutely loves git and thinks it's the best thing to happen to VCSes in a long time: no it isn't. This kind of advocacy is worse than useless: when people realize that git isn't a revolutionary concept that will change computing and is really "just" a VCS (albeit a very good one), there will be a big backlash.

apenwarr · on March 14, 2011

I'm the author of the OP. The comments here make me think (again) that the article wasn't clear enough: admittedly, when I first wrote it, I was just discovering git, as some of the comments said. The difference between me and perhaps many recently-baptized git fanboys is that now, three years later, I still believe exactly what I wrote. I just now also know why it came across the wrong way.

Here's what I was trying to get across at the time: git creates a whole new set of nouns and verbs for computer science that almost none of us have experienced before. Yes, it steals a lot of concepts from programs like darcs and monotone, and there are other things that do the same things that git does from a VCS point of view - but my focus is on the nouns and verbs. git exposes the plumbing of these new concepts directly to you, which is both scary and intensely powerful.

git isn't the next Unix because it will replace Unix: git is the next Unix because its concepts represent the next mind-shifting change in computer science. I mean that git is the next Unix in the same way you could say "Unix is the next Lisp" or "Dynamic Languages are the next Static Languages." Not that the new thing replaces the old thing: they have totally different uses. But that's the point: the new thing's uses are really new. Stuff that was hard is now easy.

It's hard to imagine the world before Unix pipes (and the Unix sh in general) were invented, but I used it, and IT SUCKED. The whole Unix paradigm (yikes, now I've used that word) really changed the face of computing. Even if you don't use Unix, you got changed by Unix.

git's new nouns are blobs, trees, commits, and refs. The new verbs are push, pull, merge, tag, etc. You can apply these nouns and verbs to a lot more than just source code version control.

The naysayers in this thread all sound like 1990's programmers who don't understand the value of higher-order functions or dynamic typing or macros. You can survive without those things, but some problems are just so much easier with them than without them. git is like that. If you don't get it, you're living in the past.

One final clarification: my article was written to talk about git, but it's not about git's code or API or repo format at all. bup, the backup software I started writing about two years after that article, doesn't share any source code with git, but the amazing things it does are possible because it uses the new nouns and verbs popularized by git. When new distributed filesystems and databases and social networks and wikis and massively distributed collaborative text editors arrive, they will all be using these new nouns and verbs. If you don't care about that, then yeah, git isn't the next Unix for you. But if you want to build the next generation of networks in real life, then you'll either be taking advantage of the new nouns and verbs or you'll be painstakingly building the Windows of distributed systems.

haberman · on March 14, 2011

> You can apply these nouns and verbs to a lot more than just source code version control.

Do you have any examples of people (other than you) who have actually done this? It would make your argument much more convincing.

Also, the fact that other DVCS's have different, incompatible models underlying them suggests that Git's nouns and verbs are not nearly as universal as Unix pipes. If Git's nouns and verbs were universal, Git could subsume other DVCS's (ie. you could implement other DVCS's semantics on top of Git with performance as good or better than what they have already).

apenwarr · on March 14, 2011

Many of the new "nosql" databases use a lot of similar concepts. But it's a new thing: I'm trying to see into the future here, not tell you what's already happened :)

As for different DVCSs, I think you're exactly backwards. Almost everyone commenting on these things seems to believe that git isn't special, it really does the same thing as every other DVCS, etc. And on a fundamental level this is true: it's very easy to convert from one DVCS to another, because fundamentally the models are so similar. git just exposes the model in a more obvious way. The insides are the same, the outsides are different.

eru · on March 14, 2011

The Mercurial guys even go out of their way to show how they are like git (http://mercurial.selenic.com/wiki/GitConcepts).

levesque · on March 14, 2011

Out of their way? What do you mean? They also have a CvsConcepts page....

eru · on March 14, 2011

Bad choice of words.

But--try to find a sentence like "Mercurial and Git differ only in nomenclature and interface." in the CvsConcepts page.

Montagist · on March 14, 2011

I thought I was with you but I was just made a little more confused by that comment :\

I've yet to do a lot with Git, but your post has made me a lot more interested. After wikipediaing the features, I'm starting to see how one could save a lot of time by building an app on top of Git rather than recreate all this functionality from scratch.

bobds · on March 14, 2011

Git core has support for CVS, SVN, Perforce, Arch and Quilt.

There are plugins for Mercurial, Bazaar and DARCS.

https://git.wiki.kernel.org/index.php/Interfaces,_frontends,...

Git to Mercurial.

http://hg-git.github.com/

Git to Bazaar.

http://doc.bazaar.canonical.com/migration/en/foreign/bzr-on-...

It doesn't seem other DVCS are that incompatible.

sedachv · on March 14, 2011

http://camlistore.org/

SeveredCross · on March 14, 2011

You had me until dynamic typing. ;) And I really hope you mean Lisp-y macros, and not CPP macros.

patrickg · on March 14, 2011

Has anyone experience with this setup: install your software in a git repository. That way you can say something like "version=1.23" on top of the control file (such as a sourcecode file for a scripting language) and the software system checks out that particular version which is on your hard drive. There are obvious drawbacks of the system as where to install the intermediate working directory, but there might be solutions for these problems.

That way updates may be much less hassle. If you have for example a python script on your server running doing important things, an update to python might break the script and could cause some trouble. But if you could say for example "uses python=x.y" and the system silently falls back to that version even if a newer version of pyhton is installed, the script is more likely to keep running even on upgrades.

neutronicus · on March 14, 2011

http://nixos.org/nix/

davidu · on March 14, 2011

Whoever this guy is, his blog has been one of my favorites for quite some time now.

Adaptive · on March 14, 2011

And for those that don't realize, the OP author also wrote up git subtree, which pretty much caused me to take git seriously and change from hg to it.

njharman · on March 14, 2011

Binaries work across pipes. With git, not so much.

look_lookatme · on March 14, 2011

I remember reading this and being blown away by his enthusiasm.

What are some interesting non-SCM projects that use git? (Like wikis, etc)

beagle3 · on March 14, 2011

apenwarr's own "bup" http://github.com/apenwarr/bup - it's the ultimate backup system on one hand, and it's just a git repository on the other.

Way better deduplication than any other backup system you've used, every backup is at the same time incremental, differential and (looks) complete.

It's magic.

cperciva · on March 14, 2011

Way better deduplication than any other backup system you've used, every backup is at the same time incremental, differential and (looks) complete.

You just described Tarsnap.

beagle3 · on March 14, 2011

Cool.

When I last looked at tarsnap, I got the impression that it's a smart compressed rsync - apparently I was wrong.

bup has a FUSE frontend, that exposes every backup set as a complete file system (also through http and ftp, but the file system angle is the most useful in my opinion). Does tarsnap have something comparable? I'm going to look at it again.

cperciva · on March 14, 2011

No, Tarsnap is designed for backups rather than random access -- it does things like cryptographically signing archives, which is obviously only possible if you have a concept of "this archive" vs. "that other archive".

(You can extract subsets of files, as per normal tar functionality, though, and Tarsnap only downloads the files you want plus the 512-byte tar headers it needs so that it can figure out which files match your specifications.)

beagle3 · on March 16, 2011

In that case, I did not just describe tarsnap, or at least - did not intend to.

bup _is_ designed for backup rather than random access. But it is easy enough to make those backups look like read-only file systems (bup includes an FTP server, HTTP server and FUSE module that expose a backup set through the respective protocol or as a filesystem).

But since bup builds on git, and a bup backup set is actually a git repository, you get all the git related stuff for free - e.g. bup supports cryptographic signatures in the repository by way of git's signing support -- although, for now, the "bup" command does not implement them (so, if you want to sign or verify the signature, you'll have to use git on the repository rather than bup)

Bup's deduplication is comparable to rsync's (and it reuses rsync's main tool for that). If you change a byte in the middle of a 100MB file, you'll likely need to transfer ~16k to or from backup (compared to the other version of the same file). That's also true if a byte was inserted in the middle of the file. And if you backup 100 copies of a 100MB file, was just a few bytes changed in each file compared to each other file - you'll need less than 150MB of space/storage, rather than the 10GB or so without deduplication.

vinc · on March 14, 2011

I don't want to spam but I did this : http://www.wipigi.com/ a wiki service on top of Git (and Django)...

But with GitHub's wikis it's a little useless for anyone but me. Nonetheless it was fun to build and an opportunity to learn a little more about Git (and Django).

hasenj · on March 14, 2011

github wikis come to mind.

Whitespace · on March 14, 2011

I've been hacking at gollum (github's FOSS git-based wiki) for some time, and I might have some interesting uses for it in the education space that go well past a basic wiki.

entropie · on March 14, 2011

I'am writing a gollum like wiki atm. Its not finished yet, but works already and is not far away from a (stable) release: http://github.com/entropie/oy

My problem was gollum worked not behind Apache/mod_proxy, so i was forced to write it on myself ;)

Montagist · on March 14, 2011

Yea that seems like a pretty powerful concept in particular. I wonder, though, if people'd really care about the ability to edit something offline when they could just go with some wiki that has private restrictions on it.

I do, however, really like the idea of downloading a complete wiki, knowing that I can resync at a later date if content in my version becomes irrelevant.

hasenj · on March 14, 2011

I've been thinking about this idea actually. Sometimes, it'd be really handy to have blog posts (aka essays) under version control so you can edit them in your favorite text editor offline. At the same time, that could be very restricting; imagine if you were required to use git + text editor to edit/publish all of your posts.

Montagist · on March 14, 2011

Having the wiki paradigm in mind, it didn't occur to me that one -would- use their preferred text-editor, but that's maybe a big benefit right there. What if it were a desktop app running in the background that monitors these text files and is smart enough to handle all git operations behind the scenes? And what -is- it about long form writing that is just inconvenient on the internet? haha

happypeter · on March 14, 2011

I love git, and use it to backup my life. Very interesting to know that you will still be able to checkout what you do today many years later. CVS can do this as well, it is just not so attractive being much slower.

tybris · on March 14, 2011

I used to feel like this way about git. Then I got dropbox and became too lazy to use git for anything personal.

hasenj · on March 14, 2011

I don't remember if I've read this before, but I feel the same way.

The great thing about git is that it's easy to understand. Once you understand the concepts (and they're really (relatively) simple) and learn the vocabulary, it gives you tremendous power. At least, it makes you feel that you have power, hence it empowers you.

> Git is actually the missing link that has prevented me from building the things I've wanted to build in the past.

Totally agree with this one.

mberning · on March 14, 2011

I'm having a hard time understanding the second point. I have never felt that my VCS prevented me from doing ANYTHING, even when I was forced to use completely shit systems like Visual Source Safe 6.0.

Are you saying that you wanted to build systems that integrated very tightly with version control, or that the difficulties you experienced with older version control tools prevented you from being more adventurous in your coding?

hasenj · on March 14, 2011

It's not the presence of a bad VCS. It's the lack of a decent one.

Without git, it's really hard to take bold steps in redesigning the project, because one would be afraid it won't work, and then you'll lose all the progress you had so far.

Git solves this problem. Just start a new branch and work in it. If it works, great, merge mack to the master branch. If it fails, no big deal, discard this branch and go back to master.

Basically git encourages experimenting in a way no other system does.

georgieporgie · on March 14, 2011

Just start a new branch and work in it. If it works, great, merge mack to the master branch. If it fails, no big deal, discard this branch and go back to master.

How does that not describe more traditional VCSes like cvs and subversion?

eru · on March 14, 2011

Merging used to be harder in them, and forking was a big deal.

aerique · on March 14, 2011

Even moving directories was a pain in the ass in CVS.

georgieporgie · on March 14, 2011

Having read through this: http://book.git-scm.com/3_basic_branching_and_merging.html

I don't see how branching or merging is any easier than I recall from Subversion or cvs.