Git from the inside out

leni536 · on March 26, 2015

> from inside out

As a physicist it always surprises me how thinking between physicist and programmers most of the time is kind of reversed. Most git tutorials seem like this to me (Mechanics analogy):

   1. Slopes
   2. Springs and gears
   3. Horrendous contraptions
   4. Ropes and pulleys
   ...
   9. Newton I., II. and III.

These kind of "inside out" tutorials are natural for me and recently I taught git basics to my SO in a similar way. It worked out well (she is a physicist too). I don't want to generalize though, it's maybe rooted in the common ways of teaching programming and physics.

JoshTriplett · on March 26, 2015

Imagine if everyone learning physics came in with an attitude of "I need to learn and use the rocket equation as quickly as possible" (or substitute some other high-level problem for "the rocket equation"). You'd end up with strange backwards tutorials for physics that start out with that specific model, then a handful of related models, then how to abuse that model to handle things it doesn't really apply to, and much later the underlying physics and mathematics to solve arbitrary generalized problems.

Many people start out trying to figure out either "how do I use git exactly like (svn, cvs, vss, ...)" or "how do I commit and push my changes", so tutorials start there. Most people don't approach git by learning its underlying data model. Arguably people should, because it's a rather simple data model, and then all the commands become simple applications of that model.

ams6110 · on March 26, 2015

Most people don't start learning Word by understanding the data structures used to store the text and modifications either.

For 95% of developers, git is a tool that is incidental to their primary task (developing software). Having to have a deep understanding of the underlying data structures in order to use it effectively is the antithesis of how most "utility" software is designed.

When I am in the flow of coding some part of my project, my head is full of the data structures, object models, databases, algorithms, requirements, etc. that are immediately relevant to that task. If I have to do a context switch and pull the git data model front and center into my thinking to know what to do to get my work into the repository, that is a serious break in flow and has always been a problem for me whenever I've had to use git.

agumonkey · on March 27, 2015

> Most people don't start learning Word by understanding the data structures used to store the text and modifications either.

And, there should be a warning when your document gets above 5 pages long that you should learn the structural sides of the program at hand. Unless you like spending your weekend crafting a TOC line by line.

simula67 · on March 27, 2015

The world would be a much sadder place if that was the case. I do not understand how the infrastructure that powers world trade, banking, public construction etc works. I am grateful that they are packaged into easily understandable interfaces so I can still benefit from them.

The power of technology is compounded when it can empower even non-technologists to use it. The more sophisticated tasks they can accomplish with it, the better.

agumonkey · on March 27, 2015

It's a double edged sword. By crafting interfaces so simple anyone can use it, we forbid them to understand what's really happening. That's how people end up thinking the blue 'e' icon on the screen is Internet.

JadeNB · on March 27, 2015

But is 'forbid' really the right word here? I certainly agree that such interfaces encourage people to "accept without questioning", but it seems that there is no necessary obstacle to making something that's easy to use but also permits you to dive under the hood and see the details. (I think particularly of Mac OS before it started becoming all iOS'd. Even now, when Apple's 'just work'ing settings don't just work for you, you can often fix it by diving into the command line.)

agumonkey · on March 27, 2015

You're right forbid isn't appropriate.

IMO, no actual mainstream OS gives ability to understand anything, even UNIX based. And I don't believe command lines are a way to understand either, or a very low efficiency one (gotta read a lot, understand complex context, try mistakeful commands).

You need virtual, mockable, undo-able environments to understand. You need ways to decode the data and metaphors used by computers.

coldpie · on March 27, 2015

> For 95% of developers, git is a tool that is incidental to their primary task (developing software).

Ehh, either those 95% of developers are working on small projects, or they're not working as effectively as they could. Version control is just as much a part of software development as your compiler or your text editor. Incidental, sure, but how you use those tools is critical to your final output. Being able to quickly and easily bisect the tree is critical, and this requires that your commits are small and self-contained. Being able to do development and backport patches to older versions of your software is similarly critical, and so it has to be super easy to make a new branch and port the patch across to that branch.

Just like I couldn't do my job without a text editor or without a compiler, I couldn't develop complex software without version control. I know my text editor very well so that I can write efficiently. I know my compiler well enough to use it to build my programs in the various configurations I need. I don't see why the version control software should be any less important to master.

Solarsail · on March 27, 2015

>I know my text editor very well so that I can write efficiently. I know my compiler well enough to use it to build my programs in the various configurations I need.

Aren't these somewhat different details to know tho, compared to knowing the insides of git? Knowing the commands, and options available in an editor is familiarity with an interface. You don't need to know exactly how it stores the buffer so it can stay responsive with large files, you don't need to know the rendering algorithms involved, etc. etc. Just the same, I don't think being able to use a compiler effectively depends on knowing how it does register allocation, or knowing if the parser is LALR or recursive descent.

These are all implementation details. Good interfaces generally abstract over them, and present the user with what they need to use it. Version control being necessary doesn't change how people should approach learning it. At least, no more than we should expect good drivers to understand piston engines. (Though, drivers of manual transmission cars should probably understand clutches. Maybe Git is comparable to that?)

semi-extrinsic · on March 26, 2015

I know it was just an example, but the rocket equation is just one assumption (constant exhaust velocity) and a simple integration away from the force balance coming out of Newton's second law.

All credit to Tsiolkovski though, the guy was a true visionary.

JoshTriplett · on March 26, 2015

Oh, absolutely. But imagine the effect if almost every physics book and curriculum taught the rocket equation in chapter 1, and either didn't teach Newton's laws or the general equations of motion at all, or only taught them in the last chapter/appendix/advanced class.

scott_karana · on March 26, 2015

Tutorials are specifically "learn to get something done, RIGHT NOW, working on the shoulders of giants".

Education from first principles is common in computer science, but tutorials aren't the place.

Analogous in physics would be, "I need to figure out exactly how much force is going to be applied through this climbing harness and pulley system lest I fall and die", and learning how to plug your variables into a standard set of equations without understanding anything but basic calculator operation.

dasil003 · on March 27, 2015

All the replies are missing the fundamental difference between learning physics and learning git: every animal on the planet is intuitively schooled from birth—even before birth by their DNA—in the daily practice of mechanics.

Learning git is a comparatively abstract intellectual pursuit. You can't assume people know how to practice version control or even what its purpose and utility is, therefore learning this way is going to be very hit or miss depending on the preparedness of the audience.

agumonkey · on March 27, 2015

But having decades of subjective perception of physics is very far from understanding the underlying laws found by scientists. It may even be counter-productive.

dasil003 · on March 27, 2015

What does that have to do with my point?

agumonkey · on March 27, 2015

Experience doesn't necessarily help understanding.

dasil003 · on March 27, 2015

It gives you something to relate dry academic equations to though.

agumonkey · on March 27, 2015

Very fair point. You have to throw both against each other and see what stick.

martininmelb · on March 26, 2015

So, maybe this is more useful?

Unlike more primitive version control systems, git repositories are not linear, they already support branching, and are thus best visualised as trees in their own right. Branches thus become trees of trees. To visualise this, it’s simplest to think of the state of your repository as a point in a high-dimensional ‘code-space’, in which branches are represented as n-dimensional membranes, mapping the spatial loci of successive commits onto the projected manifold of each cloned repository.

(Quoted from tartley.com)

RickHull · on March 26, 2015

It has to do whether the audience already has a solid theoretic / symbolic / formal background. Do they already think in terms of models? For graduate level physicists, this is almost surely the case.

Most git tutorials are for people who need to get up to speed quickly, having had git imposed on them from above, mapping surface area concepts from the prior version control system. Hard core engineers with an interest find their way to the theoretic center.

amelius · on March 29, 2015

The reason is that programmers try to create new things, whereas physicists try to figure out how things work which are already there, waiting to be discovered (or they are really engineers).

Also, programmers create abstractions such that one does not need to know about the nitty gritty in order to understand how to use something.

joshuapants · on March 26, 2015

I think this is generally because people want to start Doing Things right away with what they're learning, so most teaching attempts revolve around giving the student something they can Do and then explaining it afterward.

kazinator · on March 28, 2015

Okay Johnny, here is how we drive a car. First, put on your seat belt. Next, we discuss Newton I, II and III. We really should have done that right away, but they say, "safety first!"

rtpg · on March 27, 2015

this is disingenuous. As a child, you _did_ see slopes and springs and gears and ropes and pulleys before Newton, just not in physics classes but in everyday life. Physics classes start from first principles, maybe, but in terms of what you get exposure to, the three laws of newton show up pretty late in one's understanding of how things happen.

kazinator · on March 28, 2015

People knew about inclined planes and pulleys before Newton postulated I, II and III.

leni536 · on March 28, 2015

And you can learn using only git commits without branching, merging or rebasing ever. You don't need to understand git for that.

_ugfj · on March 26, 2015

Regular pitch for http://www.sbf5.com/~cduan/technical/git/

> you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.

It's concise (the linked article is gigantic) and allows for an understanding of this overhyped user hostile DVCS. I know that git won but don't expect me to be happy about it.

fit2rule · on March 26, 2015

I found this tutorial (the one you recommend) to be, generally, terrible. It doesn't define terms, but rather just jumps into the git terminology without explaining things. For example the section on "Commit Objects" - what actually is a commit object? Is it a directory, is it a special file, is it .. something else? The term is used in an abstract manner without actually explaining what the term means - sure, it explains what is associated with the term, but why is it an object?

The same goes for the "Heads" explanation - its a reference to a commit object. Is this a softlink, is it a field in a special file that contains a reference .. what "is it?", and why is the word "head" used?

Whereas in the tutorial referred to in this article, "Git From the Inside Out", the terms are actually defined before they're used in any significant fashion. "Commit Objects = After creating the tree graph, git commit creates a commit object. This is just another text file in .git/objects/:" (with example), and .. "Heads = Which is the current branch? To find out, Git goes to the HEAD file at .git/HEAD and finds: ref: refs/heads/master This says that HEAD is pointing at master. master is the current branch."

I just want to point this difference out, because it actually is endemic in all Git tutorials I've found - either you define the basic terms, such that an association can be made in the mind of the reader, allowing them to grasp the abstractions .. or you don't. This appears to be a common problem with Git tutorials, in my opinion - the terms, or rather the taxology of the Git abstractions - are quite unclear. Why on earth the term "Head" is used to refer to a commit object is quite un-intuitive .. at first. Git really requires a deeper dive into the abstractions before surfacing for true understanding.

barbs · on March 26, 2015

What alternative do you prefer?

_ugfj · on March 27, 2015

bzr. The CLI interface is much nicer and consistent. It has bound branches which is really useful useful when you are doing something small (see bsder comment above). You are not forced to use a staging area. There is no need to squash commits just to have a clean history instead you commit what you have and limit the depth of the log on view. Also, bzr is really easy to extend in Python. And so on and so on.

The git command line interface is the exact opposite of user friendly: some commands have options that fundamentally change the operation so much so it should be a different command: git reset removes changes from the index leaving your files intact and if you do not have anything staged then does nothing. Now, git reset --hard simply throws away your changes. This does not end here: git reset 'HEAD^' will remove a commit. How mad is that, git reset operates on the staging area but git reset something operates on commits??

git checkout filename is almost completely unpredictable as it might get the file from the index if it exists there or from HEAD if not. There is no indication at all what happened. And so forth.

barbs · on March 27, 2015

Interesting, I've not ever seen a recommendation for bazaar over git or mercurial. I have heard that mercurial's CLI is a bit more sane than git, and I've also read that it's easier to extend in Python. How would you compare bazaar to mercurial?

I've actually only ever used git for professional and personal development, but I think I'll want to play around with both mercurial and bazaar now. Git's CLI is definitely a bit of a PITA.

I find this video summarises git's frustrations pretty well :) https://vimeo.com/60788996.

ngoldbaum · on March 27, 2015

Bazaar is basically unmaintained at this point. If you look on Launchpad, the trunk branch hasn't seen a commit since December 2014 [0]. There hasn't been a release since August 2013 [1].

Mercurial is very actively developed, with dozens of patches a month [2]. They also keep to a planned rolling release cycle [3] along with deep commitments about backward and forward compatibility [4].

New features like the experimental evolve extension [5] are also really cool. I use it for day-to-day work and find it incredibly useful for editing work-in-progress commits into nice self-contained patches.

[0] https://code.launchpad.net/bzr/trunk

[1] https://launchpad.net/bzr/

[2] http://selenic.com/pipermail/mercurial-devel/2015-March/thre...

[3] http://mercurial.selenic.com/wiki/WhatsNew

[4] http://mercurial.selenic.com/wiki/CompatibilityRules

[5] http://mercurial.selenic.com/wiki/ChangesetEvolution

_ugfj · on March 28, 2015

Yes. git won, bzr is dead. As I said: I am aware and I switched to git because everyone else did. Just don't expect me to be happy about it.

ngoldbaum · on March 28, 2015

I'm trying to say if you want a git alternative that's not dying and isn't infuriating, check out mercurial.

_ugfj · on March 28, 2015

I can't, the world (open source projects and work both) is using git.

_ugfj · on March 27, 2015

I will be ready to admit I haven't used mercurial much. But ... how does mercurial help the "small" use case?

bsder · on March 27, 2015

Mercurial.

barbs · on March 27, 2015

Thanks for replying to a question not asked to you, with an answer that doesn't explain your position at all.

jacquesm · on March 27, 2015

Don't ask questions in an open forum if you aren't willing to accept with grace answers to your question from others, use email instead. Besides, his answer explained his position perfectly.

barbs · on March 27, 2015

No it didn't, it was just one word! It gave no value to the discussion at all!

AceJohnny2 · on March 26, 2015

See also "Git from the Bottom Up": https://jwiegley.github.io/git-from-the-bottom-up/

(originally a PDF in 2008)

dwyer · on March 27, 2015

Much better article IMO. Introducing the low level commands that the higher level ones wrap around is a much more fun and interactive way to understand the .git schema to me.

kazinator · on March 27, 2015

Wish `git` didn't have annoying special cases in it. For instance

   git rebase -i HEAD~2

won't work if there are only two commits, because HEAD~2 refers to a nonexistent commit after the first two.

There should be some friggin' NIL terminator there which takes the HEAD~2 reference.

Imagine having a function to, say, delete characters from a string which takes an open-ended range [from, to). Then imagine that the index to has to exist in the string; it must not point one element past the end! Oops, you cannot delete from a position to the end of the string.

The garbage-collected object graph is nice and "Lisp-like" in some ways, but silly in others.

Oh, and in case you're thinking "just make an empty initial commit, and it will effectively be your NIL terminator". No can do; git doesn't allow empty commits. Of course, you can make a file called ".nil" and add it and commit. Use "()" as the commit comment. :)

nshepperd · on March 27, 2015

I feel like the lack of an initial empty commit is really a failure to match the intuitive graph model. Clearly there should be an arrow corresponding to "adding the first files". And that arrow needs somewhere to go from and to. Hence, `git init` should always start by creating an initial commit object referring to an empty tree.

A side bonus of this would be that since the initial commit is empty it has a fixed id. Suddenly, all git repositories have a common ancestor, and you can merge any two random projects together without losing history!

kazinator · on March 28, 2015

I echoed this exact idea in another comment in the thread, right down to that commit having a fixed ID. (I proposed the all-zero SHA (which is not really a SHA)).

_ugfj · on March 27, 2015

git commit --allow-empty

kazinator · on March 27, 2015

Thanks; that is perfect. I just tried that it works on the root commit in a newly initialized repo.

I will always do this from now on:

  $ git init
  $ git commit --allow-empty -m NIL

Maybe the git maintainers can be persuaded to make a "git init --nil" which does this in one step.

mitchty · on March 27, 2015

Eh, just make an alias in ~/.gitconfig no need to add more options to git.

That is:

    [alias]
      nilinit = !sh -c 'git init && git commit --allow-empty -m NIL'

And now git nilinit does what you want.

kazinator · on March 27, 2015

The problem is that what I want is for "git init --nil" to do that, not for some "git nilinit".

Actually, I want "git init" to do that by default!

Every git repo should begin with an empty commit.

In fact, there should be a special SHA built in representing the empty commit; perhaps an all-zero SHA. This object is understood not to occupy any space in the object store beyond its own SHA: you don't look for an all zero SHA in the object space; it just stands for itself.

Then this same can be used as a parent in any orphan branch.

One of the benefits is that this zero SHA is global. And so all git repos have a common ancestor automatically.

If you and I take some code tarball (the same one) and start git repos individually, then we implicitly have a common merge-base: the NIL commit. If I pull from your repo, then we have this:

          your-import <--- your-hacks
        /
     NIL
        \
          my-import <--- my-hacks

your-import and my-import are the same since we imported the same thing. I can just do a "git rebase your-hacks-branch" to transplant my-hacks over your-hacks, implicitly using NIL as the merge-base.

mitchty · on March 28, 2015

git-subtree already allows for embedding one git repo in another.

https://github.com/git/git/blob/master/contrib/subtree/git-s...

I see the point but not sure I agree that we should treat all git repositories as having a common base commit. Not thought it through however.

stormbrew · on March 27, 2015

I actually put a basic description of what I intend to do with the repo in the message of my empty root commit these days. Helpful for remembering if I lose momentum and come back to it later.

I'd actually love to start doing that with starting a new branch, but I use rebase -i too much and it really really dislikes empty commits in the set you're working on (it just throws them out).

dnc · on March 26, 2015

For grokking git, indispensable resource is git early dev mailing list and corresponding code base (first couple of months after project started). Linus explained it in very clear and precise way in the mailing list and related code. The initial code base is surprisingly small (around 1200 LOC of clear and precise C code). Used data structures are simple and self-explanatory. Although most of the original code is not in the git code base anymore, the data structures and main design ideas have stayed there intact so far.

voltagex_ · on March 26, 2015

I'm going to have to go through the archives later (anyone got an mbox?) but it's tricky to follow the early development. The archives seem to start at http://marc.info/?l=git&r=20&b=200504&w=2 and I don't see many design messages from Linus.

coldpie · on March 27, 2015

I actually found gitcore-tutorial(7) to be a really great resource when I was learning git.

mendelk · on March 27, 2015

Interesting article.

Also wasn't aware that Hacker School changed their name.

https://www.recurse.com/blog/77-hacker-school-is-now-the-rec...

chromedude · on March 27, 2015

Don't worry you aren't very far behind the times. It was announced yesterday.

jordigh · on March 26, 2015

Huh, another git explanation.

Either the thing is so easy to understand that everyone can do it and is then compelled to write about it, or it's so difficult to understand that everyone feels the compulsion to explain it to everyone else.

RickHull · on March 26, 2015

Maybe a quip, but that dichotomy is so false it's not even funny. Git is tough to wrap one's head around at first. It's not intuitive unless one already has a deep background in this space. Hence, there are lots of attempts to explain it in order to bring more into the fold of intuition. It seems perfectly natural and good, and your contemptuous tone puzzles me.

Dylan16807 · on March 26, 2015

Right. It's simple but it's unfamiliar.

Confusing command parameters aside.

bsder · on March 27, 2015

> it's so difficult to understand that everyone feels the compulsion to explain it to everyone else.

All successful religions include proselytization in their basic tenets.

Ahem. Anyhow. Let me be more charitable. git is so hard to learn because it has an impedance mismatch with the primary user's use case.

git clearly works in the large--that's what Linus designed it for. The problem is that git forces you into that workflow immediately and has no intermediate steps. The problem is that most of us use version control to coordinate less than 10 people, and git forces way too much mental energy on top of something which is very simple in CVS or SVN or ... any other version control system, really (maybe arch had a worse mental model ... that's not a compliment).

Use Mercurial for a while, and then use git. You will find yourself saying things like: "Why should I need to even care about X?" and the answer is always "Well, if you had 100 committers and 40 branches ..."

Things like: "Why would I need to name a branch?" "Why wouldn't I just sync a repository completely?" "Why not just clone the whole repository?" etc.

The problem is that everybody is forced into "git in the large" in order to contribute to open source projects. The tutorial that needs to be written is "git in the small", but I'm not sure that the design of git actually allows that tutorial to be written.

kinghajj · on March 27, 2015

Maybe it's because Git was my first VCS, but I find SVN extremely frustrating to use at work. I'm having troubles understanding the examples you gave, though. Is coming up with (presumably topic) branch names really that hard? Just name it "foo" or "woot" temporarily until you have the energy to come up with a more-friendly one. What do you mean by "Why wouldn't I just sync a repository completely?" Is this supposed to mean that the concept of private, local branches is confusing? And "Why not just clone the whole repository?" really baffles me, because that's exactly what cloning already does...

bsder · on March 27, 2015

> Just name it "foo" or "woot" temporarily until you have the energy to come up with a more-friendly one.

But how do I merge your "foo" and my "woot"? Um ... why should I even have to care? Why aren't we just working on the same branch by default?

> Is this supposed to mean that the concept of private, local branches is confusing?

Yes. What happens when I type "git push"? Which of my named branches got transferred? Um ...

> "Why not just clone the whole repository?" really baffles me, because that's exactly what cloning already does...

So, is that before or after having been gc'd? Which branches did you get? etc.

If there are 3 people generating code artifacts in an office, one of which is an artist, I DON'T CARE about any of these things. Worse, they are wasted mental energy for people who don't like this kind of stuff (the artist, for example, likely doesn't enjoy graph theory). Worst, they are ways that you can corrupt your repository or blow your code away.

The fact that I have to think about graph theory to use a DVCS in daily use is unacceptable. It means I can't explain this to writers, artists, and other non-programmers who I would really like to have in source control, too.

I explain mercurial vs git like the difference between a high-end chainsaw and a logging chainsaw. A high-end chainsaw has things like safeties, a blade clutch, electric starters, comfortable handles, etc. It will go through 99% of the trees anybody will use it for. A logging chainsaw doesn't have lots of amenities, but the motor is bigger and it will blow through 100% of trees. It's also a hell of a lot heavier which means it's harder to control and more likely to seriously hurt you when something goes wrong.

kinghajj · on March 27, 2015

> But how do I merge your "foo" and my "woot"?

`git checkout woot; git merge his/foo`?

> Um ... why should I even have to care?

Because it helps encourage separating work logically, and gives you quick way to reset your working tree to a known state should merging upstream changes go awry?

> Why aren't we just working on the same branch by default?

You are, you just create topic branches off the development one to focus a series of commits on some goal. When you're ready to share it, just merge it into the development branch and push it. If you and your colleague are working on a topic together, you do the same, just push to some common remote branch for that topic.

> What happens when I type "git push"?

In Git 2.0, it pushes the current branch iff it tracks a remote upstream one; otherwise it prints an error explaining how to set the upstream branch for the current branch.

> So, is that before or after having been gc'd?

GCing only removes garbage. Since you're creating branches to keep track of your topics, this shouldn't ever be a problem. If you had no references to a commit, it must not have been important!

> Which branches did you get? etc.

As far as I know, cloning gets every branch from a remote; not all branches will automatically have tracking branches, but making those is easy enough.

> Worst, they are ways that you can corrupt your repository or blow your code away.

Do you mean making incompatible commits via rebasing by "corruption," or are you referring to actual repository data loss due to bugs? I'm unaware of the latter. If you consistently make temporary private branches, it should be impossible to lose work.

Is graph theory really that complicated? I always thought it was one of the more intuitive compsci topics, since it's very visual. Maybe that's my bias, though.

I try Mercurial every once in a while, so I'm not basing my opinion only on old versions (in fact, I just played around with it again today.) I always feel constrained--like it's trying to prevent me from doing what I want to do. A lot of it is familiarity, I'm sure, and differences in nomenclature (though, I'll say, "checkout" makes a lot more sense than "update" to me--I would assume that "update" would be analogous to "fetch").

tensorproduct · on March 27, 2015

> > So, is that before or after having been gc'd?

> GCing only removes garbage. Since you're creating branches to keep track of your topics, this shouldn't ever be a problem. If you had no references to a commit, it must not have been important!

Coming from Mercurial to Git, this has been the single hardest thing to wrap my head around (so far). Why should I have to tell my version control system twice not to throw away my work? Committing a change should be sufficient to let the VCS know that I want to save it. That's what that word means.

tensorproduct · on March 27, 2015

Actually, I thought of an analogy that helps me reason about this.

If we think of the VCS like a text editor, then a branch is like a file that you can write changes to. However, if you're working on an unnamed file, then the editor (or its designers) has to make a choice about what to do with that unnamed document when you're done.

Git treats an unnamed branch like the scratch buffer in Emacs. You can make all the changes you want, but once you close it, it's gone (in this case, eligible for GC). This seems reasonable, because if you cared about it, you would have given it a name.

Mercurial's designers work off a different assumption: all the work that you do is important (and it provides other mechanisms to allow rough work). So if you start working in an unnamed branch, that's no problem. You can have as many unnamed branches as you like, but it's up to you to remember your way around. Going back to the filesystem analogy, it becomes like navigating between your unnamed text files by their address on the disk, rather than a nice human readable name.

Does this analogy make sense to anybody else?

bsder · on March 27, 2015

> actual repository data loss due to bugs?

Not bugs. Actual repository loss due to commands functioning as intended. "git push" tends to be the culprit. I have seen more people blast repositories with it than you would believe.

> Is graph theory really that complicated? I always thought it was one of the more intuitive compsci topics, since it's very visual. Maybe that's my bias, though.

Repeat after me: "Not everybody is a programmer."

Okay? Really. I want my artists, composers, writers, etc. to put their stuff in version control. They can't do that if a third or fourth year course in CS is a prerequisite.

And that's my biggest counterpoint. I can and have taught people who have basic computer literacy how to use CVS, svn, hg, and even bzr. git is completely opaque to those same people.

phaemon · on March 27, 2015

"git push" does not cause data loss. Whoever told you it did was incorrect.

Also, a "third or fourth year course in CS" is not a prerequisite for understanding git. I am proof of that :-)

bsder · on March 27, 2015

Dude, this is NOT theory. I watched it happen. Multiple times in multiple companies.

Sure, maybe the data was actually there and a git god could have unwound whatever brain damaged state "git push" stuck the repo in. Maybe. However, it's not like everybody didn't try. The only thing that saved them was that I take snapshots of git repositories via rsync every hour because I know this is going to happen. Not might--will.

I have never had this happen with Mercurial. Ever. I don't even think about it. I wouldn't dream of rsync'ing a Mercurial repository because I've never put a repo in a state that I can't find the data.

Everybody apologizes for git. I don't have to apologize for Mercurial.

kinghajj · on March 27, 2015

The only way I know of that "push" can cause problems is with "--force"... so just tell people not to do that! If you attempt to push a non-fast-forwardable commit to a remote branch, Git will complain and stop with a message like

     ! [rejected]        master -> master (non-fast-forward)
    # error: failed to push some refs to 'https://github.com/
    USERNAME/REPOSITORY.git'
    # To prevent you from losing history, non-fast-forward updates were rejected
    # Merge the remote changes (e.g. 'git pull') before pushing again.  See the
    # 'Note about fast-forwards' section of 'git push --help' for details.

phaemon · on March 28, 2015

Sorry for the delay in replying, I was away.

> Dude, this is NOT theory. I watched it happen.

No, you thought you saw it happen. "git push" does not remove data, it can only add new commits (and their contents) and update refs (branch names and tags) so you were mistaken in what you saw.

> Sure, maybe the data was actually there

Yup, exactly. No data was lost. It was actually there.

> and a git god could have unwound whatever brain damaged state "git push" stuck the repo in

I like the "git god" moniker. Thanks! But it's not really that complicated. What probably happened was that "git push" added a new line of commits, and updated your branch name (let's say it was "master") to point at these new ones. The old ones were still there though, they just weren't an ancestor of your current "master" so you couldn't see them immediately with "git log".

If you wanted to get your repository back to its previous state, you just needed to look with "git reflog" and see what commit master used to point to, and then set it back to that. That's the only change you needed to make. The repository wasn't broken in any way, it just had a few more commits in it.

Now, you might say that "git push" shouldn't make it so easy to do this: and you'd be right, so it doesn't. You need to have used "--force" to get it to behave in this way. Really, if this is a happening regularly, you clearly have a broken workflow. Perhaps introducing some Code Review in there would help? Gerrit maybe?

cunac · on March 27, 2015

I am pretty sure that your conclusion is wrong. You can push 'wrong' things if you want to e.g. just stamp over other people changes because you didn't feel like merging conflicts properly but that is user issue , not tool issue repo is in correct state as per your actions now if you don't understand implication of what you are doing that is different problem

ta0967 · on March 26, 2015

Its concepts easy to understand and git is very powerful, but the interface is horrible and a bitch to learn. Git does not need another tutorial, it needs an alternative porcelain.

bsder · on March 27, 2015

git has been out how long now? How many people bitch about its bad user interface? And yet nobody as stepped up to write something better?

Perhaps this is pointing to the fact that there is something fundamental about git that prevents such a thing from happening?

ta0967 · on March 27, 2015

Prevents? There's nothing like that. The only something about git that hampers comprehensive porcelain alternatives is the lack of orthogonality in git's plumbing.

People spent years bitching about CVS before a few Apache-related people got together and wrote Subversion. The long time it took didn't point to anything fundamental about CVS preventing anything.

edavis · on March 26, 2015

Or, alternatively, git's building blocks are elegant, powerful, and interesting and writing about them makes for natural blog posts/guides/tutorials.

logicallee · on March 27, 2015

>"This essay explains how Git works. It assumes you understand Git well enough to use it to version control your projects."

so...the opposite of Bjarne Stroustrup's maligned "The C++ Programming Language", which fails to explain how C++ works, after assuming you don't know it. :)

seriously though no need for the second sentence. this article is a great intro!

RansomTime · on March 27, 2015

Footnote 3: git prune deletes all objects that cannot be reached from a ref. If the user runs this command, they may lose content.

In what cases would a user lose content? When something is added but not committed only?

m0tive · on March 27, 2015

When you've committed something, but then rebased or reset the branch position so the commit is not longer in the history of any branch or tag. This usually isn't a problem, because when you rebase work you are making a copy of the commit so references to the data should be the same.

I also think it's worth noting, `git gc`, which is triggered automatically occasionally, actually runs `git prune`.

ThinkBeat · on March 27, 2015

What tool did the poster use to create the diagrams?

maryrosecook · on March 28, 2015

OmniGraffle. I really enjoy using it.

Fannon · on March 27, 2015

A great title would have been: The Guts of Git.

a3_nm · on March 27, 2015

Already taken: https://lwn.net/Articles/131657/

msie · on March 27, 2015

I regular read articles about Git's inner workings and I always seem to forget it. :-(

ams6110 · on March 26, 2015

Yet another attempt to explain the incomprehensible.

Why does such a popular version control systems find itself in need of so many explanatations.

Any startup attempting to market something that required a user to understand concepts such as this...

https://codewords.recurse.com/images/two/git-from-the-inside...

...would be laughed out of the room in any other context.

dr4g0n · on March 26, 2015

A version control system with the feature set git has is necessarily complicated, this is not a bad thing. Forcing the complexity on the user is a bad thing, but git does not do this except when it is necessary to. Using git at a basic level is not hard.

This essay is not attempting to explain how to use git, it is explaining how git itself works and how changes are physically tracked. There's no need for a basic user to know this information, but if someone wants to dig into git and understand how it works, this essay is a nice guide. Explaining how SVN or any other version control system works at this level of detail would also be complicated.

rudolf0 · on March 26, 2015

Git isn't exactly a startup (or a company, or even a product), nor did it ever intend to be.

Lots of great software happens to be difficult for a lot of people to intuitively understand.

Crito · on March 26, 2015

> "Lots of great software happens to be difficult for a lot of people to intuitively understand."

For real. Maybe it is just me, but I find programs like Photoshop and nearly every CAD program I've ever encountered to be bewilderingly complicated. I don't use any of those sorts of software professionally, but have found myself needing them numerous times for hobby reasons. Every time I try to learn them I become frustrated with just how steep and tall the learning curves are.

Git though? I felt pretty confident with how it worked and basic command line operation after just a weekend.

Maybe git's command line is more inconsistent than hg's or subversions', but in the grand scheme of software difficulty? I just don't get the complaints. "Incomprehensible"? Give me a break. It does not hold a candle to most commercial professional software.

sp332 · on March 26, 2015

You are not required to understand this concept. But if you're working on a large team and your branch history gets this complicated, you can still effectively use git to manage it. Try that with less-popular version control systems. (Mercurial comes to mind, actually.)

imakesnowflakes · on March 27, 2015

What is it that you can do in git, that you can't do in Mercurial?

kinghajj · on March 27, 2015

I believe one piece still missing is the full functionality provided by the index in Git. There is the record extension, but IIRC that doesn't emulate one of the index's greatest features: during a merge/rebase, non-conflicting changes are already staged, so "git diff" shows only conflicts (and "git diff --cached" shows those changes that merged successfully).

imakesnowflakes · on March 27, 2015

Mercurial does this, and you only have to manually resolve conflicting changes during a merge/rebase. See here

http://mercurial.selenic.com/wiki/MergeToolConfiguration

I think you might have tried this on a instance of Mercurial where the "premerge" option of the merge tool has been turned off for some reason..

sergiotapia · on March 26, 2015

Git won because of Github, it's an unpopular opinion but I stand by it. If Github did not exist, Git wouldn't have been adopted by so many projects.

josteink · on March 27, 2015

> Git won because of Github, it's an unpopular opinion but I stand by it.

Quite often you will find the exact opposite statement: Github succeeded because it rode on the success of Git.

I'm not saying your position is wrong, but they cannot both be right.

brown9-2 · on March 27, 2015

Sometimes technical things are complicated, because people don't work or communicate in tidy ways.

You could make similar complaints about many of the foundations that make technology possible. Take for example a protocol that enabled you to read this message: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#/...

semi-extrinsic · on March 26, 2015

So by your logic, we shouldn't have helicopters? Using them arguably requires a lot more understanding of complicated things than git does.