A suggestion for anyone hoping to write accessible tutorials:
> Hg folks should read this section carefully. Among various crazy notions Hg has is one that encodes the branch name within the commit object in some way. Unfortunately, Hg's vaunted "ease of use" (a.k.a "we support Windows better than git", which in an ideal world would be a negative, but in this world sadly it is not) has caused enormous takeup, and dozens of otherwise excellent developers have been brain-washed into thinking that is the only/right way.
If this is one of the most important concepts, starting it off with a slew of negativity about things the reader may be currently using (windows, hg) is probably not the best way to get them to keep reading.
There's also a bit of "people in glass houses..." to this comment. I don't think git really wants to start a fight when it comes to poor design decisions.
I'm a huge fan of git (but became one the hard way) and I think that most of the poor design decisions are in the command layer. Largely the tendency for the same command to do several things that, to the user, are wildly different (often because they map to the same fundamental operation on the DAG).
As the most simple example, git add both adds a new file to the index and adds changes to an existing file to the index. They are both fundamentally the same operation, but your intentions are different.
Checkout, likewise, both changes HEAD's symbolic reference and changes the working copy to match the index or a commit. This makes it both a means of full tree changes and local reversion.
And then there's rebase...
I don't think these are huge issues, but I do think they're barriers and a source of common early issues with git for new users.
This is exactly what I was getting at. git's command layer is much like the "new" keyword in Javascript -- it's a false abstraction of a relatively elegant underlying system [1]. As it is, the command layer encourages a naive mental model that is both inaccurate and leaky. You end up having to learn both the abstraction layer and the actual system underneath in order to actually understand what any of the commands do.
[1] I assume this was originally done to make git seem more approachable to SVN or CVS users.
I find that SVN users have the most trouble with git, so I don't think that's true. Mercurial has a similar underlying model but has a porcelain layer that svn users find far more comfortable.
The branch switching behaviour definitely seems taken from cvs, but somehow works much much better than it ever did in cvs.
I assume you're referring to the use of the different flags?
For the latter two, no disagreement there. For the former two, there is a difference between returning a verbose response within the context queried and expanding the inclusive parameters of the query. 'git remote -a' makes no sense as all remotes are equivalent (there is no such thing as a local remote).
If you're curious, "git remote", "git branch", and "git tag" all present a list without supplying those options. You only need "git branch -a" if you want to see remote branches.
"git stash" is certainly the outlier; the reason is that when "git stash" was first added it was meant to be a short-and-sweet command. Otherwise, you'd be forced to type "git stash save <name>", which is not so short and sweet. "git stash list" was added later.
I agree the CLI can be a pain, but it's slowly improving. It's in a better state now than it was a few years ago. For example
git push origin :topic-branch
can now be replaced with
git push origin --delete topic-branch
This still has issues, there's now multiple ways to do the same thing. There's also different commands (git push vs git branch) for the same action, deleting a branch (regardless if it is remote or local).
Not sure what ender7 refers to. As far as I know, there are a few things in git that its maintainers think are missing - e.g.,
Linus mentioned that having a "generation" in a commit, which is 0 for the empty commit and 1+max(generation of ancestors), would have sped up some merge operations.
Finding the history of a single file requires traversing the entire commit tree.
But there is no "fundamental" problem or bad design choice - and in fact, this data can easily be cached without adding it into the protocol.
EDIT: saw some complaints listed in this thread - they all have to do with UI, not with a design choice that is limiting use. Wrappers like "eg" and "legit", and macros, can be used to fix the UI (except no one agrees on what a better UI looks like).
"saw some complaints listed in this thread - they all have to do with UI, not with a design choice that is limiting use."
For many people, UI issues carry way more weight as "design choices that limit use" than issues such as speed, efficient disk usage, data normalization, etc.
"Wrappers like "eg" and "legit", and macros, can be used to fix the UI"
I disagree. There may be some that _could_ be used, if 'the internet' wasn't filled with helpful comments using the 'real' UI. As it stands now, there is no replacement that has sufficient market share to give users a good chance of finding questions to answers they may have. It is a bit like the early days of Windows, where "how do I..." Questions got answered by "that's easy. Exit to DOS..." (Linux on the desktop has a bit of a similar problem, but I think it is slowly outgrowing it)
"except no one agrees on what a better UI looks like".
That may be an indication that it really is hard (maybe even impossible) to find a good alternative UI hat does not leave out git features.
The commands, my god there are numerous commands that have no rhyme or reason.
Want to checkout a branch?
git checkout <name>
Want to make a branch?
git checkout -b <name>
Want to reset a file?
git checkout -- <file_name>
Want to make a sandwich?
git checkout ++D@A#32
Want to rebuild universe from Big Bang?
git checkout 0 --rebuild-universe
Add to this, duplicated commands, arcane commands (hello fsck --lost-found), commands that seem deceptively similar but aren't, doesn't work out of the box like hg, doesn't work on Windows etc.
I love git, but its CLI is horribad. And that is after most of weird/confusing stuff was removed...
Git, like Linux is a power tool. It's like those drills that can drill through solid rock easy, but if they jam, they'll spin you around.
There is an inverse relationship the number articles title "x explained simply" and the actual simplicity of x. I honestly don't understand why the developer community refuses to admit the obvious that git is unholy clusterfuck of a product. It has a nice data structure inside it? Name another end-user product for which you are even vaguely aware of what data structures were used.
> Name another end-user product for which you are even vaguely aware of what data structures were used.
Unix.
You are operating primarily on a tree of files and streams of text. To operate on these you have a wide array of utilities that perform simple tasks (and a handful that perform complex tasks as well) that, when composed, allow you to perform any transformation you want. You can get a freshman CS student off the ground with Unix like systems in what, one lecture?
Because the (bleedingly simple) data model is the focus when learning Unix, you don't need to memorize every single little edgecase of the system. Knowledge of the data model alone is enough to tell you what sort of things can or cannot be done, and general knowledge of the sort of thing that a few utilities do is enough bootstrap yourself. If you want to list out some files in some particular way, you may not know immediately what exactly to type, but you probably do know that ls or find is a decent place to start looking.
Technically there isn't a single version of Git either, though (thankfully) they can all operate on the same repositories (whereas in UNIX land, you'll find different file systems that the others do not support). They do however have some different capabilities. jGit for instance can push to S3, which is pretty neat.
Linus just created a kernel that can be used with a Unix-like system. The difference here isn't academic, what I am talking about above was created before Linus was born. (mildly interestingly, he apparently missed Unix Epoch by only a few days)
After looking at Git I wonder how Linus could ever constraint himself to POSIX? How come linux system calls don't have ten optional parameters each? Some of them actually mandatory, some changing meaning of the whole call?
Why go with boring open, creat, read, write when you can have rerere and prune and annex and reflog?
Not sure what you're getting at here, git developers got to choose their own names for git commands because there was not an existing standard that they were trying to implement.
Do you really think that 'prune' is a worse name than, say, 'fcntl'?
Word. I have a theory that while git has completely taken over the software industry and most users can "get shit done" with it, very few of them actually understand what the fuck is going on. All those "simple explanations" are really nice theoretical pieces about the merits and design of git itself and most of the time tend to completely ignore (or dodge) the clusterfuck that every day git can be, especially for people starting up with it.
I'm talking about the botched merges, the hour-long rebases with 156 git rebase --skip, the "your branches have diverged" mysteries, the subtle differences between fetch and pull, the fact that all the GUI I've seen so far, far from being a tool, are actually complicating the task with their own little syntax.
If the underlying data structure is so beautiful, then how come there's no UI where I can simply drag branches around, have an actual "OOPS, MISTAKE, LET ME UNDO" button, reorder my commits with the mouse, something that holds my hand and actually cares about ALL git users, not the 1% connoisseur elite?
Don't get me wrong, I have zero doubt that git is in fact an absolutely great tool with a very intelligent design and that the users are to blame for not understanding it, I'm just thinking that if after more than 5 years of git-as-a-dominant-dvcs I keep seeing the same puzzled faces looking at a series of SHA-1 like it's the answer to the universe over and over again, there's something that's not quite right.
That being said, this is indeed one of the most comprehensive articles I've ever read about git.
While I agree with some of your criticisms, as the resident 'Git guy', I've found no difficulty in explaining to people that pull is a convenience alias for 'fetch followed by merge'.
True, but to understand that you also need to understand what fetch and merge do respectively. Not so easy for the profane/beginner. You're correct that this one is a bit of dishonesty on my part though :)
I think part of the problem is that git, like other version control systems, does not enforce any particular way of working (workflow).
When authors try to explain how to use git, they often have a very particular workflow in mind and don't necessarily describe exactly what that workflow is. This can cause major problems when somebody tries to apply advice to their own workflow.
I like the graphical approach. It helps to conceptualize what is going on so that someone can apply it to their own situation.
That's not it. Following any reasonable workflow still requires a set of arcane commands and flags that only make the slightest bit of sense if you know how git is implemented. I was able to use SVN successfully without ever knowing a thing about the implementation. I've generally had the same experience with HG and even CVS and VSS back in the day. Git adds sophistication over a prodcut like SVN, but adds vastly more complexity.
It's that its porcelain is a clusterfuck. The staging area is a hack that greatly convolutes the UI, and it would have been better if it were left out and you just had a way to cherry-pick what you want to commit at commit time (if it wasn't everything).
There's no symmetry in commands. The opposite of "git commit" is "git reset --soft HEAD^", not "git uncommit". "git reset" is three commands in one. I could go on and on.
git was designed as a data model and then a series of slapdash commands that enabled manipulation of that data model. It wasn't designed from the end-user perspective backwards, and this really, really shows.
I love the staging area, and I miss it everytime I have to us SVN at work.
I actually use the GIT-SVN bridge to work with git locally, pushing my changes up to SVN when I've resolved a topic. I do this in part because of the staging area. I have accidentally included changes in SVN commits so many times that I try to avoid SVN altogether.
I agree though, so many of the commands are just plain painful, the reset command(s) is a perfect example.
You have it right that it was designed as a data model. Linus has said a few times that it wasn't originally intended to be the Source Control Management tool itself, but more like a kit for building an SCM. Sadly, it took off so quickly simply because Linus built it and was adopted as the end user solution.
I have really mixed feelings about the index, personally. It is nice to be able to incrementally build up a change, but I do think that it's a thing that complicates learning for new users.
I'm curious, as someone who actually loves it, what if there were just better tools for incrementally altering the last commit instead? I'm not sure that a more robust git commit --amend couldn't achieve the same goals as the index with less conceptual overhead.
You're basically railing against orthogonality. I don't think anybody disagrees that some flags could be made more consistent, but creating a separate git-uncommit when that functionality is fundamentally encompassed by the purpose of the third invocation of git-reset is a mistake.
If you make a specific case for 'git reset --hard/--soft HEAD^' then you would either be left without the more general (and more useful) 'git reset --hard/--soft <commit>', or you would have a situation where you have to use 'uncommit' to (for example) re-commit something, or you would be left with a git-uncommit command and still have git-reset for related operations. Why would any of that be better?
You could split the third invocation of git-reset off into its own command, but honestly I don't see the utility in doing that.
Disagree. A git-uncommit command tells you by its name what it's going to do, even if you know nothing about git or even if you don't know what a commit is, you can assume that git-uncommit will undo it.
git reset --hard tells you absolutely nothing obvious about what it's going to do unless you understand git and its specific incantations. You have to learn git commands, rather than intuit them.
-p, --patch
Interactively choose hunks of patch between the index and the work
tree and add them to the index. This gives the user a chance to
review the difference before adding modified contents to the index.
This effectively runs add --interactive, but bypasses the initial
command menu and directly jumps to the patch subcommand. See
“Interactive mode” for details.
I might not [EDIT: in fact, was not: see reply below] be appreciating the way that the staging area convolutes the UI for you, but you can pretend it doesn't exist by always just doing commit -a. And, if you want to "cherry-pick what you want to commit at commit time", well, that's what the staging area is for, isn't it?
Pretending it doesn't exist won't unconvolute the UI. I'm talking about things like "git checkout -- <file>", and "Git reset" which could just be one command if they didn't have to manage getting things in and out of the staging area.
I much prefer bzr's porcelain, it's much better designed and more sane.
I had to use svn in college for some projects and we frequently ran into terrible merge problems that we simply couldn't figure out because we didn't know what was actually going on behind the scenes.
git is more complex on the surface, but I really find it to be so much simpler when you end up in real nontrivial use cases, especially when something goes wrong. I would agree that the commands are sometimes too memorization-intensive, but I've never been in a situation with git, even as a beginner, that I couldn't figure out and resolve relatively easily, especially with all the online resources available. I can't say the same for svn. Maybe I was just an idiot when using svn, but if I was an idiot there, I don't see why I wouldn't be an idiot with git as well unless there was just something fundamentally more useable and flexible and understandable about git.
The underlying data structures do matter, even with mercurial. One workflow I immensely appreciate with Git is locally committing some series of messy changesets and commit messages and then afterwards, when I have finished the feature, tidying everything up by rewriting the history (git rebase -i) before I push my commits.
When I tried the same approach with mercurial I found that rewriting the history is not reasonably supported by mercurial. They use these append-only data structures that sound nice, because they assure stability (no already written data is ever in danger because files are only appended) and fit with the general concept of commit->pull->merge. But when you want to change the commit history you clash with the append-only concept. The Histedit extension which is meant to provide this feature has warnings all over the place that what I do is dangerous and might result in data loss with changeset backups written to locations in the working tree. It's horrible compared to Git.
Now don't misunderstand me, Git has a lot of problems UI wise (for example I still don't really grasp what is going on with detached heads). But I find the fundamental design choices more sound than with any other VCS that I have tried.
> Ignore them — you do know, what are you doing, right?
No, I don't :). I heavily rely on my CVS to never loose any data and to be able to go back to a previous state whenever I need to. This works great with Git's reflog.
> That's not true. Backups are written to .hg/strip-backup directory, which isn't tracked.
I stand corrected then. Maybe I'm mixing that up with amend or revert? I last used mercurial 9 months ago, but believe to remember there was some command that left .orig files lying around and that if you applied a history rewrite command several times those backups were overwritten with the new backups and you weren't able to come back all the way.
> there was some command that left .orig files lying around
Only way .orig files may pop up is failure to replay rebased commit(s). No way these are backups — their purpose is to make user able to fix things and continue.
I used SVN for years without understanding it and that caused me to get weird merge errors I never managed to figure out. Git on the other hand I understood after a few weeks of using it.
Since I failed to understand SVN after years of use I would say it is more complex.
That is where I am at. Being able to use something for years while never actually understanding it is not a desirable position to be in, and really does not equate to any meaningful sense of simplicity. It particularly is not something that developers should strive for in a system built for developers.
I'm curious: did you use branches and labels in SVN? I've come across many svn repositories that don't use the trunk/branches/tags layout, and as a result the developers keep completely separate repositories for slightly different versions of their projects instead of creating branches. I've even seen new repositories created for each release version of the project.
If you're using svn you need to understand the implementation to get why copies are cheap, so that you can understand how to use branching and tagging appropriately.
The problem is that with SVN, you can get in gradually. To start, the branchless approach still gets you a great service - obviously an incomplete one, but better than not having source control at all. And you can have leaders who are the only people who have to think about branching and merging and have people very gradually move into that role.
Git seems to force you to dive straight into the deep-end, since everything is a branch, even your own local working folder.
Branches are fundamentally little more than text files in .git/refs/heads that contain the sha of a commit object. Don't let the idea of a branch frighten you, there isn't much complexity behind the idea.
That argument doesn't really make sense. If x were fundamentally simple, why would I need a simple explanation for it? Clearly git is, or can be, complex, and a simple, simpler or simplified explication would be helpful.
None of that implies that "git is [an] unholy clusterfuck of a product," it means that git is complicated. For 99% of the work you do with git, it isn't even that complicated and you don't need to be aware of the data structure. As for the last 1%, well, that's what separates git from other (D)VCSs. Git gives me the power to do a lot, and incidentally, it gives me the power to shoot my foot off too...I still prefer it over say svn or hg, my personal opinion.
It is simple because a simple explanation is possible. Simple doesn't mean "intuitive to proverbial grandmother".
Git is clever, moderately novel and therefore unfamiliar (depending on your background), and simple. There are not many concepts present, and the concepts that are there are not difficult to understand, but those concepts need brief introduction because they are concepts that many will be unfamiliar with.
If you buy a checkers board it will come with a (very simple) rulebook. You aren't born with some sort of natural checkers ability, you have to learn it. Nobody would claim that checkers isn't simple though.
That's not the strongest argument. If you want to get anything done at a reasonable level, checkers is hard (American checkers is easier than international checkers, but neither is really simple).
Git may well be similar: relatively simple rules, yet hard to use proficiently.
Checkers is a very easy game, but with those incredibly simple rules you can get complex behavior. For an even more extreme example, you can look at Go. Git is similar, except there is no competition/competitor there to befuddle you. From the simple components/rules (you've got what, you can perform incredibly complex operations that are infeasible with lesser VCSs.
I think in many ways it comes down to the way people think about and use the product. For me svn and perforce are way more of unholy messes than git is and I find them considerably more frustrating to use.
Gitolite (where this is hosted) is actually pretty cool too. For those who don't know what gitolite is, it is software to works in tandem with git-daemon, that basically allows you to run a centralized git sever with access rules.
Moreover, if you integrate gitolite with redmine (easy to do with a plugin [1]), you get a great corporate-leve, self-hosted, easy-to-use team management tool for programmers and alike.
There's also gitlab, which is a github style app you can run locally. It used to use gitolite internally for access control, but it is now using its own access control system.
I've used both. Gitolite does everything on the command-line, including access management.
Gitlab is essentially an open-source clone of Github's web UI. Of the two projects, I think Gitlab is harder to deploy and far more resource-intensive on the server, but easier for users.
I really did not expect that. That's one of the best unexpected funny things I've experienced this week. I don't think I would have found it as funny if I had stumbled on it randomly; the discussions here provided the perfect context for this link.
Great article. Just one rather glaring omission: only a single mention of the index, and that only in passing. I have used git for years, and I still don't understand what the fleeping index is supposed to be for. What can you do with the index that you can't do with a branch? And why is it called the index? (And why is it git add -a but git commit -A? Or maybe it's the other way around?)
Not to be cruel, but I don't understand how you've used Git for years without understanding what the index is. There's more than a few learning resources for Git online to satisfy your curiosity. Anyway, to give an abbreviated explanation:
The index is a staging area for your commits. When you use `git add`, changes in the working directory are staged (prepared) for the next commit. If you pass the `-a` flag to `git commit`, Git will stage all changes to files that it is already aware of. (Recall that new files are untracked and must be manually added to the index the first time they're committed; `-a` won't add those files because Git doesn't already know about them.)
Why have a staging area instead of just creating a commit directly from all the changes in the working directory? It's basically a sanity measure for organizing commits if you're ever anything less than a perfect developer. If you make a bunch of changes and later realize that there's more than one "unit of work" represented in those changes (however you choose to define those units), you can selectively add files to the index to create commits that make sense. You can even use the interactive mode of `git add` to selectively stage changed sections within a single file. If you care about the benefits of sensible commits -- bug hunting with bisection, ability to run `git revert` to undo a logical unit of work -- then the index is your friend.
Well, I'm being a little facetious. I do understand what the index is and what it's used for (but not why it's called the "index" instead of, say, the "stage"). What I don't understand is why the index exists as a separate abstraction. You could have the exact same effect by, for example, doing a git stash, and then popping changes out of the stash into your (now clean) working directory. The WD in effect plays the role of the index, and you get the same result, but with fewer abstractions, fewer commands, and less confusion.
But I hold Linus in high enough regard to take very seriously the possibility that the index is a reflection of some deep wisdom that I have missed. That's the real reason I raise this every now and again.
I'm not sure I see how you could use `git stash` to accomplish the same thing. Running `git stash && git stash pop` is almost a no-op, so you don't get the benefits described above. Am I missing something?
I thought you could pull individual files out a stash, but it seems I was wrong about that. But it's not hard to imagine a variation of git stash that allowed you to select individual files from a stash to pull back into your working directory.
True, but even after travelling along the RCS->VSS->CVS->SVN graph for the past 20 years I still find git baffling sometimes. I like it but the question is whether I want to invest my brain resources, which are a scarce resource at age 47, in learning a revision control system that can drive you into a cul-de-sac when using even the most basic workflow.
The model behind Git, the way the repository works/conceptual model, is incredibly simple. The only trick is learning which commands are used to make the changes you want, and that can take a little time (and Googling).
I thought I would try that, but I'm probably still not really thinking in Git. I never branch or merge for example, since I never think of doing that. I understand this isn't really normal git usage.
Me neither. But I think I should. Every time I want to try something new, take a new direction in code or add a new feature I should branch, so I can safely add or remove pieces of code. Things not related to that new feature, like bug-fixes, should be done in the main branch and then get pulled into the feature branch. I'm just too messy with my commits to do that, because it requires making frequent smaller commits, instead of huge ones once in a while.
I am familiar with that approach, but it's really not how I work. If I'm not done with a feature, I don't check in. And I've never needed to work on two features at the same time.
I think that Google is probably the best way to go, because not only will you get the man pages as high ranking hits, but you'll also get great hits from sites like StackOverflow and blogs that can explain things better.
I don't get this attitude. The DAG is exceedingly simple and I think can be taught reasonably in only a few minutes. From there you really only need to teach a very minor amount of the UI, and teach the user how to perform "I want to do this to the DAG"=>"This is what I type" translations on their own. I've seen all of this done well in sub-hour presentations.
Mercurial on the other hand has a pain in the ass datamodel (so much so that most introductions to it that I have seen do not even approach the topic), so you actually do have to learn all of the UI commands to get an idea of what can be done and what cannot be done. It is far more complex than git.
I really cannot think of a simpler VCS than git. I've used plenty, but never got off the ground faster with anything else.
I agree with the model being great. The git command itself is abominably baroque in its user interface (inconsistencies and strange defaults abound), but I've gotten over that with more effort than I'd like to admit.
People who refuse to see the ugliness in git are the same people who think it's manly to live on the command line. Using git through Eclipse looks a lot like using SVN through Eclipse. If I right-click on a file and go Team > Replace With > Remote and select 'origin master' is saves me from trying to remember the obscure list of flags I need to pass to the command line to do the same thing.
Refuse to see the ugliness? No, I really don't see it. Sure, a few flags could be cleaned up, but the beauty of the rest of git more than offsets a few weirdly named flags.
Meanwhile git through eclipse causes nothing but trouble as far as I have seen. Making it seem like SVN is exactly the problem, git isn't like SVN so if it seems that way, something is going wrong. Pretending git is something that it isn't will bite you in the ass sooner rather than later. The disappointing part is that there isn't any technical reason why git integration in eclipse couldn't be good, it just isn't currently.
The fact that I can select files to add/commit with checkboxes and permanently check a box that will push every commit is exactly what I want 99% of the time. I know that isn't the "git way", but the "git way" has no value for the kind of projects I work on.
The "git way" is just whatever way you want. If you want a central server that you always push to, that is fine; there is no problem with that.
The problem is with that particular tooling. That tooling presents a workflow that is perfectly fine (though it is problematic that it does not facilitate alternative workflows, which becomes particularly problematic when working on a team with other users), but it obscures what is actually going on and executes that workflow imperfectly, generally falling over in rather novel ways. When it fucks something up, and it eventually will, you will need an understanding of the basic concepts underlying git to figure out what went wrong. I'm not saying you need to know how to use the default git porcelain, I'm saying you need to be aware of the concepts underlying git.
I find that your comment about mercurial's data model a bit ridiculous.
The reason why most introductions to mercurial do not mention its data model is because it is _not_ important. You really do not need to care about it at all on your day to day use. I've been using mercurial for years and I've never had to ask myself what is mercurial's data model.
IMHO the reason why git forces you to understand its data model is because its UI is terrible. It is a failure of the tool when you need to understand how it works internally to use it. If git's UI were better its awesome data model would be something that only git devs would need to understand.
Git is simpler than any other VCS, but VCS itself is not simple concept to understand. That's what he's saying, and I agree. I use git all the time, but it's not easy to understand the whole version control thing as a beginner. Also Git is not just about DAG, you need to be able to understand the decentralized nature, etc. In that sense, some people may find SVN easier to understand.
Exactly what I meant. If you're working with people who need to learn what a DAG is in the first place, that adds some overhead. After that you still need to understand that your working directory is just a scratchpad, and branches aren't actually like real-life tree branches but instead they're pointers to the ends of those real-life branches, etc. Not exactly simple.
Conceptually speaking, a centralized VCS -- one where you check out files, make changes, and check files back in when you're done -- is vastly simpler than Git. Sure, it's also much less powerful (and I would choose Git a million times over such a system), but it's definitely simpler.
I have a much easier time explaining Mercurial. And I will keep using Mercurial and keep improving Mercurial until we all realise how much better it is than git because it's easier to explain, it's just as fast, and no less powerful.
really? its not that complicated, it just has lots of bad defaults in my experience
(i use mercurial when i start new projects, not because its fundamentally better - but because it has sensible defaults and i don't need to configure it our get bitten by gotcha x for the nth time - e.g. i can revert a merge without reading a document or configuring anything - a vital feature of source control imo)
This has been danced-around in the comments, so I'm just going to say it: I don't need the concepts of git simplified, I need a better explanation of how git's bizarre command set maps onto the obvious DAG/filesystem operations.
... that is not simple. But I'm figuring it out anyways.
I commit my changes to my own repo and keep building changes under HEAD, committing as I go. If I get a branch from a buddy and I want to add it to my code, I either merge or rebase depending how I want his commits to be intertwingled.
Because GIT is decentralized, there's no difference between merging in a buddy's branch, and import the latest changes from Origin into my branch. So I fetch the changes from origin/master and then rebase or merge my repo on top of that. That's my "Get Latest" command, basically, right? Assuming I'm working on "master", I fetch then rebase or merge origin/master.
To check in, I tell the origin server to take my stuff and then rebase or merge its master with that.
I still feel like this is a rather baroque approach to the problem... managing oodles of local commits separate from rebase/merges seems bizarre, above and beyond the decentralized approach that makes my own repo, my peer, and the "origin's" stuff all equivalent.
The decision of when to merge vs. rebase is still confusing to me.
> The decision of when to merge vs. rebase is still confusing to me.
I'll take a stab at this. When you merge, you're merging their branch into yours. At the end of you have both pieces of code, but all the history is still there. You can see that their branch started 20 commits ago, that two things were done in parallel, and that they came back together with the merge.
Rebase is a different. Rebase changes history. Rebase is like this:
1. Find the common ancestor between the branches
2. Extract each commit from their branch, turn it into a patch (named patch*N*)
3. Switch back to your branch
4. Apply patch1, patch2, ... patch*N* in order
Now it looks like all the work was done at the end of your brach, like it was based on your latest commit. Instead of being based on the common ancestor commit, it is now based on your commit. It's been rebased.
It makes sense if you think about Git coming from kernel development. Developers tended to work in patch sets, a long series of patches to a common base. Rebaseing is just applying that series of patches to a different base.
Merging merges branches, but rebase is more like moving branches.
One small point: on check-in, I don't think it's exactly that origin is also rebasing/merging just like you did. It's more like origin is just taking whatever you have and copying it exactly. The merging/rebasing process itself only happens locally.
Regarding merge vs. rebase, here's my approach: rebase to keep history a straight line when it's just your changes and it's just a few commits. If it's too many commits you tend to have more conflicts and it's usually easier to merge.
> One small point: on check-in, I don't think it's exactly that origin is also rebasing/merging just like you did. It's more like origin is just taking whatever you have and copying it exactly. The merging/rebasing process itself only happens locally.
But then what happens when I merge/rebase locally and publish at the same time as somebody else? I assume the origin doesn't keep a lock on the whole mess while I'm doing my local merge/rebase. That sounds like it would lead to a "last-one-wins" conflict-resolution or a complete reversion of the origin if I publish without rebasing on the origin's version first.
> I assume the origin doesn't keep a lock on the whole mess while I'm doing my local merge/rebase.
You're right that the remote server has no idea what you're doing on your local machine.
> . That sounds like it would lead to a "last-one-wins" conflict-resolution or a complete reversion of the origin if I publish without rebasing on the origin's version first.
Unless you use a --force option, Git will refuse to do anything that will lose information. When the second person pushes, Git will say "You can't get here (new commit) from there (their commit). Nope." In that case it's up to you to pull their changes and either merge or rebase your changes. At that point when you push your new commit will be a descendent of the server's latest copy, and it willy happily accept it.
Now Git's messages aren't that friendly. The message you'll get is something like 'Unable to perform a fast-foward merge'.
So yes, if you try to push without pulling down changes, then you will get an error about the histories having diverged. Sometimes you don't care about wiping out history, and can just push with the -f parameter, but most of the time that's your cue to rebase.
Branches are cheap in git, and there's no real advantage to doing dev work in the master branch. You should create a branch at least as often as you develop a new feature. In git, history isn't necessarily a static thing. Sometimes you need to mangle it to achieve your goals, and sometimes you fat-finger the merge and do time in History Hell. In either case it's only a huge flaming deal if you're working on master. Ditto with the problem you describe.
I recommend the O'Reilly book on Git. The simple explanations are nice in theory, but this is a complicated subject and deserves a full explanation. There are very sound reasons for the how and why of git, and they should be within the grasp of any aspiring programmer.
I'm not using git at all, I'm trying to understand how it works before I start. Every guide to git I read before was a completely opaque mess of terminology, so this is the first time it's starting to make a lick of sense.
I think Git is much easier to understand if you understand the underlying data model. The core object in Git (as far as you need to care) is a commit. Each commit holds a link to the commit(s) it was based on. If two commits have the same parent, you have two branches. If one commit has two parents, it's a merge. Individual commits don't know what branch(es) they are a part of.
Branches are just pointers to commits. When you make a new commit on a branch, the new commit is saved and the branch pointer is moved forward to point to the new commit.
Tags point to branches, but they don't get updated. They're supposed to be permanent (but can be modified if you try). If a tag points at a commit and you make a new commit, the tag still points to the original commit.
The only other odd term would be HEAD, which is like a branch that ALWAYS points at what you have checked out at the moment. You'll see this if you make commits without being on a branch (say you checked out an old commit and just started working).
Since branches are just pointers at commits, you can move them around easily. If you make 6 new branches, they all just point at the same commit to start, which is why it's so amazingly fast to do. If you want to undo a commit (that you haven't pushed), you can move the branch pointer to a previous commit. After that when you make new commits it will be like the mistake never existed. (Note: If you accidentally do this, it can be fixed if you catch it soon enough).
When it comes to using branches, "A Successful Git Branching Model" [1] is very commonly used, and works fantastically. I had almost no trouble getting the other developers in my company on the model, and it makes it very easy to keep things straight.
If you'd like help understanding Git, I'd be glad to try to help you. My email address should be in my profile. You may find this kind of thing much simpler if you look at the graph of repository. My company uses SourceTree[2], which is a pretty great GUI and makes it easy for me to see how the various branches I've got relate.
It notices that origin has commits that you don't have and rejects your push. Then you pull the change that happened in the meantime, merge/rebase your change again on top of that, and push again.
If we're recommending books on Git, I recommend Pro Git. It's free online. http://git-scm.com/book
It's possible, at least after I see the concepts this way, that git has the simple design Hoare was talking about when he said:
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
― C.A.R. Hoare
I don't use Git and would like to know how the following problem is solved in Git.
Say you have a project which is a hundred megabytes big. And you have to develop almost in parallel three or four "generations" of the project -- let's say. v1, v2 and v3. In parallel means you'd like to be able to build any of the three versions without having to take the version out of the repository first. You can't say that v1 is obsolete, as soon as some bugs are reported in v1 you have to fix them in v1, v2 and v3. And every bigger version is "newer" but some features can be added in v2 and v3 some just in v3 etc.
How can you work on such a big project and have a single repository where all three versions are present, and work on these three versions in parallel (having sources which are compiled in different base directories)?
What does "take version out of the repository first" mean?
You can do a "git checkout" to get a copy out. On a modern drive, checking out a hundred meg history takes a few seconds.
You do not need multiple copies on the disk at the same time - "git checkout v1" when you are working on v2 will do only the changes necessary to make your directory into "v1", and then you can do "git checkout v2" or "git checkout v3" to get another version.
Alternatively, you can just mount your git repo as a filesystem, e.g. https://github.com/davesque/gitfuse (there are other projects - this came up on search, never used it myself).
And anecdotally, my git repos with tens of branches tend to take much less space than one checkout. git is super efficient about storage.
Just make a branch for each v1,v2,v3. Instead of having a single main, you have three. Unless you are literally typing with both hands on two different keyboards, you just checkout whichever branch you want to work on at the moment. That is a very lightweight operation.
Note that 100MB is not really big in Git; you don't really run into issues until you are in the low GB range (at which point you should probably be considering if perhaps you actually have multiple different projects in the same repository. If so, that is where tools like 'repo' can step in: http://en.wikipedia.org/wiki/Repo_%28script%29)
> (having sources which are compiled in different base directories)
With my nascent Git understanding, I think you would just have multiple branches for v1, v2... and then clone the repository multiple times so you have multiple working copies.
Check out v1 in the first one, v2 in the second one.
Although changing between related branches is usually quite quick in Git. Also, a fresh checkout of ~100mb is not a lot. At least for an SSD.
This also relies on having a centralised Git repository for you to push/pull changes to. But I believe Git allows you to synchronise multiple repositories on disk.
You're rarely developing two things at once in any given instant of time... why not just quickly check out the branch you want?
> and then clone the repository multiple times so you have multiple working copies.
This is probably not what you want. First, you should know that switching between branches in Git is insanely fast. In general, it won't get in your way.
If you clone the repository, each one is a full git repository. That means you'll triple the storage on the disk. Worse, you'll have to do 3x as many pulls to keep all 3 repositories up to date.
> You're rarely developing two things at once in any given instant of time... why not just quickly check out the branch you want?
It often comes up, but that's what we do. We may have a dozen branches on our machines (the thing(s) we're working on, recent things we worked on, the one that's been sitting for a while we're waiting on an answer to pick up again) and we can switch our project within a second or two on a simple rotating hard drive.
I know this, but GP was asking about how to do it with "different base directories" which I'm assuming is asking for an analogy to Subversion's multiple working copies (i.e. check out this location from the repository to this location on disk).
Yes you're right, the thing is, the project produces different binaries which should be accessible for all different versions, which was solved by having only different base directories, all the configuration files build to same subdirectories no matter which version. If I'd switch to "always having only one version in a single base directory" then I'd have to maintain different temporary output and binary output names in all the configuration files in every version which is quite ugly. Then I can't use any "known fixed subdirectory names" in the projects.
A Successful Git Branching Model [1] is a very common way to deal development in a good sized project. You could consider the bug fixes to v1, v2, and v3 to be hot fixes. You could choose to keep a single running branch (the equivalent of master in the model given) for each of the older versions if you keep adding occasional features.
I know I've seen guides online of how to deal with the exact problem you're describing, but I can't remember where to find any of them right now.
This is very nice as a review of Git, but probably best in that context rather than an initial presentation of the concepts. I really enjoyed the Source Control Made Easy series by Jim Weirich. It presents the same information in an easily digestible, step-by-step approach. Highly recommended for those trying to understand how Git works and how to best make use of it.
Git's learning curve feels similar to the learning curve of a programming language like Python.
Once you understand you're not going to pick it all up in an afternoon (just like a language) and that there will be lots more to learn down the road (like a language), git feels great.
Git's learning curve is far worse than Python's, simply due to the limitations of its abstractions - e.g., a branch isn't really a first class entity in Git, yet most people using the centralized server model for Git will tend to think about their daily Git work in terms of branches.
The problem only manifests when you're trying to do things like "show me only commits from branch <X>". Or, "show me when branch <X> was created from master."
Programming language is 80% of my work effort but version control is more like 5%. It should be an utility, I don't want it to be a world of its own right, I don't have needs for insanely powerful version control system because my needs are sane and limited - and that's where git is not so cool.
Version control is social and there you can see a few maniacs ruining it for the rest of the team.
Debuggers aren't much harder than pour and drink.
Dependency managers are pain in the ass (unsolved problem in CS) but you don't wrestle with them every day.
I don't use very many features of my Eclipse and I don't use terribly many commands in vim. I also use arrows, I kid you not.
It's the stuff behind the curtain that makes me love git. It doesn't just seem like version control software - it seems like the software is an interface to a much more powerful version control engine. Git just makes sense under the covers.
Nice. I'd love to see more details about the index, the way to see differences between branches with log and diff, and I think stash should be mentioned.
a lot of this is generally (d)vcs and applies equally to mercurial or even svn...
also if you think x pages of anything is a simple explanation then you missed a trick or two.
e.g if you have to explain why your arrows are pointing backwards you are doing it wrong, instead of using the standard notation for graphs and lists and stuff which are not generally well known, use what most people will understand on inspection.
> Hg folks should read this section carefully. Among various crazy notions Hg has is one that encodes the branch name within the commit object in some way. Unfortunately, Hg's vaunted "ease of use" (a.k.a "we support Windows better than git", which in an ideal world would be a negative, but in this world sadly it is not) has caused enormous takeup, and dozens of otherwise excellent developers have been brain-washed into thinking that is the only/right way.
If this is one of the most important concepts, starting it off with a slew of negativity about things the reader may be currently using (windows, hg) is probably not the best way to get them to keep reading.