Torvalds: git vs CVS

ableal · on April 16, 2010

Exercise: what are the downsides of the current generation of DVCS ?

(Assume you may be something other than a coder that carries all files in a laptop.)

amethyst · on April 16, 2010

Poor or inefficient support for large binary assets, such as compiled objects, images, or video. Especially in the context of video game development, binary assets that constantly change are highly painful to deal with in Git or Mercurial.

lnguyen · on April 16, 2010

It's not limited to DVCS or open source. Commercial tools such as ClearCase, AllFusion/Harvest, and Dimensions still handle large binary assets poorly.

I don't think you'll find a satisfactory tool that has as its primary use the management of texts and deltas of texts. You'd probably need to use a combination of tools which has its own set of frustrations.

smallblacksun · on April 16, 2010

Perforce works well with large binaries and text (costs a fortune, though).

sandGorgon · on April 17, 2010

Bitkeeper (the SCM that git was "inspired" from), claims good support for large binary assets

BAM solves this problem with a hybrid approach. BAM adds the concept of one or more BAM servers. Each BAM server contain all BAM data, but clones of the server (work spaces) contain only that data that the user requests. Typically this is the most recent version of each binary but in some cases it may even be less than that.

BAM has been successfully deployed in the game development space with great success. Game developers continue to enjoy the benefits of distributed work flow without the penalty of carrying all versions in every workspace.

naner · on April 16, 2010

How is it possible that nobody has dealt with this problem yet?

pmjordan · on April 16, 2010

Active open source development rarely meets "AAA" game development.

sshumaker · on April 16, 2010

One could argue that the traditional source control model (where everyone has a copy on their own machine) isn't a great way to deal with huge binary assets. I've worked on game projects where the binary churn was on the order of gigabytes a day. Even though everyone was physically co-located, when everyone arrived at 10am and started to update to head, a hundred computers all trying to download the days worth of 2 gigs of assets over the network didn't make for a pretty picture.

You can try to cronjob the updates, staggered around 4am. Or you can realize that the majority of people are only modifying a subset of these assets, and roll your own link-based asset manager for large binaries.

pmjordan · on April 17, 2010

Yes, absolutely, you wouldn't want everyone to keep pulling all data, git-style. The programmers usually only need the assets in the game-ready format, not the source models, textures, etc. which are typically much larger, and only needed by the artists/level designers working with them.

blasdel · on April 17, 2010

Because anyone cleverer than a doorknob uses the right tool for the job: rsync. They're almost always artifacts, not assets.

You handle versioning either through filesystem snapshots, or by just using different paths for each version.

ableal · on April 17, 2010

So, git/hg/etc are doorknobs ?

Of course not, it's not their problem. What annoys me is that the "You must use DVCS" memo is being sent around uncritically, and anyone who does not fit the use case is left out.

I think that people who deal with images/spreadsheets/data would also benefit from a good VCS tool. If the coders, having scratched their itch, did not declare the problem solved and move on ...

P.S. thanks for the polite answer. So, there is either no problem or no solution ... I think we're done here.

blasdel · on April 17, 2010

git/hg/etc are not doorknobs. My simile was comparing you, and people like you, to doorknobs. http://www.google.com/search?q=dumber+than+a+doorknob

No version control system, centralized or distributed, actually handles blobs better than just treating them as artifacts and using rsync.

The only way to actually solve the problem is to make the data not be blobs anymore, where the diff/merge/serialization code all understands at least the structural container format and can render it usefully. There's never going to be a general purpose tool that does that outside of a live-in smalltalk image (or similar).

The best you're going to get is tools vertically-integrated with the application, and all the ones I've seen to date (MS Office, Adobe Version Cue, etc.) all do a terrible job of even doing diffs/RCS, much less actually implementing a real VCS.

stevelosh · on April 17, 2010

Greg Ward and some others are working on it with the bfiles extension for Mercurial. It's a work in progress but at least it's getting the ball rolling: http://mercurial.selenic.com/wiki/BfilesExtension

ableal · on April 18, 2010

Thank you very much. I liked what I read in the design doc (http://vc.gerg.ca/hg/hg-bfiles/raw-file/tip/design.txt), which seems to be trying to automatically do the best thing possible under the hg constraints.

It's interesting that the names are "Distributed Version Control Systems", or "Revision Control", but the ghost of the word "source" (meaning code text, in 1972's SCCS) is still read into it.

Surprisingly, when DVCS are shouted as the bees' knees for everybody, people who have images, documents and data may think they were included. Apparently not, those benighted heathens should use rsync or whatever manually, and not soil the "source control" systems with their binaries. Or go find not-included plug-ins, if they can.

(Those people are apparently too dumb to understand that a binary "artifact" produced with an image editor, spreadsheet, or data analysis tool is fundamentally different from a blessed code file written with a text editor, and thus is not entitled to be under real version control, or have a DCVS tool automatically do something smart about it.)

The Perforce guys must be laughing their asses off.

ableal · on April 16, 2010

I have been trying to point it out [1], without much luck. Today I got in early by chance, and decided on the Socratic approach ;-)

[1] http://news.ycombinator.com/item?id=1201559, http://news.ycombinator.com/item?id=1219082, http://news.ycombinator.com/item?id=1222905, http://news.ycombinator.com/item?id=1242374

prog · on April 17, 2010

This is my pet peeve with all DVCSes. I tried this with bzr, hg and git and all fail. Not sure if any other DVCSes handle it.

I don't see it happening anytime soon as long as it justified using "its _source_ control". There are programming activities that require large binary files to be versioned along with source and this is a clearly a limitation of DVCSes IMO.

Apart from that I am quite happy moving to bzr from svn. Life is much easier now.

ajb · on April 17, 2010

Apparently vesta (http://www.vestasys.org) can handle large binaries (it's from the EDA world, so this is plausible). It's open source, but the extent to which it's distributed is debatable - you need a server, but anyone can install a server and they form a distributed network like a dvcs.

ableal · on April 18, 2010

Thanks, looks interesting, and it came from DEC via Compaq. It's billed as a Configuration Management System (i.e. managing builds). They do talk of managing tens of GB of derived data, which is also important.

The hardware part seems limited: "the Alpha group ... hardware description language files into Vesta's source code control", so not schematics or layout binaries as 'source'.

That site seems to be resting since 2006, but there is some activity at http://sourceforge.net/projects/vesta/ (releases in 2009, mailing lists with 2010 messages).

ajb · on April 18, 2010

Last time I looked, the IRC channel was somewhat more active than the list.

sshumaker · on April 16, 2010

With binary files, you really want an explicit checkout (exclusive lock) system - since there's typically no way to merge them. This makes the commit/merge style SCSes a bad fit for binary assets (and partially explains the popularity of Perforce, which is a lock-based SCS).

dalore · on April 17, 2010

That and Perforce is a commercial company that provides support with their product. By default perforce doesn't lock, but it knows when a user has a file checked out for "edit".

By default all files are set read only. This was a big downside when you had no internet connection.

koenigdavidmj · on April 17, 2010

Yeah, but 'chmod u+w filename' and you're back in business. The GUI has a "resolve offline work" feature that will find everything.

Of course, if you don't mark it writable (and just use :w! in Vim, for instance), then a 'p4 sync' may eat your work if anyone's changed the file.

dalore · on April 26, 2010

Ahh the resolve offline work feature must be new. It wasn't around when I was using perforce.

koenigdavidmj · on April 27, 2010

Sadly, it has no direct CLI equivalent. You may find http://p4delta.sourceforge.net/ useful, however; it's like `svn up` for Perforce.

chrisbolt · on April 16, 2010

Inability to check out a subset of a repository.

snprbob86 · on April 16, 2010

There are two good reasons to checkout only a subset of the repository:

1) Large binary or data files -- For this, there are various solutions of varying hackiness.

2) Your project has grown into several sub-projects -- The best solution here is to clone the repository to new names and then refactor and trim the respective project trees with a lot of deleting. You'll thank yourself later for improving your architecture and there will no longer be a partial check-out problem.

The real issue for both of these, is handling unversioned or independently versioned bits with respect to all the other bits. You might want some unversioned binary blobs, but you might also want some versioned giant csv files without having to download all the old versions all the time. Source code has one versioning strategy, data files another, photoshop files yet another. No version control system today really lets you pick which versioning strategy to use for which files. And none of the existing "sub-module" type extensions are really great either.

alextp · on April 17, 2010

Actually, the worst problem with submodules in modern DVCSs is handling dependencies. If project A depends on project B and project B and the API exposed by project B varies over time it is pretty easy to keep both in sync and working. Except, when someone needs to check out an older version of A they need to figure out which version of B to check out (and it might not be as easy as checking out the latest B in the date that that version of A was committed). Git allows you to mark a specific revision of a submodule, but then updating the whole project is no longer a single command, and there are some problems with not pushing everything at the same time.

sandGorgon · on April 17, 2010

http://mercurial.selenic.com/bts/issue105

Two of the features that made it difficult to 'sell' Hg for the main SCM of FreeBSD were the lack of Partial checkouts and Partial history

viraptor · on April 17, 2010

3) You have a small patch and want to update it to the HEAD version before submitting. You worked with the source you obtained in some other way (from a source package for example). Now you have to wait 10+ minutes on a large project, downloading all the changes since 1970, just to change the newest version of a single file.

4) The project has grown into several sub-projects - but you don't control that repository.

cma · on April 17, 2010

re #3:

"shallow repository A shallow repository has an incomplete history some of whose commits have parents cauterized away (in other words, git is told to pretend that these commits do not have the parents, even though they are recorded in the commit object). This is sometimes useful when you are interested only in the recent history of a project even though the real history recorded in the upstream is much larger. A shallow repository is created by giving the —depth option to git-clone(1), and its history can be later deepened with git-fetch(1)."

There are still issues; you can't commit for instance, but you can update to HEAD to update your patch, with a depth of 1 you only get the most recent changes.

dasil003 · on April 17, 2010

I agree this is a definite weakness compared to svn, but the problem is that svn gains this facility by having no concept of a project. Let me repeat that: in svn it is technically impossible to identify the project tree.

The problem with that, of course, is that merging only makes sense if you are merging branches that occur at the same tree-level in a project. This is why subversion merge tracking is so buggy and half-baked... because any given directory could be a project tree, a subtree, a branch, or a tag, and you could merge any of them. Sure if stick to certain conventions it works pretty well, but technically the whole implementation is a minefield, which is why svn will never be as robust as other systems that don't make this mistake.

Even though git's submodules and subtrees leave a lot to be desired, and have significant room for improvement, they will never be as convenient for partial checkouts, because the requirements for svn-style partial checkouts require crippling the entire system.

prog · on April 17, 2010

bzr has support for filtered views[1] if thats what you mean.

[1] http://doc.bazaar.canonical.com/latest/en/user-guide/filtere...

janitha · on April 16, 2010

For this reason, I am loving svn-git

koenbok · on April 16, 2010

- When you work with large binary files that don't diff well (psd), it's a pain to have the entire repository history local.

- Hard to explain to even fairly technical designers/copywriters.

- With all that easy branching, I sometimes just have a hard time getting a decent version of the codebase together from everyone.

dustingetz · on April 16, 2010

i used SVN to track large binaries and it was morbidly slow. maybe i was doing it wrong. i thought the solution was to keep binary assets under a separate version control better suited for files that don't diff; a nightly snapshotted filesystem seems just fine.

ableal · on April 16, 2010

Seems that's why Perforce is in business (e.g. http://news.ycombinator.com/item?id=1242374 )

th · on April 16, 2010

Complexity (in terms of the operations that need to be performed regularly). At least with Git, simplicity seems to be traded in favor of flexibility.

prog · on April 17, 2010

bzr, hg and darcs are not too complex to use.

ionfish · on April 16, 2010

No way to move files between repositories and retain their history. Luke Palmer talked about this a little here:

http://lukepalmer.wordpress.com/2008/11/12/sketch-of-udon-ve...

blasdel · on April 17, 2010

It is absolutely possible to do this in git. It helps if you forget about "files" as something to "move".

I did it in the project I'm working on now! It originally had one main repository, and another for an extension that was used as a git submodule in the main one. Active development proceeded in both, with shitloads of commits in the main one just to update the reference to the submodule.

I decided this was retarded, so I took a clone of the extension repository, used filter-branch to rewrite it's entire history so that all the paths were always prefixed with "vendor/extensions/project_name/", and pushed it as an unrelated branch into the main repository.

Then in the main repository I made a new branch, removed the old submodule junk there, did a nice clean merge that melded all the commits from the extension branch going back in time, and made that the new master branch.

windsurfer · on April 16, 2010

  git filter-branch --subdirectory-filter foodir -- --all

_delirium · on April 16, 2010

Well, a big one is probably the same one: politics. Instead of politics over commit privileges and commit rules, you have politics over who controls mainline, who controls which staging areas, what their rules are for pulling or accepting pushed patches, etc.

bdonlan · on April 16, 2010

The point is with centralized systems, only committers can do things such as branching or preparing commits. With a DVCS, everyone gets the same tools, so the rights granted by having the patch accepted are simply having it merged, nothing more or less.

_delirium · on April 17, 2010

That's true, and I think it is an improvement when it comes to the ability to do local forks. But most of the big fights I've seen in CVS/SVN-based projects ultimately come down to who controls mainline, and a DVCS doesn't solve that problem at all. If anything, it makes it easier to put even more politics on top of it, where instead of having a fight over getting into the single repository, you now have to navigate whole cascading set of repositories (e.g. Linus doesn't generally pull directly from you, so you have to deal with sub-gatekeepers before you get to the main gatekeeper).

I suspect Linus likes the approach not because it has no politics, but because it encourages the politics that match the way he'd like to manage the Linux kernel development, with this style of hierarchical, cascading commit approval, which is pretty hard to set up in CVS (though Mozilla's sort of grafted it on via their Bugzilla, which keeps track of cascading approvals of patches based on who owns which areas and sub-areas). Not that that's necessarily a bad thing, it's just different (and for a project as large as Linux, probably necessary).

yummyfajitas · on April 16, 2010

Lack of a good way to edit recent history.

Suppose I sit down for a collaborative session with someone. Over the course of a few hours, we might generate 10-100 commits between the two of us, and several branches, all to implement a single feature. I.e., I commit, say "hey, pull from me, you had an off by one a few minutes ago."

At the end of the session, we only want to generate a single commit to the public repositories.

Mercurial queues are a tolerable way to handle this, but they aren't great.

amethyst · on April 16, 2010

Git makes this easy. It supports "rebasing" commit trees, and for what you want, a "squashed" rebase is absolutely trivial to do.

vtail · on April 16, 2010

git rebase --interactive <commit> should help

http://stackoverflow.com/questions/598672/git-how-to-squash-... and

http://stackoverflow.com/questions/435646/how-do-i-combine-t...

dalore · on April 17, 2010

I do this all the time with git. I create a branch, do all my incremental commits, get it all working. Then I merge it into the mainline as one squashed commit. There a few guides on how to do it with the command line but I'll admit I've gotten lazy and now just use smartgit which lets me do it easily without having to remember the commands.

prog · on April 17, 2010

git already has this in core and bzr has a rebase plugin[1]. I think this problem has been solved for quite some time now :)

[1] http://doc.bazaar.canonical.com/plugins/en/rebase-plugin.htm...

cool-RR · on April 17, 2010

I would want something pretty superficial: An ability to organize branches in folders.

maw · on April 17, 2010

bzr can do that with its shared repos. It's not too hard to fake it with git, either. (This assumes you know how to use git, which without a doubt is hard, despite what its apologists may tell you.)

jedediah · on April 17, 2010

Terrible submodule support.

AngryParsley · on April 17, 2010

But if you have hundreds of developers, and you have a dynamic trust network (I trust some people, they trust others, and we all tend to trust people more or less depending on what they work on), the CVS model is absolutely HORRID. It just doesn't work.

I can count on one hand the number of projects described by that sentence. No surprise, the Linux kernel is one of them. Linus built a tool to satisfy his needs. But most developers work in smaller groups, and these groups have explicit trust. Working at a company, the trust network isn't dynamic. Even in large open source projects, most of the commits are by a handful of individuals. It's not a big deal if the occasional one-time contributor e-mails a patch.

But most of my gripes with git don't have to do with its ideas. Although it's distributed revision control, in my experience everyone designates one repo as authoritative. Like centralized RVCs, certain users are explicitly granted write access to said authoritative repo. So git ends up working like an svn repo with a ton of branches.

My complaints about git have to do with its interface. Coming from svn, git is very frustrating to use. Certain benign commands in svn will erase your data in git-land. For example, "git checkout filename" is the equivalent of "svn revert filename"; it erases any uncommitted changes. Of course, git has a revert command as well, but it doesn't behave like other RVCs. Git checkout can bite you if you have a branch with the same name as a file or directory in your source tree.

My biggest annoyance is if I accidentally commit and push something. Usually it's when I forget which branch I've checked out. Undoing a commit/push means rebasing or resetting, and that's where git drives me insane. I have used subversion, CVS, and even Visual SourceSafe, but only in git have I lost previous commits. Again with the misleading terminology. Why call them commits if you can destroy them with a single command?

Deestan · on April 17, 2010

> Undoing a commit/push means rebasing or resetting, and that's where git drives me insane.

To look on the bright side first, Git is the only(+) DVCS in which you can clean up these kinds of mistakes without cluttering the revision history with "revert" and "oops sorry" changesets. :)

> I have used subversion, CVS, and even Visual SourceSafe, but only in git have I lost previous commits. Again with the misleading terminology. Why call them commits if you can destroy them with a single command?

The changesets are not really destroyed; you have just redeclared the official revision history not to include them, so they aren't visible. The changesets are still alive within the database until the garbage collector picks them up at a later date (then they will be destroyed), and you can access them via looking up the revision ID in the revision log (or sometimes just by scrolling up the terminal window). Once you have the revision ID, you can tag it (or declare it a new branch) so you don't lose it until you're done with the cleanup. Then you can cherry-pick, rebase, merge or do whatever you want to fix the erroneous commit.

This bit of Git causes a lot of headache among new Git users for the first few weeks (me included), since you need to understand the underlying database and really toy around with it a lot to understand how to do stuff like this properly. Though once I finally got it, Git replaced HG as my favourite VCS.

(+) That I know of at least. :) HG supports a "rollback" command, but that only covers one changeset, and you can't really use it if you have pushed your changeset or if it has been pulled by somebody else.

parbo · on April 17, 2010

It sounds like you'd like Mercurial better. The interface is much more SVN-like.

AngryParsley · on April 17, 2010

I totally forgot about Mercurial, probably because I have no bad experiences associated with it. I haven't used it much, but the times I have used it have been forgetful. I think that's a good thing when it comes to revision control.

ido · on April 17, 2010

I tried git only to be frustrated by its interface and come back to svn.

A few months after that I tried hg and never looked back.

cschep · on April 16, 2010

This is a super level headed response from Linus on a topic that he usually rants about. That's pretty cool.

rabidgnat · on April 17, 2010

I follow the Git mailing list, and he's even-keeled and surprisingly helpful in discussions. The tone of the original post is representative of what he normally writes.

I'm sure he's no saint, but his most famous rants were responses to sniping. The Minix/Linux debate was started by Tanenbaum, Bram Cohen picked a fight over merge strategies in a list discussion, and the 'C++ sucks' rant was in response to some flamebait.

Auzy83 · on April 17, 2010

We just made a selection of version control system at http://getnightingale.org, and Git isn't that great either (it's over-hyped). CVS does suck (and SVN does too for our purposes). But Git requires that users either use cygwin or install half a linux environment in Windows. Just because Git is coded by Linus, doesn't instantly make it a good product. In the case of Git, I REFUSE to force newbie windows developers who want to mess around with our project to be forced to install 2 linuxy environments, or have to integrate it into mozilla build (every other system is just a simple file you can add to path).

We settled with Mercurial, because hgweb isn't that memory intensive and our 512MB RAM prgmr VPS can handle it (although, we hope to upgrade the VPS, to allow more checkout's simultaneously). SVN/CVS also consume little ram on the server too though.

People who wish to make a selection should try them all out, and ask around. Because whilst Git users are very passionate about Git, I couldn't find a single one on IRC who had recently tried mercurial or Bazaar. Furthermore very few (if any) actively used Git in Windows

But that's just what I found. I didn't run proper benchmarks and things would be different if we had a better server (Loggerhead for bazaar wanted 2gb when running).

0x44 · on April 17, 2010

  But Git requires that users either use cygwin or install half a linux environment in Windows.

That was true, but is no longer. There is a Windows port of git called MSysGit available on google code: http://code.google.com/p/msysgit/ .

Auzy83 · on May 3, 2010

MsysGit is still a huge collection of random packages, many of which you don't need for other revision control systems. No it isn't cygwin, but you require 130mb of linux packages still to install it.

It's nowhere near self contained. Bzr,Hg,cvs,svn (and others), are just a small directory, and need none of those. MsysGit is overrated too. Usable yes, but ideal? Hardly..

ido · on April 17, 2010

Off topic, but I suggest you add some 'about' page to your website.

After clicking around for a couple of minutes the only thing I learned about it was that it has something to do with Songbird (which google tells me is a media player).

Auzy83 · on May 3, 2010

That's in progress still.. Voting on website designs still..

mgunes · on April 16, 2010

This is effectively a summary of his Google Tech Talk on git.

http://www.youtube.com/watch?v=4XpnKHJAok8

clemesha · on April 16, 2010

"So one of the worst downsides of CVS is _politics_. People, not technology."

Oxryly · on April 16, 2010

Although it's not as if switching to distributed revision control will eliminate politics. It just transforms them and pushes them around. Who owns the mainline of a project? Why won't s/he accept my patches? etc.. etc...

thristian · on April 17, 2010

Of course it doesn't eliminate politics, but it cleanly separates technical concerns from political ones. With a centralised version control system, technical problems can have political ramifications ("$LEADER said she'd give me commit access three weeks ago and she still hasn't; sure she claims she's suffered a server crash but I think she just doesn't like me") and vice-versa ("$COMMITTER didn't like $OTHER_COMMITTER's contributions, so they started a revert war").

stromhold · on April 16, 2010

i feel thats true for most software failures, bugs not getting fixed, priorities being shuffled, blame being shifted etc etc.

ableal · on April 18, 2010

I'll leave here a short note on a test I ran with mercurial v1.31, it may give someone ideas. The document was a 45 slide OpenOffice presentation (.odp, OO.org v3.1, Linux). Large font text, nearly all pages had one or more images. Did not delete/replace pics more than a couple of times, but did reorder a bit, especially at the end.

Surprisingly, reordering did not affect much the size of the hg store, which is only 60% over doc size. That may be either because hg is being extremely smart about the content, or because OO doesn't move binary chunks around after they are inserted in the file (more likely).

On each commit, a script noted changeset number and output of 'ls -s' on the doc file and the .hg/.../_file.d store. Only started at 11, and trimmed most of the lines.

    c.set  file   file.d
    11     320     632
    15     488     944
    20     736    1336
    25    1056    1804
    26    2044    2800
    29    2336    3260
    30    2336    3480
    31    2356    3560
    35    2688    4104
    38    2852    4388
    39    2848    4468
    40    2856    4540
    41    2924    4676

nearestneighbor · on April 16, 2010

I program solo, and for fine-grained version control, I use comments and conditional compilation, which I remove when the time comes. I also take periodic tarball snapshots of my code just in case.

Before you downmod me to -∞ for my uncouth approach, consider this:

According to Linus, Git > tarballs > CVS > SVN (he made a statement about tarballs being better than CVS somewhere else). That leaves Git and tarballs.

Now, Visual Studio is my primary development environment, and it does not integrate with Git (as far as I know), and Git just isn't well-supported on Windows (there are some fragile solutions). Secondly, I probably spent a whole day playing with Git where it's supposed to shine (OS X with GitX), and I just find it kind of awkward and unintuitive for no benefit.

solutionyogi · on April 16, 2010

I use Git on Windows with Visual Studio and I don't face any issues whatsoever.

Yes, Git is not integrated with VS but I have no trouble switching to explorer and commit my changes using TortoiseGit. TortoiseGit scans my working directory and presents list of all the changes made in a session.

And yes, to be on safer side, I copy and store my working directory + Git to a different location.

nearestneighbor · on April 16, 2010

Check out these replies:

http://stackoverflow.com/questions/1500400/is-tortoisegit-re...

It's not just the fragility and lack of integration, but also I'm just not seeing much of a benefit to counterbalance the complexity and awkwardness.

solutionyogi · on April 16, 2010

Well, I have been using it for 'production' work for last 6 months and I have not found any issues. I am not suggesting that everyone will have the same experience but I think you should give it a try before dismissing it outright based on someone else's experience.

Also, if you could provide some real details on the 'complexity and awkwardness', I could share my experience which could be helpful.

nearestneighbor · on April 16, 2010

> give it a try before dismissing it

Giving it a try can only prove the presence of fragility, not its absence :-)

Generally, people are very reluctant to criticize "hip" tools like e.g. Clojure, Haskell, Google Go, Git or its accessories. So, when 3 out of 4 people say they had problems with it, to me it weighs very heavily on the negative side.

wvenable · on April 17, 2010

For the record, I agree with you on Git (on Windows at least). It's improved at a rapid pace but it's still a bit awkward.

However, that's no reason to dismiss all version control.

nearestneighbor · on April 17, 2010

I'm glad you agree, because for the rest of the equation (Tarballs > CVS > SVN), Linus agrees with me. So you see, I embody the combined wisdom of both of you, as far as version control goes.

wvenable · on April 17, 2010

You have his equation wrong. He says in this article that SVN is still better than CVS. And really, I would agree that tarballs are better than CVS. So the equation is more likely SVN > Tarballs > CVS. Don't put words in Linus's mouth.

Also Subversion no longer stores it's data in a database so Linus's objection in this article has been resolved.

Finally, Linus's needs are pretty unique in the world. Linus isn't satisfied with Subversion for the same reasons it might work perfectly well for you.

nearestneighbor · on April 17, 2010

> He says in this article that SVN is still better than CVS.

He says SVN is better, but is more fragile (which for source control, I interpret as being worse):

    SVN fixes (supposedly) those "implementation 
    suckiness" issues. ... 
    I think it's also a much more fragile setup and 
    there's apparently been people who lost their 
    entire database to corruption

Even if SVN = CVS, clearly Tarballs > SVN, according to him. His actual quote was Tarballs >> CVS. I can dig it up if you can't.

wvenable · on April 17, 2010

Thousands of companies (and millions of developers) use Subversion. It's 10 year old. It's an Apache project now. It's solid. It's a simple and easy to use tool that will make your life better. That's all.

If you find that personally offensive, so be it.

nearestneighbor · on April 17, 2010

Why do you say that I find this personally offensive?! You are not making a lot of sense. I'm just quoting Linus who probably had more experience with various VCSs than any of us.

koenigdavidmj · on April 17, 2010

Linus was more ranting that Subversion uses a binary database. If it gets corrupted, you're screwed.

CVS's database is just RCS files, plus a little. Nice and easy to restore if Bad Things start to occur.

wvenable · on April 17, 2010

Subversion now use text files for storage too. You have the option of that or the Berkley DB.

ableal · on April 16, 2010

I'd say that mercurial ( http://tortoisehg.bitbucket.org/ ) or bazaar ( http://doc.bazaar.canonical.com/migration/en/why-switch-to-b... ) might be good for you. Or even plain old tortoise-svn.

There's a certain peace of mind to be had knowing you can roll back to what was working yesterday at 17h30.

(P.S. it's easier than the tarball snapshot - right click on folder, commit, type a note. Been down that path ;-)

wvenable · on April 16, 2010

Rather than being a competition -- each version control system has it's own advantages and disadvantages. Now on the scale of version control, you're at a zero. Almost anything is better than what you're doing now. Even a solo developer should be using version control. For you, I would highly recommend moving to Subversion. It's relatively simple and has great (and free) integration with Windows Explorer and Visual Studio. And if you want to move up to Git in the future, it's not a terribly difficult transition.

nearestneighbor · on April 16, 2010

> Now on the scale of version control, you're at a zero.

After I went to all the trouble to explain that my system is the best one for me? :-)

What problem that I have will switching to subversion solve? Suppose I have two versions of a procedure and I can't decide whether the new version is faster and just as correct as the previous one. I keep both with

    #if 0
      // old one
    #else
      // new one
    #endif

And it's easy and intuitive to see them side by side and switch back and forth between them until I'm sure. The VCS just don't give me this simplicity, convenience and intuitiveness.

wvenable · on April 16, 2010

> What problem that I have will switching to subversion solve?

You already have the problem (backups, multiple versions of code) you're just doing it the hard manual way. You could say the same thing about Visual Studio over Notepad -- what problem does it really solve? One is just a superior way to work. You're using stone knives and bearskins.

Version control doesn't prevent you from using conditional compilation. You really wouldn't have to change your work flow at all. But you already make tarball backups -- taking the two clicks to commit your code is going to be much simpler. And if you ever screw something up, you can always go back to a working version.

If you get around to branching and merging the full power of version control reveals itself. I'm currently working on a development branch of my software while the production branch continues to get bug fixes. When I'm ready to deploy, I just merge all those changes together.

nearestneighbor · on April 16, 2010

> But you already make tarball backups -- taking the two clicks to commit your code is going to be much simpler.

And you are saying I'm not doing it right?

I have a 1-line script that does that, automatically adding time stamps to the name of the tarball. How is checking in your version simpler?

> If you get around to branching and merging the full power of version control reveals itself.

This attitude is unproductive, as I witness from other programmers' experience: they branch and then they branch - stuff gets inconsistent, bugs get fixed in one version, but not another, and the same bugs that were fixed before, get magically released to public 2 versions later (REAL STORY).

My philosophy is MAKE BRANCHING HARD.

wvenable · on April 17, 2010

The extra click is so that you can add a description of the changes, but if you want to resist even that level of organization you could script it down to one click as well. Your system just gives you a big dumb useless tarball. For the exact same effort, you can view changes to past revisions, revert any single file to a previous version, see exactly what you changed and when. Why you wouldn't want that power for the same effort, I don't know.

> This attitude is unproductive, as I witness from other programmers' experience:

You said your are a solo developer so you're telling me if you had the power to branch and merge you wouldn't be able to control yourself? You'd just branch and branch and never merge and make a mess of the whole thing? Even though you could do that right now just using the file system?

The whole point of tracking your changes in version control is to prevent the very thing that you describe. I simply wouldn't be able to function without that ability. Our stable production version is live and we have big changes in development (over 5 months now). Without version control the bugs fixed in production would likely never make it into the new version.

nearestneighbor · on April 17, 2010

> For the exact same effort, you can view changes to past revisions, revert any single file to a previous version, see exactly what you changed and when.

Tarballs let you do that as well.

> You said your are a solo developer

Right. Those people are working on their own code base (total disaster, much of it due to branching and following the VCS "methodology". I feel like slapping their lead whenever he mentions tagging or branching. That stuff ain't free! You only have one brain! /rant)

wvenable · on April 17, 2010

> Tarballs let you do that as well.

No, they don't. I can right-click on any file and view all the changes as a diff, I can see exactly what I changed and when, and revert that file back to any previous version. You can't do that with a tarball -- at least not without significantly more effort. It's just better all around.

> Those people are working on their own code base (total disaster, much of it due to branching and following the VCS "methodology".

You're not required to use any particular methodology. I've seen people screw up with every technology in existence -- that's hardly a reason to head back into the woods and live like a cave man.

There are very few things in the field of computing science that are universally agreed on. There are dozens of development methodologies, thousands of different programming languages, IDEs, etc. The closest thing we have in this business to consensus is the use of version control (even if the exact tool to use is still under debate).

You're simply mistaken to assume not using version control is superior in any way to using it. There's nothing wrong with your own methods of development and backup but that isn't version control and it isn't incompatible with it either.

weavejester · on April 17, 2010

> My philosophy is MAKE BRANCHING HARD.

You outline problem with branches that are long-lived, but the majority of branches created in a typical Git workflow are temporary, often lasting only a few days or less before they are absorbed into master.

FooBarWidget · on April 17, 2010

So with your system, how do you view the changes between one version and another? It's possible to view changes within a single file with diff, but what if you want to see the entire set of changes, e.g. in look for a regression?

What if you're working on an experimental feature that makes your program unstable while it's still being developed, and the feature spans multiple files? If you decide later on that the feature is useless, do you modify each of those files to remove all the #ifdefs? I make a temporary branch for this and delete the branch if I decide it's a failure.

prog · on April 17, 2010

> I have a 1-line script that does that, automatically adding time stamps to the name of the tarball. How is checking in your version simpler?

The important question is, do you have unit tests for that script :)

pgbovine · on April 16, 2010

what if you have more than 2 versions, or if your #ifdef's get nested and more hairy? slippery slope to preprocessor hell :)

with git, there's a nice 'bisect' feature that lets you quickly jump back-and-forth between different versions of your code (in a binary-search-like way), so that you can debug performance issues like the one you're using #ifdefs to manually do. just check in a bunch of versions of your code and use 'git bisect' to jump between them and test each out for performance (or correctness)

nearestneighbor · on April 16, 2010

Realistically, you wouldn't normally have more than two versions: one "solid" and one "experimental". But if you do, there's #elif. I repeat: I see conditional compilation as a temporary thing. My code does not end up littered with them.

> with git, there's a nice 'bisect'

How does "bisect" know where the boundaries are? What if you change the original code a bit, like re-indent it or make another trivial change? How can you look at both versions, preferably right in the editor? What happens to time stamps when you switch between the versions? Versions cached in the IDE? Directories? (You may be surprised that Git leaves them around from previous versions)

It's all doable, but not very intuitive. Why bring complexity where there is enough of it already?

wvenable · on April 16, 2010

If you're say, changing an API, and then you've modified all the code that uses that API that's a large number of files with defines in them, right?

Your work flow is limited to the method you've chosen not the other way around. To say it works for you sort of misses the point. Version control can free you to work in ways you can't yet imagine.

> How does "bisect" know where the boundaries are?

Clever algorithms.

> What if you change the original code a bit, like re-indent it or make another trivial change? How can you look at both versions, preferably right in the editor? '

The file gets flagged in your editor as having a conflict. Inside the file, any code parts that cannot be merged are included the file (both versions) and you pick which one you want (or edit the changes together manually). It's actually very easy, very intuitive, and works with your editor. In most cases, you won't have conflicts.

> Versions cached in the IDE? Directories?

I've never had a problem with versions cached in the IDE -- probably because almost everyone uses version control it's not something that usually goes wrong. With IDE integration, it's even better. Directories are handled pretty sanely in Subversion, at least.

nearestneighbor · on April 17, 2010

> If you're say, changing an API,

You obviously take the snapshot before. No different from the more over-engineered approaches.

>> How does "bisect" know where the boundaries are?

> Clever algorithms.

Really?! You change two methods in a class, and "bisect" knows how to undo only one? No, you have to spoon-feed it, "staging" your changes (I did use GitX for a day). So it's not as simple as taking snapshots after all, is it?

Edit: typo, formatting

blasdel · on April 17, 2010

Yes, git asks that you make your commits actually make sense, instead of "oh I'm about to leave, better check in all the unrelated shit I did today"

Sounds like you'd fit in just fine with ClearCase users.

nearestneighbor · on April 18, 2010

Maybe you should have instead posted your retort when someone claimed that Git's commits are as simple as taking tarball snapshots?

FooBarWidget · on April 17, 2010

Git bisect is a binary search. You tell it what the last known good revision is, and then it'll perform a binary search for you between that revision and the latest revision. In each step of the binary search Git will ask you whether the current revision is good or bad, and in about O(log N) steps you'll know which exact revision introduced the regression.

> (You may be surprised that Git leaves them around from previous versions)

Git leaves directories around only if they contain files that are not checked in version control.

nearestneighbor · on April 18, 2010

You misunderstood the (already answered) question. I wasn't asking about binary search and its complexity.

suraj · on April 17, 2010

I have had wrong code delivered on more than one occasion because of workflow of using tarballs and/or backup folders.

If you are working on windows, there is no GUI better than TortoiseSVN. Give TortoiseSVN a try, it won't replace the #ifdef's; but it will certainly be better that tarballs for solo development.

RyanMcGreal · on April 17, 2010

>Git just isn't well-supported on Windows

Msysgit is stable and fully functional. You can even install it so that the git commands are available on the 'DOS' command line.

prog · on April 17, 2010

Irrespective of what Torvald says, I would probably choose svn over tarballs. But you should look also at bzr and hg. I don't think they integrate with Visual Studio but have good guis.

Locke1689 · on April 16, 2010

Git is not supposed to shine on OS X. Git was written with the Linux kernel and subsystem in mind. OS X is UNIX, but it is not Linux.

Try Mercurial. Git's piss-poor cross platform support has all but removed it from my development routine. Well, that and I find Mercurial's interface far more comfortable to work with.

pak · on April 16, 2010

How does git not shine on OS X? Installed via Fink or DarwinPorts, I can't find any area in which it doesn't perform equivalently to the linux builds. There is also the lovely GitX on OS X for visualizing repos and staging commits.

bsergean · on April 16, 2010

Yep, and compiling git is a piece of cake (on machines with a decent gcc).

pgbovine · on April 16, 2010

agreed, i totally hate going thru dependency hell, but on any OS X, if you install the developer tools from the installation DVD, that will give you gcc and the basic libs you need to compile git with no problems. ./configure; sudo make install

eplanit · on April 17, 2010

The longevity of CVS speaks for itself. It's not perfect, but it has served me and my projects perfectly fine for over 20 years. Anytime I've ventured over to work with some other VCS, it inevitably was a distraction and a waste of time.

Any new VCS goes through these same rants -- not able to distinguish themselves on merit, they simply attack the dominant player. With Git, one learns quickly -- oh, it's all about the Python 'community' and their politics, I see.

When it ain't broke, don't fix it!

prog · on April 17, 2010

I once moved a project from ClearCase to CVS because the performance of our ClearCase setup was _really_ eating into our time. I didn't have any major complaints from SVN as well.

Some reasons I moved to bzr are: - repository backups are free due to the distributed nature - python scripting - fast local operations

So, yes, I don't really have a big issue with svn (or cvs) but given a choice I would go with bzr (or any other DVCS).