Maybe this is too far outside of the question, but particularly if you aren't going to be merging the large files, have you considered mirroring them with rsync (http://www.samba.org/rsync/) instead? I'm not sure tracking really large binary files is best handled by a VCS. I'm trying to read between the lines in your question, but would e.g. periodically making dated snapshots of the binaries and otherwise automatically mirroring around the newest version suffice?
Also, has anybody had good experience importing from p4 to hg on Windows? I've tried using tailor and some scripts from the mercurial wiki, but no success yet. One of these days I might write my own importer script (mostly because I need to import from five or six major branches, about 50k commits), but haven't had the time yet. (I'm working on Windows for similar reasons. Mercurial has been great for typical VC usage.)
Our import from p4 doesn't have to work in windows. We're comfortable in Linux, we just can't develop in it.
You are correct, the binary files will not be merged. Some of them are large encrypted databases and can't be merged. The database is built based on source that may be merged, but the merge will happen in the source, and when the source merge is complete we'd rebuild the databases and check in the result. The largest number of binary files are compiled programs and very rarely change.
You're suggesting something a lot of other people in this topic have suggested. I would love to implement some kind of binary file management system; it just isn't going to happen. These binary files don't need to be merged, but they will be changing semi-frequently and are likely to be different between branches. We don't have the time and manpower to implement a system that would work for us. Either the VCS needs to handle these files or we can't use it.
I think I found a config setting deep in the dark heart of git that can make a repository friendly to large binary files. I'll check it out, and if it works we could create a set of repositories for these binary files. I'll try this and see how it goes. ( http://www.gelato.unsw.edu.au/archives/git/0607/24058.html )
I'm really trying to find a way to use Hg, but if it can't handle our use-case we can't use it. That doesn't mean it's a bad VCS. I really like some of its features and the fact that it is far simpler than git. I also really like that it has file explorer integration. It just needs to handle projects with an obnoxiously large code base and large binary files.
Did you look at rsync? You don't really need to implement anything.
My point, though, was that in some sense you're trying to find a way to bend a VCS into doing something well outside the strengths of VCSs, so it is worth looking into categories of tools better suited to the problem.
Yes, I'm familiar with rsync. We've talked with Perforce Support a fair amount regarding large files, and they remind us that we are abusing their system. It's like using Harley Davidsons on a worksite to move around loads of dirt. It may work, but it's not the intended use.
We might be able to get that to work for some of the binary files realativly easily. The problem is that the majority of these files are build artifacts for our test programs and the builds happen in the same directory as the source (yes, this is a stupid way to do things, but we need to do what customers do, and our customers do this). It would be very hard to distinguish between build sources and build artifacts. This leaves the issue of branches. Each branch would have to 'know' which of the binaries to grab in the shared space, and update other branches at integration time.
This could be a valid way of doing things, but it would take a fair amount of effort because our environment is like the real world; dirty and complicated. For the past couple of years it's been on the backlog to clean and simplify our environment, but something more important always comes up.
How do you distinguish between the build source and artifacts now? It's not difficult to specify (via filename regexes) whath should and shouldn't be examined by Mercurial as potential VC files, beyond whether or not you explicitly add them: look at the .hgignore file. (Not sure that's directly helpful, but for the archives.)
We don't distinguish between them. The test programs were built and all the resulting files were checked in. Unfortunately some of the build artifacts share extensions and directories with build source, and some of the build artifacts have no extension. The only way we could add the build artifacts to .hgignore (or .gitignore) would be to manually add them one at a time, and that would be a huge task.
You can also add * to the ignore file and explicitly specify what to track, either by hand or by "hg add [fname]" in some sort of script.
I track/sync my home directory with mercurial, and did that to keep it from scanning most of my drive for updates. (You can probably do the same with git.)
(For the archives as much as you, though I hope it's useful.)
The Mercurial page on binary files (http://www.selenic.com/mercurial/wiki/index.cgi/BinaryFiles) doesn't say anything about especially large ones.
Also, has anybody had good experience importing from p4 to hg on Windows? I've tried using tailor and some scripts from the mercurial wiki, but no success yet. One of these days I might write my own importer script (mostly because I need to import from five or six major branches, about 50k commits), but haven't had the time yet. (I'm working on Windows for similar reasons. Mercurial has been great for typical VC usage.)