"Nobody really understands git" is the truest part of that. While hyperbolic, it really has a lot of truth.
It's always a bit frustrating when working with a team because everyone understands a different part of git and has slightly different ideas of how things should be done. I still routinely have to explain to others what a rebase is and others have to routinely explain to me what a blob really is.
In a team of the most moderate size, teaching and learning git from each other is a regular task.
People say git is simple underneath, and if you just learn its internal model, you can ignore its complex default UI. I disagree. Even just learning its internal model leads to surprises all the time, like the blobs that I keep forgetting why aren't they just called files.
The day I got over what I feel was the end of the steep part of the learning curve, everything made so much sense. Everything became easy to do. I've never been confused or unsure of what was going on in git since.
What git needs is a chair lift up that hill. A way to easily get people there. But I have no idea what that would look like. Lots of people try, few do very well at it.
The whole point about abstractions is you shouldn't need to understand the internals to use them. If the best defense of git is "once you understand the inner workings, it's so clear" then it is by definition a poor abstraction.
Who said it's supposed to be an abstraction? The point, theoretically, of something like Git is that the actual unvarnished model is clear enough that you don't need an abstraction. The problem IMO is that the commands are kind of random and don't map cleanly to the model.
There are couple projects that try to tackle this problem by providing an alternative CLI (on top of git's own plumbing), like gitless and g2. Haven't used any of them myself, but would be interested in experience of others.
Any interface means you'll build an mental model of the system you're manipulating. How else could you possibly know what you want to do and what commands to issue?
So given a mental model is inevitable, seems reasonable that that model should be the actual model.
You don't need to understand how media is encoded to watch a movie or listen to a song. You don't need to understand the on disk format of a Word document to write a letter. When writing a row to an SQL database I don't always understand how that software is going to record that data, but I do know I can use that SQL abstraction to get it back out.
> You don't need to understand how media is encoded to watch a movie or listen to a song.
I recall the time when mp3 was to demanding for many CPUs, so you had to convert to non-compressed formats. Today you do need to know that downloading non-compressed audio will cost you a lot of network traffic. Once performance is a concern, all abstractions have to be discarded.
Exactly, if you stick to the very basics with git, you can live a happy life never caring about the internals. If you however want to dig into the depths of Git and use all its power, I don’t get why people don’t think there would be an obvious learning curve.
Same exact thing above applies to so many things in software development, from IDEs, to code editors (Vim/Emacs/Sublime/etc), to programming languages, to deploy tools, the list goes on. There’s a reason software development is classified as skilled labor and not a low end job generally. You’re expected to have knowledge of, or be willing to learn a lot, to do your job.
The difference is that the video model abstracts over the encoding, the git model does not abstract over the storage model, it exposes it. git commands are operations on a versioned blob store.
> I think the longevity of SQL has proved there's value is non-leaky abstracted interfaces.
How is sql non-leaky? To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc. To debug and improve them you need to look at the query plan which is the database exposing it's inner workings to you.
You have to know about the abstractions an sql server sits on as well. Why is it faster if it's on an SSD instead of an HDD? Why does the data dissapear if it's an in memory DB?
> To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc
No, you don’t. As far as I know, the data is stored in discrete little boxes and indexes are a separate stack of sorted little boxes connected to the main boxes by spaghetti. This is the abstraction, it works, and I don’t need to know about btrees, blocksizes, how locks are implemented, or anything else to grok a database.
You've never had to look at a query plan that explains what the database is doing internally? If not then I wouldn't consider you proficient, or you've only ever worked with tiny data sets.
Have you created an index? Was it clustered or non-clustered? That's not a black box, that's you giving implementation details to the database.
I don’t think being a professional DBA managing an enterprise Oracle installation is isomorphic to the general populace that might use git.
There’s no question that knowing more will get you more, but I think for the question of “when will things go sideways and I need to understand internals to save myself”, one would be able to use a relational database with success longer than git, getting by on abstractions alone. Running a high-performance installation of either is really outside the scope of the original point.
Those things don't generally influence how you structure the query, though - you can choose to structure your query to fit the underlying structure better, or you can modify the underlying structure to better fit your data and the manipulations you are trying to preform.
Yes, most of us will have to do both at some point, but they can be thought of as discrete skills.
This isn't a bad analogy though. Git itself is similar - once you understood the graph-like nature of commits (which isn't all that complicated to begin with), it's generally not hard to skim through a repository and understand its history. Diffing etc. is also simple enough this way.
If, on the other hand, you are working to create said history (and devise/use an advanced workflow for that), it's very helpful if you understand the underlying concepts. Which also goes for designing database layouts - someone who doesn't understand the basics of the optimizer will inevitably run into performance problems, just as someone who doesn't understand Git's inner workings will inevitably bork the repository.
You don't need to know more than sql to manipulate the data. The semantic of your query is fully contained in sql.
You may need to go deeper and understand the underline model if you want performance but sticking to normal form can make unnecessary for a lot of people a lot of the time.
You can have a useful separation of work between a developer understanding/using sql and a DBA doing the DDL part and the optimization when needed.
> Relational databases abstract away the physical nature of the disk, just as file systems do; but instead of storing informal arrays of bytes, relational databases provide access to sets of fixed sized records.
This isn't true. SQLite does not use fixed size records.
This suggests to me that a lot of people who consider themselves proficient with SQL don't know how the results are stored on disk, nor the difference between the SQL model and the actual implementation details, making them not proficient under your definition.
> That is, I am not proficient with relational databases, and I can handwave why an SDD is faster, and why data may disappear from an in-memory DB.
Because you know that information for other reasons as most people would. Just because the information is gained for other reasons does not make it irrelevant when using a database though.
> This isn't true. SQLite does not use fixed size records.
It's actually true of most/all modern databases these days. The point isn't knowing the exact structure the database uses to store it's information (even though it can be useful) but knowing how efficiently it can find the information for any given request. Knowing when a database is doing an index lookup or a full table scan is very important and I wouldn't consider someone that can't make a reasonable guess to be proficient in sql. Many of these details are even exposed in the sql, when you create an index and decide if it's clustered or non-clustered your giving the database specific directions about how the data will be physically stored.
The fact that you need to know anything about how they do their work internally to be reasonably competent at using them makes them a leaky abstraction.
SQL leaks for complex queries and schemas if performance needs to be optimized. I argue virtually all abstractions leak heavily when performance is considered, some more than others. SQL leaks relatively little in comparison to some other technologies IME.
Also, SQL has well-established processes and formalisms to design schemas which generally result in solid performance by themselves. That's what RDBMS are around for, after all: enabling efficient and consistent record-oriented data manipulation. This is quite difficult to do correctly in reality; for example, if you write your own transaction mechanism for disk/solid-state storage, you are going to do it wrong. This is genuinely difficult stuff.
There is a ton of internals that SQL abstracts so well that very few DB programmers know or (have to) care about them. Things like commit and rollback protocols, checkpointing, on-disk layouts, I/O scheduling, page allocation strategies, caching etc.
You seem to be talking about a different kind of leakiness. In my mind, there are two kinds: conceptual and performance leakiness. You are talking about the latter. Pretty much any non-trivial system on modern hardware leaks performance details. From what I understand, git's UI tries to provide a different model that the actual implementation but still leaks a lot of details of the implementation model.
I disagree with that. The point of an abstraction is to not having to know the implementation. Understanding the principles used behind will always lead to a much better use of your abstraction
I'd also say an abstraction could be carrying its weight even if it only reduces the amount you have to think about the implementation details when using it.
To be fair, most ORMs poorly implement the "leaky" principle. When implemented well, like with SQLAlchemy, the end result is a much nicer ORM.
In fact, one of the things in common among the ORMs that have left a bad taste in my mouth is that they all tried to abstract away SQL without leaking enough of it.
Picking the ideal interface to abstract is critically important (and very hard).
In the case of ORMs, available solutions abstract the schema (tables, rows, fields), the objects, or use templates. My solution abstracted JDBC/ODBC. The only leak in my abstraction was missing metadata, which I was able to plug (with much effort!).
My notions for interfaces, modularity, abstractions are mostly informed by the book "Design Rules: The Power of Modularity". http://a.co/hXOGJq1
This might sound a little out of touch, but am I the only one who doesn't think git is that hard? It is a collection of named pointers and a directed acyclic graph. The internals aren't really important once you have that concept down.
That said, I do feel some "porcelain" git commands are poorly named and operate inconsistently -- compared to the plumbing of the acyclic graph concepts which is good but limited.
I mean, one of these looks just a little more straightforward than the other, doesn't it?
Also, a cursory test in a local git repo just now showed that command seems to print out only immediate descendants--i.e., unless that commit is the start of a branch, it's only going to tell you the single commit that comes immediately after it, not the timeline of activity that fossil will--and all it gives you is the hash of those commit(s), with no other information.
I use git myself, not fossil, but if this is something you really want in your workflow, fossil is a pretty clear win.
I don't know why they have the need to retrieve the hash of the descendant commit, but usually what I'm doing is: I use a decent visual tool and just follow the branch (sourcetree).
`git log` stays in the current branch unless you give it the `--all` option. But when you give it the `--all` option the limitation by `<COMMIT>..` does no longer work. So not a solution.
Getting only the changed filenames is a fairly specialised operation. Often in normal use you can get away with a more generic operation that comes close to what you need, but is way more common, e.g.
If you're new to tech or you've got a different mental model of how version control works, getting across the gap to git is a challenge.
My current team are mostly controls engineers, working on PLCs. But the software we're now working with has its configurations tracked in git. These aren't dumb people, they're quite talented, but their education wasn't in CS, and "directed acyclic graph" is not a thing they have a mental model for.
No you're definitely not the only one. Git is one of the simplest and dumbest tools developers have at our disposal. People's inability to conceptualize a pretty straight forward graph is something no amount of shiny UI can ever fix.
Sure, and a piece table is a simple way to represent a file's contents. But if anyone wrote a shell or a text editor that required you to directly interact with the piece table to edit a file—instead of something sane—then they'd rightfully be called out on it. It wouldn't matter how much you argued about how simple the piece table is to understand, and it wouldn't matter how right you were about how simple the piece table is to understand. It's the wrong level of abstraction to expose in the UI.
The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands. That's about it. The whole point of an SCM is that graph that you want to move away from. People have asserted your claim many times but can't ever give specific things to fix about the "abstraction."
There are about 5/6 fundamental operations you do in git/hg. If that's too much then again, there's not an abstraction that is going to help you out.
See, you're trying to foist a position on me that isn't mine—that I'm scared of the essential necessities of source control. And you act as if source control were invented with Git. Neither of these are true.
> git/hg
Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles. The tradeoff was a few minor foibles of its own, but a much better tool. It's a fucking shame that Git managed to suck all the air out of the room, and we're left with a far, far worse industry standard.
>Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles.
No, Mercurial's design is fundamentally inferior to Git, and practically the entire history of Mercurial development is trying to catch up to what Git did right from the start. For example having ridiculous "permanent" branches -> somebody makes "bookmarks" plugin to imitate Git's lightweight branches -> now there are two ways to branch, which is confusing. No way to stash -> somebody writes a shelve plugin -> need to enable plugin for this basic functionality instead of being proper part of VCS. Editing local history is hard -> Mercurial Queues plugin -> it's still hard -> now I think they have something like "phases". In Git all of this was easy from the start.
Another simple thing. How to get the commit id of the current revision. Let's search stack overflow:
The problem is, this answer is wrong! This simple command can execute for hours on a large enough repository, and requires write privileges to the repository! Moreover, it returns only a part of the hash. There's literally no option to display the full hash.
The "correct" answer is `hg parent --template '{node}'`. Except `hg parent` is apparently deprecated, so the actual correct way is some `hg log` invocation with a lot of arguments.
I would not call "hg log -r tip" a lot of arguments.
Also, on the git/hg debate, I feel I've had problems (like the stash your modification and redownload everything) more often with git that hg. I mean perhaps it tells something about my capability to understand a directed acyclic graph, but hg seems less brittle when I'm using it.
I disagree with some of your comments, is git stash really essential or unneeded complexity? That's debatable, I never use it personally.
What I don't like in git is the loss of history associated with squashing commits, I would prefer having a 'summary' that would keep the full history but by default would ne used like a single commit.
In git you can use merge commits as your "summary" and `--first-parent` or other DAG depth flags to `git log` (et al) to see only summaries first. From the command line you can easily add that to key aliases and not worry about. I think that if GitHub had a better way to surface that in their UI (ie, default to `--first-parent` and have accordions or something to dive deeper), there would be a lot less squashing in git life. (Certainly, I don't believe in branch squashing.)
The DAG is already powerful enough to handle both the complicated details and the top-level summaries, it's just dumb that the UIs don't default to smarter displays.
(I find git stash essential given that `git add --interactive` is a painful UX compared to darcs and git doesn't have anything near darcs' smarts for merges when pulling/merging branches. Obviously, your mileage will vary.)
>you're trying to foist a position on me that isn't mine
I just said you can't give specifics on what to change, because there isn't much too change.
>And you act as if source control were invented with Git
No I'm not?
>and we're left with a far, far worse industry standard.
Yeah, we definitely should have gone with the system that can't do partial checkouts correctly or even roll things back. Branching name conflicts across remote repositories and bookmark fun! Git won for a reason, because it's good and sane at what it does.
No, the reason is mercurial sucked at performance with many commits at the time, and was extra slow when merging.
Lacked a few dubious features such as merging multiple branches at the same time too.
It has improved but git is still noticeably more efficient with large repositories.
(Almost straight comparison is any operation on Firefox repository vs its git port.)
Git main target is Linux. Obviously. Performance on the truly secondary platform was not relevant and it is mostly caused by slow lstat call.
Instead Mercurial uses additional cache file which instead is slower on Linux with big repos. But happens to be faster in Windows.
And the octopus merge is used by kernel maintainers sometimes if not quite a lot. That feature is impossible to add in Mercurial as it does not allow more than two commit parents.
Which reinforces the position that git should have stayed a Linux kernel specific DVCS, as the Bitkeeper replacement it is, instead of forcing its use cases on the rest of us.
...as I get stares (okay, mostly of fear) if I point out that we need a branch in my workplace. What you can/can't do (sanely) with your tool shapes how you think about its problem space.
To emphasize that even more: Try to explain the concept of an ML-style sum type (i.e. a discriminated union in F#) to someone who only knows languages with C++-based type systems. You'll have a hard time to even explain why this is a good idea, because they will try to map it to the features they know (i.e. enums and/or inheritance hierarchies), and fail to get the upsides.
> Many people complain that Git is hard to use. We think the problem lies deeper than the user interface, in the concepts underlying Git. Gitless is an experiment to see what happens if you put a simple veneer on an app that changes the underlying concepts
> The whole point of an SCM is that graph that you want to move away from.
I think that's an exaggeration. For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead. I'm sure there are other useful ways to model DVCS too.
Whilst this is mostly irrelevant for Git users, you mentioned Mercurial so I thought I'd chime in :)
> The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands.
I mostly agree with this: Git is widespread enough that it should mostly be kept stable; anything too drastic should be done in a separate project, either an "overlay", or a separate (possibly Git-compatible) DVCS.
>For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead.
I said graph, I didn't say which graph. Both systems still use graphs. And still a graph you have to understand how to edit with each tool. The abstraction is still the same, and if you have problems with Git, you're going to have problems with either of those tools as well. The abstraction is not the problem, it's the developers inability to conceptualize the model in their head.
You said "that graph" which, in context, I took to mean the git graph.
> Both systems still use graphs
True
> The abstraction is still the same
Not at all, since those graphs mean different things. Each makes some things easier and some things harder. For example, time is easy in git ("what did this look like last week?"). Changes are easy in Darcs ("does this conflict with that?"). Both tools allow the same sorts of things, but some are more natural than others. I think it's easy enough to use either as long as we think in its terms; learning to think in those terms may be hard. For git in particular, I think the CLI terminology doesn't help with that (e.g. "checkout").
> if you have problems with Git, you're going to have problems with either of those tools as well
Not necessarily. As a simple example, some git operations "replay" a sequence of commits (e.g. cherrypicking). I've often had sequences which introduce something then later remove it (bugs, workarounds, stubs, etc.). If there's a merge conflict during the "replay", I'll have to spend time manually reintroducing those useless changes, just so i can resume the "replay" which will remove them again.
From what I understand, in Darcs such changes would "cancel out" and not appear in the diff that we end up applying.
> Where is the exaggeration?
The idea that "uses a graph" implies "equally hard to use". The underlying datastructure != the abstraction; the semantics is much more important.
For example, the forward/back buttons of a browser can be implemented as a linked list; blockchains are also linked lists, but that doesn't mean that they're both the same abstraction, or that understanding each takes the same level of knowledge/experience/etc.
>The idea that "uses a graph" implies "equally hard to use".
What I'm getting at is that if you don't understand what the graph entails, and what you need to do the graph, any system is going to be "hard to use." This idea that things should immediately make sense without understanding what you need to do or even what you're asking the system to do, is just silly.
I've never seen someone who understands git, darcs, mercurial, pijul, etc go "I totally understand how this data is being stored but it's just so hard to use!" I don't think that can be the case, because any of the graphs those applications choose to use have some shared cross section of operations:
* add
* remove
* merge
* reorder
* push
* pull
I see people confused about the above, because they don't understand what they're really asking the system to do. I don't think any abstraction is ever going to solve that.
Git does have a problem with its command line (or at least how consistent and ambiguous it can sometimes be), but you really should get past it after a week or two of using it. The rest is on you. If you know what you want/need to do getting past the CLI isn't hard. People struggle with the former and so they think the latter is what's stopping them.
Can you tell the other guy to not post false and disingenuous statements? Because I'm pretty sure that is what degrades discussions, not any tone I choose to exhibit. I highly encourage you to read the thread thoroughly. If I switched my position on git we wouldn't be having this discussion, as evidenced elsewhere in the thread where people are taking a notably blunter tone than I am just with the side with popular support on this forum.
I posted a bald statement. He replied directly with snide remarks and fallacies. Look at the timestamps and edits. I have every right to be annoyed and make it known that I am annoyed in my posts when the community refused to consistently adhere to guidelines.
Enforce guidelines that keep discussions rational, not because people don't want to be accosted in public for their misleading, emotionally bloated statements.
>Don't say things you wouldn't say face-to-face. Don't be snarky.
"Every day humans make me again realize that I love my dogs, and respect my dogs, more than humans. There are exceptions but they are few and far between." [2]
>Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize.
"Which sort of doesn't matter since everyone thinks GitHub is source management." [1]
>Please don't post shallow dismissals
"You all lost out on "the most sane and powerful" as a result." [1]
"Calling it a sane and powerful source control tool is just not supported by the facts, calling "the most ..." is laughable." [1]
"Calling Git sane just makes it clear that you haven't used a sane source management system." [1]
"Lots of people are too busy/whatever to know what they are missing, maybe that's you. It's not me" [3]
>When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
"Arguing with some random dude who thinks he knows more than me is not really fun." [3]
Some of the things you quoted there are admittedly borderline, but you went much further across the nastiness line. Could you please just not do that? It isn't necessary, and it weakens whatever substantive points you have.
>borderline, but you went much further across the nastiness line.
I didn't insinuate that people are worth less than pets that I bought and own (who can't even choose who to be dependent on) because they don't agree with my perspectives over a piece of software. In what context would this be an acceptable statement to make face to face or in a public setting and you go "well, you know, it's kind of okay to say!"
I'm exceedingly interested in where I crossed that line in a considerable manner because that's one distant line to cross. Next time someone says something I perceive to be incorrect, or they get on my nerves for continually disagreeing with me, I'll be sure to tell them my dog is worth more than them since that's actively being allowed and has a precedent of moderator support.
And for the record, my tone is probably "abrasive" in this post because the above actions and outright blind eye towards outright lies and uncalled for statements is aggravating. I have a feeling you're not doing anything just because of who he is, and not because what he is saying is warranted or even accurate (it's definitely not, as I demonstrated across several different posts).
Exactly, it was no longer mysterious for me after I had to prepare a written branching procedure for our team starting from how to branch off, commit, rebase to doing resets and working with reflog. While doing that I've thoroughly read the official docs, examined lots of examples, created a local repo with a couple of text files to test various commands. An then it became so clear and simple! Especially the reflog – so powerful!
So, my advice is to try to write some instructions for yourself for all the common cases you might run into during your work. It will not only help you realise what you actually need from git, but also will serve as a good cheat-sheet.
I start with simple examples and work up from there. It's based on training I've conducted at various companies, and avoids talk of Merkle trees or DAG.
I am not a git expert or anything, but I have helped resolve weird git issues for my teammates usually using a lot of Google and StackOverflow.
I just know 5 basic commands; pull, push, commit, branch, and merge. Never ran into any issues. People who run into issues are usually editing git log or doing something fancy with “advanced” commands. I have a feeling that these people get into trouble with git cause they issue commands without really knowing what those commands do or even what they want to achieve.
I use submodules every day, never had a problem with them. What do people complain about when it comes to them?
My mental model is basically that they're separate repos, and the main repo has a pointer to a commit in the submodule. Do your work that needs to be done for the submodule, push your changes, and then check out that new commit. Make a commit in the main repo to officially bump the submodule to that new commit. Done.
The annoying part is when you do a pull on the main repo, you have to remember to run git submodule update --recursive.
Because you have the .gitmodules file, the .git/config file, the index, and .git/modules directory, each of which can get out of sync with the others.
If, for example, you add a submodule with the wrong url, then want to change the url, then you instinctively change .gitmodules. But that won't work, and it won't even nearly work.
If you add a submodule, then remove it, but not from all of those places, and try to add the submodule again (say, to a different path), then you also get wierd errors.
If you add a submodule and want to move it to another directory then just no.
Oh and also one time a colleague ran into problems because he had added the repo to the index directly - with git add ..
Oh and let's talk about tracking submodule branches and how you can mess that up by entering the submodule directories and running commands...
But seriously, the fact that there is a .gitmodules file lulls you into a sense that that file is "the configuration file". If you don't know about these other files, then it's natural to edit .gitmodules. When you make errors, the fixing those errors are pretty hard. There is no "git submodule remove x" or "git submodule set-url" or "git submodule mv".
For example, do you know how, on the top of your head, to get an existing submodule to track a branch?
How do you think someone who does not quite understand git would do it? Even with a pretty ok understanding of git infernal, you can put yourself deep in the gutter. (case in point, if you enter the submodule directory and push head to a new commit, you can just "git add submodule-directory" to get point the submodule to the new commit. But if you were to change upstream url or branch or something else in the submodule, you're screwed. That's not intuitive by a long shot)
Edit: git submodule sync is not enough by the way... You can fuck up your repo like crazy even if you sync the two configuration files.
Right, it’s not that hard, but there are some gotchas. The most common problem I see is the local submodule being out of sync with the remote superproject. Pushes across submodules are not atomic. Accidentally working from a detached head then trying to switch to a long out of date branch can be an issue, as can keeping multiple submodules synced to the head on the same branch. Recursive submodules are, as you mentioned, even more fun.
What's the alternative? Managing all dependencies by an external dependency manager does not exactly reduce complexity (if you're not within a closed ecosystem like Java + Maven that has a mature, de-facto standard dependency manager; npm might count, too).
It's absolutely not feasible for C++ projects; all projects that do this have horrible hacks upon hacks to fetch and mangle data and usually require gratuitous "make clean"s to untangle.
I use git sub-trees. Actually I love the thing. They give you a 'linear' history, and allow you to merge/pull/push into their original tree, keeping the history (if you require it).
I never had any problems the past 6 years I've been using Git professionally. But then someone asked me what to do when Git prevents you from changing branches and not knowing they did not stage, I told them to stash or commit. They stashed and the changes were gone.
My point is, while your basic commands do the work, your habits and knowledge keep you from losing code like this without you knowing.
I do like Git, most of the time, but really, not a single problem, in six years?
When using Git daily we never really did anything complicated, just a few feature branches per developer, commit, push, pull-request, merge. Basic stuff. We had Git crap out all the time. Never something that couldn't be fixed, but sometimes the fix was: copy your changes somewhere else, nuke your local repo, clone, copy changes in and then commit an continue as normal.
I’ve been using git since 2007 and never ever even wanted to try nuking a checkout and starting over to recover from anything, much less did so. (Did have a nameless terrible Java ide plug-in do it for me once.)
I think it's the most important sources of my cognitive dissonance around git. It strengthens the illusion that a working directly is somehow related to a git store, which it really isn't.
You have a working directly/checkout - that can be: identical (apart from ignored files) to some version in git; or different.
If it's different ; some or all changes can be marked for storing in the git repo - most commonly as a new commit.
It's a bit unfortunate that the repo typically is inside your work directory/checkout - under '.git' along with some files like hooks, that are not in the repo at all...
I use `git config pull.rebase true` too, but that doesn't mean you _have_ to stash first, just as rebase manually wouldn't - depends if there's a conflict.
It's quite a considerable saving. I suppose by "fix UX" you mean make it so the saving would be less anyway, but I think really they're just conceptually different:
- branch: pointer to a line of history, i.e. a commit and inherently its ancestors
- stash: a single commit-like dump of patches
If stashing disappeared from git tomorrow, I think I'd use orphan commits rather than branches to replace it.
`cherry-pick` is just plucking a single commit and adding it the commit history of the current branch, and `rebase` is what civilized people use when they don't want merge commits plaguing their entire code base.
merge is what civilized people who care about getting history and context in their repository use ;) ... I worked a lot in git using both rebase and merge workflows and I'll be darned if I understand the fear of the merge commit ... If work happened in parallel, which it often does, we have a way of capturing that so we can see things in a logical order ...
Polluting the master repo with a bunch of irrelevant commits isn't giving you context, it's giving you pollution. There's nothing to fear about merge commits. It's about wasting everyone's time by adding your 9 commits to fix a single bug to the history. I work on teams, and we care about tasks. The fact that your task took you 9 commits is irrelevant to me. What is relevant is the commit that shows you completed the task.
It's not really a fear of the merge commit. In a massively collaborative project, almost everything is happening in parallel, and most of that history is not important. The merge makes sense when there is an "official" branch in the project, with a separate effort spent on it. It's likely that people working on that branch rebase within the branch when collaborating, and then merge the branch as a whole when it is ready to join the mainstream.
Ah, you can learn the beauty of merge AND rebase at the same time then...
Here to 'present' feature branches, we take a feature development branch will all the associated crud... Once it's ready to merge, the dev checkouts a new 'please merge me' branch, resets (or rebase -i --autosquash) to the original head, and re-lay all the changes as a set of 'public' commits to the subsystems, with proper headings, documentation etc.
At the end, he has the exact same code as the dirty branch, but clean...
So he merges --no-ff the dirty branch in (no conflicts, same code!) and then the maintainer can merge --no-ff that nice, clean branch in the trunk/master.
What it gives us is a real, true history of the development (the dirty branch is kept) -- and a nice clean set of commits that is easy to review/push (the clean branch).
Sometimes I want to take a subset of the commits out of a coworker's merge on staging to push to production, and then put all non-pushed commits on top of the production branch to form a new staging branch. I find having a linear history with no merges helpful for reasoning about conflict resolution during this process. What advantages do merged timelines give in this context?
What I like about merges it that it shows you how the conflicts were resolved. You can see the two versions and the resolved and you can validate it was resolved properly. With a rebase workflow you see the resolutions as if nothing else existed, you can't tell the difference between an intentional change and a bad resolution...
Yes, my direct team is small of 4 devs but the main repo we work on is used by 100+ devs. We use git workflow (new branch for each feature) for the main repo and github style workflow (clone and then submit PR) for some other repos.
The number 1 reason my team has not moved from Subversion to Git is we can't decide what branching model to use. Use flow, don't use flow, use this model, use that model, no, only a moron would use that model, use this one instead. Rebase, don't rebase, etc. No doubt people will say that it all depends on the project/team/environment/etc., but nobody ever says "If your project/team/environment/etc. look like this, then use this model." So we keep on using Subversion and figure that someday we will run across information that convinces us that it is the one true branching model.
I have another solution: just switch to mercurial. I switched some big projects to mercurial from svn many years ago. Migration was painless, tooling was similar but better, the interface is simpler than git, and haven't regretted it once.
This is the path I took for a few projects years ago when Google Code didn’t support git.
Switched to mercurial from svn and workflow was painless for the team. Interestingly, we slowly started adopting more distributed techniques like developer merges being common. With svn, I think I was the only one who could merge and it would be rare and added product risk.
Then after about a year of mercurial we switched to git and our brains had adapted. Our team was small, 5-10 people.
Somewhat relatedly, in 2002, I worked in a large team of 75 people or so with a large codebase of a few hundred thousand lines of active dev. It used Rational ClearCase had “big merges” that happened once or twice a release with thousands of files requiring reconciliation. There was a team who did this so it was annoying to dev in, but largely I didn’t care.
Company went through layoffs and the team was down to one. He quit, the company couldn’t merge, so couldn’t release new software versions.
There was a big crisis so they went to the architects and pulled a few out of dev work. It turns out I was the one who could figure it out and dumb enough to admit it.
That sucked. It took a few weeks to sort out and modify our dev process to make merges easy and common. But it was not fun. Upside is we ended up not having any “non-programmer” op/configuration management people since the layed off/quit team were ClearCase users, who didn’t code.
Moral- don’t let people know you can do hard, mundane tasks.
> but nobody ever says "If your project/team/environment/etc. look like this, then use this model."
Honestly, its because a lot of it comes down to preference and what value you gain from using version control. It is very much like code style standards -- it doesn't matter what is in the standard so much as your teammates all using the same one.
If part of the blocker for your team is that no one is experienced enough with git to have a strong opinion, I'd be happy to brainstorm with you for an hour to learn about your current process and offer a tailored opinion.
Why not replicate whatever you are doing in Subversion in Git? You'll still be able to take advantage of the better merging algorithms, while maintaining whatever political momentum seems to be driving the team's decisions.
If it is import to switch to Git, I suggest a technical leader, imbued with authority from management, make those decisions and just do it. However, I don't necessarily think a team should switch away from Subversion if it's working for them.
> everyone understands a different part of git and has slightly different ideas of how things should be done
This was a big problem that bugged me too, so for every team I've worked with I've created a few scripts for the team's most common version control operations.
Most devs, including me, are pretty lazy so they'd all rather run this script than go to Stack Overflow to figure out git arcania.
This helps standardize conventions too: Feature branches/linear DAGs/topic branches/dev branches/prod branches/whatever weird thing a team does they all just do that using the script so it's standardized.
Rebase is "rewind local changes" "pull" "replay local chances"
Basically it makes it so that all of the local-only commits are sequenced after any remote changes that you have not seen yet.
[edit]
YZF is correct. In the context of pulling (i.e. "git pull --rebase") my description is correct. However in general rebasing branch X to Y that diverge from commit C is:
rewind branch Y to commit C; call the old tip of Y Y'
"pull" might be the first thing I'd throw out, if thought there was any hope of fixing git ux. Then add a working merge --dry-run #do i have conflicts?.
I think a default of --ff-only would be fine for pull. This is great for when I'm merely a consumer of a project, and will never silently perform a merge or rebase.
rebase can do a lot more. Try `git rebase -i` to squash smaller commits, edit the commit msg, or even drop a commit before you push it to your colleagues.
Last time our devop did 20 commits to get something on elasticbeanstalk right, I squashed it all into just one clean commit that got merged into master branch.
It will help you to commit more often without worry until the moment you have to hand in your work.
Rebase is a controversial history altering operation and makes it easy to paint yourself into a corner and get weird error messages or wrong results. Its very different from pull/merge.
History altering is only controversial on things that are published. There is nothing wrong with reordering, combining or splitting your local commits to give more clarity to what you are doing. Keeping this in mine will give you the freedom to commit frequently.
This confusion happens because many popular SCMs historically have the "commit" and "push" operation in a single step. Git keep them separate.
There is no tracking by git on what is published, so it's easy to make the mistake of rebasing things that are published and shared by others. Then you will have a bad time later when you try to sync with others, possibly days later.
Um... git kind of does with remote tracking branches. You can also make it very obvious by your workflow? If you use local feature branches (which you should for juggling between development tasks, etc.), what you are working on vs what's upstreamed should be pretty clear. Sounds like you are not using local branches.
Not using local branch is another confusion caused by the perspective of historical/traditional SCMs (people thinking branches are the domain of a centralized server and are outside of their control.)
Often you want to push changes to a remote, but not yet merge or PR them to upstream.
Keeping "local feature branches" just on your dev machine is bad for many many reasons:
- you want to encourage low barrier cooperation in your team -> sharing changes
- you want changes to the CI pipeline early so the potentially slow testing machinery works in parallel with the developer
- you want to keep the team up to date on what changes you make
- you don't want to lose work if the machine/OS dies, or the developer leaves/becomes sick/goes on a 4 week vacation during which they forget their disk crypto password
So, in practice you can try to use rebase opportunistically, when out of chance your WIP work is still unpushed because the change was only made very recently. This is error prone. Or you can rebase published branches explicitly, by destroying the original branches in the PR merge phase. But all this is big bother if the purpouse is to just beautify history and at the same time hide the real trial and error that went into making the changes.
Did you notice that y2kenny was talking about how, if you use local feature branches, then the remote tracking branches make it really clear what's been published vs not? The implicit meaning is that we should use local feature branches but also publish them to the repo while we're working on them.
But maybe to you, 'publish' means 'publish to master'? In that case I can assure you, they are not necessarily the same thing. I regularly work on a local feature branch, publish that branch to the shared repo, rebase it on top of master, then force-push to the shared tracking branch. When I'm done I merge it into master and don't rebase master on top of anything.
I'm not sure if you are being serious? The answer is that published advice on rebase overwhelmingly warns against rebasing published code, and for good reason.
I LOVE rebase but when I run into merge conflicts I rather `rebase --abort` and leave that merge commit as it is.
But those instances are rare and having a merged branch's commits nice and compact in the log makes me happy every time.
What I find ironic is that github is massively popular as a central way to use a distributed version control system. The distributed nature only adds to the complexity and I am sure it is only used by a fraction of git users.
Yes...? What's surprising about using a central repo to collaborate? There needs to be a single source of truth for a coherent project, otherwise you're just going to have chaos.
The distributed nature of git led to the simple and secure contribution model of everyone working on their own repos and not needing to give write access to anyone else. This pretty directly led to an explosion of open source software.
Is there any really good tutorial on git that teaches the internal model? Ideally, it would illustrate each command and show the before and after of the internal objects.
https://learngitbranching.js.org/ is the best guide I've seen. It shows you the complete commit graph and all refs on that graph, and updates the graph when you type in commands. It covers and displays workflows involving remotes as well.
Indeed. When the article said "younger developers only know git" I immediately thought, no, they don't know anything. These people don't even know what a DAG is. Git was made for people who know these concepts. I've tried explaining git to people and they just don't understand. They just don't.
What's annoying is that git is just expected knowledge these days and having a github account is enough to claim it. There's not a good way to sell the fact that you're a bit more into it than that.
I've even said to git "experts" that branches should really be called refs and their eyes glaze over. It's difficult for me to understand what git is in their heads.
I started naming branches 'post-its', as to me that's what they are, labels you place on the real 'branches' (the commit tree). You can take them of easily, move them, discard them, whatever you want. They are just volatile.
A symbolic ref is a ref that points to another ref instead of a ref that points to a commit. `HEAD` is a symbolic ref. (It should be your only symbolic ref.)
Hrm, but a ref is a file containing a hash, right? So if the hash is equivalent to the file, the surely a ref is equivalent to a symlink? A symbolic ref, in turn, should be a symlink to a symlink... Or something like that...
Git is the solution to the problem of doing distributed development on the Linux kernel. People who aren’t doing that, I wonder if they’re entirely clear in their own minds why they use it. I’m certainly not... other than that it’s just the default choice these days, the path of least resistance...
I'm a big fan of Fossil myself. But the SQlite people have something that I don't really have within the teams I operate : the authority to dare and speak out against Git and not be laughed away like a hipster that is just trying to be different.
Pijul lets you describe your edits after you’ve made them, instead of beforehand.
Pardon my French, but about fuckin time.
On a big product, forensics matter. Not day to day, but often enough and if your metadata is rotten then you’re left with the oral history of the project as your only guide. And even that may not exist, depending on project structure.
Git has something similar called git-notes, but at the time I tried using it, it was really early-days. No idea how support is working for that now. You could also make an annotated tag, which has it's own "commit message", but it will show up with all other tags.
Standalone, that sounds like a commit message - which you make after editing the code anyway. (And possibly tweak/update with git rebase before pushing)
In that section's context, it sounds like naming a branch after having already started on it. In which case, that seems to me the tiniest bit less useful than git's ability to rename branches (git branch -m oldname newname).
I haven't used Pijul, but I did use Darcs for several large production projects back when it was still a thing.
Darcs was magical -- in both senses of the word. It was incredible to see it figure out which patches depended on which, allowing a fluid exchange of changes between branches in a way that quickly becomes a nightmare in git. But it was also magical in that nobody really understood the internals. Not in the sense of git where the underlying data model is pretty simple, and the "version control" aspect is a (thin!) UX veneer on top, but in the sense that it was like quantum physics. When something went wrong, it was almost always impossible to fix. And with Darcs, things did go wrong, because it had bugs, specifically a certain dreaded "exponential conflict" edge case where, if it encountered an identical line change in two patches from different branches (or something like that, it's been more than 10 years), computation time went through the roof and the merge command almost never finished. At several points we had to start history from scratch to avoid spending an entire day fighting the conflict problem. Another thing with Darcs (and presumably Pijul) was that since it tracks patch inter-dependencies, you can rarely cherry-pick individual patches -- pulling out one patch tends to pull with it a whole string of related patches, all connected. Which is often what you want (git just fails horribly in such cases), but sometimes you do want to "forcibly cherry-pick" and manually fix, change identity be damned. I don't know if Pijul supports this.
It looks like Pijul fixes the conflict problem, but it still seems to keep the "quantum theory of patches" that requires an above-average developer to understand. If it has no bugs, then maybe the problem is moot, but in our industry, transparent, "self-repairable" tech seems to win in the long run over the esoteric, opaque and magical.
That said, it's clear the Darcs/Pijul has a vastly better UX, which I'm all for. Git's data model works remarkably well for what it does, but it's always been obvious to me that its "record snapshots and try to make sense of them after the fact" philosophy is a bit flawed. The article mentions branch history. And rename detection doesn't work well with how most people work, for example; it's a clever kind of lazy evaluation, but probably designed for Linux kernel devs, so not clever enough. Darcs had a patch type specifically for renames, and it worked very well.
Another thing I wish version control systems had was what you might call a high-level changelog. It would let you group and annotate commits after the fact, but without changing them. For example, you might want to group a bunch of patches as a single "feature" commit. Then you could make a "release" group that groups a bunch of feature commits. In other words, several levels of nesting, with each commit containing child commits and so on. Viewing the log should show only the highest-level groups, with the option to expand them visually so you can see what they contain. You should be able to group things like this after the fact without changing commit order, and you should be able to annotate the log (e.g. add more information to a commit message) without mutating the underlying patches. Git was on the verge of ventured into this territory with its (now discouraged) "merge commits" -- a high-level commit that represents a single logical merge but encapsulates multiple physical patches -- but that didn't go anywhere. The nice thing about a high-level history like this is that you could use it to drive release notes and change logs, and it would greatly aid in project management and issue tracking, because you could manage entire sets of commits by what issues or pull requests or milestones or whatever they relate to.
> It looks like Pijul fixes the conflict problem, but it still seems to keep the "quantum theory of patches" that requires an above-average developer to understand. If it has no bugs, then maybe the problem is moot, but in our industry, transparent, "self-repairable" tech seems to win in the long run over the esoteric, opaque and magical.
The patch theory is complex, but it isn't that complex. Especially since there is plenty of alternate implementations out there of Operational Transforms (OTs) and Conflict Free Replicated Data Types (CRDTs), it's relatives/cousins/descendants. In theory, any developer than can grok a blockchain or a Redis cache should be able to grok the patch theory.
Darcs suffered much more from being written in Haskell, I think, than from the actual complexity of its patch theory.
Pijul being written primarily in Rust maybe has a chance of also getting over that hump a bit easier than Darcs had. Though now it also has the uphill climb of competing against git's inertia.
> Git was on the verge of ventured into this territory with its (now discouraged) "merge commits"
Discouraged only by people that don't know `--first-parent` exists as a useful `git log` and other command arguments. The useful thing about a DAG is you can very easily slice it to create arbitrary "straight line" views. You don't have to constantly smash and squash history to artificially force your DAG into a straight line.
Git is a use-case that is excellent for 90% of development. Sqlite is just an example where the use-case isn't necessarily ideal, not an indicator that it's "better" than git.
I’d say that git is fine for 90% of development (or some arbitrarily large number), but so is fossil. I don’t even think that SQLite-in-git would necessarily be a deal-breaker that couldn’t be worked around (drh ‘sqlite can chime in here). The whole space (from personal projects to global collaboration) is diverse enough that there’s no talking about “better” without qualifying the situation, either.
Fossil is good for a large subset of work that can benefit from source control management, regardless of git.
What git definately has is
1) scaleabilty, which is probably of no consequence for 99% of the cases it is employed
3) (cd mozilla-central && time hg log dom/base/nsDocument.cpp)
4) (cd gecko-dev && time git log dom/base/nsDocument.cpp)
It's not quite apples to apples because the git repo there has some pre-mercurial CVS history in it. But note that I'm not even using --follow for git and the file _has_ been renamed after the mercurial repo starts, so git is actually finding fewer commits than mercurial is here.
Anyway, if I do the above log calls a few times to make sure the caches are warm, I end up seeing times in the 8s range for git and the 0.8s range (yes, 10x faster) for mercurial.
That all said, most repos do not have millions (or even hundreds of thousands) of changesets or files. So the scalability problems are not problems for most users of either VCS.
I have never once had to use the missing feature that was a dealbreaker for the SQLite guys (find the descendants of an arbitrary commit). I have no idea what they're doing if they super depend on something like that.
If you were promoting git back in the days of SVN and are now moving on to something else because git is too popular, just cop to it, you're a hipster. :-P
You want old, ugly, slow, pain-in-the-ass and proprietary,? Try ClearCase. Expensive. Drain on productivity (all commits were individual file based, meaning it was extraordinarily easy to miss a change; also had no concept of an ignore, so really easy to miss adding a new file). Also, very. fucking. slow. A moderately sized project (or VOB in clearcase terminology) could take an hour to update. I've probably lost a year of my life waiting for clearcase to complete. Also, it had a habit of just royally fucking up even trivial merges (dropping braces in C++ code, for example, or ignoring whitespace changes in python code for another).
I've never used Clearcase, but I worked for Sun around 2000 and one of the things I did was analysing kernel dumps of Solaris.
If we saw the Clearcase kernel module you can be sure that that was going to be the root cause of the crash. That thing seemed to really terrible, and it wouldn't surprise me if the rest of the product was as bad.
I don't know if things have changed now, but to use Clearcase you needed a kernel module that provided a special filesystem that you did your work against.
The idea is not terrible, but when the kernel module is so unstable it brings down the build machines with a kernel panic, I'd say the execution was somewhat lacking.
In their respective times they were a big improvement. I believe CVS was the first client/server revision control system (why that feature was added was a horror story)
Hahah. Nothing to be sorry for, I was just curious. There are enough source systems out there, I wouldn't be surprised if they were a few I hadn't heard of.
See also RCS - the revision control system you discover when you mistype vi. (This frequently happened to people at school back in the SunOS/Solaris days.)
We still use SVN (but tfs git for greenfield projects). We are a small company and only 2 or 3 devs work on the same project at a time. Git itself doesn't solve any problems we have. We still deal with merges when someone goes nuts refactoring.
Fossil is a great replacement for that kind of use case. You can fully convert a git or a svn repository with fossil import, so migration is painless. It's also painless for users since fossil is very easy to pick up compared to git and the basic commands are quite similar to svn. Branching and merging is a breeze compared to svn, quite like git but easier. It's also nice for companies that want everything in one place, since all developer repos sync with a central repo by default. It's also distributed in the sense that everything gets replicated to every repo that's in sync with the central repo (it does this automatically at fossil update or fossil sync) contains the full project history.
As a mostly solo developer I haven't found a need to switch away from SVN to git or anything else; doing so would involve work I can't bill for and for no benefit to myself or existing clients. Some things in software and life in general are Fine Just As They Are.
"I also did this until the day came when I wanted to branch and someone showed me how easy it was in git."
Git is an incredibly powerful tool for managing a set of files over time, but if you just use the handful of basic commands, then I agree that the immediate big win is branching.
Personally, I found that moving from Subversion to Git fundamentally changed my work habits (for the better). I was a lone developer at the time, so the collaboration aspect wasn't really important.
I noticed that Git made it so easy to create a repository that I put everything into version control: not just application code, but random scripts and notes.
The other gain was that I learned to work in small, focused commits, because Git is so fast that commiting often is not a burden. Once I made that change, the commit history became meaningful and useful in a way that that Subversion never was: I could quickly revert code, and look back at individual commits for information.
My company uses SVN. SVN works fine so there isn't really a reason to spend the man hours migrating a crap ton of projects to to Git. Before SVN existed we used CVS and we migrated to SVN from CVS about, idk, 15-ish years ago?
Yes, there are people who are surprised that many have the opinion that SVN is fine for a lot of projects and you don't need to migrate for the sake of it.
So, not recent. But version control is one of the few infrastructural components that's allowed to have decades of churn, in my book. Software lives that long.
Lol, i mean, it's within the realm of plausibility. My secret sauce is the zipping. The SHA integrity and the zippy bits are decentralized separately and n-matrix distributed across multiple cryptos using a randomized forward time arrow only strategy. so, pretty psicc.
See for example: http://endoflineblog.com/gitflow-considered-harmful
"I remember reading the original GitFlow article back when it first came out. I was deeply unimpressed - I thought it was a weird, over-engineered solution to a non-existent problem. I couldn't see a single benefit of using such a heavy approach. I quickly dismissed the article and continued to use Git the way I always did (I'll describe that way later in the article). Now, after having some hands-on experience with GitFlow, and based on my observations of others using (or, should I say more precisely, trying to use) it, that initial, intuitive dislike has grown into a well-founded, experienced distaste. In this article I want to explain precisely the reasons for that distaste, and present an alternative way of branching which is superior, at least in my opinion, to GitFlow in every way."
And a summary of an alternative proposed there: http://endoflineblog.com/oneflow-a-git-branching-model-and-w...
"As the name suggests, OneFlow's basic premise is to have one eternal branch in your repository. This brings a number of advantages (see below) without losing any expressivity of the branching model - the more advanced use cases are made possible through the usage of Git tags. While the workflow advocates having one long-lived branch, that doesn't mean there aren't other branches involved when using it. On the contrary, the branching model encourages using a variety of support branches (see below for the details). What is important, though, is that they are meant to be short-lived, and their main purpose is to facilitate code sharing and act as a backup. The history is always based on the one infinite lifetime branch."
At work, I use fossil to manage my repository of scripts that I edit and use on different machines. (Strictly speaking, that could be done without an SCM, but I find it more convenient this way.)
On Windows, that fact that fossil comes as a single statically-linked executable that works without any special installation procedure[0] is really nice.
Also, I have to appreciate the builtin Wiki. I use it to keep a kind of diary of what I did and why, as well as gather helpful links I have come across over time.
[0] Other than putting the executable somewhere on your %PATH%, of course.
I haven't used Fossil, but just a comment on some of that page, in the order they're presented:
1. It's unclear to me what he means. Yes git doesn't store anything like a doubly linked list of commits, and thus finding the "next" commit is more expensive, but you can do this with 'git log --reverse <commit>..', and it's really snappy on sqlite.git.
It's much slower on larger repositories, but git could relatively easily grow the ability to maintain such a reverse index on the side to speed this up.
2. Yeah a lot of the index etc. is complex, but I wonder how something like "git add -p" works in Fossil. I assume not at all. Is there a way to incrementally "stage" merge conflicts? Much of that complexity comes with significant advantages.
3. This is complaining about two unrelated things. One is that GitHub by default isn't showing something like 'git log --graph' output, the other is that he's assuming that git treats the "master" branch magically.
Yeah GitHub and other viewers could grow some ability to special-case the main branch and say "..and this was merged into 'master'" at the top of that page, but in any case all the same info exists in git as well, so it's just a complaint about a specific web UI.
The "index" is a silly dongle in Git. One way to get rid of it would simply be to make it visible as a special "work in progress" top commit, visible in the history as a commit. "git add -p" would just hoard changes directly into the WIP commit, if it exists, otherwise create it first. Some sort of publish command would flip the WIP commit to retained status; then a "git add -p" would start a new WIP commit. "git push" would have some safeguard against pushing out a WIP commit.
The "--cached" option would go away; if you have a WIP commit on top, then "git diff" does "git diff --cached", and if you want the diff to the previous non-WIP commit, you just say so: "git diff HEAD^".
stashing wouldn't have the duality of saving the index and tree. It would just save the changes in the tree. Anything added is in the WIP commit; if you want to stash that, you just "git reset HEAD^" past it, and later look for it in your reflog, or via a temporary tag.
Everyone thinks that until they need to use it for something. If all you do is a bunch of linear small changes with obvious implications and two-line commit messages, then the index is nothing but an extra step.
But at some point you're going to want to drop a thousand-line change (from some crazy source like a contractor or whatnot) on top of a giant source tree and split it up into cleanly separable and bisectable patches that your own team can live with. And then you'll realize what the index is for.
What I described supports staging small changes and turning them into individual commits. Just the staging area is a commit object rather than a gratuitously different non-commit object with different commands and command options to deal with it.
You'd still need separate commands, though. Commit vs. commit --amend. Add vs. add -p. Diff-against-grandparent vs. diff --cached. You just want different separate commands to achieve your goal, which is isomorphic to the index.
So sure: if you want a not-index which works like the index and has a bunch of 1:1 operations that map to the existing ones, then... great. I still don't see how that's much of an argument for getting rid of the index.
Well; yes; the tool can't read your mind whether you'd like to batch a new change with an existing one, or make a separate commit.
Remember that git, initially, didn't hide the index in the add + commit workflow! You had to "git add" and then "git commit". So the fact there is only "git commit" to do everything is because they realized that the index visibility is an anti-pattern and suddenly wanted to hide it from view.
Since the index is already hidden from view (and largely from the user's model of the system) in the add + commit workflow, we are not going to optimize the command set by turning the index into some other representation. That's not what this is about.
The aim is consistency elsewhere.
For instance, if the index is an actual commit, then if we abandon it somehow, like with some "git reset", it will be recorded in the reflog.
Currently, the index is outside of the commit object model, so it gets destroyed.
It's possible for a git index to have content which doesn't match the working tree; in that case when the index is lost with a git reset, that content is gone.
If the index is a commit, it can have a commit message. It can be tagged, etc.
Did you read parent comment? How is dealing with the thousand-line change not possible with what they described (hint, it's totally possible, no index needed)?
Well, that's the weasel word. In my grandparent comment, I proposed the "alike", didn't I?
Nowhere did I say, just remove the index from Git, but don't replace its functionality with any other representation or mechanism.
In git, we can do that today in such a way that the index is only temporarily involved:
$ git commit --patch
... interactively pick out the change you want ...
$ # now you have a commit with just that change
It is not some Law of Computer Science that the above scenario requires something called an "index", which is a big archive holding all of the files in the repo, where these changes are first "staged" before migrating into a commit.
The problem is not that Git supports staging partial changes. The problem is that Git has shoehorned a tool that "at some point you're going to want"—to help you deal with a rare occurrence—into the default workflow, forcing you to deal with the overhead of staging every time.
It's basically the overhead of typing -a ... so git commit -a rather than git commit. It's not such a big deal. it does take a while to get used to the git "pipeline" tbh but when the rare occurrence happens you have this option, on source control systems without this option you just don't have it.
You're minimizing the overhead. `git commit -a` also won't help you with new or renamed files. So when you write about "the overhead of typing -a", what you really mean is the overhead of
1. typing `git commit`, checking the output, then typing `git commit -a`, or
2. typing `git commit`, then moving on with your life, and realizing minutes, hours, or days later that the changes you meant to include were not actually included, so you have to go back and add them if you're lucky, untangle them from whatever subsequent changes you were trying to make and/or do an interactive rebase if you're unlucky, and maybe face the prospect of doing a `git push --force` if you're really unlucky
Scale that up to several days or weeks to match the learning period and repeat for every developer who has to sit down and interact with it. That's the overhead we're talking about.
The article got it right; this is a monumental waste of human effort.
> Every developer has a finite number of brain-cycles
I've never used a version control system that didn't have to be notified about which files you would like to add. "vc commit" cannot simply pick up all files and put them under version control, because you have junk files all over the place: object files, editor backups, throwaway scratch and test files and so on.
But even when we use "git add", we are not aware of the index. The user can easily maintain a mental model that "git add" just puts files into some list of files to be pulled into version control when the next commit takes place. That is, until that silly user makes changes to the file after the git add, and those changes do not make it in because they forgot the "-a" on the commit.
I use an IDE with git integration. So I really never worry about most of this but I do also interact with the command line. When I create a new file in my IDE it asks me if I want to add it ...
I won't argue there is a relatively long learning period with git ... It helps if you have some experienced mentors in this area. But you get a lot of power for this...
At what point do you test these "cleanly separable" and "bisectable" patches? Do you do a second pass where you check out and build/test each of these commits?
It's pretty routine for a CI integration to test every patch, yeah. Not all do. (e.g. Gerrit-based systems generally do because the unit of review is a single patch, github likes to do integration on whole pull requests). It's certainly possible. I don't really understand your point. Are you arguing that it's preferable to dump a giant blob into your source control rather than trying to clean it up for maintainability?
No, I prefer small meaningful commits. I am not for or against the index. I have no problems switching my brain between git add -A or -p as necessary. Like you said, it happens too often that someone sends you a huge pile of code (C code, in my case). My first impulse is to build and run it immediately. For me, just compiling the code can take up to an hour sometimes. Running my full test suite takes even more hours.
At some point I am ready to craft this code into multiple commits. After my first git add -p and git commit, I don't know if HEAD is in a state where it even compiles. It takes further work and discipline to produce a whole series of good commits.
And I was saying that as a practical matter, you don't always get that option. Individual developers working on their own code don't need the index. But then quite frankly they don't need much more than RCS either (does anyone remember RCS? Pretend I said subversion if you don't).
Situations like integrating a big blob of messy changes happen all the time in real software engineering, and that's the use case for the git index.
I split work consisting of multiple changes in the same file just fine under CVS and Quilt.
I would convert the change to a unified diff, remove the change, and then apply the selected hunks out of that diff with patch. ("Selected" means making a copy of the diff, in which I take out the hunks I don't want to apply. Often I'd just have it loaded in Vim, and use undo to roll back to the original diff and remove something else.)
Using reversed diffs (diff -uR) I used also to selectively remove unwanted changes, similarly to "git checkout --patch"
This is basically what git is doing; it doesn't require the index. The index is just the destination where these selective hunks are being applied.
Personally, Yes. Often I discover a commit won’t build so I do a little bit of interactive rebasing to move some dependent change into the same commit or an earlier one.
You would compose a series of local commits and, if you wanted, test them individually before pushing them. With a bunch of changes made to your files locally, you'd use tools like `git add -i` or `git add -p` to stage subsets of your changes to make those commits. As you finish building these commits, you would be left with a series of commits to your local branch, and no additional unstaged or uncommitted changes. You're "draining" the uncommitted changes into commits, part by part. Commands that manipulate the index are how you describe what you want to do to Git.
"Index" is probably not a helpful term. I think of them as simply "staged changes", that is, changes that will be committed to the repository when I run `git commit`, as distinct from local changes that will not be committed when I run `git commit`. With a Git repository checked out, just editing a file locally will not cause it to be included in a commit made with `git commit`. Rather, `git add` is how you describe that you want a change to be included in the "staged changes" that will be committed. You can add some files and not others, or even parts of a file and not other parts.
The need for this doesn't come up especially often, but it's really helpful when it does. One common case where this can come up is when you've been developing for a while locally, and you realize that your changes pertain to two different logical tasks or commits, and you want to break them up. Maybe one commit is "upgrade these dependencies" and the other is "add feature X". You started upgrading dependencies while building feature X, but the changes are logically unrelated, and now the dependency change is bigger than you expected and deserves to be reviewed on its own.
So with all of these changes in your workspace, you'll stage just the changes for "feature X" or "upgrade dependencies" and then run `git commit`. At this point, maybe you'll move this commit into its own branch or pull request in order to code review and ship it separately. (You might use `git stash` to save the remaining uncommitted changes while you do this.) Then you'll return to the remaining changes, which you will stage and commit as well. You've just gone from a bunch of unstructured, conflated changes to two or more separate commits on different branches/PRs (if that's what you want), that can be reviewed and shipped independently. You've gone from a massive change that's too big to review, to multiple bite-sized pieces.
These tools are also especially helpful if you, for any reason, need to manipulate source control history, such as breaking up one already-made commit into several commits, or simply modifying an existing commit. To do this, you would take that commit, apply it to the local workspace as if it's an unstaged change, and then, starting from the point in history before that commit was made, stage parts of the changes again and check them in. At this point, you can push the changes as a new branch, or even rewrite history by replacing your existing branch.
To give a use-case for this last capability, imagine that a developer accidentally checks in sensitive data as part of a big commit. Before (or even after) shipping the change, you realize this, so you want to go back and edit that commit, to remove the part of the change that checked in data while leaving the rest of the changes. You would describe these manipulations with the index as described in the previous paragraph.
You seem to miss the point that instead of staging changes out of your working directory and then committing them, you could just commit those changes out of your working directory. The extra step of staging is not needed.
Everywhere you said stage you could say commit (or amend) and then not need the extra step of committing afterwards.
> You seem to miss the point that instead of staging changes out of your working directory and then committing them, you could just commit those changes out of your working directory. The extra step of staging is not needed.
How would you describe this? One massive `git commit` with a ton of parameters? I don't see how it could work.
How would you describe "commit the first hunk of fileA (but not the second), and the second hunk of fileB (but not the first), and all of file C?". How do you "just commit those changes"? I believe you are missing how to actually describe this on the command line or with an API.
The index is absolutely needed. It's what allows you to build up a commit through a series of small, mutating commands like `git add fileC`, `git add -p fileA`. The value of the index is that you can build up your pending commit incrementally, while displaying what you've got with `git status`, then adding to it or removing from it.
You can build a commit using multiple small "commit --amend --patch" commands. These use the index, but only in a fleeting, ephemeral way; changes go into the index and then immediately into a commit. They go into a new commit, or if you use --amend, into the existing top-most commit.
The index is a varnish onion.
Git has too many kinds of objects in its model which are all bags of files.
I do not require a staging area that is neither commit, nor tree.
Look at GIMP. In GIMP, some layer operations get staged: you get a temporary "floating layer". This gets commited with an "anchor" operation, IIRC. But it's the same kind of thing: a layer. It's not a "staging frame" or whatever, with its own toolbox menu of operations.
$ git commit --patch
... pick out changes: commit 1 ...
$ git commit --patch
... pick out changes: commit 2 ...
$ git commit --amend --patch
... pick out more changes into commit 2 ...
$ git commit --patch
... pick out changes: commit 3 ...
There, now we have three commits with different changes and were never aware of any "index"; it was just used temporarily within the commit operation.
Oops, the last two should have been one! --> git rebase -i HEAD~2, then squash them.
The index is too fragile. You could spend 20 minutes carving out very specific changes to stage into the index. And then you do something wrong and that staging work is suddenly gone. Because it's not a commit, it's not in the reflog.
You want changes in proper commit objects as early as possible, then massage with interactive rebase, and ship.
Suppose HEAD points to a huge commit we would like to break up. One simple way:
$ git reset HEAD^
Now the commit is gone and the changes are local. Then just do the above procedure: commit --patch, etc.
>Rather, `git add` is how you describe that you want a change to be included in the "staged changes" that will be committed. You can add some files and not others, or even parts of a file and not other parts.
That's largely outdated. For years now, git's commit command has been able to stage changes and squirrel them into the commit in (apparently to the user) one operation.
Only people who learned git ten years ago (and then stopped) still do "git add -p" and then a separate "git commit" instead of just "git commit --patch" and "git commit --amend --patch" which achieve the same thing.
I've had the same feeling about the index, except I want the opposite setup. Instead of the index becoming a commit, I want to always commit my working tree, and for there to be a special sort of stash for the stuff I don't want to commit just yet. I want to commit what's in my working tree so I can test before committing.
The way it would work is that when I realize I want to do a partial commit I'd stash, pare down the changes in my working tree (probably using an editor with visual diff), test what's in my working tree, commit, and then pop the stash.
I had hoped that this would already be doable with git, but it isn't, at least not in a straightforward way. The problem shows up when you try to apply that stash. You get loads of merge conflicts, because git considers the parent of the stash to be the parent of HEAD, and HEAD contains a bunch of the same changes.
I'm sure there's some workaround for this, but every time I've asked people always tell me to not bother testing before committing!
Yes, that works, but it means I'm committing with a dirty work tree, so I can't really test before I commit.
You're probably going to say I shouldn't test before every commit, but I rarely work in a branch where the bar is as low as "absolutely no testing required". I generally at least want my build to pass, or some smoke tests to pass, and I can't reliably verify either of those with a dirty work tree. And actually, the fact that all of the commits on my branch are effectively going to end up in master (unless I squash) makes me want to to have even my feature branches fully tested.
> Yes, that works, but it means I'm committing with a dirty work tree, so I can't really test before I commit.
Ah, I see what you want to do now.
> You're probably going to say I shouldn't test before every commit
If have no business telling you what you should do. If you want to test before committing, your wish is my command
git stash --patch # Stash the bits you don't want to test
<do your tests>
git commit <options> # Commit the rest when the tests pass
git stash pop
<continue developing>
Interesting! I guess that works because the stash doesn't contain the changes I'm going commit now, and so it doesn't conflict (at least in simple cases).
I never use `--patch` (even with `git add`). I prefer to use vim-fugitive, which lets me edit a diff of the index and my working tree. It looks like being able to do something similar with stashes is a requested, but not yet implemented, feature for vim-fugitive: https://github.com/tpope/vim-fugitive/issues/236
$ git stash
$ git checkout stash@{0} -- ./
$ $EDITOR # pare your changes down
$ make runtests # let's assume they pass
$ git add ./
$ git commit
$ git checkout stash@{0} -- ./
$ make runtests # let's assume they pass
$ git add ./
$ git commit
*(In case it appears otherwise, this isn't actually supposed to be a defense of Git. Originally, this was 15 steps, but I edited it into something briefer and more straightforward.)
What you describe changes the name from "index" to "WIP commit", keeping the same semantics. Along the way, you now have a "commit" that doesn't behave like a commit, further adding to the potential for confusion. I strongly believe that things that behave differently should be named differently.
> What you describe changes the name from "index" to "WIP commit", keeping the same semantics.
I.e. you get it.
Importantly, the semantics is available through a common interface rather than a different design and implementation of the semantics for the index versus commits.
> you now have a "commit" that doesn't behave like a commit
Well, now; literally now you have a commit that doesn't behave like a commit: the index.
If a real commit is used for staging, it behaves much more like a commit. It's just attributed as do-not-publish so it doesn't get pushed out. Under this model, all commits have this attribute; it's just false for most of them. Thus, it isn't a different kind of commit.
> I strongly believe that things that behave differently should be named differently.
Things that do not behave completely differently can use qualified names, in situations when it matters:
"work-in-progress commit; tentative commit; ...."
For instance we use "socket" for both TCP and UDP communication handles, or both Internet and Unix local ones.
1. I actually ran it, and ironically, `git log --oneline --reverse` runs faster than `fossil descendants` on the sqlite repository for new commits, and just as fast on old commits. perhaps fossil would do better on a large repository, but I doubt it. git has many flaws, but "insufficent optimization (compared to alternatives)" is not one of them.
That's because commits are not owned by branches. The same commit can be present in two or more separate branches with different successor commits in each. And this is a feature, not a bug! It's what allows you to ask the question "Where did these two branches fork?".
Now, I don't know how Fossil answers that question. Maybe it's got some clever trick (like a separate "universal ID" vs. "per-branch ID" for each commit, maybe). Maybe SQLite doesn't need that and doesn't care. But it's not like this is a simple feature request. Git was designed the way it was for a reason, and some of us like it that way.
> Git was designed the way it was for a reason, and some of us like it that way.
For the record, the initial version of git was developed in haste as a replacement for bitkeeper, after Larry McVoy (who shows up here on HN once in a while, and always has great posts) got a little aggressive about the licensing.
BitKeeper was open-sourced a year or two back: https://github.com/bitkeeper-scm . I haven't had a chance to use it extensively yet, but I hear it still has a good selection of features that git either chose not to implement or didn't properly understand.
Don't get me wrong, I think git is pretty great and have been one of its primary advocates over SVN, etc., at most companies I've worked with (I think 2013 was the first time I came on board a team that was already using it), but its history is informative.
For the record, in my opinion git is the worst SCM among the distributed ones and I truly believe it was only hype that carried it. The user interface is genuinely user hostile with so many commands and yet many commands have switches that change the behavior so much it very well might be a different command. And so forth.
Git is just a patch database management system that tries too hard to become a version control system. It is proof that this approach is fundamentally misguided.
However, mercurial offers both named branches (hg branch) and pointer branches (hg bookmark) and when I last used it, it seemed the consensus was shifting to one of two positions, (a) just use hg bookmark or (b) use named branches for long lived branches only (master/default, develop, release branches etc) and bookmarks for feature branches/dev use.
Have you looked at the --branches option for git log? It annotates which branch(es) a commit is part of unless you're talking about something else. There's obviously some cost but all this talk of "slow" is ignoring the fact that in practice on small repos (& yes - sqlite is small for git) you're not going to notice it. Also, you can limit your branches to those you're interested in to speed things up.
No, it's not impossible. It's quite easy. What it isn't is immutable. I.e. you can have a repo now with just "master", and I can push a new branch one for each commit in it, and make any such tool output useless garbage.
I.e. in some DAG implementations the branch would be a fundamental property at write time, in git it's just a small bit of info on the side.
What I'm referring to is that for the common case of something like the SQLite repository that uses branches consistently it's easy to extract info from git saying "this commit is on master, and on the LHS of any merge to it", or "this commit is on master, but also the branch-off-point for the xyz branch".
The branch shown in the article is a perfect example of this. In that case all the same info exists to show the same sort of graph output in git (and there's even options to do that), it's just not being done by the web UI the author is using.
The lack of an index (or something alike) was what made using hg/svn unusable one I'd moved over to git. The ability to closely review what I'm committing (and not commit "every single change in the repo") is a must.
Some time back, some GUI design firm asked on HN for a suggested open source program that needed a GUI developed. They had some developer time free and wanted something that would get them visibility. I suggested they do a GUI for Git. Several such things exists, but they're just buttons hooked up to the command line; they have no useful visual outputs. A good GUI for Git, where you could look at branches and such graphically, would get attention.
Too hard, they said. They wanted something which just needed to be pretty, not something with tough human interface problems.
GitUp for the mac is an excellent git UI. It is emphatically not just wrapping the command line (it manipulates the repo directly); you can do stuff like look at the timeline and directly edit it ("oh, delete this commit" or "move this branch tip here" or "merge these two commits" or "rewrite the message for this commit"). It's really nice.
Downsides is it's not well known, the principle dev is only doing bug fixes, and it while it's blindingly fast on small to medium repos, it doesn't scale well for larger repos.
Seconding gitup! One of those apps I didn’t mind abandoning my terminal workflow for. Great keyboard shortcuts. Really great and subtle UX for partial staging too.
I found myself using the "network" section of a work project on GitHub on an almost daily basis. It's basically a prettier git log.
I know about git log (and the millions of options it has), but it's ugly as a sin, difficult to just quickly browse, and is just so busy that it's hard (for me at least) to quickly get information on if employee x merged branch y into z or if v has the latest commits from w, etc...
So I ended up writing a user script to blow up the small window on the GitHub page to a larger size, and get rid of some stuff that we don't care about and made it a dashboard of sorts.
I spent some time one day trying to find something like it, butcame up with pretty much nothing that was easily setup and maintained and that I didn't need to build my own application around.
Because both of them are basically "git log in a gui".
It's the same overly busy UI, the same lack of easily seeable branch names (yes I know this isn't how git works internally, but it's extremely useful), the same vertical layout on our widescreen monitors, and they are still ugly to me (although this is honestly one of the lowest on my list of priorities).
I'm looking for something like [0] but with a better UX (Github's version doesn't let you scroll with the mouse wheel for instance), is fast enough to pull up at a glance to quickly see the state of the whole repo, and won't get disabled if the repo has too many forks (and I'm guessing branches, although I've never seen it happen from then alone) like Github's does.
It's easy to glance at and see where a branch is (what was merged into it, what it was merged into, etc...), who did the work (in the example it shows in a "tooltip" when you hover over the dot), roughly when the work was done, and what that branch includes.
I've been saying this for a while. SourceTree used to be fantastic a couple years ago. Yet each update since those glory days has brought not just bugs, but a worse overall user experience.
I mostly only ever used it for branches, cherry picking and conflict resolution, and it was awesome for that.
I don't use it now, and I don't use branches or cherry picking. My personal philosophy is that I am not keen on branches: never have been, never will be. I see repositories as a two pizza team concept. If you need branches, you should split the repo, make more iterative changes, or look at establishing or improving a test suite. It keeps things simple and cognitively low overhead for everyone (as per the sqlite team's comments).
I work on a branch, raise a merge request to master, let my colleague do a check and point out any issues and then merge it once we're both happy with it. He does the same thing for me to check.
Whatever works. However, I am not keen on such hoops. Objectively, relative to "just commit to master" that is a pretty high overhead contribution process.
It is not immediate, needs to wait for another human to be around and mentally present, demands some sort of QA standards are created and commonly understood, and creates the need to make and manage a forum for listing and discussing merge requests.
You could get some of the way and retain instant feedback with automated tests, yet remove a lot of the overhead.
I use them a lot and love how it makes some advanced stuff much easier (committing hunks, interactive rebases). They're also cross platform, which is great.
Still, it's incredible how crappier it gets over time. It feels like every year there's some kind of new major refactor or UI framework refresh that just destroys whatever little stability the application had, as if a new lead comes in and decides to start everything from scratch.
As an engineer I kinda understand the reasoning (using a nuclear bomb to get rid of tech debt), but it's pretty baffling from a product lifecycle standpoint.
I wish they had a more stable model and were charging for registration really. I'd gladly pay them $50 or whatever for some modicum of stability.
I work on a huge codebase and SourceTree handles it quite well on macOS. A year ago I'd occasionally see updates that were occasionally broken (solvable with a downgrade) but it's been pretty solid lately.
I find its OK on macOS but on Windows the latest thing is the delay in displaying modified files. Newly modified files sometimes take minutes to turn up.
Minutes? Today I hit F5 to refresh the window, and it still missed a checkin from 10 minutes earlier.
If you're talking about locally modified files, I use the integration with Visual Studio to manage those and it is doing an OK job. The only gripe I have with it is that you can't change an already staged file or those changes won't get checked in. I don't remember it working like that a year ago so something must have changed.
I use SourceTree daily and it regularly has very noticable lag on most window-related commands: close a window, open a repository, etc. And a lot of spinning beach-ball. Have you seen any of this?
I have experienced this. I think it happened when Atlassian switched over to WPF for the GUI in version 2 a few month backs. The new GUI has introduced a lot of bugs and performance issues that they still seem to be trying to figure out.
The visualization features are amazing though but could use a little bit more design.
I did comparisons between gitkracken and tower. I found, at least on the Mac, gitkracken crashed often vs tower being stable. I was also turned off by the yearly price of $50 vs a one time $70.
Otherwise they had very comparable features including being able to display the histories, merges, etc.
TortoiseHg is serving me faithfully since years, no real issues, big improvements in productivity due to being ablento actually look at your code evolving.
The best I've ever seen is Tower (mac/windows), it makes most of the tedious steps and brain-cycles described in the authors' list very easy to manage.
I like using a mixture of Tower Beta (for the interactive rebase and visual commit staging edits), command line (faster!) and GitUp (for understanding repo history). I’ve been experimenting with using hub and go-jira command line tools for GitHub and JIRA respectively.
How is your experience of its performance? The last time I tried it I found it made everything unusably slow; disabling that one plugin made an absurd difference in the responsiveness of the machine.
Tons of such GUIs exist already. Branch visualization is excellent in SmartGit (one of the best apps I've ever used, really) and many other apps have similar visualizations.
ClearCase is 20 years old or so, but no git gui comes close.
The reason is git's peculiar object model. It has nodes (commits), edges between nodes (parent references) and references (pointers to commits) but no branches. That means it is not in general possible to tell which branch a particular commit belongs to. Asking a question like that just doesn't make sense i git's world. Therefore visualizing complicated branching scenarios, in ways that makes sense to users, becomes almost impossible. This is why people advocate rebasing https://blog.carbonfive.com/2017/08/28/always-squash-and-reb... The idea is to "lie" to the vcs because otherwise your history becomes a hodge-podge mess of merges. :)
Btw, all other features of ClearCase were just horrible and brain-damaged. But the branch visualizer kicked ass.
> > That means it is not in general possible to tell which branch a particular commit belongs to.
>
>That's because commits can belong to multiple branches, which is by design.
I could be wrong, but I think what they mean is the fact that once you merge/rebase your dev branch back into master, all of those commits you made in dev are now commits in master.
You could conceivably have a source control system that still allows commits to belong to multiple branches, but bakes the branches of a commit into the commit. (And I believe many/most other source control systems do just that.)
The fact that the branches of a commit can change later on always struck me as being completely inconsistent with the commonly held idea about git that outside of master it's ok to forego testing. That really only works if you squash into master (and test that squash commit). If you don't squash, and especially if you fast-forward, then all of those commits are now "in" master, and so you've dumped a bunch of untested stuff in master.
Damn, that's pretty ruthless how they blew you off like that after you did 90% of the work (coming up with an idea). And one that's already been implemented many times.
In the 90s we had cvs and perforce... and then svn. Then there was BitKeepr. From what I can read of history, 2002 to 2005 Linux was under BitKeeper and then in 2005 that relationship soured and Linus went off to write Git. At the same time there was a large growth of other version control systems. The graphic shows bazaar, darts, hg, plastic and then Fossil is also in there.
Even at 2006 when Fossil was released, there wasn't significant mindshare on any of those platforms yet. Why use git? It wasn't even a year old when Fossil was released.
The migration of SQLite to Fossil (from CVS) was done in 2009.
That's several years after the software was written that SQLite switched to it. I wouldn't exactly put that in the place of dogfooding. It kind of is - but Fossil was a mature project when SQLite shifted its codebase.
SQLite shifted to Fossil in stages. Documentation moved in 2007. The TH3 and sqllogictest test suites started out in Fossil in 2008.
I wrote Fossil specifically to support SQLite development. If Fossil does nothing else other than support SQLite, then it is a success. Any other use of Fossil is just gravy. That we were conservative in moving the main SQLite source code into Fossil does not negate that fact.
Your repo being eaten by some arcane git 'feature' that you had no idea even existed, and then when you ask for help you get a 50/50 split between "that shouldn't have happened" and "well of course, you have to run 'git --unfubar repo' every 431 checkins or it'll corrupt your repo, seriously who doesn't know that?"
That's hilarious given that Fossil was notorious for actually corrupting data, which is one of the few things people hardly ever bitch about with git. The fact that you are running into this frequently enough to rant about it suggests maybe you aren't familiar enough with the tool?
If a version control tool, when used correctly, ever barfs on your data, that's too frequently. And I wasn't praising Fossil, just raising my gripe with git.
I don't know if you're intentionally trolling, but I have been using version control for almost 20 years now and have never used the term check-in. All version control software that I've used in that time used the term commit.
Uh, where? I said "check in" is a generic term (which it is - it's listed as the first synonym for 'commit' here: https://en.wikipedia.org/wiki/Version_control#Common_vocabul... ). I never mentioned the term "commit" (which is obviously also a common term for the same action). Then you suggested I was trolling, for saying it was a generic term, which it is. Now you're putting words in my mouth and calling them nonsense.
> Or that I used the generic term rather than the git-specific term?
In the context of the discussion, generic term refers to "check in", and git-specific term refers to "commit" (which is the only possible alternative term to be using in that context). The obvious reading is that you're calling "commit" a git-specific term.
So I wouldn't say I'm putting words into your mouth, unless you were unaware that "commit" is the correct alternative term, which is of course a possibility that I didn't consider.
Git is much more complicated and harder to grasp, most people I know just remember common operations and commands to issue without understanding what these really do.
Also unless your organization's project is open source, you don't really need DVCS. In my organization we use git nearly exclusively, but in the end everything relies on the central repo. We actually would do much better if we used SVN instead of git and have less issues. For example we already had significant mistakes such as someone delete main branch or performed a force push (yes, you can restrict it, but with git you need to know what to expect before blocking it). We also have repo for CMS, where we would greatly benefit from ability to merge by directory. I also see people trying to checkout latest version from git for just specific subfolder, but with that is also difficult, but trivial and extremely lightweight on SVN.
Yes, for still has very valuable tools for the local developer: stash, staging changes, bisect, local history (great if you work without internet access), but you can actually use git with other SCM and get the best out of both worlds.
Unfortunately when I mention that we could have our main repo under SVN, people look at me like some kind of dinosaur that is proposing it, because I get confused by git.
Just because git is a great tool for Linux kernel development, doesn't mean that it is the best SCM for for your organization. And if your organization uses something else than git, it doesn't mean you can't use git, in fact the tool of my choice is still git, I just think that most companies don't need DVCS for their main repo.
I've built two source management systems (NSElite, internal to Sun, and bitkeeper, now open source at bitkeeper.org).
Calling Git sane just makes it clear that you haven't used a sane source management system.
Git has no file object, it versions the repo, not files. There is one graph for all files, the repo graph. So the common ancester for a 3 way diff is the repo GCA, which could very well be miles away from the file GCA if you have a graph per file (like BitKeeper does).
No file object means no create event recorded, no rename event recorded, no delete event recorded. If you ask for the diffs on "src/foo.c" all Git can do is look at each commit and see if "src/foo.c" was modified in that commit. That's insanely slow in a big repo. And it completely ignores the fact that src/foo.c got moved to src/libc/foo.c years ago and there is a different src/foo.c that is completely unrelated. There is an option to try and intuit the renames when you are spitting out diffs but noone uses that because it's even more insanely slow.
Git is basically a tarball server. Calling that a source management system is an enormous stretch. Calling it a sane and powerful source control tool is just not supported by the facts, calling "the most ..." is laughable.
Yeah, I get it, Git won. You all lost out on "the most sane and powerful" as a result. Which sort of doesn't matter since everyone thinks GitHub is source management.
> Git has no file object, it versions the repo, not files.
As someone who has used (in a professional setting) version control systems ranging from RCA to SVN to the current crop (git, mercurial, even darcs), the fact that git versions the repository as a whole instead of individual files is a godsend.
You've not known hell until you've had to deal with an RCA/CVS repository with 15+ years of history and thousands of files, each that maintains their own version history (and associated version number!).
I'd gladly take the comparative "slowness" of git when dealing with large repositories.
> Git is basically a tarball server. Calling that a source management system is an enormous stretch.
What, in your opinion, is the definition of a version control system then?
A version control system is an accurate audit trail of everything that has happened in the repository. Every create, delete, rename, every rwx mode change, every content change.
In BitKeeper files work like they do in Unix, there is a (globally) unique name for each file object. Where the object lives in the repository is an attribute of that object, as are the contents, the permissions, who has changed etc.
Here's a super common workflow that's easy in BitKeeper and miserable in Git. I'm debugging an assert. I want to see who added the assert. I pop into the gui that shows me the per file graph and contents, search for the assert, hover over the rev and see that it was done a long time ago. I look in the area above the assert and I see a recent change, hover over that, see the comments and go "hmm, maybe this". Double click that line and I pop into a different gui that shows me the whole commit that contains that suspect line.
Note that because I have a graph per file, I have checkin comments per file. More work for you poor committers but a godsend for us debuggers. More breadcrumbs are more better.
In Git, less breadcrumbs, single commit message. Git wants to go from the rev to the commit, it's miserable to look around in a file and then go backwards to the commit.
When I was supporting BitKeeper our average response time to bug report or a crash report was 25 minute. 24x7. The only reason it was that long was because we were all in North America so there was a window where we were all asleep. Response time 6am-6pm PST was typically under 5 minutes. And I credit the fact that the tool accurately recorded everything and you could find the history really easily.
Oh, and it didn't slow down as the repo got big. Git is fine in little repos but it sucks pretty hard in big ones. Sucks even worse if you are on NFS. I can dig up benchmarks, we built up a synthetic 4M file repo and ran a bunch of tests on it (it was a modified version of the facebook repo builder, the facebook one had some stuff in it that made Git look incredibly bad, we looked at that and decided that wasn't real world or fair, we took that part out).
> Here's a super common workflow that's easy in BitKeeper and miserable in Git. I'm debugging an assert. I want to see who added the assert. I pop into the gui that shows me the per file graph and contents, search for the assert, hover over the rev and see that it was done a long time ago. I look in the area above the assert and I see a recent change, hover over that, see the comments and go "hmm, maybe this". Double click that line and I pop into a different gui that shows me the whole commit that contains that suspect line.
I may be misunderstanding that particular workflow (I'm sadly unfamiliar with bitkeeper), but this seems like a workflow that I accomplish relatively often with the use of tig[1].
On a separate note: thank you for the wonderfully detailed reply. It's such a pleasure to have an opposing view be so thoroughly explained.
Indeed, there are many tools to do that with git. I use vim with vim-fugitive (:Gblame, then o to open the commit I want to look at), but most IDEs do that to.
Heh, no worries dude (or dudette), I have dealt with the trolls. You can't post that the sky is blue without getting the trolls telling you are doing it wrong.
Every day humans make me again realize that I love my dogs, and respect my dogs, more than humans. There are exceptions but they are few and far between.
> In Git, less breadcrumbs, single commit message. Git wants to go from the rev to the commit, it's miserable to look around in a file and then go backwards to the commit.
A thought not universally shared. Intel was our biggest customer and when they saw the quality of the breadcrumbs produced by our gui check in tool vs the command line checkins they were smart enough to push hard that everyone used the guis.
I get why, as a dev, you want git commit -m'Fixed bug' but as the debugger guy, the reviewer gui, anyone who reads the code, that's a horrible thing to do to those readers.
Who, if you wait long enough, will be you. And I'll laugh my ass off at all the lazy committers who really could use more breadcrumbs when they have to debug their code later.
Been there, done that, I haven't worked with people that lazy in decades.
edit: since HN won't let me extend the thread, let me reply to the comment below because BK does do something special.
The GUI for checkins presents you with a list of files, a place to type comments, and a big pane that shows the diffs. You type in comments for the first file, go to the next, type in comments (yes, there is a way to say use the previous comments). As you move from file to file, the bottom pane shows the diffs for that file so you can see what changed in that file.
The special sauce, that Git most definitely does not have, is when you get to the last file, which in BK is the ChangeSet file, this is where you would type the commit message. What are the diffs? There aren't any so we stuff all the comments you just typed on individual files. What does that do? Well, on the files you are usually typing in details of how you did this or that, when you get to all of those comments, you naturally uplevel and type in why you were doing all that.
It dramatically increases the usefulness of commit messages. That's why Intel pretty much mandated the use of the gui checkin vs the command line checkin.
To anyone reading, bitkeeper does nothing special in this regard. Enforce commit message standards, plenty of platforms and systems built around git (and literally every other SCM) have support for this.
It's possibly the least interesting and least unique selling point an SCM can have. It's really funny that you keep bringing up this example around the thread.
EDIT: To respond to the above edit:
Again, he's Proving The Point. Commits should be atomic. If you have to individually comment on file changes then the correct thing to do would be to put those in their own commit, no? I'm not really sure what's being described is necessarily a cool feature, but rather a way to avoid making sure your changes are truly related. I honestly don't see the point. This seems like a feature that was written because of the decisions that were made into how BitKeeper works internally, not because it's a fantastic idea. You can get the same thing with atomic commits in git. You comment per file, because BK tracks changes per file. Git does not do this. You should be making your commits atomic because git is tracking the actual content. Atomic commits will accurately describe what's being changed, and then of course those all get lumped together in a patch/PR.
Git isn't lacking the feature you're describing, it just kind of is there without any extra data tracking required, because it's not making up for technical design decisions.
> As someone who has used (in a professional setting) version control systems ranging from RCA to SVN to the current crop (git, mercurial, even darcs), the fact that git versions the repository as a whole instead of individual files is a godsend.
This. A thousand times this. A change isn't a single file, it's likely a changeset of multiple, sometimes hundreds or thousands of files. Most often, you need to know the entire changeset, not just what happened with one file. This is especially true of large-scale refactoring where you're changing the public interface of something. Depending, that can have a far-ranging impact and you want to see that history all together.
You're talking to the guy that created that concept [1]. I get that changesets are cool :) You're right, you do want to see all that info together.
But lots of times you want to look at the file view, find the line of code that looks like the problem, and then zoom out to the changeset. BK makes that trivial, Git makes that miserable to impossible.
[1] Actually there was a little known system, Aide-de-camp, that one of my people told me about that had changesets so I didn't invent it, but I reinvented it. And made the world aware of the concept. Back when you could search Usenet via dejanews you could search for "changeset" and date limit it to before me talking about it. There were maybe 5 hits. A few years after BK came along there were 100's of thousands of hits. So I wasn't first but I am definitely the reason that you know what a changeset is.
Thanks for your contributions to better SCM design through BK. I agree with the potential value of tracking files and branches as well as commits and being able to easily navigate through that space -- as well as the emphasis on adding more metadata to make future understanding easier.
It's unfortunate we don't have a better funding model for FOSS development (or alternatively a basic income) because otherwise BitKeeper might have been open from the start -- and then we might have avoided the limitations of git as a hasty workaround for licensing issues.
> But lots of times you want to look at the file view, find the line of code that looks like the problem, and then zoom out to the changeset. BK makes that trivial, Git makes that miserable to impossible.
This criticism doesn't ring true for me. What are you saying is missing in the Git experience?
In Git, I would use `git blame` to determine which commit contributed the problematic line in question. It displays the file along with the commit that most recently modified each line. At that point, I know which commit last changed the line (and the commit is the changeset). Aren't we done?
If I need more history, I can use `git log` on the filename to see what commits have changed that file over time, and I can inspect how each commit individually changed the file if needed. There are editor-integrated tools to walk back through this history easily.
Looking at a file, and then looking the commit which last modified a specific line in that file is a trivial operation in git that I do on a regular basis.
On the largest repos I've worked with `git blame` takes a second or two: long enough to be annoying and make me wish it was a bit faster, but still fast enough for interactive use and well short of the threshold where I would be tempted to context switch after invoking the command. On most repos it's perceptively instant.
I suspect on a HDD `git blame` would be unbearably slow on anything but the smallest of repositories, but it has been many years since I last worked with source code stored on a HDD.
That's true, I'm not the op, but changesets get in a way when using git for SCM, where the typical workflow is to merge between two branches on the level of subdirectories.
In my company people simply use checkout to apply changes from another branch, but that doesn't handle well for example file removals. It also creates a completely different commit so the branches diverge more and more, making things like rebase more time consuming.
It is a huge pain, which would not exist if we would use SVN as a backed for example.
I agree that that's not well supported, because it kind of goes against the grain of git thinking about project versions rather than file or directory versions.
What you should probably do in this case is use git merge nevertheless, but reset all changes outside of the directory of interest before committing the merge. This way, you get the history of the merge in the DAG, which will make git's merge resolution work.
Unfortunately, I'm not aware of a built-in way of doing this.
What do you mean when you say "look at the file view". I read that as just looking at an annotate view and then find the changelist that the change was part of.
However, this is something every single version control system can do, so surely you were referring to something else. Could you explain what operation you were referring to?
So BK is kinda weird in that the metadata that binds all the files together in a commit is just another version controlled file.
We built a GUI tool that lets you look at a versioned file, it shows you the graph in the top pane and either diffs between two versions or the contents of a particular version in the bottom pane.
It's the goto tool for figuring out stuff. It is not just a GUI version of "git blame". When you use it you can see the history of each line by hovering over that line, you get a popup that shows the checkin comments for that line. And it is fast, as in below human reaction time, so you use that feature.
And you can double click on any line and boom, you are looking at the changeset that introduced that line.
I'm tired so I'm probably not doing a good job explaining this, but we supported commercial customers for a couple of decades and we had just incredible response time to each issue and I credit this work flow for that. Someone would call and say I have this assert and one of would get into the gui, start looking and we would know the cause of the problem in seconds or single digit minutes and I don't me 5-10 minutes, I mean 1-2 minutes.
Maybe I'm clueless and there is a way to do this in git but I haven't found it. When I have to work with git repos I fast-export them into BK just so I can have a more sane way to look at the history. It's not great history but it's better than Git.
Edit: I didn't explain what that gui did on the ChangeSet file. So that's what gitk (gittk?) is, it shows you the repo graph. You can click on a node and see the commit, you can left click and right click and see the diffs between those changesets.
So far as I know, BK is the only system that puts the metadata in the same system as the user data.
SVN does version the whole repository, so a revision can contain any number of changes to files, but it tracks eachs files history too. So a SVN revision is a collection of changes to any number of files/directories.
When I make a change, I want to mark that the I changed 3 files simultaneously. Not that I intended the system to work with only the first file changed.
I don't claim that Git is perfectly. But most of the problems are solved by better UI (Git does track when files are renamed, created and deleted), not by wrecking what a commit is.
Git understands commits. CVS style 'every file has it's own history' is wrong. Git's not perfect and I would not be too bothered if Mecurial beats it long term, but my god if I had to go back to CVS-style flow with concept of multi-file checkin, I would quit my job.
> Git does track when files are renamed, created and deleted
Actually, no, it doesn't. Git tracks just the before/after state: before the commit these files existed, after the commit those files existed. It infers creation/deletion/rename, when necessary, by comparing these two (or more) states.
I think it's interesting, it's checks off a lot of my boxes. We wanted to do a product that we called "software dev in a box" that was SCM, bug db, wiki, etc. Fossil comes closer to this than we did. So that part is really cool.
I'm not a fan of using a DB to store versions. It's just not the right tool. Before we open sourced, we jealously guarded "the weave" which is how the history data is stored. The weave gives us so much, bk blame is instant, bk grep is instant, there is a "bk grep -R" that will look in all versions of a file that is instant, or you can do "bk grep -R<revs>" and look in just those revs, all instant.
The weave is compact, fast, merges better, it's just a better storage format than a DB. Here's an example. In most version control systems, lets say there is a 100 line file. I clone that repo and I modify the first 51 lines of that file. You clone the same thing I cloned, and you modify the bottom 50 lines of that file. So we have 99 unique lines and 1 line that we both modified. Now Joe Merge clones my repo and merges your repo. He's the guy that closed the DAG. He had to manually merge the one line that we both changed so when you do $SCM blame the correct answer is the top 50 lines are me, the bottom 49 lines are you, and the manually merged line is you, right?
That's what happens in BK. It's not what happens pretty much anywhere else. Either the entire top chunk or the entire bottom chunk will look like it was done by Joe. Why? Because everyone else passes data by value, BK (the weave) passes data by reference. Everyone else copies the data across the merge point. BK does not, the only new data that will be in the merge node is the one line that joe had to merge by hand.
This can have some space savings implications, which can be a big deal for big files, but in my opinion the far bigger implication is blame. Joe merged in your stuff and now your stuff looks like he wrote it. Someone is tracking down a bug and they should be talking to you but they are talking to Joe.
> That's what happens in BK. It's not what happens pretty much anywhere else.
I'm not sure I understand your example. At least, the way I understand it, both git and mercurial deal with it the way you say BK does. They attribute the 50 first lines to you, the 51th line to joe if it looks like neither what was in your or the other's version, and the last 49 lines to the other.
Didn't intend for it to come across as personal, just pointing out that it would be very hard to make unbiased statements about git when you have a dog in the fight.
It would be like the Myspace founder/creator badmouthing Facebook and claiming Myspace was still superior and that Facebook's way of doing things is insane. Whether or not the claims are legitimate is irrelevant in light of the fact that your competitor squashed you and may have made you bitter.
>Git has no file object, it versions the repo, not files.
Which is the correct thing to do.
>if you have a graph per file (like BitKeeper does).
Which is insane to do because you're basically making things more complex than they need to be, which is why merging in git is not only SUBSTANTIALLY faster than nearly every SCM out there, it also works a good deal of the time without issue.
>No file object means no create event record, no rename event recorded, no delete event recorded.
Most people do not need these. If you're going to choose to use file GCA's for the reasons listed above then you better have a damn fucking good idea of how much these features are actually used, because the trade offs are ENORMOUS.
> That's insanely slow in a big repo.
And also not really done that often. So you know, good job choosing to optimize for things no one is going to use on an astoundingly consistent basis or even needs to be super duper speedy in the first place.
>You all lost out on "the most sane and powerful" as a result.
Your concerns are misplaced? Misdirected? Git does things a certain way and people use it because people were tired of SCM's that decided to fix problems no one really cared about, and then do the things that developers do care about very poorly. You actually demonstrated this pretty thoroughly in your own post while attempting to call ,presumably, bitkeeper "sane."
>Calling it a sane and powerful source control tool is just not supported by the facts
I'm sorry, feel free to tell every other developer in the world, including the ones that are involved in far more collaborative efforts than your work requires, that the tool they're using is just not sane. I guess being one of the most used SCM's in the world, on one of the biggest OS projects in the world aren't really relevant facts into how "sane" an SCM is. I guess that's totally why bitkeeper used to be sold and now is open source.
Your claim about merging is false though, demonstrably. Picking a repo gca when you could have used a much closer file gca is better. BK does that and automerges, correctly, more frequently and is way way faster than Git.
I've written two source management systems. I'm confident in my knowledge. Arguing with some random dude who thinks he knows more than me is not really fun. So go enjoy Git. Lots of people are too busy/whatever to know what they are missing, maybe that's you. It's not me, I kinda like my audit trail to be accurate.
Even Linus admitted to me in my kitchen that Git's audit trail is lossy. But go enjoy Git, I'm glad it works for you. Knock yourself out.
That doesn't really explain why perforce, and mercurial seem more popular, nor the lack of current buzz around the project. I could be wrong but I can't say the mind share is very high.
>BK does that and automerges, correctly, more frequently and is way way faster than Git.
Demonstrate it. You said it was demonstrable so presumably your Totally Sane SCM project should have evidence to back this up.
>Arguing with some random dude who thinks he knows more than me is not really fun.
Why because you basically made a complete fool of yourself?
>Even Linus admitted to me in my kitchen that Git's audit trail is lossy.
You might describe it as FUZZY, but not "lossy." Nothing is being "lost." My random internet dude that is actually an insane assertion to make, especially the anecdote about Linus being in your god damn kitchen. Not that it really changes anything about the characteristics of any SCM and why people use it.
>But go enjoy Git, I'm glad it works for you.
Enjoy your dead project. Glad it could be surrounded by a graveyard of other Totally Sane (C) SCM's who are collectively responsible for an untold amount of wasted man hours and licensing fees.
Linus, in my kitchen, surrounded by impressed women who were asking how he lost weight. He had just answered the question
"did you give up drinking" with a "hell, no!"
Funny story, I invited Linus to the pig roast and he didn't RSVP so there wasn't anywhere for him to sleep. I ended up sticking him and his daughter in a VW popup van :)
You can kindly wander off now, random internet dude. Especially since the guy who wrote Git agrees that it is lossy, so take your FUZZY and go home.
People used git because it was a tool that solved some SCM headaches. People use it now because it solved some headaches and because of the network effect.
I think Git won because it was superior to CVS and SVN. I don't think it won because it was better than BitKeeper, Fossil or Mercurial.
Lastly, does anybody have experience with git merges and BitKeeper merges? You make it sound slow, but it also sounds like you've never used it.
You can go look at BK's merge alg, it's quite cool. And the basic implementation was done in about 20 minutes. And it is extremely fast.
I should write up a blog post about it, it's pretty complicated to understand because to get it, you need to understand SCCS's interleaved delta format. If you understand that format, then imagine that you put a line number in front of each data line in the weave. Check out the GCA, local, and remote versions of the file with those line numbers prefixed. Now run a 3 way merge on that.
All the complexity in smerge.c comes from dealing with the cases where that doesn't work, but man, it works great 99% of the time.
Ice never used Bitkeeper so I can't say if the statements he made are true or not, but arguing that something is good juts because a lot of people use it is most definitely not a valid argument.
The computing world is full of absolutely terrible technologies people keep using even though much better alternatives exist. At first I considered listing a few and then I realise that it's likely most readers are actually users of one of those, but I'm sure you can think of a lot.
We took too long to open source it, people didn't like it being closed source and used for the Linux kernel. RMS was hugely butthurt that we had the best system and all open source had was CVS and SVN.
Whatever, it paid the bills nicely for almost 20 years.
>We took too long to open source it, people didn't like it being closed source and used for the Linux kernel.
And for things like this:
>I didn't want to do anything that even smelled of BK. Of course, part of
my reason for that is that I didn't feel comfortable with a delta model at
all (I wouldn't know where to start, and I hate how they always end up
having different rules for "delta"ble and "non-delta"ble objects). [1]
Most sane? That’s a matter of perspective. I’m still a little shocked that git “won out” over mercurial. Even as Subversion was eating CVS, everybody knew distributed revision control was going to ultimately prevail. I was pretty sure Darcs wasn’t going to achieve popularity, but I’d have bet anything that Mercurial would be the successor to Subversion. It was far more natural / similar for anyone who's ever worked with CVS/SVN.
If you don't need a distributed system, I'd argue that Subversion is at least as sane as anything else, at least at the plumbing level, once you realize that it is not so much a VCS as essentially a remote file system with atomic multi-file operations, automatic sequentially numbered annotated snapshots after write operations, and near zero cost copy on write copying.
You establish a naming convention on top of that to use it as a VCS. For example, one common such convention is to have directories named "trunk", "tags", and "branches" at the top level, with your projects living in subdirectories under "trunk". Under this convention the way you represent a tag is by simply copying your project directory from trunk/my_project to a new directory named tags/my_project/tagname. Similar for branching...just copy to branches/my_project/branch_name.
Don't like that convention? Develop your own that fits your work better.
> If you don't need a distributed system, I'd argue that Subversion is at least as sane as anything else
Oh my yes. Binary diffs were lovely. Also, support for large assets.
For reproducible data science work, you’re going to need at least code and data. One of those things is a total PIA with git if the data aren’t trivially small.
Seriously, I find git very non-sane if not quite insane. I really liked darcs and still prefer Mercurial but sometimes the value attained by adopting standardisation overcomes the extra value in a "better" alternative, and so… here we are.
This is a great question. Where I work we use TFSVC and the VP of Engineering is willing to green-light a switch to Git as long as we can make a data-driven, objective case for why Git is better. In other words: "the devs prefer it" or "using Git makes me happier" don't wash as valid reasons. Branching in Git is certainly much easier but the counter that "you can branch with TFSVC too" is true, even if it's slower and eats up your hard drive space faster...
That's not necessarily true. Everyone knows at least one dev who would prefer the world's stupidest workflow. And where it is true, it can directly contradict other priorities, like 'the devs prefer to check bug fixes directly into production because it saves them the work of staging'.
I think it's fair for someone to explain why he doesn't use Git when he probably gets that question a few thousand times a year. He's not the one that posted it to HN.
I said the article does, not the HN post. And yes, explain that YOU made Fossil in that case... it's probably one of the biggest reason SQLite uses Fossil. Way above all others.
Now, the reasons why you created Fossil, that interests me. But this is not the case. It's an ad.
But then, on that page, explain that the author wasn't happy with Git and wrote their own. It's as if I post an article on why LibreOffice is better than Microsoft Word and it turns out I'm the founder of LibreOffice or something. Of course I'd think it's better and of course I know just the right things to mention to convince everyone, because I was there to build them or check them in.
> There is no significant way in which I found Pascal superior to C, but there are several places where it is a clear improvement over Ratfor. Most obvious by far is recursion: several programs are much cleaner when written recursively, notably the pattern-search, quicksort, and expression evaluation.
While the author goes about comparing Pascal to C (and Ratfor), he fails to disclose his affiliation with being the author of C.
"Why Pascal is Not My Favorite Programming Language" is an article by Brian Kernighan. Kernighan didn't create C. Dennis Ritchie did. And between Ken Thompson, Kernighan, and Ritchie, Kernighan had the least to do with the creation of C.
What Kernighan did do is write the book on C. (You might argue that it's a conflict of interest, too, but it's not. It makes a lot of sense that someone would like a programming language so much that they'd decide to write a book on it.)
It seems perfectly reasonable for someone who created an alternative to explain why they did it. Maybe the page should be titled "Why sqlite dev(s) created Fossil", but it certainly doesn't come across as disingenuous to me.
Because they don't want to come across as pushing their own product, I guess. I saw a recorded talk by the Fossil creator a couple of days ago where he talked specifically about git and how it could be improved. He was extremely reluctant to name Fossil explicitly, although he stated upfront that he has developed similar software.
There are reasons to dogfood that aren't 'does it pass integration tests'. Stuff like 'is it to hard to use this feature in a real way', and 'performance bottlenecks that you might not have thought of'.
Good list, but I feel like they bury the lede: integrated wiki, issues, and notes. That doesn't scale to every project (linux kernel for example), but for a lot of projects it's really cool.
Other selling points from the Fossil site:
* Integrated Bug Tracking, Wiki, and Technotes
* Built-in Web Interface _and_ Self-Contained (I combined these)
* Simple Networking (no git://, just HTTP and SSH)
* CGI/SCGI Enabled
* Autosync - "Fossil supports "autosync" mode which helps to keep projects moving forward by reducing the amount of needless forking and merging often associated with distributed projects."
* Robust & Reliable - "Fossil stores content using an enduring file format in an SQLite database so that transactions are atomic even if interrupted by a power loss or system crash. Automatic self-checks verify that all aspects of the repository are consistent prior to each commit."
It would be easy to get into comparing git and Fossil feature by feature. That's interesting. But it's more interesting to compare the philosophies between the two tools.
I think it’s a philosophical difference that the Git maintainers likely have a strong position with. Some people like having one piece of software that does everything in an integrated way, and some people like purpose-built tools that focus on doing one thing well.
Git is a version control system; it’s not a complete project management solution. You can build integrated solutions around it, like GitHub, but if GitHub’s issue tracking is too primitive for you and you want to use Jira instead, that’s super easy.
I actually favor that approach myself. If I wanted to use Fossil but their issue tracker didn’t work for me, the best case scenario is that I just integrate Fossil with Jira or whatever and just haul around this vestigial issue tracking system that I don’t use. Likewise, if I wanted to use Fossil’s issue tracking but not their version control, what then?
> Git is a content-addressable filesystem. Great. What does that mean? It means that at the core of Git is a simple key-value data store. What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.
That's a great thing for matching the mindset of someone writing operating system software.
Fossil is on top of a relational database... which is also a great thing for matching the mindset of someone writing a relational database.
Interesting and possibly relevant to note that SQLite is not your typical opensource project:
"Open-Source, not Open-Contribution: SQLite is open-source, meaning that you can make as many copies of it as you want and do whatever you want with those copies, without limitation. But SQLite is not open-contribution. The project does not accept patches. Only 27 individuals have ever contributed any code to SQLite, and of those only 16 still have traces in the latest release. Only 3 developers have contributed non-comment changes within the previous five years and 96.4% of the latest release code was written by just two people. (The statistics in this paragraph were gathered on 2018-02-05.)"
If we accept patches, then the person who has submitted the patch owns the copyright on that patch, and that means the software is no longer completely in the public domain. Unless, of course, the patch submitter has filled out a lot of legal paperwork to dedicate their code to the public domain, which rarely happens.
> In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches. If you would like to make a suggested change, and include a patch as a proof-of-concept, that would be great. However please do not be offended if we rewrite your patch from scratch.
The argument that "Git lacks native wiki and bug tracking" is actually a very good thing: each of those 3 things (scm, wiki, bug tracking) are very different, and an scm must not bring a process/management system with it. Obviously, often it's nice to have all the things packaged, and GitLab does it amazingly well (maybe does too much these days though), but at a some level of process and IT management, wiki and issues have their tools already and should not depend on the scm.
"The mental model for Git is needlessly complex and consequently distracts attention from software under development. A user of Git needs to keep all of the following in mind:
The working directory
The "index" or staging area
The local head
The local copy of the remote head
The actual remote head
Git contains commands (or options on commands) for moving and comparing content between all of these locations.
In contrast, Fossil users only need to think about their working directory and the check-in they are working on. That is 60% less distraction."
I don't think about any of this when developing. I check out a branch, work on it, commit to it, and push it back up. If the fix is larger than a few commits i'll make a feature branch. What's so hard about that?
Also, everyone using git isn't a bad thing. It means we finally have at least one standard in development.
You get git. Good for you. A lot of smart people get git. A lot of smart people don't. I'm a smart person. I use git every day, 7 days a week. On my side projects, it's fine. Never a problem. At work, the workflow is more complicated, due to many branches, and many submodules. Submodules are a pain in practice. I think the mental model is too complex. This manifests as frequent unintended results, from simple operations that in my mind need not have unintended consequences. I never had any such problem with Perforce, even though I have used git twice as long. Your mileage obviously varies.
I'm sorry, I don't buy this at all. Git is one of the most simple source control tools there is. If you can't understand a DAG then there isn't much else you probably can understand in the development world.
>This manifests as frequent unintended results
No it doesn't. Every time I've seen people complain about "unintended results" it's literally been because of the above, and they've been complete morons so I'm never surprised when these people have "trouble" with git. You're building a graph, and you're doing pretty basic manipulation of that graph.
>I never had any such problem with Perforce
Uh what? Permission issues? Terrible branch performance? Merging between branches is basically a gamble -- it's actually insane how much this used to mess up over the most basic of merges. Having to "upgrade" the system? Ever done that. I'm going to guess no.
I can run circles around most of my fellow developers working with Git. I know how to recover from almost anything I do with Git. However, I put in a lot of hours to get there. I have nothing but distaste for Git's command line. It's an abomination. The command line flags are inconsistent and require real work to commit the right combinations to memory. Anyone, whose solution to this is "You are an idiot for getting confused by this confusing UI" is just flat out wrong. There are better UI's that are less confusing with the same featureset as Git. Git won not because it was that much better than the others. It won because the Linux kernel uses it.
Right now today, there are developers out there who are putting in the work to learn git. They are smart and dedicated to their craft. They are also losing work because git makes it easy to shoot yourself in the foot and because it doesn't easily surface to you how to recover when you do so. It's not because the developer is an "idiot". It's because git's UI is user hostile and does a poor job of reflecting the core concepts in operation when the user is using it.
To be fair to Git they've put a lot of effort in fixing this. It used to far worse than it is now. But due to backwards compatibility and other concerns there is still a lot confusing and unintelligible options out there and that is even after you get the whole concept of a DAG and the index vs working tree.
> Git won not because it was that much better than the others. It won because the Linux kernel uses it.
This is a pretty silly assertion to make. You're actively ignoring the SCM's that were chosen by far larger org's. Git took the dev world by storm because it was better. Not because an open source OS used it. Other SCM's had years worth of developer training and interaction with their systems and STILL lost. Years worth of sales connections, demos, networking, you name it and STILL lost.
>They are also losing work because git makes it easy to shoot yourself in the foot
Most people can get the quickie commands down pretty fast, and as long as there are no complaints or errors, it's all OK. But as soon as you need something that takes several steps, your average git user falls apart. Rebases are the most common example. If you don't block force commits from the get-go on a project used by more than two devs, you'll learn that you need to do that pretty quick.
Exactly. Many users get confused and a quick look at the man page shows them a magic --force command that seems to do what they want. Not to mention that we sometimes encourage the use of --force sometimes when we are doing pull requests since we want them to push a squash of all the different commits. It's like a tragedy of the commons playing out in multiple dev teams the world over and it didn't have to be that way. The problem is no one took the time to devote some UX to the git command line until it was too late.
Everyone of these is an example of the git UI/UX confusing you about what is happening. Reflog allows you to recover if you know about it but if you don't you will feel betrayed by the tooling. Blaming the developers for not knowing what Git was going to do is ignoring git's bad ui/ux decisions. Mercurial for instance doesn't have this problem. In fact it tells you whenever you do something that might lose data that it made a backup and where it put the backup at. Everything the developer needs to know right there after they accidentally shot themselves in the foot trying to do their job.
Other SCM's had years worth of developer training and interaction with their systems and STILL lost. Years of worth sales connections, demos, networking, you name it and STILL lost.
You are comparing Git to high cost options like Perforce which lost because they were expensive and OpenSource doesn't like expensive proprietary solutions in a field where OpenSource rules. I'm talking about solutions like mercurial which have a similar underlying model and a better user experience out of the gate. Git beat hg not because it was better architecturally. They have the same underlying datamodel. It won because the linux kernel used it and that gave it the necessary cachet and authority by association. This is not bad git is worlds better than perforce or cvs or svn in many ways. I'd rather use git than most of those in most situations. But I also would rather use hg than git any day because it's the same datamodel with a ui that doesn't mislead me.
>Everyone of these is an example of the git UI/UX confusing you about what is happening.
Um, I don't see it. Your first 2 links are the same question. All "3" examples basically give you a single command to run to get your work back. What's confusing?
> Mercurial for instance doesn't have this problem.
WHAT? Mercurial actively changes the underlying data objects it stores and there is NO WAY to get them back. Rollbacks actively delete content and you CAN'T restore them, shuffle them around, or anything like that. You do have the Journal extension but it's not apart of the core product. And people definitely haven't had problems with extensions before ;). See: https://book.mercurial-scm.org/read/undo.html
>In fact it tells you whenever you do something that might lose data that it made a backup and where it put the backup at.
So does git. Every operation that Mercurial gives you a warning about git does as well. Even better, git still gives you a way out even if you do the thing you shouldn't have done. It warns you in the prompt, it warns you with comments in the text editor of your choice that pops up for input.
Is there some operation you can give an example of that git lets you do that Mecurial somehow magically prevents or gives you a way out of?
>You are comparing Git to high cost options like Perforce
And what about the other opensource SCM's that lost?
>They have the same underlying datamodel.
They do not.
>I'm talking about solutions like mercurial which have a similar underlying model and a better user experience out of the gate.
It definitely does not. Not sure what you want but I'll take practical solutions like git being able to handle partial checkouts, branches, and blames over a nice gui. Git may have inconsistent command flags but it still does what you need it to.
>use hg than git any day because it's the same datamodel
Please don't be that guy. You saw people having issues with git. Have you tried explaining why they ran into those problems and tried to find out where the misunderstanding came from? It's easy to criticise about what you already know. But everybody's experience is different. There's a practical issue with mapping every command to each copy of the dag, that people sometimes don't get.
If you haven't learned about urbit yet, you could try that for the same "I don't know what you're talking about" experience in your life.
> Have you tried explaining why they ran into those problems and tried to find out where the misunderstanding came from?
Of course? I'm also free to call them idiots online.
>There's a practical issue with mapping every command to each copy of the dag
That is certainly the most valid criticism of git, in my opinion. And certainly the biggest learning curve for most people. But that's not where people tend to have true, blue operational issues in my experience.
This is why some devs shouldn't design UIs, for many people their source control is simply a UI that they want simple with simple workflows. I know people with PhDs who just don't understand git, not because they can't understand DAG, they simply don't want to have to understand it in relation to a tool. The biggest thing with Git is you have to invest time into understanding it, that alone is one of its failings and hence why there is so so so many guides that try to give advice to people who don't really want to understand it.
>they simply don't want to have to understand it in relation to a tool.
There isn't a way around this. With any tool, especially when dealing with SCM. You have to understand how your selected tool is going to perform certain operations because it's going to dramatically effect your workflow with that tool.
>The biggest thing with Git is you have to invest time into understanding it
Just like you have to do with every other SCM, or just generally any piece of software?
Git doesn't do that. Perforce and TFS definitely will. The very moment I add something to the index it's basically recoverable. I'm not sure what you're really getting at here.
>So I'd like you to prove that git won't lose work.
It does not "actively" lose work. Losing work in Git is a pretty specific operation and generally requires you to spell out what's going on (either via 'reset' or 'checkout'). And also less likely to happen, since you are regularly using the index to stage changes and should be commiting often (since it's cheap to do, unlike trying to commit in BitKeeper, let's say, where locking problems sometime even require to change the window you're committing in [1]). The moment things are in the index/repository, it's pretty unlikely you're going to lose work.
Say, in other systems, me accidentally clicking something is a surefire way to just wipe it out without a second thought. In TFSVC I can easily undo my active changes with a simple click of a button. Again, the contention was that you actively lose work while working with git. It's hard to do, as deleting work in git requires some fairly specific commands.
Even removing a file requires you to stage the deletion, then commit it. And even THEN, even if you amend some other previous commit with that deletion you can role the entire thing back with reflog and get your file back. If I took that amended commit and rebased onto another branch, rolled that all into one commit via squashing, I'd still be able to go back to where I was with the reflog.
If however, you edit some lines in a file and then tell git to checkout that file again and they're not staged, then sure, you're going to lose work. But that's not something that's common and it certainly isn't a recipe for having git "actively" delete work.
> If the repository is locked, and you try to bk commit, the commit will
fail. You can wait for the lock to go away and then try the commit again; it should succeed. If the lock is an invalid one (left over from an old remote update), then you can switch to another window and unlock the repository. After it is unlocked, the commit should work.
If you want correctness then you lock the repo when you are updating it. We do that and make it work on NFS, which is not that easy to do.
If you want to live in a world where two people can be wacking the repo at the same time with undefined results, be my guest, you seem like that sort of person. We are not. We like atomic commits.
As for BK being expensive compared to Git, you are so right. 10 years ago. These days we do quite well and we do it correctly.
I dunno why you have a hardon to smear BK, but bring it dude, I'm happy to make you look foolish.
>I dunno why you have a hardon to smear BK, but bring it dude,
Excuse me? You brought up BitKeeper. You're the one that claimed I've never used a sane SCM, because you think BitKeeper is sane and I clearly didn't include that in my list of the World's Most Sane SCM's list. Added bonus for the claim we're all "missing out" on sanity.
You're actively on this forum basically demonstrating why BitKeeper and other SCM's have lost. You bring up silly things like templated commit messages, random anecdotes that don't technically make any sense, and claim annotating/blaming history in files is hard to do in git.
>These days we do quite well and we do it correctly.
Some guy got so pissed off at your SCM and made a new for one free, without wasting untold amounts of man hours and capital. And he wasn't the only one (Mercurial). He did it so well that other companies now use his SCM as a cornerstone for their platforms (GitHub and BitBucket).
> He did it so well that other companies now use his SCM as a cornerstone for their platforms (GitHub and BitBucket).
I think this is your fundamental misunderstanding. Git would be another esoteric tool for crazy kernel devs without GitHub. Git won because of GitHub. Very simple. Out in the real world of teams of 12 developers rewriting the same business logic over and over until they retire, Git is GitHub. I've worked with several developers who don't understand the difference, and think that they're using the GitHub client whenever they interact with Git locally (and, if they use GitHub Desktop, they are). All your discussion about git's dominance being a testament to the tractability of the git UI is a false equivalence.
GitHub could switch to BitKeeper under the covers overnight, and as long as they branded it in a non-scary way, very few people would know the difference.
When is it ever the case that general acceptance means "objectively the best" instead of "obviously the path of least resistance"?
I think network effect and the fact that the linux kernel was in git. When the kernel was using BitKeeper that was huge for us, people went "welp, if it's good enough for the kernel it's good enough for us".
And I think, might be wrong, but I think github was first.
No it didn't. GitHub was built because of git's popularity. That's a really silly claim to make and doesn't even logically make sense. The tool was popular, someone built a hosted service for it.
Like think about the insanity of what you're saying. A company was built around some "esoteric" tool, despite plenty of alternatives existing at the time and you think people just whimsically invested in this company because of... what exactly?
You're literally just talking from historical ignorance and making up shit. It's so easy to verify any of the claims you decided to type out yet you chose not to anyway. GitHub believed in Git's popularity, and it paid off.
>I've worked with several developers who don't understand the difference
That's a really neat anecdotal story.
>GitHub could switch to BitKeeper under the covers overnight
No they couldn't. They couldn't even switch to it over a two year time frame. You are making absolutely ridiculous claims with nothing to back it up. BitBucket, owned by Atlassian, has hosted Mercurial repositories. Don't see anyone lining up to switch over to Mercurial.
A really easy claim that is obviously negated by a practical example and you still chose to make it? Legitimately silly.
>All your discussion about git's dominance being a testament to the tractability of the git UI is a false equivalence.
That's not even what that word means.
>When is it ever the case that general acceptance means "objectively the best" instead of "obviously the path of least resistance"?
Don't know if you were around for when git was released but there is now a whole graveyard of SCM's that people actively dropped to switch to git. And BitKeeper was definitely one of them. But I guess those were all dropped because of a future platform that no one knew about at the time. Heavy sarcasm by the way.
> Don't know if you were around for when git was released but there is now a whole graveyard of SCM's that people actively dropped to switch to git.
Indeed I was, and FWIW I distinctly remember everyone throwing their weight behind Mercurial, in large part for its superior cross-platform support.
I stand by my position. git usage would be minor without GitHub. GitHub could switch off git if they wanted to and they'd take most of the git user base with them.
I agree with cookiecaper. We went hard (for us) into marketing before giving up and what we learned by going to dev conferences is that people think GitHub is source management. If you talk about any sort of workflow other than what github provides you can just see their brain switch off.
It's sad, because there are other useful work flows, but GitHub is SCM at this point. I agree with what someone said elsewhere, they could swap out git for bitkeeper and nobody would care (well the people that are still butthurt over the licensing would whine but it's apache v2 now, that should be good enough).
>If you talk about any sort of workflow other than what github provides you can just see their brain switch off.
I'm sorry, what dev conferences are you going to? You're kind of just claiming that these same people are too stupid to understand what git allows you to do out of the box so they wouldn't mind BitKeeper's (or any other SCM like it) problems and limitations as long as GitHub hosted it for them with a nice logo (which again, isn't true because other hosted SCM's solutions lost as well). That's just incredibly tone deaf and doesn't make sense from a historical timeline perspective. This is just straight up denial at this point.
>well the people that are still butthurt over the licensing would whine but it's apache v2 now, that should be good enough
It's not just a licensing issue and you know it. You are being dishonest with everyone here and yourself. There is a historical record in the lkml archives that you're choosing to ignore.
This is fine when you're working on your own project.
However, if you want to send a patch to someone else's project on Github, you have to create a remote fork, download the fork, and push it to your remote fork, then send the pull request. (I have tried other ways but they don't seem to work.)
This is fairly annoying since it's extra steps and it clutters many people's Github accounts with forked versions of projects they contribute to. There's no good reason for these forks to exist.
But this is a problem with Github and not git itself.
On the other hand, do you really want random people on the internet to be able to inject objects into your git repository by creating a branch or whatever? Especially after SHAttered, that's a huge liability that sort-of justifies the "you take yours and go over there" approach.
Of course not. You want to send them a patch and have it show up in their "inbox" on Github as a pull request. There's no reason to put pull requests into the git repo.
I normally use git commit -a. I use git diff to preview my commit. However, recently I had to git add a file. After that, git diff didn't show the file, even though git commit did include it in the commit.
So, it doesn't take long to realise that you need to understand the index after all. You need to learn git add -N, or use "git diff HEAD" instead of git diff.
In particular, I'm interested in the last section "4.2 Features found in Git but missing from Fossil".
Maybe this is some deficiency of my workflow, but both those things make Fossil sound extremely unappealing to me. You're telling me I have to push all local changes every time I want to push anything? What if I was just debugging or experimenting with one branch, and then I had to switch to a "serious" branch to push some critical changes?
And rebasing is fantastic when you have a feature branch workflow.
A "default" user of fossil here - meaning all my personal projects are done with fossil .. only.
I use git professionally (of course), but while I've used rebase to fit into workflows with github though I dislike the history garbling it does, I've never bothered with rebase with my fossil projects though I use feature branches. Just normal merges. I let it fork and merge at an appropriate point. (What I actually miss now and then is "git rerere" and a useful GUI like gtk.)
Some notes on this choice focusing on the differences I actually use -
1. Fossil gives me peace of mind that I have everything (all code/notes/bugs/tags/branches) synced to the server with a single command "fossil sync". If you have autosync, then even better. I never get this peace of mind with git.
2. There're only two files to deal with - the fossil repo file and the fossil executable - which functions as the command line tool, a server for cloning and syncing, and minimal GUI.
3. I like the tagging system in fossil more. Since you can reuse tags unlike git, I just use a single "release" tag to mark code points pushed out .. with the commit containing details. I similarly use tags to mark points in the commit tree to revisit later (for example) - across branches. In fact, branches are just recurrent tags in fossil, so one less concept.
4. More peace of mind 'cos I can't leave a "dangling commit" that will be "garbage collected". Since I can't leave an unreachable commit, if I want to stash something, I just commit it with full notes, update to an earlier commit and "let it fork". (fossil does have stash, but I don't bother with it as .. less peace of mind).
5. I can customise the bug system for different uses.
What is wrong if your debugging of experimental branch is pushed to the repository? Someone can find this code useful for something else later. And after all, all the experiments are part of the project history, that fossil strive to keep unchanged and as it actually happened.
The same is about the rebase - the difference is that while git keeps the history "as the developers want it to be", fossil keeps it "as it actually happened".
Fossil is chauvinist if it's asserting that everything I save locally "is part for the project history" and must be pushed. Don't tell me what's important for me to push.
Git preserves history as it happened on the remote, which is what matters for collaboration. Why foul up the canonical history up with what amounts to scratch paper? Typically, people only amend local history if they feel it would easier to follow later. Why take that option out of their hands?
The way I do work and the way I submit it are two separate problems with overlap. I use git for both and it works beautifully. Even for personal non-code projects I never intend to collaborate with others, I still use git because it promotes a workflow I'm comfortable with.
On the team I'm in, the convention is to include code changes in the same commit as any test changes. There are advantages and disadvantages, but it's the convention, and it's best we follow it or else it might lead to CI problems. But, in my personal workflow, I tend to try to change the tests, commit, code changes, commit, and then iterate. With git, this is as simple as doing it however which way I want, and then squashing the commits together before pushing. What do I do with Fossil?
I'm trying to figure out what problem it's trying to solve with this, and it just comes off as hollowly idealistic.
Git does not track branch history.
This makes review of historical
branches tedious.
This isn't true. It's just the Fossil example has a better UI than their github example. Pop open gitk or anything with a nicer UI and you can easily follow branch history.
Yeah, honestly, the other arguments didn't carry much weight with me, but this one I find really annoying. When I'm looking back at history that had a bunch of merges, I don't want to have to guess which branch was which before a merge. It also seems like such an easy thing to add - you can kind of hack it up by just adding the branch name to the commit message with a pre-commit hook, but it would be really nice if git just kept track of it for you.
Yeah, the git cli in its default settings is very hostile towards that information ^^.
Pull actually works the wrong way around – it merges the remote branch into the local branch and then pushes that as the new remote branch. So if you follow the first parent, you land in the pullers local part and bypass everything he merged.
Don't even need to delete the branch, git reset <newcommit> will reset the branch to the given commit, throwing out whatever was there before. You may want to add --hard to that, depending on what you're doing.
I can easily follow the history of long-deleted remote branches that I never had to begin with... you might need to pull it by ID instead of by name to get it locally, but you can still view the name, history, commits, diffs...
Again, that's not the same. Each commit in Fossil belongs to some branch, because its name and the fact of adding or removing a name is recorded directly in the commit artifact. In Git, branches are local references to commit (local in a sense that they are different for each repository instance, but can be synchronized). In Fossil, you can see which branch name the commit belongs to and if the parent commit had the same branch name or not; in Git branches are ephemeral entities which don't belong to commits, and once you delete the branch there's no way to know which name of the branch it belonged to.
i'm just not getting what you mean. if i delete a branch, i CAN still see the branch (and its name) the commits belonged to, even if i never had that branch locally.
i absolutely can, though. i went through this recently, trying to get back to a commit from nearly a year ago in a repo i'd never touched before.
the annoying part of getting it back was that you can't just do "checkout {name}" you have to do it by the SHA ID and then commit again by the original name, but before all that, you can 100% follow any since-deleted branch by its name. there's just a weird disconnect between viewing its history and having it directly in your hands again.
They don't maintain the branch name, or even the history of the branch: they just tell what are the parent commits of the commit; as mentioned above, this history can be a "lie". Fossil's history is immutable — once the commit is there, it's there forever (although you can prevent an artifact from being synched or displayed with "shunning").
That's not a proper data structure, and you can't point to a random commit after merging and deleting a branch, and ask git to tell you which branch it came from.
Yes, you can. The name is not stored in the history other than the merge commit message but the full history is there and you can ask git to show you the merge commit for any commit.
No I'm not, you do _not_ require keeping a branch around to have the complete commit history of the branch, the point it diverged from another branch, all the commits that were on the branch, and the point where it was merged back into another branch. Git doesn't keep the name of the branch in the history outside the merge commit message, but everything else is there.
- Eve has a repository, Alice and Bob have access to it. Eve goes on vacation and nobody commits to "master"/"trunk".
- Alice makes a branch "alice-fixes" and creates commits on it.
- Bob comes and creates a branch "bob-features" from some point from "alice-fixes".
- Bob then merges "bob-features" into "master" and deletes it.
- Alice gets fired and her branch is deleted without merging.
In git, you can't see that some of the commits in the history came from "alice-fixes". Fossil, on the other keeps track of branch names and _changes_ in commits:
When Alice created her branch: -trunk +alice-fixes
When Bob created his branch: -alice-fixes +blob-features
When Bob merges his branch: -bob-features +trunk
You can't delete this information, because it's recorded in the commit artifacts. You can't delete branches — you can only close and hide them (you can also apply edits to commits by adding "edit" artifacts [don't remember what the are called], but the history is preserved, not modified.)
I think that is just an example of losing the branch name, the commit history is the same in both cases.
If you really want that behavior from git, you can have it, just create a tag for each branch HEAD. Actually I think this is all that fossil does as well.
It's not the same, there's no commit indicating that Bob branched off Alice's branch. It appears as if the whole history before merging belongs to Bob's branch. There is such commit in Fossil.
> If you really want that behavior from git, you can have it, just create a tag for each branch HEAD. Actually I think this is all that fossil does as well.
Fossil has branch/tag name directly in the checkin manifest (I've linked to the document describing file format somewhere above.) Tag and branch names are embedded directly in the history, they are not separate references to commit hashes like in git.
I'd be interested as well. Of all the reasons, this (not fit for intended purpose) struck me as the only compelling reason that doesn't come off as an 'and also...' argument (ie: justificatios/rationalisation of a decision, post making one's mind up).
If I were to hazard a guess, this ability to follow descendents helps follow the evolution and intent of current state of code, given some previous state? (how did we get here from there and what else changed on the way).
In git that'll kinda sorta be possible with git bisect.
It seems to be a very common case for Git users too. People often read "git log" or its tree/graph version. Or click on the "commits" tab in Github PRs / branches.
I really like git these days, and I was initially stubborn, having grown up in the age of first cvs then svn. I design/implement core infrastructure and services for a large emergency services project. By nature the team that design and build it is especially conservative. It's only this past year I've been able to convince management that we should move to git because it offers advantages over the traditional tool (svn).
What does make me chuckle however is seeing git used as
a drop in replacement for svn without using any of the advanced branching/merging features it offers. I find it funny to hear a devops youngster eschewing the benefits of git after hearing that <n> hip opensource projects use it, just to find that they use using it with a single tree (or bunch thereof), just continually committing everything to the master branch.
I don't know anything about Fossil, but saying that no one understands Git just makes me think they're ignorant to learning Git. I doubt that's the case, but still, why say that?
Obviously it isn't literal. Myself and many of my colleagues have had confession sessions in which we all admit to regularly shooting ourselves in the foot with git, and needing to have a cheat sheet open, perpetually. That should tell you something, and it really shouldn't tell you that we're willfully ignorant, unless you want to sound smug.
Any system that has consistent-ish rules can be learned. The more rules, and the subtler they are, the harder it is to learn. Git's problem is that it is necessary to know how it really works, to avoid getting into a bad state. It's complicated enough that it is easier than it should be to get into a bad state. The extent of the rules you have to know to avoid trouble does not match the simplicity of the day to day operations you do on the repo.
Because the data model of git and what git actually does has virtually nothing to do with how most people think it works, conversely the limitations of the tools are not well understood and users are seldom able to help themselves. The arcane git options, error messages and documentation are not overly helpful regarding this.
Because most users of git don’t really understand it. It’s really hard to learn properly and so most users just memorize the commands for their project’s workflow and ask the resident git expert to help (or nuke their local copy) when things go wrong.
It’s not hard to “really understand Git”. If anything, it’s the opposite problem—Git’s implementation is conceptually simple enough that the only good mental model the end user could possibly have is the implementation model. Systems like Subversion or Perforce or Fossil probably have even more complicated implementations than Git, and largely that’s to support a dumbed-down mental model that doesn’t require learning how the tool actually works. With Git, you learn how it works and you’re set, more or less.
You've never botched a merge, try to undo it and accidentally lost work? It's happened to me and/or the people I work with roughly once a quarter since I started using git.
You shouldn't be able to lose any committed work in git. If you botch a merge, you can reset to one of the merge parents and try it again. If you botch a "destructive" operation like a reset or a rebase, you can undo it using the reflog, which is essentially a 30-day undo history for every branch in your repository.
No, I can't say that I've ever lost work in that manner in over a decade of using git. The first few times I made a mess of things while trying to do a nontrivial merge it admittedly took a while to figure out how to get back to my starting point, but I guessed (correctly) that it'd be worth spending that time to learn how to do it rather than giving up and throwing away work.
Indeed, that was then intent. The other points are serious, but then I thought "let's throw in a cartoon!"
I just wrote the referenced article last night. Normally, I takes months or years before something like this gets picked up and discussed on HN, and I have more time to refine the text. This one snuck up on me. Come back in a month or two and the article will probably be much improved. You are reading an initial draft.
On the other hand - it is a funny cartoon, don't you think? And it does kind of capture how most people use Git in a snarky kind of way, doesn't it? :-)
Absolutely, xkcd tends to reflect popular sentiment very well. On a different front, you made my day! One reply, and from the creator of SQLite, no less!
I feel comfortable in git because I've mastered SourceTree. Even 'rebasing' regularly seems to put me in the 1% of users compared to coworkers I've seen over the last two jobs I've worked on. I've seen so many git blunders, and experienced them myself.
It's the only thing I've tried since SVN days, but I can totally see the room for alternatives...
>>> The principle maintainer of SQLite cannot function effectively without being able to view the successors of a check-in. This one issue is sufficient reason to not use Git, in the view of the designer of SQLite
Ok I'll bite. Why?
I can see it's a nice idea - but at some point you add function X, then ten check ins modify that function. what's the difference between looking back at ten and looking forward at ten?
Perhaps I need an example.
If I remember rightly fossil was written by the Sqllite people? I tried it out for a while back in the day - I think it had this store the tickets in the branch alongside the code - it was an appealing idea, but got complicated quickly.
Purely speculative, but I imagine you identify a bug that was introduced in commit X from 2 years ago. You want to answer, "Which supported releases of this product need to be patched?".
isn't that just a question of looping over your release tags and asking whether each one can reach that commit? It's not too hard to script - I don't really see an advantage either way.
> Git encourages a style in which individual developers work in relative isolation, maintaining their own branches and occasionally rebasing and pushing selected changes up to the main repository.
> Fossil, in contrast, strives to keep all changes from all contributors mirrored in the main repository (in separate branches) at all times.
Sounds similar to my experiences with Bazaar many years ago. Ended up converting all project over to Git once we realised that nearly all external developers we worked with preferred Git and didn't have a clue about Bzr.
Semi interesting full circle side note: One of those projects was actually a rudimentary version control system for a report writing app, which used SQLite as the VC data store.
I switched as well from bzr to git. The most important thing that Git offered over alternatives was amending commits to fix mistakes, reordering commits on the local branch with 'git rebase -i', and partial checkins with the staging area. Git was the first VCS I used which could commit unrelated changes separately with ease. With other tools, you manually have to copy changes around until the working copy only contains what you want to commit.
Despite that, the Git command line interface is horribly inconsistent. Commands take separate words, `--options` or one-character flags without a scheme. Lots of synonyms such as staging area, index, or cache all meaning the same thing in the terminology does not help to make learning Git easier.
For new users I would recommend Gitless, which tries to create a better interface to git and to solve the ambiguity in commands. As Gitless is a frontend to libgit2, it works with any git repository and you can also use normal git commands. The downside is that documentation usually only shows how to do stuff with 'git'. If you only learned 'gl', you have no idea how to reproduce it.
> Ended up converting all project over to Git once we realised that nearly all external developers we worked with preferred Git and didn't have a clue about Bzr.
Why? There was a time not too long ago where this was true about git. Companies switched to git anyway.
Git is absolutely terrible. Let’s just get that out of the way.
But it ticks the two boxes
- powerful enough (to e.g do proper merging, which cvs and svn never could)
- tooling support, meaning it’s supported out of the box in bug trackers, build systems etc.
All others (hg, perforce, svn, cvs, pijul, fossil, ...) fail one or both of the above.
Now, I hope that one day git will be replaced by something nicer. But for the time being it’s what we’re stuck with.
My pet peeve: sequential revision numbers. Why not have an arbitrary numbering of commits?? Saying “I have bug X in rev 1234 but it’s not in 1230” is fantastically powerful compared to “I have the bug in a1b34h but not in 3ae452”. These would be a sequence for a particular centralized branch - typically mainserver/master.
Because this is the reality: it’s distributed version control but we almost all use centralized version control. This is also why it’s so odd that Git LFS took years to make it into git. Why would I want all past revisions of a binary (Yes, binaries must often be in version control, whether anyone thinks it’s a bad idea or not)?
That's why most people tag their builds with git describe. It can easily tell you the most recent tag in the current branch and how many commits away from it the build is, for instance v1.5-15-a55325.
> Git lacks native wiki and bug tracking. If you want these essential features, you have to install additional software such as GitLab, or else use a third-party service such as GitHub. And even then, the wiki and bug reports are centralized, not distributed.
Is a distributed bug reports etc even desirable? How would they work?
> In contrast, Fossil users only need to think about their working directory and the check-in they are working on. That is 60% less distraction. Every developer has a finite number of brain-cycles. Fossil requires fewer brain-cycles to operate, thus freeing up intellectual resources to focus on the software under development.
Not thinking about the actual remote head and/or the local copy of the remote head seems bad? I'm worried that Fossil is just obscure information, rather than presenting it.
Is it possible to `fossil rebase origin/master` without an internet connection?
// edit:
Reading more of the docs, what I quoted at the top is definitely, ah, misleading.
> When autosync is turned off, the changes you commit are only on your local repository. To share those changes with other repositories, do: fossil push URL
> When you pull in changes from others, they go into your repository, not into your checked-out local tree. To get the changes into your local tree, use update:
> Fossil stores content using an enduring file format in an SQLite database so that transactions are atomic even if interrupted by a power loss or system crash.
This is why I can't take any of this reasoning from SQLite seriously at all. The feeling I gather from this webpage is 'Hey, I made something I think is better than a popular thing, therefore every project I author from now on must use my thing and here are X number of refutable reasons why'
Git is needlessly complex. Absolutely. The solution we've stumbled upon is to use a good GUI (GitKraken). Every developer I've introduced it to was at first intensely skeptical "I always use the command line and it's fine", but their mind was blown once they actually started using the GUI. Our whole team got a significant boost in productivity once we enforced the use of GitKraken, and now I fire it up even for the simplest of commands.
I just attempted to try GitKraken, and while it looks quite good on paper, I find it perplexing (and a deal breaker) that it asks me to login (with Github or Gitkraken) before using the open source and free desktop client.
The only UI I've added to my toolchest in my 11 years of using git is [tig](https://jonas.github.io/tig/) and my default is to browse my repo with `tig --all` to see what everyone is up to.
The list of reasons not to use Git seems as smug as it is uninformed.
> With Git, it is very difficult to find the successors (decendents) of a check-in.
"git log --children" and "git log --reverse" are very difficult indeed.
> Fossil users only need to think about their working directory and the check-in they are working on. That is 60% less distraction.
Since Fossil is a distributed version control system, there is also remote state to keep in mind. I don't know Fossil, so I don't know the details, but simply pretending the remote state does not exist seems at least misleading.
> Setting up a website for a project to use Git requires a lot more software, and a lot more work, than setting up a similar site with an integrated package like Fossil.
Setting up GitLab with Omnibus takes about five minutes. It's not Git itself, but rather a third-party package, but why should I care? (And in general, I wouldn't set up anything at all – I'd just use GitHub or GitLab or Bitbucket for my open-source software.)
It's fine for different people to prefer different tools. Git can be annoying at times. However, it's a blessing that the open-source community is moving towards a standard everyone can work with, and Git is Good Enough to be that standard. Using an obscure alternative and justifying it with a list of downright wrong claims doesn't seem to create a welcoming atmosphere for new contributors.
(And don't get me wrong – SQLite is a great piece of software, and I'm thankful people put in their time to create it.)
I just learned that SQLite doesn't accept any patches, so the argument that Git is the industry standard doesn't matter to them, and they just use what they prefer.
2, 4, 5, and 6 seem pretty weak. Not a particularly interesting article. Mostly comes down to 'we like X and Y about Fossil, and it works for us, so we use it instead'.
2,3,5: very valid and the reason why we use Mercurial instead. Can't afford to hire a Git guru in every team. Instead we use a tool that's easier for everyone.
Once you start adding more than one developer to a Mercurial repo you need to know how Mercurial works, so I don't see how this is much different. In fact adding remotes to Mercurial was a much harder process last time I used it.
The point is, Mercurial is much easier learn. We are a team of 20 developers distributed across globe that transitioned from SVN and it was easy for us to pick up Mercurial and start using it within a couple of days.
We tried git too and lost our changes inevitably. Our git knowledge is to be blamed but blame git CLI too! It is not easy to learn at all. After 12 years of git, I see articles with tips and guides on using git. That kind of hints how esoteric git CLI can be.
I feel like most of the trouble developers have with Git is pretty shallow, and it's a shame. It's not a tool I would want to get non-programmers to use (which may be a problem if you've got design folks on your team as well), but the model should be pretty learnable (in principle) to anyone who can manage to hobble their way through an intro data structures course.
For folks like me, who spent their evenings in high-school screwing around with obscure linux distros and have been scouring badly written man pages trying to fix their computer for half their life, the little stuff is forgivable, and a lot of the big stuff is pretty good. But it's a shame how many inessential hurdles there are. See also:
Biggest piece of advice for not losing data: learn about git-reflog and git-reset before doing anything that modifies history. Nothing is destroyed, even by "scary" operations like rebase, so it's always possible to recover something, but you need to know how to find stuff.
I do some contract work for a university, where we have lots of student interns going through. We use GitHub, and we rarely rebase or cherry-pick, basically because while I could spend a bunch of time trying to teach git to new interns that will be gone in 4 months, It seems like a more efficient use of time to just occasionally put up with a messy history.
The only version management that i find intuitive is wiki model(i.e. wiki articles).
Imagine 3 tasks.
1.Revert 3 git commits. This is one-click revert in wiki.
2.Diff 2 arbitrary commits. One-click in wiki.
3.change 2 lines of code. Edit->Submit in wiki.
Why git has to be so arcane, especially
reverting to specific point in time?
The first code versioning control system that emulates the wikipedia UI model will win the market.
> The principle maintainer of SQLite cannot function effectively without being able to view the successors of a check-in. This one issue is sufficient reason to not use Git, in the view of the designer of SQLite.
He's basically looking for
git branch --contains $commit_or_branch
isn't he?
I do think it's a thoughtful write-up, hopefully git maintainers and github read it.
Found the following line from the article... "entertaining"
# begin quote
Every developer has a finite number of "brain-cycles". Fossil requires "fewer brain-cycles to operate", thus "freeing up intellectual resources" to focus on the software under development.
It is useful to have a 'lingua franca' of open-source version control, but I'm not sure git is necessarily the best choice for that (not that network-effects usually result in the 'best' being chosen)
Can someone explain how the problem of not being able to get successors causes issues in practice? I don't doubt it does, but I've never personally had a situation where inspecting the graph did not give the information I required in this regard.
The refactored page brings up good points. Git has undeniably improved the world of software development, and is de facto the version control system to use in the majority of cases today.
That said, that doesn't make the points SQLite makes invalid.
Speaking truth to power! I like it! I so scared of saying that I never understood git and was getting by using only the most basic of the commands. Now Sqlite has called out!
I am not qualified to judge whether Fossil is better than git and I can completely acknowledge that git has a step learning curve (although I feel that a big chunk of that learning curve is unlearning previous VCS experience).
But, now that I do know git the biggest change from I noticed from previous VCSes is how much I work on multiple issues in the same repo. Something that was extremely hard with CVS, SVN, P4 (10yrs ago).
A friend was struggling with git recently and ranting about it. He didn't get it and didn't understand why anyone would use it compared to what he was used to (non DVCS). I wrote him this analogy
> Imagine some one was working with a flat file system, no folders. They somehow have been able to get work done for years. You come along and say “You should switch to this new hierarchical file system. It has folders and allows you to organize better”. And they’re like “WTF would I need folders for? I’ve been working just fine for years with a flat file system. I just want to get shit done. I don’t want to have to learn these crazy commands like cd and mkdir and rmdir. I don’t want to have to remember what folder I’m in and make sure I run commands in the correct folder. As it is things are simple. I type “rm filename” it gets deleted. Now I type “rm foldername” and I get an error. I then have to go read a manual on how to delete folders. I find out I can type “rmdir foldername” but I still get an error the folder is not empty. It’s effing making me insane. Why I can’t just do it like I’ve always done!”. And so it is with git.
> One analogy with git is that a flat filesystem is 1 dimensional. A hierarchical file system is 2 dimensional. A filesystem with git is 3 dimensional. You switch in the 3rd dimension by changing branches with git checkout nameofbranch. If the branch does not exist yet (you want to create a new branch) then git checkout -b nameofnewbranch.
> Git’s branches are effectively that 3rd dimension. They set your folder (and all folders below) to the state of the stuff committed to that branch.
> What this enables is working on 5, 10, 20 things at once. Something I rarely did with cvs, svn, p4, or hg. Sure once in awhile I’d find some convoluted workflow to allow me to work on 2 things at once. Maybe they happened to be in totally unrelated parts of the code in which case it might not be too hard of I remembered to move the changed files for the other work before check in. Maybe I’d checkout the entire project in another folder so I'd have 2 or more copies of the project in separate folders on my hard drive. Or I’d backup all the files to another folder, checkout the latest, work on feature 2, check it back in, then copy my backedup folder back to my main work folder, and sync in the new changes or some other convoluted solution.
> In git all that goes away. Because I have git style lightweight branches it becomes trivial to work on lots of different things and switch between them instantly. It’s that feature that I’d argue is the big difference. Look at most people’s local git repos and you’ll find they have 5, 10, 20 branches. One branch to work on bug ABC, another to work on bug DEF, another to update to docs, another to implement feature XYZ, another working on a longer term feature GHI, another to refactor the renderer, another to test out an experimental idea, etc. All of these branches are local to them only and have no effect on remote repos like github (unless they want them to).
> If you’re used to not using git style lightweight branches and working on lots of things at once let me suggest it’s because all other VCSes suck in this area. You’ve been doing it so long that way you can’t even imagine it could be different. The same way in the hypothetical example above the guy with the flat filesystem can’t imagine why he’d ever need folders and is frustrated at having to remember what the current folder is, how to delete/rename a folder or how to move stuff between folders etc. All things he didn’t have to do with a flat system.
> A big problem here is the word branch. Coming from cvs, svn, p4, and even hg the word "branch" means something heavy, something used to mark a release or a version. You probably rarely used them. I know I did. That's not what branches are in git. Branches in git are a fundamental part of the git workflow. If you're not using branches often you're probably missing out on what makes git different.
> In other words, I expect you won’t get the point of git style branches. You’ve been living happily without them not knowing what you’re missing, content that you pretty much only ever work on one thing at a time or find convoluted workarounds in those rare cases you really have to. git removes all of that by making branching the normal thing to do and just like the person that’s used to a hierarchical file system could never go back to a flat file system, the person that’s used to git style branches and working on multiple things with ease would never go back to a VCS that’s only designed to work on one thing at a time which is pretty much all other systems. But, until you really get how freeing it is to be able to make lots of branches and work on multiple things you’ll keep doing it the old way and not realize what you’re missing. Which is basically way all anyone can really say is “stick it out and when you get it you’ll get it”.
> Note: I get that p4 has some features for working on multiple things. I also get that hg added some extensions to work more like git. For hg in particular though, while they added after the fact optional features to make it more like git go through pretty much any hg tutorial and it won't teach you that workflow. It's not the norm AFAICT where as in git it is the norm. That difference in base is what really set the two apart.
Sorry that was so long but my question for the Fossil guys would be "which workflow does Fossil encourage?" Lots of parallel development like git or like many other VCSes not so much parallel dev. Are branches light and easy like git or are they only meant for marking versions like the were in SVN, P4, CVS. Do branches even need to be related or can they be completely unrelated like gh-pages and the VCS won't complain that you're "off master" as hg does (did?)
> 1. Git data model only lets you see ancestor commits, not descendants. Maintainer needs to find descendents.
It's technically true about git's data model, but it's not too hard to reconstruct forward history based on all refs (branches for this purpose). (see "git rev-list").
They don't justify the use-case that demands this information be trivially available, and I've never personally needed it, so I find this a weak argument. I have needed to find what branches contained a given commit (either a regression or a bugfix), and that's available with "git branch --contains <SHA>"
> 2. The mental model of git is complex (working dir, index, local head, local copy of remote head, remote head)
Granted. It's especially always complicated trying to explain to beginners the distinction between local head, local copy of remote head, and remote head. Though it makes sense to me when working out technical implications of git's distributed data model, it's obvious that many users are overwhelmed. (I think the mental model is actually great, but I won't impose my opinion on others)
> 3. Fossil's branch history display is better than GitHub's, and git doesn't tell you if a branch has been merged
No mention of "git log --graph" (or any GUI alternative) which goes a long way of solving their complaint. I agree that GitHub's graphless history is frustrating, but that's a GitHub issue, not Git.
However, Git does "swallow" merged branches, as the merge commit may be a property of the receiver branch, not the original branch (depending on your development model). Git doesn't enforce something there, so I agree with the issue.
> 4. Git lacks essential wiki/bug-tracking. If you use 3rd-party tool for these, they're centralized
Granted, though I'm not quite convinced in how "essential" it is to have that tightly integrated in the version control system, and thus have it distributed, but that's probably an artifact of me being used to existing git-based systems.
A counterpoint to this is that Git (now) has a healthy ecosystem of alternatives for you to choose from. If you're not satisfied with Fossil's offering, how easy is it to change?
> 5. Git requires administrative support for the extra web tools that Fossil otherwise integrates
This follows from the previous point, so granted. I'd add to this that Git lacks a good access-control/code-review system, hence the rise of so many alternative portals (GitHub/Gitlab, Gerrit...)
> 6. No-one really understands git [w/ XKCD link]
Now you're just trolling.
Overall, I'm surprised at how many points I actually agree with the author(s), but the main difference is that git purposefully aims at a more limited feature-set that excludes project-management (ie wiki/bugtracking), which I guess is the consequence of its Torvalds/kernel-originated development history.
This is technically correct, the best kind, but it's not the whole story.
The default merge commit message does contain the branch names.
Unless it's a fast-forward, in which you maintain linear history, so there are no problems.
Git is Great! But Humans should never interact with it directly. Treat Git only as an API meant for applications to interact.
If your team uses a common application to write their code which manages Git, it will align everyones method of use.
It's funny watching an instructor training a group on Git. Everyone looks lost because there are too many scenarios being explained and how to handle them depending on what your intent might be. Before there is a solid grasp on the code management workflow the lecturing about commit comments begins, shifting clear over to the other side of the brain to guarantee none of it is collectively understood.
Presenting too many options to the collective is counter-productive.
>Every day humans make me again realize that I love my dogs, and respect my dogs, more than humans. There are exceptions but they are few and far between.
You love them more than humans because they're fundamentally incapable of calling you out on your fud?
Please don't post acerbic swipes to HN. That's definitely against the site guidelines: https://news.ycombinator.com/newsguidelines.html. If you'd (re-)read those and take the spirit of this site to heart, at least when posting to it, we'd appreciate it.
I don't want to jump in out of turn here, but as a HN reader, I would ask that you please consider adjusting your tone. The only reason HN is worth anything is because you find the most interesting people here -- and love him or hate him, Larry McVoy has certainly earned a permanent spot.
If you like git as much as you seem to like it, you should be thanking Larry, because my understanding is that it was his decision to play hardball with Linus that led to git's birth in the first place.
If you disagree with him, the beautiful thing about HN is that you can state it politely, and you get the chance to talk to, and maybe even getting a little feedback from, a successful entrepreneur who sold source code management software to SV powerhouses for many years. There aren't a lot of people with that perspective.
Be grateful that Larry is here and willing to share some of his knowledge and experience with you (that's what "I'm retired" means; he has no obligation to evangelize bitkeeper anymore, he is doing it for your information).
Do NOT scare this type of person away from HN or make it too annoying for them to contribute. HN is nothing without such people. Seriously.
So please put on your big boy pants and show some respect and professional decorum. This is a trade outlet, not reddit, and we try to maintain some respect and not to engage in snide attacks and petty accusations.
As a change of pace, I really enjoyed your comments on k8's in the thread you linked in your profile. It truly is a breath of fresh air to see some sanity around that topic, especially with how much favor such platforms have curried on HN.
>I would ask that you please consider adjusting your tone.
I would ask that you go reread the thread where he made a pretty disingenuous claim about my experience with SCM's.
>the beautiful thing about HN is that you can state it politely
Look at the timeline of his posts. I didn't start this. You're just replying to me and making it about tone because you disagree with my positions.
> and maybe even getting a little feedback from, a successful entrepreneur who sold source code management software to SV powerhouses for many years.
And then lost it to someone's side project they made in about a year. The history of SCM's, especially proprietary ones, is essentially a history of failure.
>he is doing it for your information
I'm going to reask you to reread this thread and truly ask yourself whether he's just hear to share information. He's clearly a little bit jaded by his loss. Allow me to share this choice quote from his first response to a pretty simple claim I made:
>You all lost out on "the most sane and powerful" as a result.
He's referencing BitKeeper. Which definitely isn't sane or powerful.
>Do NOT scare this type of person away from HN or make it too annoying for them to contribute.
I'm sorry, he actively is spreading fud so I don't give a shit who he is.
>So please put on your big boy pants and show some respect and professional decorum.
My tone is blunt. His is disingenuous. Guess which one of these is more harmful to the community.
>This is a trade outlet, not reddit, and we try to maintain some respect and not to engage in snide attacks and petty accusations.
Are you going to tell this cornerstone of HN the same thing? My guess is probably not.
> So, why can't we rebrand fossil as Blockchain-VCS or
something and move on toward world domination?
Because Fossil-- like Git-- doesn't solve, attempt to solve, or even advertise itself as solving the problem of decentralized consensus.
All existing blockchain technologies at least claim to be a solution to the problem of decentralized consensus.
Therefore Dr. Richard Hipp is only "technically" right in the ways that do not matter, in the same way that I'm "technically" doing functional programming any time I write a javascript function.
> Given some historical check-in, it is quite challenging in Git to find out what came next. It can be done, but it is sufficiently difficult and slow that nobody ever does it.
Lists all refs (branches, tags) that contain the passed argument.
I'd be curious to know the author's need for this. I've used something similar on GitHub to determine when a given commit, typically a bugfix, is released.
> There is no button in GitHub that shows the descendents of a check-in.
There is: go to the commit's URL (https://github.com/$ORG/$PROJECT/commit/$COMMIT) and below the subject & body of the commit, but above the author/commit time, there's a section that will have all branches, tags, and pull requests that the commit is a part of.
It's always a bit frustrating when working with a team because everyone understands a different part of git and has slightly different ideas of how things should be done. I still routinely have to explain to others what a rebase is and others have to routinely explain to me what a blob really is.
In a team of the most moderate size, teaching and learning git from each other is a regular task.
People say git is simple underneath, and if you just learn its internal model, you can ignore its complex default UI. I disagree. Even just learning its internal model leads to surprises all the time, like the blobs that I keep forgetting why aren't they just called files.