Hacker News new | past | comments | ask | show | jobs | submit login

git. The right storage backend is either git, or a git-like Merkle tree. This is small data. Individual files for objects are fine. Compacting is better, but not critical.

You can write git in a weekend or two: https://wyag.thb.lt/

sqlite is the wrong tool for the job, beyond an initial prototype. git and hg figured this stuff out decades ago.

Good job, by the way. This seems like the sort of thing which, if:

* it did store data in git and sync; and

* had a nice Python API

would change how I work.

The Python API, I could probably even handle myself. This is small data, and calls to shell commands are both more than adequate, and easy to script in Python.




You aren't technically wrong, I guess, but Fossil would disagree that SQLite is somehow wrong for this use-case.

Fossil does seems to have issues when the VCS grows into many many gigabytes of data, like the entire OpenBSD source tree didn't work out very well in Fossil, but the chances of a personal task manager ever growing that size is slim to none I would imagine.


I guess I'd be okay with a Merkle tree stored in SQLite, although looking over the Fossil documentation, it seems like the developers didn't get git, and what made the git data structure so clever.

I see pieces like this in the Fossil docs:

"Fossil stores its objects in a SQLite database file which provides ACID transactions" -- https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki

The whole point of git is that your only operations on the underlying data structure are:

1. writing an object under its hash

2. incrementing a pointer to the head of a branch, a git tag, or similar, to show which version is current

You can do (1) on an eventually consistent (or even never-really-consistent) backing store, without any sorts of strong guarantees of integrity, and the git data structure continue to guarantee integrity. If (2) goes out-of-sync, you're left in a state very similar to what happens if I'm working on my laptop, and you're working on yours. We need a merge.

With DVCS, the whole point is to make ACID irrelevant.

I hadn't heard of Fossil before, but reading over the docs, I see a lot of red flags like that.

If grit were backed bit literal git, I'll mention, you get a ton of stuff for free, mostly with regards to syncing and finishing tasks on distributed devices (e.g. my cell, my laptop, etc.)


Git and Fossil started about the same time, and initially released within a year of each other.

For the record, the people that created SQLite also created Fossil. The SQLite software(and website) was Fossil's initial use-case.

Fossil offers WAY more than what Git does, Fossil includes ticketing, forum, chat, wiki, etc. It's more on par with Gitlab/Github/etc, than it is 'git' alone.

Fossil syncs, etc, just like Git.

I think they serve different use-cases. git is nothing but the data structure(s), it's very very hard to use git, if you don't understand how the data is stored, as the UI is a very thin wrapper around it, which causes all sorts of issues for people that just want to get work done. That said, it makes Git very flexible, which is great.

Git was built for the Linux kernel way of doing things, and got co-opted into the PR model that Github pushed. I'm not saying these are wrong ways to think about the problem, but they are not the only way. Fossil is a different way.

Around ACID vs Git's data structure, the whole point is your data needs to kept safe, who cares how it's done.

I'm not trying to convert you to Fossil, but I think it's important to recognize git isn't perfect, or even perfect for it's use-case. But it clearly won the mindshare of developers because we just blindly followed Linus.

I'm of the opinion that for most projects, Mercurial, Subversion or Fossil would have been the better/easier solution for most people and while arguably not as flexible, the UI is at least 50% better and easier to understand.


If you want to keep your data safe, it matters how it's done. See MongoDB for the story of a system with a friendly UI and a train wreck back-end.

git didn't win mindshare of developers because people blindly followed Linus, anymore than relational SQL databases weren't inched out by OODBs, document databases, or all sorts of other technologies because of dumb developer entrenched mindset. git (like SQL) won because it has very, very solid theoretical underpinnings which make it work incredibly well across a surprisingly broad set of use cases and handle data robustly.

Subversion is good for virtually no one. It's like using Excel for a database. It's complex, but doesn't get the job done.

Fossil, on a cursory look, looks like nineties mysql, when it was easy to use, but didn't really work, and was kind of a cargo-cult implementation of a proper database. I will admit I could be wrong (perhaps, just lousy docs).

hg, I agree, is better than git. git is a beautiful backend with a horrible front-end. hg has an equally beautiful backend, with a clean front-end and decent libraries. I wish hg had won. It didn't.

Oh well. At the end of the day, though, for this use-case, back-end matters far more than front-end. Competent developers can and do learn to use git, and are equally productive as in hg once they get over the learning cliff.

An upside is you learn a lot by learning git (or hg) internals. If OP knew git, they would have made a better grit. It's something every developer should do. Once you're past that, the front-end, while far from clean, gets the job done.

svn doesn't get the job done.


You seem awfully confident of yourself, but I'd disagree with many of those perspectives. Just because you sound confident doesn't mean you are correct.

There is a reason there are 500,000 git tutorials that start with the basics of how git works, and still practically nobody understands git. Despite knowing the underpinnings of how it works practically required to be useful in git.

I agree the implementation matters from a data safety perspective, I mean Jepsen made a name for himself proving how terrible all the "distributed" databases suck(because it's a very hard problem to solve). But as a user, you don't usually have to care... unless it's git.

FreeBSD successfully used SVN for a long time without any major issues, as one example.


Very few git tutorials explain how git works. That's the underlying problem. Using git correctly requires knowing how it works. You solve that problem by having a reasonable userspace, like hg, or by having a proper course in college. Much like you learn databases, compilers, and operating systems, you should learn about hash trees and similar data structures.

I feel like the statement "FreeBSD successfully used SVN for a long time without any major issues" is a lot like saying "I wrote books on a typewriter without any major issues." Or "Why do I need pointers/references/data structures? I wrote my code in QuickBASIC without any major issues." Or "DOS doesn't have memory protection or multitasking, and I never had any major issues" or "I wrote my system without documentation or test infrastructure, and it's never been a problem."

At the time you switch to correctly using a DVCS, your productivity skyrockets.

If you're using SVN, you DO have a problem. You might not know you have a problem, but you do. If you're using git like svn, you also have a problem.

I agree with you that many programmers have a problem, though. Just not about the root cause or the solution.

Maybe the right solution is to write a Python implementation of git with a sane user-space. The original git was written in a couple weeks, and I don't think it'd be more than a few months of work to have feature parity.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: