Hacker News new | past | comments | ask | show | jobs | submit | pmeunier's comments login

Many Rust projects have done cool new stuff. Alacritty is possibly the greatest terminal emulator on the market today. I'm not into cryptocurrencies, but Parity and ZCash are doing cool stuff in that space thanks to Rust.

I'm also the author of Pijul, a much simpler and more scalable (yes, both! why choose?) version control system, and of Sanakirja, an on-disk transactional allocator to write persistent datastructures (like B trees, ropes, radix trees, HNSW…).


This is indeed one of the issues of Pijul, but patch application is extremely fast, which makes it easy in the vast majority of cases (we don't apply patches to files directly, but to an on-purpose on-disk datastructure).

We also have a tag features, allowing you to pinpoint versions and go back instantly, and we have plans to make that feature even lighter on disk space.


> The only ones I know of are Pijul and Jujitsu which you mentioned. They're both quite new.

There are other Git frontends like Jujutsu: Gitless, StackedGit, GitButler, Sapling…

Even the idea of "SVN is the next Git" (which is the thing here) isn't quite new, PlasticSCM did it already.

Nothing like Pijul though, defining the actual problem carefully and rigorously, then actually solving it.

> Sort of... But actually, as soon as you go offline it's distributed.

Even online is distributed: Google Docs needs stuff from distributed computing such as OTs and CRDTs.


One of the motivations behind Pijul was to manage custom versions of Nixpkgs while still benefiting from upstream commits. One issue that's hard with Git is that when you also want to contribute multiple changes back, you have:

1. A branch pointing to the latest nixpkgs head.

2. A branch with commit A (let's say commit A introduces a new package to nixpkgs).

3. A branch with commit B (changing some config file).

4. A branch currently at in use for your own machines, with branches 2 and 3 rebased on top of branch 1.

Every time you do anything, you'll have to remember the flow for getting the commits fetched/rebased. Which is fine if you have a DevOps team doing exactly that, but isn't too cool if you are anything other than a large company.

In Pijul, you would have a single channel (branch sort-of equivalent) and two patches (A and B) instead, which you can push independently from each other at any time if you want to contribute them back.

Darcs does the same but wouldn't scale to Nixpkgs-sized repos.


Also, the conflict resolution is just another patch (Pijul patches aren't just regular diffs, they have a lot more information), so should you decide to merge it back upstream after all, you can also cherry-pick the conflict resolution along with the conflicting patch, and also without changing the hash.


IMHO there are different ways to design a version control system:

1. The SCSS/Git way, aka the hacker way: look at what we can do with existing stuff, and use that to build something that can do the job. For example if you're one of the world's expert on filesystem implementations, you can actually produce a fantastic tool like Git.

2. The mathematician's way: start by a model of what collaboration is, and expand from there. If your model is simple enough, you may have to use complex algorithms, but there is a hope that the UI will match the initial intuition even for non-technical users. Darcs did this, using the model that work produces diffs, and conflicts are when diffs can't be reordered. Unfortunately this is slow and not too scalable. Pijul does almost the same, but doesn't restrict itself to just functions, using also points in the computation, which makes it much faster (but way harder to implement and a bit less flexible, no free lunch).

3. The Hooli way: take an arbitrary existing VCS, say Git. Get one of your company's user interviewer, and try to please interviewees by tweaking the command names and arguments.

The tradeoff between 1 and 2 is that 1 is much more likely to produce a new usable and scalable system fast, but may result in leaky abstractions, bad merges and hacks everywhere, while 2 may have robust abstractions if the project goes to completion, but that may take years. OTOH, method 3 is the fastest and safest method, but may not produce anything new.

So, I am the main author of Pijul, and I also don't quite see how to do much better (I'm definitely working on improvements, but not technically radical). But the causal relationship isn't the one you may be thinking: it is because I thought this was the ultimate thing we could have that I started the project, not the other way around.


Yes indeed, we would not have started Pijul without the hope that at least theoretically, patch-based designs could be faster than snapshots. "Patch-based is slow" without any other argument is not a very informed claim, Pijul is actually faster than Git in some cases (in fact Pijul is faster where it matters most IMHO: large files and/or large repos, conflicts and blames). Not because we're better at tweaking C code (we're definitely not!), but because we designed our datastructures like theorists, and only then looked at how (and whether!) to implement things. One advantage we had over Linus is that we had no time pressure: we could well use Darcs, Git, or Mercurial to write Pijul initially (we used Darcs, actually), and it didn't matter much if we failed.

It took a little bit of work to get that down to actual fast code, for example I had to write my own key-value store, which wasn't a particularly pleasant experience, and I don't think any existing programming language could have helped, it would have required a full linear logic type system. But at least now that thing (Sanakirja) exists, is more generic, and modular than any storage library I know (I've used it to implement ropes, r trees, radix trees…), and its key-value store is faster than the fastest C equivalent (LMDB).

Could we do the same in Haskell or OCaml? As much as I like these two languages, I don't think I could have written Sanakirja in a garbage-collected language, mostly because Sanakirja is generic in its underlying storage layer: it could be mmap, a compressed file, an entire block device in Unix, an io_uring buffer ring, or something else. And the notion of ownership of the objects in the dictionary is absolutely crucial: Sanakirja allows you to fork a key-value store efficiently, so one question is, what should happen when your code drops a reference to an object from the kv store? what if you're deleting the last fork of a table containing that object? are these two the same thing? Having to explain these to a GC would have been hard I think.

I wouldn't have done it in C/C++ either, because it would have taken forever to debug (it already did take a long time), and even C++-style polymorphism (templates) isn't enough for the use of Sanakirja we have in Pijul.

remember the "poop" paper about mmap for databases, right? Well, guess what: having a generic key-value store implementation allowed me to benchmark their claims, and actually compare congestion, throughput, speed between mmap and io_uring. Conclusion: mmap rocks, actually.


Wow, thanks for the detailed and informative comments!

Did you write up the mmap results as a paper? Sounds like it would be quite useful. (Is this the paper you're referring to? https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf)


Not the same category of tools: Pijul and Fossil have radically different designs, whereas jj is a Git frontend, and Sapling a Mercurial fork.


In Pijul the head is a CRDT. Having used it for years to develop itself, I can definitely imagine!


Pijul handles binary files natively, using a number of mathematical tricks removing the need for extra layers (layers like LFS). If you're interested come talk to us!


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: