Difftastic: A diff that understands syntax

vcmiraldo · on March 29, 2022

I really like the idea of focusing on producing patches for human consumption. I studied the problem of merging AST-level patches during my PhD (https://github.com/VictorCMiraldo/hdiff) and can confirm: not simple! :)

stavros · on March 29, 2022

Please tell me the final output of your PhD was a differtation.

vcmiraldo · on March 30, 2022

omg!! I really should have left that typo somewhere in there! What a missed opportunity! xD

bool3max · on March 29, 2022

Should've named that repo "phdiff".

Groxx · on March 29, 2022

I'll vote for "diphph"

wst_ · on March 29, 2022

It's tangential but it reminded me of "lighght" poem by Aram Saroyan. https://en.wikipedia.org/wiki/Aram_Saroyan#Minimalism_and_co...

pdimitar · on March 29, 2022

Best pun I've heard in a long time. Well done. <3

munk-a · on March 29, 2022

To be pronounced "Doctor-iff" in speech?

MonkeyClub · on March 30, 2022

"Doctor if and only if <this works>"

einpoklum · on March 29, 2022

So I looked at the paper and it seems interesting. Basic idea: Instead of the operations to consider being "insert", "delete" and "copy", one adds "reorder" "contract subtree" and "duplicate" (although I didn't quite get the subtlety of copy vs duplicate on a short skim); and even though extra ops increase the search space, they actually let you search more effectively. I can buy that argument.

The practical problem, though, is that the Haskell compiler is limited/buggy, so you couldn't implement this for C, and you settled on a small language like Lua. If you _do_ extend this to other languages (perhaps port your implementation from Haskell to something else?), please post it on HN and elsewhere!

arianvanp · on March 29, 2022

Some of the GHC performance bugs that we ran into during the research have been fixed as far as I know! Though I'd have to double-check

vcmiraldo · on March 30, 2022

Indeed, we also designed a brand new generics library to work around that. Performance was really not an issue! :)

vcmiraldo · on March 30, 2022

Copy just copies once. The need for duplicate is clear if you're trying to diff something like `t = [a]` and `u = [a, a]`. You could copy `a`, but you'd have to decide whether to copy it on the first or second position; the second one would be classified an "insertion" by any ins/del/cpy-algorithm. If you instead opt to NOT make that choice, you can say: pick the source `a` and duplicate it instead

narush · on March 29, 2022

Can you give a little color on where the difficulties lie? Is it an efficiency question, or is determining "which changes" hard in the first place?

vanderZwan · on March 29, 2022

Early in the linked thesis there is a one-page argument about the shortcomings of traditional approaches, which technically isn't what you asked but might still answer the side of the question that deals with human usage at least:

https://victorcmiraldo.github.io/data/MiraldoPhD.pdf#page=24

scythmic_waves · on March 29, 2022

Not OP, but the docs call out some "Tricky Cases" [1].

[1] https://difftastic.wilfred.me.uk/tricky_cases.html

teeray · on March 29, 2022

I’d imagine there’s some challenging judgement calls that such a tool would have to make. Like, in Go, you can reorder the members of a struct definition. In many cases this is just diff noise to reviewers. HOWEVER, it does impact the layout of the struct in memory, so it can be semantically meaningful in performance work.

gmfawcett · on March 29, 2022

A wild nitpicker appears. I understand where you're coming from & why this matters. But Go, the language spec, doesn't make any guarantees about struct layout at all. A layout difference may be meaningful, practically, but it's potentially unreliable.

e.g. see https://groups.google.com/g/golang-nuts/c/1BlZDNBLiAM

Having said that: if a Go compiler for a given architecture decided to change its layout algorithm, I'm pretty sure it would earn a changelog entry.

munk-a · on March 29, 2022

PHP long stated that associative array sorting order was unstable and not guaranteed (especially when the union (+) operator or array_merge function were involved) - that doesn't mean ten bazillion websites wouldn't instantly break if they ever actually changed the ordering to be unpredictable.

Language designers need to contend with the fact that the ultimate final say in whether a thing is or not is whether that behavior is observed.

giraffe_lady · on March 30, 2022

Didn't ruby actually do exactly this though? And it broke a million websites and they changed it back in the next version and have made it explicit ever since? To me that is much stronger evidence than what we think would happen if php did it.

vanderZwan · on March 30, 2022

I don't know about Ruby, but one example I can think of where a language made the instability explicit is that early on in the language Go changed the behavior of the select statement:

> If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection.

https://go.dev/ref/spec#Select_statements

In an early implementation it would pick in lexical order, IIRC (and the specification did not mention how a communication should be picked). Not only could this lead to bugs, apparently some people were relying on it and they didn't want that.

zukzuk · on March 29, 2022

I wrote a masters thesis about the more general problem here (https://tspace.library.utoronto.ca/bitstream/1807/65616/11/Z...).

The tl;dr is that there's an almost infinite number of ways to atomize/conceptualize code into meaningful "units" (to "register" it, in my supervisor's words), and the most appropriate way to do that is largely perspectival — it depends on what you care about after the fact, and there is no single maximal way to do it up front.

ummonk · on March 29, 2022

I mean to have an improvement over the status quo we need to simply find a conception that works better than lines as units of code. Let’s not let perfect be the enemy of the good.

giraffe_lady · on March 30, 2022

love to tell someone who literally wrote a masters thesis on a topic what we need to "simply do" to solve it lol. I almost want to admire the confidence but

noduerme · on March 30, 2022

>>I’d imagine there’s some challenging judgement calls that such a tool would have to make

Just thinking about it makes my head spin. I spend a lot of time working out font/color hierarchies, supplementary to coding and data viz. Arguably what you're bringing up is a case for a carefully colored diff that visually cues whether something is a true semantic change or indicative of a lower level issue. I'm comfortable with reading a plain ol' diff that just shows me what changed, superficially, and interpreting it. While I think OP's idea is awesome, it also might create more confusion than it resolves; and resolving confusion is the point of a diff.

vcmiraldo · on March 30, 2022

Efficiency is not the issue at this point. My prototype diffing algorithm was linear and there have been improvements on it already (I think something called "truediff" is linear but an order of magnitude better! I could be misremembering the name, don't quote me :) ).

The real difficult part is in how you represent AST-level changes, which will limit what your merging algorithm can do. In particular, working around moving "the same subtree" into different places is difficult. Imagine the following conflict:

([1,3], [4,2,5]) <-- q -- ([1,2,3], [4,5]) -- p --> ([1,3], [2,4,5])

Both p and q move the same thing to different places so they need a human to make a decision about what's the correct merge. Depending on your choice of "what is a change", even detecting this type of conflict will be difficult. And that's because we didn't add insertions nor deletions. Because now, say p was:

([1,2,3], [4,5]) -- p --> ([1,3], [2,5])

One could argue that we can now merge, because '4' was also deleted hence the position in which we insert '2' in the second list is irrelevant.

If we extrapolate from lists of integers to arbitrary ASTs the difficulties become even worse :)

infogulch · on March 31, 2022

How does your work relate to tree-sitter, which also manages patches which it describes as "incremental parsing" as well as error states.

emacsen · on March 29, 2022

This looks absolutely amazing.

One thing I do find interesting (and a wish were different) is that only programming languages are supported, rather than data formats as well.

For example, two JSON documents may be valid but formatted slightly differently, or a common task for me is comparing two YAML files.

Comparing config files that have a well defined syntax and or can be abstracted into a tree (JSON, YAML, TOML, etc.) would be absolutely lovely, even and including (if possible) Markdown and its ilk.

Wilfred · on March 29, 2022

JSON and CSS are supported today, and I'm interested in adding more structured text formats.

If a format has a tree-sitter parser, it can be added to difftastic. The TOML tree-sitter parser looks good, but there isn't a mature markdown parser for tree-sitter. There are other markdown parsers available, so in principle difftastic could support markdown that way.

The display logic might need a little tuning for prose-heavy formats like markdown though. I'm not happy with how difftastic handles block comments yet either.

I'm not sure about formats that contain more prose, such as markdown or HTML.

zmix · on March 29, 2022

I think supporting XML would be something, a lot of people would appreciate. That XML is difficult to diff comes up again and again... However, one would need to decide, whether one wants to compare by syntax or by meaning. Latter one may be preferrable, but would require the XML to be canonicalized on both sides, first.

simonw · on March 29, 2022

I would naively expect that this problem is easiest to solve for languages like JSON that have an unambiguous way to be pretty printed.

chockchocschoir · on March 29, 2022

Indeed. One could just do `diff $(jq . $fileOne) $(jq . $fileTwo)` and you'll end up with a "nice enough" diff even if $fileOne and $fileTwo were very differently formatted.

lstamour · on March 29, 2022

The problem is when a file also needs to be normalized - e.g. object keys in a different order, YAML syntax expansion. It can be very useful to indicate when a JSON file is identical to another JSON file but some of the properties or array items are out of order and that requires more in-depth knowledge of the data format. Let's not mention that you could UTF-8 encode characters or write out the same character using backslash notation, numeric or boolean data that might be wrapped in a string in one file but not in another, etc. There can still be a lot of modelling and interpretation to consider when comparing data files rather than code files.

autarch · on March 29, 2022

I wrote a tool that tidies JSON and can do things like re-orders keys in a fixed order - https://github.com/ActiveState/json-ordered-tidy

chockchocschoir · on March 29, 2022

I'm not too familiar with YAML, so can't answer to that.

But re JSON:

> object keys in a different order

They can't be "in a different order" as JSON keys are not ordered. They can be whatever order, and would still be considered the same.

> array items are out of order

Then it's different, as JSON arrays are ordered. ["a", "b"] is not the same as ["b", "a"] while {a: 1, b: 1} and {b: 1, a: 1} is the same.

> you could UTF-8 encode characters or write out the same character using backslash notation, numeric or boolean data that might be wrapped in a string in one file but not in another

Then again, they are different. If the data inside is different, it's different.

I understand that logically, they are the same, but not syntax-wise, which is why I included the "differently formatted" "disclaimer", it wouldn't obviously understand that "one" and "1" is the same, but then again, should you? Depends on use case I'd say, hard to generalize.

stormbrew · on March 29, 2022

> They can't be "in a different order" as JSON keys are not ordered. They can be whatever order, and would still be considered the same.

This is what GP is saying, I'm pretty sure. Object member order is non-semantic in json, so in order to do a semantic diff (one that understands structure), you need to canonicalize the order of the two sides. Simply diffing the output of jq doesn't do that, because (afaik) jq doesn't alter the order.

Basically, if you want this to come up the same:

    {"a":"b","c":"d"}
    {"c":"d","a":"b"}

you need more than just `diff $(jq) $(jq)`.

Can argue about whether a tool like difftastic should do that, I guess, but I would personally lean towards that it should be smart enough to see this because it's precisely the sort of thing that both humans and line-based diff can be awful at seeing.

fwip · on March 29, 2022

Just an FYI, jq has a flag to sort by the name of keys, I believe it's -k.

stormbrew · on March 29, 2022

Fair enough! I should just never assume jq doesn't have a feature.

ninkendo · on March 30, 2022

Nitpick: diff takes filenames as arguments, so comparing the output of two commands would need the `<()` expansion. So the command would be `diff <(jq . $fileOne) <(jq . $fileTwo)`

Wilfred · on March 29, 2022

https://github.com/andreyvit/json-diff works really well for JSON diffing in my experience.

It's more simplistic than difftastic though: it considers `1` and `[1]` to have nothing in common.

mark_and_sweep · on March 29, 2022

JSON is supported.

HTML and XML are missing, too.

emacsen · on March 29, 2022

You're right. I missed JSON.

Sadly YAML, TOML and the others I mentioned are not there (yet?)

softwarebeware · on March 29, 2022

There’s always room for contributions!

d0gsg0w00f · on March 29, 2022

This is kind of like the problem of programmatically analyzing AWS IAM roles and policies to understand impact of changes. Very difficult to do in JSON format but worth tons of money to CISOs if it can be solved.

alxmrs · on March 29, 2022

Similarly, I would love it if Pandoc’s AST were supported. Or, if this could be extended to compare any documents taking formatting into account, or document-to-document conversions.

paxys · on March 29, 2022

This isn't going to add anything to existing diff tools for JSON or YAML though. Those formats barely have any syntax highlighting or complex structures.

linsomniac · on March 29, 2022

I would love a great XML diff tool, and after seeing the demo of this I was sad to see XML not in there. Would pay for.

tomatowurst · on March 30, 2022

same, I don't know how many times I do a diff and wish there was a smarter solution that could take account formatting and whitespaces. This is it. Wish git diff would incorporate this, would be a real treat.

dools · on March 30, 2022

Funny side note: I had a flat mate once who was on a working holiday from Japan.

He was in love with and endlessly curious about English slang, it’s basically all we talked about.

I remember explaining to him why my uni friends and I referred to things as being “craptastic”, starting with American marketing’s love affair with the portmanteau.

He got it pretty quickly and enjoyed using it in conversation.

The saying that was harder for him to understand was “fuck all”. He always wanted fuck to be the verb, rather than using “fuck all” as the adjective, so he would say things like “I fuck all my money last night at the pub”.

benreesman · on March 30, 2022

I know native English-speakers who would say that s/pub/bar/.

Profanity is just delightful in general, and non-native English speakers come up with some of the best profane idioms in English.

I wonder if it’s the same in other languages?

roeles · on March 30, 2022

Perhaps this book would have helped.

https://www.amazon.com/gp/aw/d/486256139X

db48x · on March 29, 2022

This is written by the same guy who wrote Helpful, an enhancement package for the Emacs Help buffer. I highly recommend checking out Helpful if you haven’t seen it. https://github.com/Wilfred/helpful

CodeIsTheEnd · on March 29, 2022

EDIT: Wilfred IS the original author [3]; my apologies.

Not to discredit Wilfred (it looks like he's taken over the project as the maintainer), but, based on the historical contributions [1], it looks like it was originally developed by Max Brunsfeld, who also created Tree-sitter. [2]

[1]: https://github.com/Wilfred/difftastic/graphs/contributors

[2]: https://github.com/tree-sitter/tree-sitter

[3]: https://github.com/Wilfred/difftastic/commit/958033924a2dea7...

arxanas · on March 29, 2022

I think the contributor graph is misleading, and that he's using git-subtree to vendor tree-sitter, which makes it look like others have contributed more to the project.

CodeIsTheEnd · on March 29, 2022

Oops, I think you're right! Thank you for pointing that out.

My apologies to Wilfred.

maw · on March 29, 2022

He wrote https://github.com/Wilfred/deadgrep too. It's awesome and I don't know how I lived without it for so long.

disgruntledphd2 · on March 29, 2022

Helfpul is (pun fully intended) so very, very helpful.

Honestly, I cannot imagine going back to the standard emacs help.

db48x · on March 29, 2022

Agreed. It’s so good it feels like it should have been that way all along. For example, when you view the help for a function Emacs has always given you a link to the source code where that function is defined. Helpful shows you the source code right in the Help buffer, and shows you a list of callers, and gives you buttons that enable tracing or debugging for the function.

Once I discovered Helpful, all of those things seemed so obviously useful that I can’t understand why nobody else thought to put them there, including myself.

disgruntledphd2 · on March 29, 2022

The best part is the forget function, for when functions are incompatible. As an example, lsp won't work for me unless I forget the project-root function from ess-r (I have no idea why this hasn't been fixed) and helpful makes this a two or three key activity.

buu700 · on March 29, 2022

For everyone wondering, it looks like this will work with git diff: https://difftastic.wilfred.me.uk/git.html.

Starcrunch · on March 29, 2022

Exactly what I was looking for. Thanks!

pvg · on March 29, 2022

A previous discussion from 8 months ago, with some comments by the author and authors of other diff tools:

https://news.ycombinator.com/item?id=27768861

loxias · on March 29, 2022

This looks really cool and I can't wait to try it, tho... a bit of a PITA to get running. ;) Took a while to figure out how to build, and had to install 400MB of dependencies first....

Edit: And after installing cargo, watching it fail to build, then determining I must need a newer version of cargo, so I built that from source... it fails. Apparently I need to install `rustc-mozilla` and not `rustc`. "obviously".

This is all a testament to how much I want to try this tool...

MOAR EDIT: even with rustc-mozilla cargo fails to build. running `cargo install difftastic` gives me an error about my version of cargo being too old ;.;

Dear author: Let us run your tool.

gkfasdfasdf · on March 29, 2022

Using ubuntu 20.04, I first installed cargo:

  curl https://sh.rustup.rs -sSf | sh

Restart shell to get $HOME/.cargo/bin in PATH, then did:

  cargo install difftastic

And ~4 minutes later, difft executable is ready.

Agree though that some pre-built binaries would be fantastic!

loxias · on March 29, 2022

Ah, well, if you're willing to accept having a frankensystem with a mix of packaged and unpackaged software, sure. ;) I used to do that, back in Slackware days.

It's considered really sloppy and unmaintainable to admin a system like that. Things quickly get out of hand.

That strategy _does_ work if you isolate it to a chroot or a container, but littering /usr/local with all sorts of locally compiled upstream is just asking for future pain. Security updates, library incompatibilities, &c.

Prebuilt binaries might be nice, but I don't expect them for random projects. (and I wouldn't have used them if offered) I do think it's a reasonable expectation to be able to build software w/o essentially setting up a new userland just for that tool though. :)

gkfasdfasdf · on March 29, 2022

The method I posted above doesn't write anything to /usr/local. Root isn't required. Everything is written under ~.

loxias · on March 29, 2022

Whoa really?

I'm sorry, and retract my ignorant assumption! Going to try it out now.

Wilfred · on March 29, 2022

There are a few packages available, e.g. https://aur.archlinux.org/packages/difftastic and https://pkgsrc.se/wip/difftastic.

I've also had requests from Alpine Linux packagers to allow dynamic linking to parsers. This is something I want to support in future, once I'm happy with the basic diffing logic.

jeremyjh · on March 29, 2022

I agree it leads to problems but isn't the entire purpose of `/usr/local` to be a dumping ground for locally administered (unpackaged) programs?

vlunkr · on March 29, 2022

A huge part of the appeal of Rust and Go tools is that you can just ship a binary, it's frustrating that it's not available here.

easrng · on March 30, 2022

Not sure about Go, but Rust still links against glibc, so I sometimes have to recompile things to make them work on my Debian systems if they're built against newer glibc.

pinsl · on April 2, 2022

Rust can statically link against musl.

https://doc.rust-lang.org/reference/linkage.html#static-and-...

ducktective · on March 29, 2022

Same here. Looked into repo -> no binary in release or Github actions

spinned up a Ubuntu 18.04 instance -> git clone, git checkout 0.24.0

installed rust using curl | sh method

build fails:

https://termbin.com/29xy

removed the instance and gonna check it again 6 months later

adwn · on March 29, 2022

In another comment you're asking about vim support. So let me get this straight: You're using vim, yet you're unable to resolve the error message

    = note: /usr/bin/ld: cannot find Scrt1.o: No such file or directory
            /usr/bin/ld: cannot find crti.o: No such file or directory

Have you tried googling for "ubuntu crti.o: No such file or directory" ?

joemi · on March 29, 2022

Using vim has nothing to do with ones ability to troubleshoot compiler/ubuntu issues. Plus both compiler and ubuntu issues can be massive PITA to solve even if you're familiar with them. Personally, if I'm trying to install something on whim to try it out and I start getting "no such file or directory" errors I'd be upset that something is going wrong.

ducktective · on March 29, 2022

>Have you tried googling for "ubuntu crti.o: No such file or directory" ?

Depending on the project, there is a certain threshold of trying-to-make-something-work which I'm willing to undertake in order to test an app.

But you are right. I'm sorry if my OG comment may come arrogant to the devs who do stuff for free. (♥ to the devs)

[edit]: ok, I tried again, `sudo apt update && sudo apt install build-essential` before installing rust and `cargo install`ing.

Error again:

https://dpaste.com/FTG7FSRQF

jbrr25 · on March 30, 2022

The GCC version in Ubuntu 18.04 is too old. I had the same problem, I just installed clang, updated the default c++ and it worked. There is an issue in the repo about that.

estebank · on March 29, 2022

Funnily enough, the error is in a C dependency providing Haskell support.

    vendor/tree-sitter-haskell-src/scanner.cc

_pvxk · on March 30, 2022

If you have nix (package manager) installed, it takes like half a second. For tools I want to install through nixpkgs I make a starter like this:

    $ cat /usr/local/bin/difftastic
    #!/bin/sh
    source $HOME/.nix-profile/etc/profile.d/nix.sh
    nix run nixpkgs.difftastic -c difftastic "$@"

and then it'll install on first run:

    $ difftastic
    these paths will be fetched (1.17 MiB download, 9.38 MiB unpacked):
      /nix/store/wn74xn0w60xcwsly6nqaibn205hh2qms-difftastic-0.8
    copying path '/nix/store/wn74xn0w60xcwsly6nqaibn205hh2qms-difftastic-0.8' from 'https://cache.nixos.org'...
    Difftastic 0.8.0
    Wilfred Hughes
    A syntax aware diff.
    
    USAGE:
    [etc.]

YetAnotherNick · on March 29, 2022

Used `cargo install difftastic`? Finished in a minute for me.

lopatin · on March 29, 2022

Build errors for me. Apparently I'm on some nightly build of cargo, but I need 2021 version. The pain begins...

Edit: Reinstalling Cargo worked!

skywal_l · on March 29, 2022

With rustup, it's pretty easy to update/change your cargo version.

loxias · on March 29, 2022

How did you do it? When I tried to rebuild cargo I got build errors. I'm starting to suspect the only way to run this tool is make a chroot tracking sid or something....

lopatin · on March 29, 2022

I just followed the installation instructions here: https://doc.rust-lang.org/cargo/getting-started/installation...

It'll confirm that you want to install it, because it's already installed I think, and I just selected 1. for Yes.

loxias · on March 29, 2022

> curl https://sh.rustup.rs -sSf | sh

hard pass :)

adwn · on March 29, 2022

> hard pass

Why? You're willing to run some random open source project, but you're not willing to run the official Rust installation script?

chlorion · on March 30, 2022

I feel the same way, I am just not willing to pipe curl into a shell blindly.

Even if this specific instance of curl'ing into sh is safe, or if I download and then run it, it's still extremely poor practice and gives me serious doubts about the developers and their security practices in general.

I also do not like when every project decides to poorly reimplement the package manager. If every software used it's own package manager my system would be a complete mess with dozens of different package managers fighting each other and it would be a total nightmare to update the system or manage non-trivial dependency chains when installing something new.

Rust is one of my favorite languages but this is definitely my least favorite aspect of it all. It really feels like the developers "optimized" for systems with no package manager.

a_passable_dev · on March 29, 2022

Out of curiosity, what would be an acceptable way for the developers to provide a quick way for users to get up and running?

A get started guide with all the required commands easily copy-pastable? (A popular option these days) Something else?

I don’t mean to be critical, I’m simply curious.

DangitBobby · on March 29, 2022

You could always download it first and eyeball it before running it.

loxias · on March 29, 2022

Sure, but first I had to figure out wtf "cargo" is. :P

Also, `cargo install difftastic` AIUI pulls it from a central location, if I'm gonna poke at software for the first time, I enjoy building it myself first, so I can get my hands dirty in the source. :)

EDIT: Also, the build fails. :(

"error: unexpected token: `include_str` --> /home/loxias/.cargo/registry/src/github.com-1ecc6299db9ec823/radix-heap-0.4.2/src/lib.rs:2:10 | 2 | #![doc = include_str!("../README.md")] | ^^^^^^^^^^^

error: aborting due to previous error

error: could not compile `radix-heap`.

sad trombone

Wilfred · on March 29, 2022

This looks like you're using a version of Rust older than the minimum required (1.56).

Wilfred · on March 29, 2022

The getting started section of the manual should help: https://difftastic.wilfred.me.uk/getting_started.html

I've documented the minimum rust version required today, although I'm looking at lowering the minimum version.

estebank · on March 30, 2022

Honest question: how did you arrive to the conclusion you needed rustc-mozilla? I would love to make sure whatever flow led you to that is made clearer for other newcomers, because that is definitely not something anyone that isn't working on Firefox should even try.

Wilfred · on March 30, 2022

I imagine it's a misunderstanding with the rustc-hash dependency used in difftastic for faster hashing.

yboris · on March 29, 2022

My favorite dev tool is diff2html - a CLI that opens up your browser with a rich diff. Pro tip: alias `diff` to the command so you can launch it quickly ;)

https://diff2html.xyz/

pabs3 · on March 29, 2022

A related thing is cregit, which does diffs of tokens:

https://github.com/cregit/cregit https://lwn.net/Articles/698425/

Wilfred · on March 29, 2022

Ooh, I'd not seen this and I've seen a bunch of diff tools at this point! Thanks for sharing.

_jfib · on March 29, 2022

I would love it if version control stored an AST that also includes comments and dividers (where right now we would leave an empty line) and dev machines rendered it out however they wanted. They could even change the language of keywords in addition to normal formatting.

glenjamin · on March 29, 2022

To do this requires some standard way of encoding an AST which includes comments and dividers.

That standard format is commonly known as source code - although it lacks a normal form.

Tools like prettier, gofmt and black can be thought of as a way to produce a normal form of source code.

This is (IMO) a reasonable incremental approach towards exactly what you describe - if a project checks in only source code that's formatted using a standardised format, then you're free to work on it using whatever equivalent representation you like - as long as it's converted back at commit-time.

Wilfred · on March 29, 2022

FWIW VCS for Smalltalk basically does this.

The challenge for a tool like difftastic is that I can't guarantee that syntax is well-formed. You might be using new syntax that my parser doesn't support, you might have merge conflicts, or you might have a plain syntax error in your code.

Tree-sitter handles parse errors gracefully, so difftastic handles syntax errors pretty well in my experience.

tluyben2 · on March 29, 2022

Yep, I posted this idea on Reddit recently and people said they need a formatted syntax because of diff and version control; we do not; get the ast, reformat in the editor as the particular user fancies and generate diff and version control artefacts also as a particular user sees fit. Our computers are very fast so you can make a lot more different views on your code than we have now by using the ast instead of text and regexps.

pie_flavor · on March 29, 2022

This exact project is called JetBrains MPS.

adolph · on March 29, 2022

MPS seems to be a DSL authoring tool. How would this be used to make an AST diff tool?

https://www.jetbrains.com/mps/

https://en.wikipedia.org/wiki/Abstract_syntax_tree

pie_flavor · on March 30, 2022

Well, the comment I was responding to was about storing the AST, not diffing it. If that's what you meant the one follows naturally from the other. Once the file format is the AST instead of its visual representation, it makes sense to implement lots of operations as DSL extensions instead of library features, because the language is the library in a sense. MPS is marketed as a DSL tool but what really is is a projectional language tool.

taspeotis · on March 29, 2022

I paid and used SemanticMerge quite successfully when we had a complex Git workflow with lots of conflicts.

https://semanticmerge.com/

Since moving to short lived feature branches it is less useful to me.

Liquid_Fire · on March 29, 2022

SemanticMerge sounded interesting enough so I wanted to check it out, but to my surprise there is no Buy or Download link anywhere on the site. The only thing that might do it is a Login link, but I don't want to create an account just to see how much the thing costs. Is it only sold in bulk to companies? I find it bizarre that there isn't even a "contact sales" button.

ziml77 · on March 29, 2022

That's incredibly annoying! They must have changed something about their pricing and sales model since the time that I had purchased it. I don't understand why companies think that's a good idea. I guess I can't recommend it anymore.

Aeolun · on March 29, 2022

There is a 'sales' button at the bottom, but it's just a link to an email. I'm really not sure how they're even trying to sell this thing.

Maybe they don't want to any more? And this is just their subtle way of pushing everyone interested in using it away?

ziml77 · on March 29, 2022

I don't need SemanticMerge often, but when I do I'm incredibly thankful that I have it.

goombacloud · on March 29, 2022

For easy git usage I created these two scripts in my PATH instead of using using git config:

git-difft:

  #!/bin/sh
  GIT_EXTERNAL_DIFF=difft git diff "$@"

git-showt:

  #!/bin/sh
  GIT_EXTERNAL_DIFF=difft git show --ext-diff "$@"

Then you can run "git difft …" or "git showt …" if you want to use it.

matheusmoreira · on March 30, 2022

This is really nice, thank you!

Pet_Ant · on March 29, 2022

I was interested in SemanticMerge/XMerge but when I looked they didn't have a Mac clinet and now it looks like they don't have a personal edition. I just want to buy a private license and use it locally. https://semanticmerge.com

password4321 · on March 29, 2022

They are requesting feedback on the pricing model for the latest revision of the technology, maybe HN could change their minds:

https://www.gmaster.io/pricing

Pet_Ant · on March 29, 2022

OS X and Linux are "wait & see" again. That describes half of our dev team and most of the seniors.

cyberge99 · on March 29, 2022

Nice tool! I've used icdiff for this in the terminal, but I'll see how this performs in my workflow.

Since I use VSCode as my editor, I created this oneliner in my .bash_profile:

# VS Code Diff

diffcb () { "/usr/local/bin/code" -n --diff $@ > /dev/null 2>&1 ; }

With it, I can "diffcb filename1.json filename2.json" to get a visual editor with contextual awareness based on installed lint modules.

aasasd · on March 29, 2022

Personally I long for a syntactic merge-tool. Every time Syncthing hiccups for some reason, I'm up for a merge session with my Org-mode files, in the vein of: ‘These properties look just like those ones, only with a different timestamp... Oh lookie, and the heading is totally changed. Let me merge this new heading all over the old one, and then pop in the old one after it.’ Dammit, it's just a whole new heading added with properties. This happens with every language heavy on markup.

However, I'm not sure if Org markup lends itself to structuring that would allow proper diffing—even with just the headings.

teknopaul · on March 29, 2022

Be good to have different git merge strategies per file type.

e.g. A merge that knows properties files support the same property added in different places but only once is needed. And another strategy if order is significant.

Cool to have an HTML merge that recognises the tree structure and supports merging tags and having the indentation follow some rules.

I believe git supports merge strategies, its been on my todo list forever.

29athrowaway · on March 29, 2022

Today in generation Z rediscovers things: semantic patching.

https://en.wikipedia.org/wiki/Coccinelle_(software)

foreigner · on March 29, 2022

I LOL'ed at the first page of the manual: "When it works, it's fantastic."

Aicy · on March 29, 2022

Looks really cool, but there was no instructions on how to install it.

I would recommend putting an installation guide in your readme, and it being a full installation guide.

I followed the link to your manual and then it told me to install your tool using a tool called "cargo" with no reference on how to install cargo. At this point I gave up. Lazy, maybe, but for a convenience tool like this I want a convenient installation.

conradludgate · on March 29, 2022

Cargo is Rust's build tool/package manager and can be installed easily using rustup. But I would probably suggest the difftastic maintainers add some prebuilt binaries to the releases

(I have an example workflow here if anyone from there is interested https://github.com/conradludgate/wordle/blob/main/.github/wo...)

jwilk · on March 29, 2022

What's rustup and how do I install it?

asicsp · on March 29, 2022

See https://rustup.rs/

loxias · on March 29, 2022

I think it's wonderful that there's an explosion of new exciting languages, it can only improve the quality of all our tools. I for one am looking forward to replacing my eons of MATLAB experience with Julia.

But I wish there was more of a convention in the F/OSS community that if your software isn't written in something universal (C, C++, shell and maybe python), then it also comes with a container of all that's necessary to run it.

It's frustrating to pollute my nicely packaged managed system with hundreds of locally installed python modules just to run one tool. Or, in this case, backport and rebuild a language specific build tool simply to compile. :)

simonw · on March 29, 2022

Have you used pipx? I really like it for installing Python tools because it automatically creates a virtual environment for them so that their dependencies don't affect anything else.

https://pypa.github.io/pipx/

andai · on March 29, 2022

>shell

>universal

* laughs in Windows, then cries *

loxias · on March 29, 2022

I used to straddle the two worlds, maintained and supported a multi-site AD domain with AFS integration for user $HOME and some sort of unholy LDAP/kerberos bridge for login. About once every year or two I'll miss something about the way Windows does things, compared to normal (meaning "linux"). Like the NTFS permissions model, that's cool.

But it's just once a year :) And the last time I was deep in windows was win7, whenever that was. I tried to use a win10 machine and gave up.

Besides, I thought the big new feature in modern windows was that WSL improved to the point you can run unix tools! ;)

chungy · on March 29, 2022

> About once every year or two I'll miss something about the way Windows does things, compared to normal (meaning "linux"). Like the NTFS permissions model, that's cool.

FreeBSD would be up your alley. Its native ACLs are NFSv4 format, a superset of NTFS ACLs. You need to enable it explicitly on UFS2, but it's default on ZFS.

Spivak · on March 29, 2022

> and some sort of unholy LDAP/kerberos bridge for login

It's really not that bad, the AD-IPA cross-forest trust is really solid as is the native sssd-ad integration if IPA is too much. Honestly I can't really imagine it any other way now, so much work has been put into AD support that it's actually the best login experience on Linux at the moment. OpenLDAP is definitely showing its age -- dgmr I use it for all my personal infra because it's free and my use-cases are dead simple but we got to delete so much bespoke code after migrating off it at work.

loxias · on March 29, 2022

> AD-IPA

I'm not sure, and you undoubtedly know more and are more up to date than I, but I don't believe any of these things existed in 2005, when I was on the aforementioned team. Or, maybe they did exist but management decided an internal implementation was better.

Getting Windows to accept the user profile in an AFS path I recall being particularly vexing.

Wilfred · on March 29, 2022

FWIW I've had reports of people using difftastic on Windows successfully.

loxias · on March 29, 2022

I agree with all your points.

Only diff is I got to the point where it said I needed "cargo", On a whim, I typed "aptitude install cargo", and it did something. Now waiting for the >1GB source repo to clone to see if it works.... ;)

gkfasdfasdf · on March 29, 2022

This method worked for me. No root required. https://news.ycombinator.com/item?id=30842720

childintime · on March 29, 2022

Looks like you need to install the Rust programming language and compile it. It worked for me. Not sure if I like the installation method. It seems the executable is portable though.

LudwigNagasena · on March 29, 2022

Is there a good reason why diff tools generally don’t use AST?

danbruc · on March 29, 2022

Because it is much easier, you don't have to build and maintain parsers for hundreds of languages. And you don't need need just any parser, you need very robust ones that can deal with malformed files well. Or, if you only pick a small set of supported languages, your diff tool will not work on most files or have to fall back to a structure-agnostic algorithm. Also not all text files even follow any useful grammar at all.

Finally, even if you have a syntax tree, that is just part of the solution, probably the smaller one. Detecting three lines of code wrapped in a new if statement is easy but also doesn't benefit much from a syntax-aware algorithm. But once you changes names and signatures, extract methods, introduce constants, and so on it will become progressively harder to match subtrees and one is probably quickly approaching the territory of NP-hard and undecidable problems.

RyEgswuCsn · on March 29, 2022

> And you don't need need just any parser, you need very robust ones that can deal with malformed files well.

I very much agree. I feel there has been a trend recently where people (re)discovered how cool and useful ASTs are and now expect everything be using them. I suspect old-school computer scientists might be secretly laughing at this while programming with some Lisp-like languages they invented for themselves.

Jokes aside, I do wonder how modern IDEs manage to parse broken source code into usable ASTs --- is this trivial (CS theory-wise) or are there a lot of engineering secret sauce involved to make it work?

danbruc · on March 29, 2022

With only basic knowledge in the domain I would assume it is hard and ugly. If the file is malformed, there is almost certainly an infinite number of possible edits to make the file adhere to the grammar, hence there can not be any algorithm that just provides the one and only correct syntax tree. This in turn means that you have to come up with heuristics that identify reasonable changes which fix the file and that is probably not easy. Also, if you do this online in an IDE, the problem becomes probably easier [1] - if you have a valid file and then make it invalid by deleting an operator in the middle of some expression, you can still essentially use the syntax tree from just before the deletion. If, on the other hand, you get a malformed file, you might have a harder time.

[1] And also harder because if you want to parse the file after each key stroke, you have to be fast. This probably also makes incremental updates to the syntax tree the preferred solution and that might align well with using prior result for error recovery.

jhgb · on March 29, 2022

"If the file is malformed, there is almost certainly an infinite number of possible edits to make the file adhere to the grammar, hence there can not be any algorithm that just provides the one and only correct syntax tree. This in turn means that you have to come up with heuristics that identify reasonable changes which fix the file and that is probably not easy."

Don't we call such heuristics "test suites"?

danbruc · on March 29, 2022

I don't understand that question. Given the following source file that does not parse

  var foo = bar baz

there are many ways to change it and make it parse including the following reasonable ones

  var foo = barbaz
  var foo = "bar baz"
  var foo = { bar, baz }
  var foo = bar // baz
  var foo = bar
  //var foo = bar baz
  var foo = bar * baz
  var foo = bar + baz
  var foo = bar.baz
  var foo = bar(baz)

but also unreasonable ones like

  var abc = 123

and therefore a parser that can handle malformed inputs has to make educated guesses what the input was actually supposed to look like. And don't be fooled by this simple example, imagine a long source file with deeply nested code in a language with curly braces and randomly deleting some of the braces. Now try to figure out where classes, methods, if or try statements begin and end in order to produce a [partial] syntax tree better than just giving up at the position of the first error.

jhgb · on March 29, 2022

My point was that test suites should give you a heuristic on what corrections are good and which are bad. A source code change that turns a test fail into a test pass should be considered an improvement.

danbruc · on March 29, 2022

I am still lost. Test suite for what? We have a parser - binary, source code and maybe a test suite if the parser developers decided to write tests - and a random text file that we throw at the parser and for which the parser hopefully generates a useful syntax tree if the content is a well-formed or not too badly malformed program in a language the parser understands.

jhgb · on March 29, 2022

What "test suite for the parser"? Of course a test suite for the faulty program you're trying to correct into a working one.

danbruc · on March 29, 2022

So I can only use the diff tool to compare two non-compiling versions of a source file if I provide a test suite for that file to the diff tool? And how would you want to make use of the test suite? Before you can run the test suite, the source file must already parse and compile which is already more than a diff tool based on a syntax tree requires - it must be able to parse the source code but it doesn't have to compile. Passing the test suite requires even more, not only being able to parse and compile but also yield the correct behavior which the diff tool doesn't care about.

And you actually jumped over the hard part that requires the heuristics, how to modify the input in order to make it parse. Take a 10 kB source file and delete 10 random characters - how will you figure out which characters to put back where? With 100 possible characters, 10,000 positions to insert a character, and having to insert 10 characters, you are looking at something like 10^60 possible modifications. You are certainly not going to try them one after another, each time checking if the modified source file parses, compiles, and passes the test suite.

jhgb · on March 29, 2022

> So I can only use the diff tool to compare two non-compiling versions of a source file if I provide a test suite for that file to the diff tool?

Not sure what this whole straw man is about. I definitely didn't suggest anything like that. Of course you can only compare two compiling versions of a source file using a test-suite-based heuristics. I thought this whole thing was about "heuristics that identify reasonable changes which fix the file" mentioned above? "Reasonable changes that DON'T fix the file" are clearly recognizable by NOT passing the test suite, just as if it was a human trying to make those changes and finding out that the change that he just did didn't in fact yield the desired results after running the test suite.

> With 100 possible characters, 10,000 positions to insert a character, and having to insert 10 characters, you are looking at something like 10^60 possible modifications.

If you're working with an AST, you're almost certainly not working with characters. That would be immensely wasteful. In fact working with an AST is pretty much the only way in which the set of changes is sufficiently reduced for almost any change to NOT be rejected outright. With character-level modifications, you're facing the problem that almost every edit will be outright rejected as early as at the stage of parsing.

danbruc · on March 30, 2022

We have obviously been taking past each other. My point was that a parser for a syntax tree based diff tool should probably be able to deal well with files with syntax errors, i.e. it must be able to fix syntax errors. And with fixing syntax errors I did not mean actually fixing the file but being able to construct a reasonable syntax tree even if some subtrees do not adhere to the grammar. Given an input like

  class foo
  {
    function bar() {
    function baz() { }
  }

it should be able to parse the file as if bar() was not missing the closing curly brace. If the parser just gave up or inserted the closing curly brace at the end

  class foo
  {
    function bar() {
    function baz() { }
  }
  }

making baz() a nested function inside of bar() the result would be worse than using a character-based diff algorithm. But I never intended to say anything about making code functionally correct, that is none of the business of a parser or diff algorithm.

db48x · on March 30, 2022

What you do is produce an AST where some nodes indicate syntax errors. This works best in languages where it is easy to resynchronize after an error, of course.

NateEag · on March 29, 2022

This tool is built on tree-sitter (https://tree-sitter.github.io/tree-sitter/), so presumably it doesn't need to maintain parsers at all.

I've thought before this is how diffing should be done, and speculated that tree-sitter would make it more feasible.

At this point, whenever I think some language-aware tool ought to exist, my first thought is "Does the language server protocol or tree-sitter make this more feasible?"

danbruc · on March 29, 2022

Someone still has to build and maintain the parsers, you are just outsourcing this. And I added a bit to my comment, I tend to believe that parsing is the easy part, but that is admittedly more a gut feeling and not based on any real knowledge of that problem space.

NateEag · on March 29, 2022

That's certainly a good point.

Languages usually change slowly, though, so once a good baseline grammar is in place, maintenance is unlikely to be a huge load.

Furthermore, with tools like tree-sitter and the language server protocol, multiple communities benefit from their continued existence, so there's a bigger pool of contributors to the parser.

LudwigNagasena · on March 30, 2022

> And you don't need need just any parser, you need very robust ones that can deal with malformed files well.

But why? Shouldn’t the code you push into a repository be at least syntactically correct? And even if it is not, one can simply fallback to textual diff.

> Or, if you only pick a small set of supported languages, your diff tool will not work on most files or have to fall back to a structure-agnostic algorithm.

I don’t see how it is a blocker.

mekster · on March 29, 2022

> Because it is much easier, you don't have to build and maintain parsers for hundreds of languages.

Seems there's a good open market for such a lazy reason.

Wilfred · on March 29, 2022

It's really hard! :)

(1) Parsing an arbitrary language is hard. Without tree-sitter, difftastic would probably be a lisp-only tool. You also want a parser that preserves comments.

(2) Inputs may not be syntactically well formed.

(3) Efficiently comparing trees is extremely difficult (difftastic is O(N^2) in time and memory).

(4) Displaying tree diffs is equally difficult. Alignment is particularly challenging when your 'unchanged before' and 'unchanged after' are syntactically the same, but textually different.

skywal_l · on March 29, 2022

Performances is one I guess.

db48x · on March 29, 2022

Also there are a lot of languages out there, each with their own special and unique syntaxes.

eatonphil · on March 29, 2022

What happens when you have an invalid file/AST?

Wilfred · on March 30, 2022

tree-sitter inserts error nodes and gives you an AST for all inputs. It seems to work well in practice.

krick · on March 30, 2022

Understanding syntax would be really amazing for merges (not sure if it's even possible), but for diffs I don't immediately see why I should use that over simpler syntax-unaware tools. Highlighting the actual change in a string is important, and so is ignoring whitespace, but diffsofancy -w does it just fine. What else would I need? (Well, I guess the only use-case I can see from the demo is 2 compact changes in a single line, but… meh.)

On the other hand, even though my diffs are usually not that huge, sometimes they might be, and I don't want to switch tools every time that happens (I have just git alias and I don't even remember my exact config, nor should I care). So being slow is not great.

SuperSandro2000 · on March 30, 2022

Delta is a pager that does Syntax highliting, better diff highliting and improved outputs of existing git commands. Highly recommended.

https://github.com/dandavison/delta

goombacloud · on March 30, 2022

I find ydiff more useful, specially for the side-by-side output: https://github.com/ymattw/ydiff

I'm using it like "git-ydiff-s" script in my PATH to use "git ydiff-s":

    #!/bin/sh
    git diff "$@" | ydiff -s --wrap --width=0

Installation is "sudo dnf install ydiff" or curl -fsSL https://raw.github.com/ymattw/ydiff/master/ydiff.py > ~/bin/ydiff chmod +x ~/bin/ydiff # (and change to python3)

rom1v · on March 29, 2022

It might be useful for reviewing merge/pull requests. But is there a way to display the diff "interleaved" instead of 2-columns side-by-side? (when executing `GIT_EXTERNAL_DIFF=difft git log -p --ext-diff` for example)

Wilfred · on March 29, 2022

There's a basic single-column 'inline' display available if you do `INLINE=y`, but it's not as mature as the side-by-side display yet.

DarkPlayer · on March 29, 2022

We are working on a code review tool which supports unified diffs with semantic diffing. If that sounds interesting for you, take a look at https://mergeboard.com

einpoklum · on March 29, 2022

Checked out the repository.

Build instructions? Nope.

Minimum system requirements? Nope. But if you check out cargo.toml, you'll see it says it needs Rust 1.56.

My system has 1.48.0 . And it the latest Debian release! I don't see how a diff tool can expect you to have a bleeding-edge development environment. I mean, ok, you chose a new language - I can understand that; I won't demand that it build with just a C compiler and Make. But come on, this is not supposed to be just a toy for new systems.

Anyway, I still cloned it, tried to build with "cargo build", and got stuck with:

error: unexpected token: `include_str`

it couldn't even tell me "get Rust 1.56" :-(

rcthompson · on March 29, 2022

I wonder if it would be possible to do this in a one-column format. That would make it more useful in a lot of contexts where a super wide view isn't practical.

synergy20 · on March 29, 2022

I use meld and it seems syntax aware plus it can do merge with a click, how will difftastic diff in that regard?

berkes · on March 29, 2022

I use meld too. But afaics, meld 'syntax aware' is very different from from difftastic.

Meld takes a diff, and applies syntax highlighting over the diffed files. It additionally highlights the changed characters in a line. Git diff, vimdiff and probably others, do this as well.

From the demo, I understand that Difftastic first applies syntax and then rebuilds the patch over that. Being aware of line wrapping, changes in nesting, moving codeblocks into functions and so on.

simulate-me · on March 29, 2022

The quick examples seem like they could all be solved using git diff's --ignore-all-space option.

sanity · on March 29, 2022

Unfortunately it's closed source, but https://www.semanticmerge.com/ has been around for a few years and works similarly, but can also merge.

oauea · on March 29, 2022

I just spent a few minutes on that site and I can't even figure out how to try it out, or their pricing, or anything other than some very superficial docs, really.

Is this just a pretty website, or is the software actually available anywhere?

Pet_Ant · on March 29, 2022

That pages is just the technology primer. The tools are XDiff & XMerge:

https://www.plasticscm.com/pricing

Looks like no locally-run-binary/non-SaaS version. I was hoping it'd have SublimeText like model. I have no interest in trying to get my team to switch nor having to deal with the security team when it turns out I was using a free cloud account.

rcthompson · on March 29, 2022

What does it do for unsupported languages? Just fall back to "regular" diff?

Wilfred · on March 29, 2022

Yep! It does a conventional textual diff: run Myers' diff algorithm on lines, then word highlighting on changed lines.

yewenjie · on March 29, 2022

Does a `magit` plugin exist for Emacs users? The author of this package is also the author of a couple of popular Emacs packages but I did not see any mention of Emacs.

Myrmornis · on March 29, 2022

It won't be able to form the basis of a magit plugin because it does not target traditional diff format.

db48x · on March 30, 2022

To be more precise, Magit could easily display the output of difft, but Magit wants to be able to do more than that. You can navigate diffs by hunk and by file, you can collapse hunks and files, you can even select individual lines of the diff to stage or unstage them, apply or unapply them, etc. Strictly speaking that’s probably not impossible with difft, but because difft has the explicit goal of displaying diffs to humans rather than producing machine–parsable output, it won’t be easy. I still want it to happen though.

einpoklum · on March 29, 2022

The documentation says:

> Difftastic output is intended for human consumption

Why not separate the human-consumption part and the underlying parsing part? Or at least provide both in the same utility?

Wilfred · on March 29, 2022

The underlying parser is just tree-sitter, which is a reusable (and excellent) parsing library.

Difftastic then converts the tree-sitter parse tree to a simpler s-expression style format (see https://difftastic.wilfred.me.uk/parsing.html#simplified-syn...), and computes differences on that.

I'm just trying to clarify that I'm not generating conventional 'unified diff' patches, so I can provide a nicer interface (e.g. line numbers).

hyperpallium2 · on March 29, 2022

BTW IIRC The tree version of levenshtein distance has (proven) terrible complexity. But so does lcs, and diff itself performs great in practice so maybe...

kid64 · on March 29, 2022

This is great. I previously used Code Compare by Devart for this purpose, but it has been abandoned without support for modern IDEs.

rednosehacker · on March 29, 2022

Any plan for the Scheme programming language ?

Wilfred · on March 29, 2022

I'd like to add it, but I haven't found any good tree-sitter parsers for Scheme.

pabs3 · on March 30, 2022

A diff tool for binary files:

https://diffoscope.org/

gwbas1c · on March 29, 2022

Now lets get a WASM build into Github. :)

steerablesafe · on March 30, 2022

What would be also cool is a syntax aware custom merge driver for git, but that's probably even harder.

fortran77 · on March 29, 2022

It supports Elixir and C#! Too bad it doesn't do Erlang and F#

It looks very handy though. I still do a lot of C and C++

sytelus · on March 29, 2022

Is there a VSCode extension for this?

soheil · on March 30, 2022

Yes! Why did it take so long for this to be invented? So obviously and amazingly useful.

challenger-derp · on March 29, 2022

First thing that came to mind is diffing python notebooks.

cycomanic · on March 29, 2022

For Jupyter Notebooks I highly recommend trying out jupytext, which converts Notebooks on the fly to a number of formats. It really has been a game changer for working with git and Notebooks for me. I essentially never want to preserve state of the notebooks anyway so converting just makes sense. The best thing is it is completely transparent, i.e. it generates a notebook file when you open the other file and saves to the file ever time the notebook is saved. If you want to keep the state of the notebook you can always keep that file around as well.

gh02t · on March 29, 2022

Don't think this tool supports that, but there is https://nbdime.readthedocs.io/en/latest/

NaturalPhallacy · on March 29, 2022

You hit us up when we can build & install it.

dmarinus · on March 29, 2022

Looks nice! Now I only need patchtasic :-)

dotancohen · on March 29, 2022

Actually, the README addresses that!

  > Non-goals
  > Patching. Difftastic output is intended for human consumption, and it does
  > not generate patches that you can apply later. Use diff if you need a patch.

neves · on March 29, 2022

Now I want a 3 way merge version :-)

tzahifadida · on March 29, 2022

how do i install on macbook to try? Can you give some instructions in the getting started?

ryanianian · on March 29, 2022

    brew install rust
    cargo install difftastic

Worked for me without any problems.

sebdufbeau · on March 29, 2022

From https://difftastic.wilfred.me.uk/getting_started.html, it's installed via Cargo, so if you already have Cargo installed its straightforward, otherwise you can install it via https://doc.rust-lang.org/cargo/getting-started/installation...

ducktective · on March 29, 2022

So how can one use this in vim?

thefaux · on March 29, 2022

The nine year old inside me can’t unsee the unfortunate choice of names used in the basic example :)