Show HN: Hiding code from Git diff

pkamb · on Dec 13, 2017

The way I've noticed people (accidentally) "hiding code" is by making changes during the conflict resolution stage of a git merge.

This often happens after code review, and there's no oversight of the conflict resolutions.

If the merge commit does get looked at, there will be potentially hundreds of lines and files of automated diffs within which the unrelated new code is hidden.

stinos · on Dec 13, 2017

Isn't this sort of solved by rebasing first instead of just merging? (well, and some rules in place which tells people 'we only accept properly rebased branches for review). Then the conflicts are resolved in the branch and there's a clear view of what actually changed in the diff, before anything is effectively merged into the main branch. Or maybe I'm missing something here.

Mithaldu · on Dec 13, 2017

It's absolutely resolved that way.

Rebase should be the weapon of choice in all cases, except those where it would be prohibitively expensive.

People are too enamored with the SVN ways though, or have even made up rationalizations about fictional concepts such as "historical purity". So you have a ton of people merging instead, producing code repositories that look worse than the worst perl code i've touched.

dataflow · on Dec 14, 2017

> People have even made up rationalizations about fictional concepts such as "historical purity".

The idea behind not rewriting history is not to gain some kind of academic pleasure from purity. It's to make sure that the commits being committed are actually the same ones that you actually wrote and (hopefully) found to be working, so that when a bug makes you refer to the commit later, you (a) don't start scratching your head as to why it didn't do what you thought you saw it do yesterday, and (b) can actually refer to it to figure out what the correct code should've looked like.

If you don't see what I mean, here's an example. Say you have

  foo():
      foo1()
      foo2()
      foo4()
      foo5()

and you turned it into this:

  foo():
      foo1()
      foo2()
      foo4()
      foo5()
      foo6()

Now now if someone has in the meantime done this:

  foo():
      foo1()
      foo2()
      if (!foo3()): return 0
      foo4()
      foo5()

then after a rebase you will end up with

  foo():
      foo1()
      foo2()
      if (!foo3()): return 0
      foo4()
      foo5()
      foo6()

which is not code that you ever had, and which is likely to fail because your assumption that foo6() will always be hit will no longer be valid.

With a rebase, not only will this be confusing when you look back and make you blame the wrong person or lead you in the wrong direction, but it will also remove the option of rolling back to the latest of either of the two branches that you combined. You'll only be able to revert to the "known-good" branch that was rebased onto, not the one that was rebased.

Oh, and did I mention you very likely just lost the working (pre-rebase) version of your code entirely? Have fun digging it out of your reflog if you notice it quickly enough, and good luck finding a copy if you do so too late. You'll have probably deleted the branch by now, and chances are nobody else will have a copy lying around for no reason either. After some period of time you'll have no idea what the working code looked like, because the only version you have is the one that already incorporated the branch you rebased onto.

dummyfunnytoo · on Dec 14, 2017

Life sucks sometimes but it doesn't matter whether you merge or rebase in the scenario you describe, you need to test the combined version of the code. You can't just assume it is still working and whoever comes second needs to shoulder that responsibility.

dataflow · on Dec 14, 2017

> Life sucks sometimes but it doesn't matter whether you merge or rebase in the scenario you describe, you need to test the combined version of the code. You can't just assume it is still working and whoever comes second needs to shoulder that responsibility.

Sure you do. But you're missing the fact that testing is in no way even remotely guaranteed to catch every error. If it did then projects that have tests would never have bugs, and that doesn't seem to even approximate the world we live in. (Why did I need to point this out though? Is this not obvious?)

dummyfunnytoo · on Dec 14, 2017

Sorry for being a bit vague but I didn't mean to imply there were tests. I meant test in the sense of "do whatever you did to check correctness" in the first place. You change the code, you should do that process again or at least some approximation of it. Automated tests make it pretty easy, manual testing makes it doable, stepping through with a debugger or dumping printing statements makes it painful and error prone.

Many codebases don't have tests at all, most have terrible tests, and under no circumstances do I think they magically prevent all bugs. But I do think obnoxious coders who assume they can just merge and move on without checking the correctness have a special place in hell.

dataflow · on Dec 14, 2017

> I meant test in the sense of "do whatever you did to check correctness" in the first place.

Sure. I already responded to this:

>> But you're missing the fact that testing is in no way even remotely guaranteed to catch every error.

fahrradflucht · on Dec 14, 2017

Obviously you have to test the combination but the merging allows an easier rollback.

Mithaldu · on Dec 14, 2017

Easier? How so? If a rebase of a branch isn't successful you just reset back to the commit you rebased onto, and continue work on your old pre-rebase branch.

xorcist · on Dec 14, 2017

Your example is in fact a good example why you should always rebase before merge. Only then would the reviewer have a chance to see the (potential) problem before it is merged.

I'm not sure I understand why rebasing would confuse blame here. Have you tried it?

If you rebase your "foo6" change on top of the "!foo3" change, the state of your branch will be exactly the state of your code base going forward. And that is the useful history to keep, not the order in which the actual keys were pressed.

grkvlt · on Dec 14, 2017

But if the idea is that I want `foo6()` etc to always be executed, wouldn't I just handle the rebase from master by resolving the conflict as follows, instead?

  foo():
      foo1()
      foo2()
      foo4()
      foo5()
      foo6()
      if (!foo3()): return 0

dataflow · on Dec 14, 2017

> wouldn't I just handle the rebase from master by resolving the conflict

(a) No, because there is no conflict. It gets merged "cleanly" without you realizing it. (Let me emphasize this: merge is NOT guaranteed give a correct result if it happens cleanly. This is just one example.)

(b) No, obviously in general doing foo3() at the end will not give the same result as doing it in the middle.

orblivion · on Dec 14, 2017

It would be nice if git kept both versions of the history around. One with work in progress commits, corrections from review, and merges. The other a clean rebased history for usual history browsing.

stinos · on Dec 14, 2017

You can do something like that easily by keeping multiple branches around. I do that sometimes when I'm doing some major rebase/cleanup of commits spanning more than a couple of days. Just so afterwards I can compare what I end up with with what I had originally.

lemoncucumber · on Dec 13, 2017

Conversely, people get too enamored of rebasing as a way to solve the problem of merging in lots of junky commits.

There are lots of ways to rewrite history and clean it up. Rebasing is just one of them, and it's one that can force you to spend more time resolving conflicts than you otherwise might have to (since you have to replay all of your commits and resolve any conflicts caused by each one individually).

There's no reason why rebasing is a better way to clean up your history than squash merging (or even -- carefully -- using git reset).

Mithaldu · on Dec 14, 2017

I already said that it can be too expensive, but thanks for playing. <3

Also, a squash merge is not a merge. It's just another form of rebase.

And if you mention reset as an alternative to cherry pick and squash you should also mention --soft.

lemoncucumber · on Dec 14, 2017

My point was that many git users don't understand git all that well, and I think that encouraging everyone to rebase is a bad idea.

If you're going to promote a particular workflow as the the one that "should be the weapon of choice," I think the workflow of merge-master-into-feature-branch, squash-merge-feature-branch-into-master is a more user friendly alternative, since then the cases where rebasing is expensive never even come up.

And yes, you're right that I was remiss not to mention the --soft part. The git reset workflow is one that I would definitely not encourage people to do if they don't understand git well.

slavik81 · on Dec 14, 2017

> Rebase should be the weapon of choice [....] People are too enamored with the SVN ways though

I'd expect SVN users to be quite comfortable with rebasing. `git pull --rebase` is effectively the same thing as `svn update`. You just couldn't commit unfinished work when using SVN, so the changes were loose in the working tree, rather than stored in local commits.

neilparikh · on Dec 14, 2017

I don't think you need to use rebase solve this issue at all. All you need to do is to make sure is that every commit on master, is on your feature branch, which you can still do with merging. Something like:

    git checkout my-branch
    git merge master

Whether you use rebase or merge is a completely orthogonal issue.

orblivion · on Dec 14, 2017

Maybe I'm misunderstanding here. Who is going to run `git merge master`? If it's not you, the maintainer, then you need to trust that the submitter didn't pull any funny business with that command. With rebasing, you can just confirm that there are no merge commits.

WorldMaker · on Dec 13, 2017

Too many experiences like that I make merge commits a required part of a code review. If a branch needs conflict resolution in order to merge, it needs that merge in that branch, and as part of the code review of that branch. (Some of the best conflict resolution is sometimes new code in light of the changes in the other branch, and so of course should be reviewed.)

If a merge commit diff is showing too much information, that's more often than not a tooling problem. Most tools provide ways to filter out the "automated stuff" and focus only on the conflict resolutions. If they don't, they are deficient (and if there are too many conflict resolutions to easily review, that may be a sign of a larger process problem). (GitHub has gotten much better at merge commit views in the last couple of years, as one anecdotal bullet point.)

seanwilson · on Dec 14, 2017

> Too many experiences like that I make merge commits a required part of a code review.

How can you do this practically though? In the turnaround time to do the code review and make the changes, a new conflict might have been introduced into the branch being targeted by the merge.

I think merging e.g. the master branch into your branch before code review is a good idea though as when the review is complete there's going to be less or no conflicts merging into master.

WorldMaker · on Dec 14, 2017

That's also a tooling issue I believe. GitHub, again, as easy example, has gotten pretty good at "these are the commits since your last review" incremental reviews, so if you complete a code review and need conflicts to be fixed before it can merge, you expect a merge commit to be made in that branch and should just need to review that merge commit.

I've not recently seen an issue with a PR cycling through a lot of merge commits to keep up with the branch it is targeting, but in almost every case where I've seen that there was some other process decision in the way (two people working on the exact same feature simultaneously) and at that point it was an architectural, "political", or "social" problem to fix first, outside of the code review process. Your mileage may vary, of course.

pkamb · on Dec 13, 2017

What I personally do is commit the merge as is, with all the <<< === >>> conflict markers.

Then, in the next couple commits, fix each conflict via a normal commit that mixes and matches "mine" and "theirs" as needed + deletes the markers.

Each resolution commit is independent of the giant merge commit, and can be separately diff'ed and code reviewed via the github PR or any other tool.

Also easier IMO to make the complicated resolution you want, or reset back to a previous resolution attempt without messing up the entire merge.

The drawback is that some commits won't compile, but compilable mid-branch commits are not something that I view as a goal.

jsteemann · on Dec 13, 2017

Non-compiling commits make bisecting harder, when the goal is to find the commit that broke the build or a specific test. IMHO having to skip over non-compiling merge commits only makes bisecting take more time, and it may also "pollute" the test pipelines if all commits are going to be built. Additionally, merging and fixing the merge in separate commits makes it much hard to revert a merge later, in case it turned out that the merge introduced something unwanted).

So I will always recommend doing the merge and fixing the merge in the same commit. Ideally this produces a squash commit that can easily be reverted later when needed. This makes the merge process more time-consuming, but it can help to keep the builds more stable, and to more easily track which commit (merge) introduced a particular problem.

pkamb · on Dec 13, 2017

> makes it much hard to revert a merge later

I don't think this would apply, as I'd be merging from `dev` > `feature`, resolving conflicts, and then doing a single clean no-conflict merge back from `feature` > `dev`. That's the only commit you'd need to revert to undo the merge.

(or on a new, third, `merge` branch, if you don't want to pollute the `feature` branch with commits from `dev`)

I admit the non-compiling commits might be a problem in some workflows, but `git bisect skip` is the solution to the only problem I've ever had with it.

jsteemann · on Dec 13, 2017

It's kind of a problem if the compile/build step takes rather long. Then you want to avoid having any additional non-compiling revision to check when bisecting.

But I agree in general, if the compile/build turn-around times allow for it, then a single no-conflict squash merge back into devel is definitely a very good approach to have clean and compiling commits, and for identifying the commits/merges that break the builds.

scrollaway · on Dec 13, 2017

I upvoted you for the sheer uniqueness of that workflow (almost a decade using git, never seen anything like that), but I think if I ever had to work on a git tree that did that I would throw a fit ...

There is so much wrong with that approach. Kills bisects, kills diffs, kills signed commits, kills merge automation.. Yikes.

pkamb · on Dec 13, 2017

I forgot to mention that the conflict resolution commits would be occurring on the feature branch (or a 3rd branch), not on the main dev branch, if that makes it easier to grok.

Anything merged into `dev` would be a single clean no-conflict merge commit that compiles. That eliminates any merge automation issues I've encountered, although other systems might require every commit to compile.

`git bisect skip` fixes bisect, just like any other non-compilable commit on a branch.

You say it "kills diffs", and that may be true for some aspects of a command-line diff. But I find the main place my team is "diffing" is via the web view of a Pull Request. This workflow makes it possible to see the diff via the PR. Each commit shows you exactly how the conflict was resolved. Without this workflow the conflict resolutions are essentially hidden from the web view of the PR and leads to the "hidden code" issue I mentioned in the OP. (Although some newer PR views may now make this possible.)

Pharaoh2 · on Dec 13, 2017

That sounds horrible, squash -> rebase -> fast-forward merge... Much more easier to reason about the commits/changes.

Git bisect is really important in some cases and that needs all/most commits to be compilable..., especially big merge commits.

Unless you have a special reason to preserve all commits as they were made (eg. signed commits), in that case go with the linux model of small single responsibility merges that preferably have no conflicts...

deckar01 · on Dec 13, 2017

GitLab makes it really easy to see what changed between commits while reviewing a merge request. Parts of the diff that are identical to the upstream branch are automatically ignored, and you can view a diff between of last two pushes.

https://docs.gitlab.com/ce/user/project/merge_requests/versi...

Edit: GitHub has similar functionality, but it doesn't know how to diff force pushes, so you would end up having to review the full diff again to ensure integrity.

WorldMaker · on Dec 13, 2017

> Edit: GitHub has similar functionality, but it doesn't know how to diff force pushes, so you would end up having to review the full diff again to ensure integrity.

I realize this is a religious war I'm stepping into, but I fall on the side of "if you are doing a force push of a branch already in code review you are doing something wrong". Once commits are out in the open and in review between multiple developers, that is shared code history that encodes important information about the code review process itself. (I make obvious exceptions for things like accidental PII disclosure or similar bone-headed mistakes, but the longer a branch has been in code review, the less willing I am to allow it to be force pushed.)

Of course, to each their own and your mileage may vary, but that comment about diffing force pushes in a merge/pull request sent a shiver down my spine, regardless of whether or not the tool supports it.

deckar01 · on Dec 14, 2017

We often have to rebase our feature branches before they can be merged, so changing the history is common and force pushes are inevitable. As a rule I don't base any work on top of unmerged feature branches, because there is no expectation of stability.

I like the commit history to form a narrative of meaningful changes. I don't think commit messages about fixing errors that were introduced in the feature branch's previous commits belong in the history.

WorldMaker · on Dec 14, 2017

I like the ability to drill down into the details of why a change was made, the narrative of a PR and the back and forth flow can be hugely useful. The git DAG can be a very powerful way to encode both the high level narrative and the low level details.

The `--first-parent` option to git log and git annotate lets you stick to that high level narrative of just PRs/features that changed. (More tools should offer that as an option, it's probably my biggest request for GitHub's commit view, personally.)

It's a DAG for a reason, and it's powerful to take advantage of that, spending a lot of work (and potentially building a lot of footguns) forcing it to be a straight line seems counter-productive to me.

closeparen · on Dec 14, 2017

Phabricator is really nice here. The history of the code review process is maintained by Phab itself, you can amend all you want against an inflight code review, and upon landing it all becomes a single commit in master with a link to the code review in the commit message.

__david__ · on Dec 14, 2017

You’d hate our team—we force push to master (not just feature branches) all the time. It seems like anarchy at first but it’s not really that hard to deal with. Our situation is a bit unique though, and I doubt it would scale well beyond a small team.

WorldMaker · on Dec 14, 2017

Yes, that sounds like you are trying to maximize footguns to me. I do hope for your sake that no one manages to trigger one. (…and I'm personally very glad at this point with GitHub supporting branch restrictions and the ability to disable force pushes for master.)

herpderperator · on Dec 14, 2017

That's why I say "you have conflicts" and have them resolve the conflict before I do the review. GitHub, at least, correctly diffs against master after the conflicts are resolved so you don't see any of the merged code (if it's the same as on the master branch), only the actual changes.

Doesn't that solve the problem?

simmons · on Dec 13, 2017

That is scary. I think it would be a nifty feature for a code review tool to show only the conflict resolution for merge commits. (Although I suppose it would need knowledge of exactly which automated merge method/options were used to create the pre-CR merge, for any given commit with multiple parents.)

kbutler · on Dec 14, 2017

The issue with with terminal handling of escape sequences, rather than with git.

If you are concerned, you can use a gui diff tool, or you can pipe it through a filter that removes escape codes:

  seal:~/work/gitPocDiff$ cat > uncolor
  #!/usr/bin/env perl
  ## uncolor — remove terminal escape sequences such as color changes
  while (<>) {
      s/ \e[ #%()*+\-.\/]. |
         (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
         (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
         (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
         \e.|[\x80-\x9f] //xg;
      print;
  }
  seal:~/work/gitPocDiff$ chmod a+x uncolor
  seal:~/work/gitPocDiff$ git diff eeeb ebc3 | ./uncolor
  diff --git a/main.c b/main.c
  index 6daa9b2..0321ff2 100644
  --- a/main.c
  +++ b/main.c
  @@ -3,4 +3,9 @@
   int main()
   {
       printf("I'm just a stub!\n");
  +    /*
  +     * Must always return a value Hidden > */printf("Insert bad backdoor here...!\n");/*
  +     * TODO: Return result
  +     */
  +    return 0;
   }

Credit to: gilles https://unix.stackexchange.com/a/14707

realusername · on Dec 14, 2017

You can also just pipe it to cat -e, it works the same. git diff | cat -e make all escape sequences visible.

kbutler · on Dec 14, 2017

There's always a utility... Thanks!

mikedilger · on Dec 13, 2017

If the output of a git command was raw text sent to stdout, like cat, then I would argue that escape sequences should be ignored and passed through unmolested.

But the output of git tools is colored (if enabled) using terminal escape sequences. Given that, the output should fully comply with the appropriate terminal escape sequences and should filter the ones the author has discovered.

deathanatos · on Dec 13, 2017

Unix pipes lack any ability to say "this is the kind of data that I'm pushing you." The push raw, unidentified bytes. If the downstream program had some idea what format the data was in, it would know how to output it.

Imagine an alternate universe where a pipe was a stream of bytes and a MIME type. If a terminal gets data written to it from a program in application/terminal.vt100, then it knows it should process those escape sequences. If it get text/plain; charset=utf-8, it knows it should not, and can take a different action.

I think such a universe would make for more pleasant piping, too. Imagine that every interactive command on a terminal ended with an imaginary |show command; it's job is to read the mimetype and display, on the terminal, that data. A command that emitted CSV data could then automatically render in the terminal as an actual table. A program that emitted a PNG could render an ASCII art version of that image (or, since we're in an imaginary universe, imagine our terminal has escape sequences for images – which actually exists in some terminals today – it could emit the actual image!). Essentially, the program emits the data in whatever form it natively speaks, and that implicit |show parses that into something appropriate for the terminal. (That way, the program can still be used in pipelines, such as make_a_png | resize_it.) If you want your raw bytes, then you can imagine

  make_a_png | force_type 'application/octet-stream'

and then the final, implicit |show would perhaps output a nice hexdump for you. (Since emitting raw bytes to a terminal never makes sense.)

Now, I'm a bit fuzzy on exactly how the pipeline being executed, the shell, and the terminal all interact exactly, and I'm nearly certain such a change would require low-level, POSIX-breaking changes. But it was a dream I had, and I think it might be a better model than what we work w/ presently.

(In the article's case, though, git log/diff both emit terminal sequences, so even in my imaginary world, they'd need to know that they should escape them. It mostly works for simpler types, but git rm could conceptually emit just text, since I don't think it ever colors.)

chickenfries · on Dec 13, 2017

> Imagine an alternate universe where a pipe was a stream of bytes and a MIME type.

Isn't this powershell?

> I'm nearly certain such a change would require low-level, POSIX-breaking changes.

That seems to be the trade off, breaking with posix compatibility for nicer abstractions.

caf · on Dec 14, 2017

and I'm nearly certain such a change would require low-level, POSIX-breaking changes.

Well, as long as you were happy with it being opt-in (ie, every currently existing program and terminal defaulting to just ignoring the content type as it does now), I imagine you could get 99% of the way there by adding two new command types to the fcntl(2) syscall:

  F_SETCTYPE
    Set the content type for the data written to this
    descriptor to the value pointed to by the third
    argument, which must be a pointer to struct fctype.

  F_GETCTYPE
    Get the content type for the data read from this
    descriptor and store it into structure pointed to
    by the third argument, which must be a pointer to
    struct fctype.

struct fctype would just contain a character array.

You'd implement it so that the value set by F_SETCTYPE on the writing end of a pipe or named pipe is returned by F_GETCTYPE on the reading end, and for pseudo-terminals the value set by F_SETCTYPE on the slave side is returned by F_GETCTYPE on the master side.

You probably wouldn't bother supporting it for sockets or disk files (though files could store it in a POSIX extended attribute if you really wanted to).

The writing side would have to set the content type before calling write(), and the reading side would have to get the content type after read() returns.

ken · on Dec 14, 2017

This is one of the problems that I've been attempting to solve with Strukt[1]. Its operations work by processing a stream of objects (which have named keys, and values with real types), rather than bytes.

[1]: https://freerobotcollective.com

You're right that shoving raw bytes to a terminal doesn't make sense. I display binary data as a picture of the bits themselves. After a little while, it becomes surprisingly informative. You can spot patterns like "ASCII(ish) text" or "all 0's".

You're right, too, that once you start tugging on one corner, the whole thing starts to come apart. That's why I didn't even bother trying to make it a "Unix shell". There's just too many issues with trying to remain backwards compatible that far back. When the operations aren't programs, for example, I can optimize between them.

Unfortunately, perhaps, "people who spend money on software" and "people who are looking for a Unix shell" are pretty much disjoint sets, so I'm initially going after markets like EDA and ETL.

deathanatos · on Dec 14, 2017

Your website, BTW, serves itself as ISO-8859-1¹, but the actual data is UTF-8. The result is that any non-ASCII is mojibake. (Such as the degree/minute/second symbols in the location example, or the second text example).

¹you serve a Content-Type of text/html, with no charset, and you have no <meta> charset in the HTML itself, so this is the default.

> I display binary data as a picture of the bits themselves.

And after looking at that on the website, that is a very interesting approach.

ken · on Dec 14, 2017

> Your website, BTW, serves itself as ISO-8859-1

Thanks! I didn't realize that, since it worked fine in all of my browsers, and nobody has said anything about it.

It's a simple HTML page (from some Jekyll templates), served up through an AWS S3 bucket, and I just learned that while it auto-detects mime-types, it doesn't do this for encodings. Fortunately, there is a way to specify it by hand [1].

[1]: https://github.com/aws/aws-cli/issues/1346#issuecomment-3332...

Should be fixed now.

> And after looking at that on the website, that is a very interesting approach.

I've been surprised by the response to this. It's the 5th or 6th design I tried, and the first one I didn't totally hate, but I've had a user tell me it's super cool and I should feature it more prominently.

One of my philosophies is "When in doubt, show the user their data".

toomanybeersies · on Dec 14, 2017

PowerShell does something similar to what you're describing, but arguably better.

With PowerShell, instead of passing text strings between programs, you actually pass CLR objects between programs.

PS does a lot of things very nicely, it's a great tool for Sysadmins, with more safety than Bash. The verbose syntax (e.g. Get-ChildItem vs ls) is a big stumbling block the few times I've used PS, but it makes sense from the point of writing a script to be executed more than once, it's a much more readable language than Bash.

deathanatos · on Dec 14, 2017

I'm a Linux user (my Windows days predated Powershell), so I'm unfamiliar here. I think the biggest concern such a setup gives me would be:

1. is whatever encoding the CLR uses for IPC efficient enough for all purposes?

2. can you stream objects / push multiple objects? (some programs, such as tar, generate potentially massive amounts of output that you cannot buffer)

Otherwise, that actually sounds a good deal stronger than my initial suggestion, and potentially a lot more powerful. (I think I just wonder about its generality & performance.)

fowl2 · on Dec 14, 2017

1. It's mostly in-process. 2. Yes, there's a "pipeline" concept.

It's a quite a... quirky language, but you can[2] do pretty much anything down to calling C functions, etc.

PS. PowerShell is on Linux now too[1]

[1] https://azure.microsoft.com/en-us/blog/powershell-is-open-so...

[2] You probably shouldn't.

CGamesPlay · on Dec 13, 2017

This idea was explored a long time ago! https://acko.net/blog/on-termkit/

I personally think something like Jupyter where the "rich display" is actually a separate channel makes more sense than pipes for rich content.

shabble · on Dec 14, 2017

its quite a lot older than that! http://xml.coverpages.org/linuxml-19990226.html is the first one I've heard of, but I wouldn't be surprised if there are predecessors decades older still.

The problem largely still is the effort of rewriting all/enough userspace tools to actually use the type system to make it worthwhile, or the even less appealing approach of maintaining parsers/shims/wrappers for things that allow it to work.

ComputerGuru · on Dec 13, 2017

I actually reported this same issue but under the context of a bug rather than as a security issue, exactly 2 months ago: https://public-inbox.org/git/CACcTrKctqAWeWWrc9Q+Y7ewXGc_o+u...

Same response, “it’s the pager’s job,” in that case.

jwilk · on Dec 14, 2017

I mentioned the problem on oss-security in 2016:

http://openwall.com/lists/oss-security/2016/04/21/9

(I think my claim that git escapes some control characters is incorrect; it's less that does it.)

> Same response, “it’s the pager’s job,” in that case.

It can't be the pagers job, because pager has no way to distingiush from escapes generated by git (to make things colorful) from malicious escapes.

CGamesPlay · on Dec 13, 2017

This is tenuous at best. iTerm2, at least, doesn't support the "conceal" color (^[[8m) and any other color would rely on the "victim's" color scheme to work properly. Git is safely removing sequences that could actually cause problems like OSC. This technique also wouldn't stand up to review on Github or any other online code review platform.

carapace · on Dec 13, 2017

(Tiny grey sans-serif body text means you hate your readers' eyes.)

jsteemann · on Dec 13, 2017

One mitigation for this is to use `col -bp` as a git pager, but it seems clumsy. However, for completeness I post it here with an example for how this makes the backdoor visible.

Reproduce instruction:

# clone repo and cd into it

git clone https://github.com/twistlock/gitPocDiff

cd gitPocDiff

# show diff (this includes the backdoor but it won't be visible)

git show ebc3f506a7ec8278e1a3ad4108612b66d10b41ca

# expose the backdoor, using `col -bp` as pager

git show ebc3f506a7ec8278e1a3ad4108612b66d10b41ca | col -bp

jwilk · on Dec 14, 2017

No, "col -bp" is not bulletproof. The attacker could still use backspace characters to hide malicious content.

tomcam · on Dec 14, 2017

Unfortunately had to stop reading due to their text-hiding scheme--miniscule light-gray text on white background

mrob · on Dec 14, 2017

Firefox's Reader View reduces the problem, and with some custom CSS, solves it completely (see https://news.ycombinator.com/item?id=15080363 ). I imagine other browsers include a similar solution.

dimastopel · on Dec 14, 2017

This will be fixed. Thanks for the feedback.

amigoingtodie · on Dec 13, 2017

What about code being 'deleted' that git diff does not catch?

Is there a recommended external diff tool that ameliorates these issues?

hawski · on Dec 13, 2017

Maybe cat -v?

    $ git diff | cat -v

However it's considered harmful: http://harmful.cat-v.org/cat-v/ ;)

UncleEntity · on Dec 13, 2017

or

git diff > tmp.diff

and open with a text editor?

jwilk · on Dec 14, 2017

  git diff | less -U

But the you lose all the colors.

ChuckMcM · on Dec 13, 2017

Interesting to see how the standardization of escape sequences by ANSI leads to a vulnerability in code obfuscation.

shabble · on Dec 14, 2017

we need a 'secure escape' sequence that disables all subsequent escapes until rescinded!

And it can change the background colour of the affected lines to a shade unrepresentable by unprivileged sequences.

Mostly tongue-in-cheek, but... one of these years I'm going to track down what the heck the 'PRIVACY MESSAGE' sequence is actually for. ECMA48 is infuriatingly vague, and nothing seems to actually use or support it.

_pvxk · on Dec 14, 2017

magit ftw: https://i.imgur.com/a8Mvbkf.png

Though I'm sure Emacs has loads of other vulnerabilities :)

always_good · on Dec 14, 2017

Some pitiful Vimming right there. From holding down `h` for 30 seconds to go to the beginning of line to using :wq instead of :x.