Hacker News new | past | comments | ask | show | jobs | submit login
GNU grep 2.12 changes behavior of recursion options, breaks existing scripts (debian.org)
119 points by cschramm on July 26, 2012 | hide | past | favorite | 77 comments



Here's the change and it's justification:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=c6e3ea6...

Change -r to follow only command-line symlinks, and by default to read only devices named on the command line. This is a simple way to get a more-useful behavior when searching random directories; the idea is to use 'find' if you want something fancy. -R acts as before and gets a new alias --dereference-recursive.

Personally I think breaking compatibility for this change was a poor decision.


That link doesn't work. I thought at first that it was incorrectly copied, but you can see it here:

http://git.savannah.gnu.org/cgit/grep.git/log/?qt=grep&q...

And clicking on it says 'No repositories found'. Anybody else have this problem?


Alternate link:

http://repo.or.cz/w/grep.git/commit/c6e3ea61d9f08aa0128a0eb1...

(Yea for the "D" in DVCS.)

Also, the corresponding bug which has additional discussion:

http://savannah.gnu.org/bugs/?17623


Just FYI, "Yea" and "Yay" are not synonymous.


Doh. I see I mistyped "its" up above as well. (Or maybe iOS did that for me.)


Yes, seems like Savannah can't completely handle HN ;)


The error message doesn't suggest this is a load issue. Also, wouldn't the entire app go down rather than a single link?


If it's not a load issue and neither some intended limitation to prevent load issues, somebody pulled it on purpose. ...which I actually included in not being able to handle HN. ;)


The commit details are back now. :)


the idea is to use 'find' if you want something fancy

I don't get it, can someone explain? It also appears in the ML thread linked elsewhere, and I don't understand it there either. (I understand the reasons for the change in behaviour, just not this specific line.)


find is essentially the command line tool for traversing the file system hierarchy recursively. It's got a million options, e.g. whether to follow symlinks, depth, file/directory/device/.., user, group, change time and many more. When you use grep -r, it only does the most common thing which is to just go over all the files. If you want something more "fancy", you can use find, e.g.:

  $ find -name '*.c' -type f -exec grep printf '{}' +



The biggest problem I see is: who the heck would expect a .12 release to break compatibility in an ancient and solid piece of software?


Why is grep being modified at all? It is clearly not broken.


To quote a recent Coding Horror post, its bugs are now Common Law Features, and must be supported.


Who thought it might be a good idea to break -r (most likely the most often used option) and use the old behaviour for -R?

Wouldn't it be better to atleast let -r behave like ever and change -R?

Anyone ever used -R here? :)


-R is the POSIX-standard flag for recursive grep, so that would be worse to change imo. It's also the flag used for recursive grep by the BSDs (some of which do support '-r', but only as a deprecated historical option... OpenBSD calls it "strongly discouraged").


Thanks for these important missing bits of information, in light of these I think the change sounds much more reasonable.

/me updates rgrep alias to use -R


Currently -r and -R are essentially the same flags, correct? And this new change would make -R and -r two different operations?


Yeah, this sounds really misguided. If they change/add functionality, why not move that to a new option? Those who want the new way can use the new feature, and scripts won't act differently depending on which version is installed.

New features should get new options; they should not move old features to new flags and put the new features on the old flags. Nuts.


Increasing the number of options and switches has downsides, it makes the software more complex to use.


Hehe, you know what makes the software more complex to use? Having to grep in the output of "grep --version" before deciding which switch to pass to grep.. :)


You don't need to do that, just always use -R :)


We'll get right on inventing that time machine so everyone can go back in time to the introduction of the -r flag and warn themselves ;).


And note that it will make your code more compatible with other, POSIX-compliant, versions of grep.


When I read the first post in the chain, I thought "well, it has to be a bug". Then I read the second post... wow, it was intentional. I've used -r and -R both in the same script to do the same thing, just depending on how I was feeling that day. Now I'm afraid to update.

I don't think distros should be afraid to break compatibility with main if main is making a change that makes no sense. Breaking essential and classic *nix functions defeats the purpose of CLI utilities.


> I don't think distros should be afraid to break compatibility with main if main is making a change that makes no sense.

Hi, I don't agree. Different versions of a tool showing different behaviour for the same option is enough for me; the same version of the tool showing different behaviour for the same option _depending on the distribution_ ... I feel that's too much!


Well, like the link states this already breaks compatibility with BSD grep. Combine that with breaking compatibility with previous version, breaking previous scripts, and breaking based on distro depending on the speed at which the package maintainers upgrade (if ever)... it's best IMO to just leave it the way it always was.

It's going to be hard enough switching between a machine that only gets critical updates and a machine with the same distro but getting all updates. Grep has been around since 1973, is there any serious Unix scripter who feels it still needs more features?


Yeah, for habitual reasons I usually use -R. No idea why.


In POSIX -R is the recursion flag for ls, cp, rm, etc. So if you want to recurse, -R is probably a safer bet than -r. (ls -r just lists in reverse.)


Yeah, that may be it.. I'm wondering why they didn't just add a new option (aka --no-symlinks) or something


Annoying exception: scp only accepts -r as recursion flag, not -R.


Same here. I've been conditioned to use -R


Good point. Also, since you reminded me, chmod, chown, chgrp.


that's probably it


ever work on a non-gnu box? (old solaris, *bsd, stock os x, etc.) there's a decent chance they don't have -r at all.


*BSDs have (=> OS X probably too).

Solaris, HP-UX and OpenServer have not.

AIX is weird:

> -r Searches directories recursively. By default, links to directories are followed.

> -R Searches directories recursively. By default, links to directories are not followed.


Isn't the AIX behavior the one POSIX specifies?


Looking at MattJ100's link (http://pubs.opengroup.org/onlinepubs/009695399/utilities/gre...), it seems like POSIX grep actually does not have any recursion.

Hence, only lubutu's comment ("-R is the recursion flag for ls, cp, rm, etc") is true, while _delirium ("-R is the POSIX-standard flag for recursive grep") is wrong, which means AIX is free to do whatever it wants with both the -r and the -R flags.


That would be kind of perverse given how cp behaves for -r and -R


I've always used -r before, since it saves a key press. ;)

But I started using -R today, in the middle of a WTF event while inspecting some Python files, digging up a Debian bug report, inspecting the upstream change and posting this to HN.

I still think it's not the best idea for Debian (or any other system) to break compatibility with upstream, since this will lead to different behaviors on different systems, not only depending on version and vendor (GNU or *BSD). But of course, it could also lead to the change being reverted, which would be welcome (guess nobody relies on the new behavior yet).

The POSIX argument is definitely valid, but nevertheless GNU's grep has always supported -r and BSD versions do as well (although OpenBSD's man page reads "This implementation supports those options; however, their use is strongly discouraged."), so it just unnecessarily breaks existing stuff.


I've always use -R, presumably because it's consistent with other commands like ls, rm, and cp (and consequently get annoyed by tools like scp that only support -r, and zip, that use -R for something else entirely); I wasn't even aware that -r was supported until I saw this.


I've only used -R, because of it's posixness. Finger memory from adminning Solaris.

Not to say that this is a good or bad change.

-R matches the -R in chown/chgrp

They should really introduce -r for those two to NOT follow symlinks. Following symlinks for those is bad.


Another thought: Since rgrep is an alias for grep -r and grep -r has changed, the complete rgrep tool has changed and this can _not_ be fixed by using the "right" switch. Hence rgrep is useless for scripting now.


This is a serious question: Why would you want to use plain old "grep" instead of "ack"[1]? Of course, other than the fact that it's on all machines. Why would you use it instrad of "ack" on your own machines? That's an honest question, I'm not starting a flamewar... The highlighting and filename/line# by default is the killer feature for me.

Edit: Come on. Downvotes for this? Honestly... It's HN, not StackOverflow. You don't mark questions as "off-topic" unless they're trolling...

[1]: http://betterthangrep.com


The main reason to use grep is portability/availability. While a modern improvement to grep (yes, I'm an ack user and fan too) you can't simply depend on it in portable scripts and such.

grep is defined for POSIX after all: http://pubs.opengroup.org/onlinepubs/009695399/utilities/gre...

I also suspect (without looking) that there are obscure (but occasionally useful) grep features that ack doesn't support.


That's a fine recommendation for machines that you solely work on, or have complete control over. However, it's good to be able to work with the standard toolset if you frequently work on a variety of remote machines (where it's quite common that you cannot install such things, due to permissions or policies, and want to get started working before attempting to download/install a bunch of custom binaries).

Edit: MattJ100 made a more clear point while I was responding.


Thanks for the response. However, a nice thing about ack is that it's not a binary (necessarily) - it's a perl script and can be used without sysadmin permissions: http://betterthangrep.com/install/

The other points are quite understandable and correct though. It's always best to at least be familiar with standard tools, even if you want to use an slightly different version for yourself.


I think we're probably mostly in agreement, even if I wasn't clear about it. In my line of work, there have been times where I was not permitted to use external scripts or binaries (highly sensitive environments). However, that one small example doesn't mean ack should be disregarded!


ack-grep is a good tool to grep through files. grep is more than that. When I need to grep in a pipeline, I probably don't want ack-grep's bells and whistles.

One example of a useful feature of grep: "grep --line-buffered" to grep with no buffering in a pipeline.


It's called ack. Ubuntu gave its package and executable a different name because it conflicted with something far less notable (poor choice IMO). I suggest adding an a alias or a symbolic link so you can call it by its proper name when typing in commands.


Ubuntu didn't; debian did. Ubuntu just inherits that. Perhaps eventually they will fix it, which they did for git (which was once git-core because of a conflict with a less notable package).


I love ack, I've never looked back.


I've been a systems administrator of one form or another for 19 years, spend 2-3 hours a day on the CLI, and I've never heard of ack. Depending on the time of day (and platform) i might use grep or egrep. I hopped onto one of the random ubuntu boxes I own, did a quick "apt-get install ack; man ack" - here is what it said:

"ACK is a highly versatile Kanji code converter. ACK can do reciprocal conversion among Japanese EUC"

This is probably why I don't use ack - never heard of it, not available on any system I use, and the dpkg repository has something that has nothing to do with grep.

Enough of an answer?


debian installs it as "ack-grep".


ack has ignore built in for certain filetypes. Then combined with my shell expanding * to just the files and directories at the current layer, ack will not find certain things that grep -R string * will.


>Why would you want to use plain old "grep" instead of "ack"[1]? //

First time I've heard of it. Thanks. Only been grepping my way around for the last dozen years or so ...


Because it motivates me to avoid revision control systems that insist on crudding up every directory in my source tree?


> Why would you want to use plain old "grep" instead of "ack"?

1. Because all the cool kids are doing it.

2. Because I hate Perl.


Why is your Perl hate relevant? It can be written in COBOL for all I care; I'm just using a tool to find files. I don't spend my days browsing through the source code of random tools.


It's funny how one of the relatively few design errors in Unix shells now indirectly comes back to haunt us. Recursing is built into pretty much every command that can handle multiple files - which very strongly suggests it should have been made a feature of glob (or the shell).

I think it's a bit sad that those "big-picture" features in unix are treated as if the were written in stone.


This is one of the great reasons to use zshell. :-)

  grep -in **/*txt "sometext"


Remember: GNU is not UNIX.


Backwards compatibility is an evil illness that sometimes must be broken. It's for the good of evolution. I praise engineers that make such decisions, even if they are unpopular.


When there's a clear benefit, that's great. Feel free to scrap backwards compatibility when there's significant progress to be made in doing so.

This feels a lot more like a "color of the bike shed" choice. There are use cases where the new behavior makes sense, sure, but there are also plenty of cases where the old way is better. This isn't an upgrade, it's a lateral move.


So we have to continue with an ugly bikeshed for the rest of eternity? I think it's very important that whenever it is decided that objectively one way of doing it is better than another way that eventually that way finds itself to be the way it is.

You basically imply that this change is not big enough to warrant a break with backwards compatibility, but small discontinuities and hacks add up. If thinks like this aren't fixed every once in a while the system will be ridden with inconsistencies.

Besides, even though it hurts when you've inherited some crazy unreadable code that utilizes some obsoleted functionality I think it is always positive for the code quality when the chaos monkey comes around and breaks something.

edit: the downvote button is to indicate I am detrimental to the discussion, the reply button is for when you disagree with me :)


>> edit: the downvote button is to indicate I am detrimental to the discussion, the reply button is for when you disagree with me :)

seconded

What I do not understand is why (so it feels) recently, a lot of disagreement is done via the downvote, if someone actually just rationally states his/her opinion.

So back to topic...


I disagree. This is not a "fix" it's an incompatible change to a widely used option for no better reason than a programmer thought the different behavior would be a useful addition to grep.

And yes, we have to continue with an ugly bikeshed for the rest of eternity, because a) beauty is highly subjective, and b) the color of the bikeshed is less important than breaking existing software on billions of computers worldwide.


If this actually fixes a legitimate problem (which is what?), I don't think anyone thinks that it shouldn't be fixed, just not at a x.12 release.

This changes breaks rgrep, thus it should be held until a full version (3.0).

edit: my FreeBSD's `grep` manpage:

    -R, -r, --recursive
           Read all files under each directory, recursively; this is equiv-
           alent to the -d recurse option.


I think many people think it should not be fixed, even though it has a use case which is mentioned elsewhere in the comments.

I absolutely agree than any changes to interface like this should be in clearly defined milestone releases, possibly allowing patches to be backported to legacy versions.


FreeBSD grep is just GNU grep, it appears, up to but not including 9.0. Note the long --recursive option in the manual page snippet that you posted; no properly God-fearing BSD program would support long options.


The new non-GNU grep in FreeBSD 9 was designed to be compatible with GNU grep (pedantically, with GNU grep configured with --disable-perl-regexp as it has been in previous FreeBSD releases) because lots of ports depend on GNU grep behavior. The fear of God is not a sufficient reason to break ports.


You'll feel differently when you inherit responsibility for some god-awful mess of shell and perl that nobody properly understands that used to Just Work but doesn't after an innocent upgrade.


Grep is used in hundreds of millions of scripts ranging from mundane rarely used things to install scripts to critical system functionality run every few minutes.

It's not acceptable to break the default way it works under any circumstances.


I agree in principle, though to be fair most UNIX utilities have so many incompatible variants that striving for maximum backwards compatibility often does more harm than good (like producing options that do entirely different things in the presence of other options, in the absence of a leading hyphen, etc.). Practically speaking, sysadmins generally use more consistent tools (e.g., Perl) for nontrivial cross-OS things once this becomes an issue, and, as a developer, it's not clear to me that build-time dependence on, say, Perl or Python is any worse than depending on GNU versions of basic UNIX tools, the existence of which tends to only be a safe assumption on Linux. On non-UNIX platforms, this is an even bigger issue: I'd certainly rather recommend ActivePerl or the latest binary Python 2.7.x release from python.org to the average Windows developer than any of SFU, Cygwin, or MSYS (and I say this as someone with a strong UNIX background who works with both Windows and Windows developers on a daily basis).


grep doesn't need any more evolution. grep is the final result of the evolution.


I also decided that sed no longer supports regexes because I said so. And ls -l will now shows a list of print jobs instead of files because maybe that's what you meant. ... oh and ping no longer supports IPv4 because I want everyone to adopt IPv6 immediately because I said so.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: