Hacker News new | past | comments | ask | show | jobs | submit login
Resolving a mysterious problem with find (johndcook.com)
33 points by chmaynard 56 days ago | hide | past | favorite | 45 comments



I'm... not sure that's the kind of Friday-at-2AM embarrassing mistakes you want to put on your blog if you sell yourself as an elite consultant?

Not using `-iname`, using `-print0` and being surprised to see NULs appearing, the weird pipe + xargs instead of just `-exec`, using some hyper-convoluted way of replacing the NULs instead of just man find... that's probably not the best advertisement for “decades of consulting experience ”.


Being an experienced consultant doesn't mean you know everything about everything.

Hell, I put myself in that camp and I am perfectly capable of hacks, cludge, and mistakes.

Be compassionate.


Using print0 as an argument and then being surprised at what it does is not great. It speaks to the author never opening the man page for find and just having copied this command a long time ago.


But you should know what you don't know. You don't blog about a mystery in behavior of some code when you have fundamental gaps how that code works. I'd say asking on SO might be better in this case.


The problem is not that he does not know everything, it is very normal to learn all the time (xkcd 1053, etc.). The issue is to be so deep into your own sentiment of self-importance to imagine that you discovering the basic functions of a decades-old program is worth sharing with the whole world on your professional website.

I'm pretty sure a lot of people woud laugh at a JS “thought leader” who would write a whole post on how he just discovered this weird number i whose square is negative and how smart he was to have found the wikipedia page about it.


To be fair, he is a mathematical consultant who uses computer tools, not a specialist in computer tools.


> the weird pipe + xargs instead of just `-exec

I think that part makes sense though as it's simply the older idiom before -exec existed. (That's the reason why both find and xargs have specific flags related to /0-delimited filenames that are basically counterparts to each other)

Also, shouldn't the two be roughly equal in efficiency? To my knowledge, xargs (without -i) does the same command aggregation that -exec ... + does.

So the number of "grep" processes spawned by the two commands should be roughly the same, I think.


I thought exactly the same. Those who can do, do. Those who cannot do, teach. Those who cannot teach, consult.

But instead of dwelling on prejudices I decided to try my own solution. See https://news.ycombinator.com/item?id=42163286


Why insult teachers?


Well,

This quote is from George Bernard Shaw, but it's from a character from a play he wrote in 1903, "Man and Superman". The character is a descendent of Don Juan, a firebrand, but instead of being crazy about seducing women, he's crazy about revolution and anarchy. GBS explains exactly what he was thinking when he wrote this, in the preface, at extreme length:

https://www.gutenberg.org/cache/epub/3328/pg3328-images.html

The preface is actually a letter to the friend who inspired the play. Some quotes:

> There is a political aspect of this sex question which is too big for my comedy ...

> When we two were born, this country was still dominated by a selected class bred by political marriages.

> I do not know whether you have any illusions left on the subject of education, progress, and so forth. I have none. Any pamphleteer can show the way to better things; but when there is no will there is no way.

> I have only made my Don Juan a political pamphleteer, and given you his pamphlet in full by way of appendix.

So the quote is from that appendix, Maxims for Revolutionists.

https://www.gutenberg.org/cache/epub/26107/pg26107-images.ht...

> He who can, does. He who cannot, teaches.

There's at least one other well known quote there:

> The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.

Did GBS really mean this stuff? Kinda, but he's obviously being playful, because he got an overblown and ridiculous comedy revolutionary character to say it for him, in a work of fiction.


I had no idea that it's that old. I had only heard it in the context of consultant bashing.


Teachers at a minimum only need n+1 more knowledge than their students.


I'm very glad you're not reviewing my PRs.


Why?


You're using -print0 and surprised that it's output has NUL characters between them?


Was puzzled about that too, especially since his solution "find ... -print0 | strings" undoes the advantage that -print0 gives you, i.e. safe handling of filenames with newlines in them (and his "sed" solution straight-up undoes the -print0 completely).

So with all due respect to the author, I wonder if he was just using -print0 after rote-learning it as part of the find command (or having had some tutor implore "ALWAYS use -print0"), without knowing what it does.


> There may be better solutions [1], but my solution was to insert a call to strings in the pipeline

The "right" answer is to switch to using -print rather than -print0

-print delimits the values with a newline character (\n) -print0 delimits the values with a null character (\0)


Not always perfectly right because an argument containing a filename containing a space character will be interpreted as 2 arguments.


No it won’t, because none of the output is interpreted as an argument. It’s passed as lines to grep. The second invocation correctly uses print0 and pairs with xargs to understand this.

Now, it does fail with filenames that have newlines in them, but who would do such a thing!


I wrote "Not always perfectly right" thinking about all cases, not this particular one: in nearly all cases (bar being absolutely sure there is no blank character anywhere) -print0 (and therefore xargs -0) seems better to me, and it sure saved me on many occasions. Better let "find" do all the work it can, including filtering filenames.


If you run this interactively on your own files, saying who would do such thing is fine.

If your server code runs this on untrusted input (files uploaded by users or whatever), the answer will be: Someone trying to crack your system.


Certainly, which is why I put quotes around right, but for this usage, it's not an issue. Find prints the whole path on a single line (including the spaces) and grep (by default) puts the full matched line, so you'll still get the full file path regardless of how many spaces are in it.


Pretty convoluted, no?

I would likely use -exec:

   $ mkdir dir.py
   $ echo blah >> blah.py
   $ find . -type f -name "*.py" -exec grep -i BLAH {} \;
   blah
   $
Edit: Ah, right, he's filtering on filenames. That's what -iname is for. The man file is quite good.


Am I the only one who has gone all in on using "-exec +"?

    find . -name '*.py' -type f -exec grep -il {} +


I've switched away from find entirely, and now use "fd" whose exec functionality is quite straightforward to use.


fd looks like a great tool, if you don't have `find` locked in already you probably want to start with `fd` instead. I've already got the basics of `find` in my head so that's the tool I usually reach for rather than switching to `fd`.


That solves only the second part of their task. The part which they actually had no problem with. But I agree the exec + solution feels better then the xargs -0 solution.


Agreed. The first part of that task just seemed to be a misunderstanding of what -print0 is, and using `strings` as the fix is weird. I'm surprised they didn't suggest `tr '\0' '\n'`... :-)


Find is one of those tools I use seldom enough that I completely forget how to use it, but is also complex enough that when I do need it I have to spend way too long studying the man page to figure out the right incantations.

In almost all cases I just want something simple, like finding a file somewhere on disk with based on a partial filename.

Now there are probably some nice, more modern tools made for this, but usually when I need it it's on some system where they're not present and I can't just install random stuff from the interwebs. So find it is...


I feel like locate is commonly installed and is designed for what you describe.


Often yea, but updatedb requires root IIRC and I don't always have that.

I just feel it would be nice if finding a file didn't feel like playing Tomb Raider.


I recommend giving ripgrep a try. (it's been around awhile now) https://github.com/BurntSushi/ripgrep


It's not compatible with grep though. How do you search for a square bracket?

    $ grep '[][]' </dev/null
    $ rg '[][]' </dev/null
    rg: regex parse error:
        (?:[][])
             ^^
    error: unclosed character class
    $
And why does it search the current directory when its input is redirected from /dev/null? What other surprises are there?



To me, ripgrep is an improvement and the differences are a good thing.


It's compatible or close enough with more modern regex syntaxes. Which are probably familiar to a lot more people than grep. Want to search for square brackets, then escape them (or do a a string literal search with -F)


So much faster than grep for these things! Love ripgrep! I also use it to rip apart directories of log files. Super convenient


Instead of `find -name '*.py' | grep -i "$PATTERN"` you can use `find -iname "*${PATTERN}*.py"` for case-insensitive glob-matched filenames, or mess around with regexes on the whole path with `find -iregex "$REGEX"`.

And yeah, why would you ASCII NUL terminate each filename output by `find` by using `-print0`? I mean, who adds quotes, backslashes or whitespace to their Python source file names?


Why not just globstar in the first place? grep foo **/*ham*py


if you want file names matching a pattern, no need for grep:

find . -name '*.py' -iname '*pattern*'

for filenames that have content matching pattern, no need for find:

grep -r --include '*.py' -l -i pattern .

no need for pipes, xargs, etc.


I was going to say, do a file -i to find the encoding of frodo.py. I note the file man page has a -0 or --print0 command that adds a ‘\0’ that can be 'cut', but strings works too.


The first line they start with is utter nonsense. find -print0 will not produce lines, but records (or strings) separated by NUL. But grep is a tool working with lines (separated by LF). No mystery that it cannot work.

Using -print0 is necessary if you have filenames containing LF chars. Otherwise just use -print and grep and everything should be fine.

Now how do we handle NUL separated records? That required a bit of thinking, the Unix world is based so much on lines. Without extensive testing the following awk program seems to work:

    BEGIN       { RS = "\0" }
    $0 ~ regexp
Call with

    awk -v 'regexp=what I search for'
In their script that would be

    awk -v "regexp=$1"

Edit: Credits for s/whitespace/LF chars/ go to user hnfong


When grepping for filenames print0 is needed only when the files have new lines in them. (Which is quite degenerate.) grep works fine with spaces and tabs in the stdin


Thanks! Updated.


You keep using that -print0, I do not think it means what you think it means




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: