Hacker News new | past | comments | ask | show | jobs | submit login
GNU Parallel – The command line power tool (slideshare.net)
106 points by vsbuffalo on Aug 14, 2013 | hide | past | favorite | 28 comments



I use parallel all the time for embarrasingly parallel scientific computations on a cluster. It is very easy to use and elegant, and it's one of the programs I'm most grateful for.

Recently the developers fixed a major bug for me, that child jobs on other nodes would not be killed when parallel was killed. This was the only thing stopping me from recommending it to my labmates, now there's no reason not to use it!


Remember to show them 'parallel --bibtex'


I use GNU Parallel. I like it because its interface is simple - input is piping filenames to it, just like xargs, and output is nicely collated to the screen.

I used to use ppss, which does the core task just as well, but the interface is more complex.

I mostly use these tools to optimize large numbers of PNGs before deployment, using optipng, pngout, and/or my own lossypng. These programs take a while to run so using all my cores gets the job done a lot quicker.


The documentation of GNU parallel (https://www.gnu.org/software/parallel/man.html) also contains a lot of nice examples on how to use parallel.


i use GNU parallel exclusively in place of xargs simply because it has --dry-run.


wouldn't an "echo" just do the same?

    $ mkdir a
    $ cd a
    $ touch b c d e
    $ find -type f | xargs echo rm
    rm ./b ./d ./e ./c
    $ ls
    b  c  d  e
    $ find -type f | xargs rm
    $ ls
    $


One advantage of "parallel --dry-run" over "xargs echo" is that the former quotes its output:

    $ touch 'Ham
    Jam
    Spam'
    $ touch 'J.R. "Bob" Dobbs'
    $ find . -type f -print0 | parallel -n1 -0 --dry-run echo
    echo ./Ham'
    'Jam'
    'Spam
    echo ./J.R.\ \"Bob\"\ Dobbs
    $ find . -type f -print0 | xargs -n1 -0 echo echo
    echo ./Ham
    Jam
    Spam
    echo ./J.R. "Bob" Dobbs
For my own "dry runs", though, I've always preferred passing the command line to

    #include <stdio.h>
    
    int main(int argc, char* argv[]) {
        char** argp;
        int i;
        printf("argc = %d\n", argc);
        for (i = 0, argp = argv; *argp != 0; ++argp, ++i) {
            printf("argv[%d] = %s\n", i, *argp);
        }
        return 0;
    }
to remove all reasonable doubt.


Why does the parallel example look weird? It put the single quotes in all the wrong places, and completely around Jam. It should have printed

  echo './Ham
  Jam
  Spam'
but your output looks different. As an example of how it should look, try this:

  $ find . -type f -print0 | xargs -n1 -0 perl -e'print "\"$_\" " for (@ARGV)'
  "./Ham
  Jam
  Spam"


Having never used parallel, I still believe parallel was correct.

./Ham' 'Jam' 'Spam

would be identical to ./Ham\nJam\nSpam (if \n were the correct translation to the newline in this case) or './Ham Jam Spam'

This would be identical to what you wrote, but only punts to quotes when it doesn't have a canonical method of representing the character otherwise. The fact that you don't need to explicitly concatenate two strings in the shell may be what's throwing you off?

Interestingly enough, 'Ham\n\nJam\nSpam' becomes

./Ham' '' 'Jam' 'Spam

So parallel is just literally outputting all newlines using quotes. I believe this would be identical, if you analyzed it and saw that two newlines are next to each other:

./Ham'

'Spam' 'Jam


It looks like it's only single-quoting the characters that need it, in his example the newlines.


Quoting is nice. That's better than bash -x debug mode too.


What if your command has a pipe in it? Then putting echo in front won't work, because the command after the pipe is still executed. The dry run option always works, and doesn't require editing the command itself.


Nice. I just had to check to see if xargs has a similar feature. It doesn't, though the --no-run-if-empty and --verbose options are both handy. I believe I've used xargs with "echo <commandlist>" to proof output before committing it.

You can simply re-run that piped to bash (or your shell of choice) to execute commands if you wish (say, if parallel isn't available).

E.g.,

     echo foo bar baz | xargs -n1 -t echo ls | bash
... will execute 'ls foo; ls bar; ls baz', while showing the expanded command.


Yes, you can use echo, but that won't work if the command is something like "ls DIR | wc -l".


Depending on how you want to do your counts or piping, you could run that after the xargs / parallel execution, which would be much more efficient (fewer processes and execs) anyhow.


Anybody have a link to a version viewable without proprietary plugins? "Flash Player 9 (or above) is needed to view presentations"


This looks very similar but is actually a year newer than the slides on slideshare:

http://www.luga.de/Angebote/Vortraege/GNU_Parallel_LIT_2011/...


I use GNU Parallel with s3cmd to move big data sets in and out of S3. I can easily saturate any network connection. I was able to GET ~2TB from S3 onto a Gluster cluster in a little more than an hour by using GNU Parallel to spread the GETs across 8 instances. Incredibly powerful, easy to use tool.


Wow, I must have reinvented this particular wheel at least five times.


This is exactly what I feel like every time I see an article/presentation about (maybe even really small) tools that just get the job done but I didn't know about and didn't even think about looking for although I can think of so many cases where they would've been incredibly useful.


A lot of the users of GNU Parallel have felt exactly the same way.


Is parallel buggy or is it just me? For example if i have a list of ip addresses:

  $ cat ips.txt | sort | uniq -c | sort -rn

   3 127.0.0.1
   2 192.168.1.1
   1 192.168.1.2 
Now i want to reformat the output of uniq -c, i want the count to the last column:

  $ cat ips.txt | sort | uniq -c | sort -rn | parallel --colsep ' ' echo {2} {1}
But gives empty output.. what gives? It only works if I double pipe it thru parallel like this:

  $ cat ips.txt | sort | uniq -c | sort -rn | \
      parallel --trim lr echo | parallel --colsep ' ' echo {2} {1}

  127.0.0.1 3
  192.168.1.1 2
  192.168.1.2 1


You have more than 1 space from uniq. 2 options:

  parallel --colsep ' +' echo {2} {3}
or:

  parallel --colsep ' ' echo {7} {8}


Thanks!, but how come the whitespace is not trimmed by --trim lr? The manpage says it trims whitespace left and right if --colsep is used.


I wrote a similar program for windows.

https://github.com/jftuga/Windows/tree/master/mp

The only file you need to download is mp.exe. Source code is mp.au3.


nifty tool!


Load test with parallel:

  cat urls | parallel --jobs 4 --load 6 'curl -s -w "%{time_total}\n" -o /dev/null {}'


I love pssh for simplicity but I guess I better look at fancier stuff too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: