GNU Parallel - build and execute command lines from standard input in parallel

d0m · on Oct 18, 2010

I'll express here what I feel while reading any man pages..

Examples should be at the top!

(10 years of frustration in that one line message :p)

jcsalterego · on Oct 18, 2010

I've just resigned to do a search for /EXAMPLES as soon as I need any.

oletange · on Oct 18, 2010

LESS='+/EXAMPLE' man parallel

paulitex · on Oct 17, 2010

This is very cool, but a little opaque at first read... To get a quicker and digestible intro, watch the video introduction (linked on the page): http://www.youtube.com/watch?v=OpaiGYxkSuQ

surki · on Oct 18, 2010

Well, the example(at least the first one) in that video is bit skewed. First, he runs gzip and then immediately runs 'parallel gzip' without dropping disk caches. So in the later case the bottleneck would be CPU rather than disk IO(everything read from disk cache in the RAM). IMO I expect for the work that is IO bound we won't see any significant improvement using parallel or anything similar.

oletange · on Oct 18, 2010

Ideas for next video are most welcome. The ideal task:

1. Is single threaded 2. Takes a lot of CPU 3. Is a task that everyone can understand and relate to and which is close to a real world scenario

I have loads of examples meeting requirement 1+2. It is 3 that is the hard part.

Post them to parallel@gnu.org

surki · on Oct 18, 2010

How about doing something with imagemagick or mencoder? I think video encoding/decoding gives a nice balance between disk and cpu usage.

eschulte · on Oct 24, 2010

Here's an imagemagick example; over six minutes with xargs, under 20 seconds with parallel

  $ ls *.png |wc -l
  3580

  $ time ls|sed 's/\(.*\)\..*/\1/'|parallel convert {}.png {}.ppm
  ls --color  0.00s user 0.01s system 63% cpu 0.016 total
  sed 's/\(.*\)\..*/\1/'  0.01s user 0.00s system 39% cpu 0.025 total
  parallel convert {}.png {}.ppm  97.39s user 61.87s system 890% cpu 17.883 total

  $ time ls|sed 's/\(.*\)\..*/\1/'|xargs -I {} convert {}.png {}.ppm
  ls --color  0.01s user 0.00s system 63% cpu 0.016 total
  sed 's/\(.*\)\..*/\1/'  0.01s user 0.00s system 39% cpu 0.025 total
  xargs -I {} convert {}.png {}.ppm  93.08s user 47.88s system 38% cpu 6:10.88 total

yason · on Oct 18, 2010

Ah, one of these again.

I wrote a simpler one a couple of years ago (http://code.google.com/p/spawntool/) myself. All it does is read commands from stdin, one per line, and keep a desired number of processes running until all command lines are exhausted. Simple.

I wrote my own because I got tired of all kinds of substitution and quoting issues with xargs. With spawn I only need to generate the shell commands and instead of piping them to bash I pipe them to spawn. Also, this means I can easily review your command line generation with less (so that quotes etc. are good) until I eventually switch to sh or spawn.

oletange · on Oct 18, 2010

If it is simple I would love to see the examples from http://www.gnu.org/software/parallel/man.html#example__worki... converted to spawn.

ori_b · on Oct 17, 2010

See also the 'push' shell: http://code.google.com/p/push/

Dobbs · on Oct 17, 2010

It seems like 90% of the uses for this can be taken care of with xargs:

    echo "file1 file2" | xargs -P 2 gzip

chrisaycock · on Oct 17, 2010

As I understand it, xargs only runs on the local machine; GNU parallel can run on remote machines as well. So parallel is the cluster-friendly version of xargs's -P.

pixelbeat · on Oct 18, 2010

Yep. There also is dxargs which looks very useful http://www.semicomplete.com/blog/geekery/distributed-xargs.h...

tange · on Oct 18, 2010

Please read http://www.gnu.org/software/parallel/man.html#differences_be... before selecting dxargs.

oletange · on Oct 18, 2010

Please read http://www.gnu.org/software/parallel/man.html#differences_be... before selecting xargs.

CJefferson · on Oct 17, 2010

Don't you need

    echo "file1 file2" | xargs -P2 -n1 gzip

?

Goladus · on Oct 18, 2010

Probably, although it doesn't seem to be the focus of xargs.

And the version of xargs that is included with Solaris 10 doesn't have the -P option. In which case, installing gnu parallel is a slightly easier option than installing a different version of xargs.

samstokes · on Oct 18, 2010

Nice example from the docs:

Convert .mp3 to .ogg running one process per CPU core on local computer and server2:

    parallel --trc {.}.ogg -j+0 -S server2,: \ 'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3

daveelkan · on Oct 18, 2010

On first impressions I like this much more than ppss. The distributed setup is much easier and the documentation is more thorough.

thomasfl · on Oct 18, 2010

Someone managed to get this working on os x?

Standard "./configure && make && make install" outputs errors.

jcsalterego · on Oct 18, 2010

The MacPorts version worked for me.

Perhaps the Portfile will have the patches you need to manually compile on OS X:

http://trac.macports.org/browser/trunk/dports/sysutils/paral...

Mithrandir · on Oct 17, 2010

Just tried it and got errors galore. Oh well, I'll keep trying.

oletange · on Oct 18, 2010

Make sure you are using GNU Parallel and not another version of parallel. Try:

parallel --version

tptacek · on Oct 17, 2010

In what ways is this better than make -j?

chrisaycock · on Oct 17, 2010

Because it can be run on the command line, ad hoc. Make -j is great for pre-existing command lists and dependencies. But as the man page describes, parallel is like xargs, which I use all the time on the command line for ad hoc actions (frees me from having to write a bash loop).

jcsalterego · on Oct 18, 2010

make requires a Makefile, whereas one can pass parameters directly to parallel.

There also seems to be a few more options revolving around job success/failure and how to react -- a) ignore failed jobs and report how many at the end, b) cleanly exit as soon as a job fails and c) stop all jobs as soon as one fails.

tptacek · on Oct 18, 2010

Those (a) (b) and (c) points sound like strengths of make, to me.

jcsalterego · on Oct 18, 2010

Sorry, those were features of parallel, not make (unless I'm mistaken).

oletange · on Oct 18, 2010

See: http://www.gnu.org/software/parallel/man.html#differences_be...

scott_s · on Oct 17, 2010

parallel is intended to work on arbitrary commands.

ori_b · on Oct 17, 2010

So is make.

scott_s · on Oct 18, 2010

It can, but that was not its intended purpose. That is, you can figure out a way to map your task to a dependency hierarchy and save it to a Makefile, but why do that when you could use something designed for that?

gilaniali · on Oct 17, 2010

Would it work on a PS3 running ubuntu?

singlow · on Oct 18, 2010

should work if you install moreutils package in ubuntu (lynx)

jedbrown · on Oct 18, 2010

The "parallel" in moreutils is an unfortunate naming collision, it is a trivial (< 200 LOC) program that is in no way comparable to GNU parallel.