As the URL (but not, sadly, the website itself) suggests, you're exhausting yourself with all that negativity; you can remove two '-'s (as well as two ' 's), and get a much tastier command line:
The 'e' character is seen as an argument to the '-i' switch, meaning this command line will yield `Can't open perl script "s/foo/bar/gi": No such file or directory`
They're functionally interchangeable for simple usage (minus some perl-specific stuff like PCRE).
As I note below, escape behavior is a little different because sed wants you to escape +'s to have the normal regex semantics ("one or more matches"). And I actually think Perl is correct here and you should only need to escape those characters if you want literal matches, but I have a weird environment (Cygwin) and it's possible the sed build there is a little messed up.
The major difference for me is that Perl can match across multiple lines using the -0777 flag. I've been doing a lot of regex-based mass manipulation of source code lately and most people write functions across multiple lines. You can't do that with sed without multi-line appending and it gets really ugly really fast. Sed is pretty much just single line matches only.
For example, I had 100-odd classes with getters for certain values but not setters. So I did:
grep -rle "getAddTime" | while read line; do if ! grep -q "setAddTime" $line ; then echo $line; perl -i -0777 -pe 's/public\s+Date\s+getAddTime\s*\(\s*\)\s*\{[\s\w=;]+\}/$&\n\n public void setAddTime(Date addTime) { this.addTime = addTime; }/' $line; fi done
translation: look for files that contain getAddTime, if they do not contain setAddTime then find the string "public Date getAddTime() {...} and append the setter after that". There are a few edge cases you could hit there but it was close enough to work on my codebase.
I wish Perl would do an inplace edit of a file without creating a backup, though. I am under source control so there's no harm in just operating right on the files. It's not the end of the world to follow up with a rm -r *.bak I guess, but it's annoying. At least they're in my git-ignore which helps a little.
Fun factoid most have forgotten: regex is perl. The beginnings are elsewhere but regex as we know it was designed as part of the language and the engine was pulled out and reused when people found how useful it was.
Perls regex parser is still far above the features in more modern languages, supporting, among other things, code execution within capture groups. If I remember right the perl regex parser is actually Turing complete
I know you're not trying to say that regular expressions were created as part of Perl, but I think you're giving a bit too much credit to it[1] regarding regexes.
The PCRE library is indeed used all over. And Perl was, I think, the first first-class scripting language that integrated regexes so closely to control structures and other language features in a way that feels truly natural.
There are still a lot of tools out there that use other regex libraries. Don't have it in front of me, but there's a lovely chart in the book _Mastering Regular Expressions_[2] that breaks out regular expression library use by tool. But, generally, I think the diversity of regex libraries actually causes problems for adoption these days, because people who are tempted to use them (thus learn more) tend to run in to other tools where the things they've learns mysteriously don't work anymore, and scares them off.
Anyway, regular expressions in the wild go back to Unix v.4, which included Ken Thompson's grep.
[1] Perl deserves a ton of credit it doesn't get in general, including credit for giving the world PCREs.
[2] In general, if you work with regexes a lot and don't own this book, you're doing yourself a disservice. It is one of my top-10 technical books, not just for density of actionable information, but also for the pure general excellence.
Have a look at grammars in Perl6 and the new regexen. Light years ahead of anything else. Perl6 also does numeric division properly and, if I'm not mistaken, eliminates NPEs so what's not to like?
What you describe happened far earlier. As far as I understand it, regexps were originally a part of ed (having been derived from QED), the original Unix text editor. Its “g” command with a “p” flag, or “g/re/p”, for globally searching for a regexp and printing the matching lines, was later found so useful that it was implemented into a separate utility, “grep”. Many Unix utilities started using regular expressions from then on, including Perl.
Maybe I was a hit too excited about the perl part. Perl perfected regex and the perl regex engine was integrated into other languages until it became a normal language feature.
Regex as we know it was largely a result of the adoption of perl and the flexibility of its regex engine
PCRE is a nice library. I read (on its site, IIRC) that it is used for the regex support in Python and some other languages.
I once worked - as part of new product work in an enterprise company - on building the PCRE library as an object file on multiple Unixes from different vendors (like IBM AIX, HP-UX, Solaris, DEC Ultrix, etc.) and also on Windows (including on both 32-bit and 64-bit variants of some of those OSes), using the C compiler toolchain on each of those platforms. I was a bit surprised to see the amount of variation in the toolchain commands and flags (command-line options) across the tools on all those Unixes. But on further thought, knowing about the Unix wars [1] and configure [2], maybe I should not have been surprised.
> If no extension is supplied, and your system supports it, the original file is kept open without a name while the output is redirected to a new file with the original filename. When perl exits, cleanly or not, the original file is unlinked.
Which system are you using? With macOS and Linux, I get no automatic .bak extension when not providing a backup suffix, i.e. it behaves like you want under these systems.
Update: Apparently, anonymous files are not supported by Windows: http://stackoverflow.com/a/30539016 which would explain the behaviour you describe.
The same as for anything that could be done in the shell, but is done with a full-fledged programming language: when you discover you need to tweak it to accommodate an additional requirement, it's easy in Perl but hard in sed.
If you feel unproductive or uncomfortable with Perl, you should avoid it and use whatever language strikes your fancy.
Many people however are extremely comfortable and productive with Perl, for a wide variety of reasons. It's really an awesome language that has proven its vast usefulness a long time ago.
perl has the best regex, or one of the best, also it is the only other language other than python which is installed by default in common distros afaik
None really. I would think if you use Perl, you would probably use the one line from Perl is all. Also sed doesn't have the in place function on Solaris (and AIX) by default I believe. So with Perl, you can do it everywhere.
A lot of these perl one liners are from the old days, from unix admins that had versions of sed (or find, etc) that didn't have fancy features like -i (inplace edit).
Also in the old days tools like sed and awk had more arbitrary limitations. E.g., line length was limited to something (reasonable) like 8192 bytes but input data I was processing was not reasonable.
For me, one part laziness and one part flexibility, and the flexibility part I might be rationalizing. :P. Perl's a superset of what can be done with sed, so I tend to use perl for this case even when sed would do, just because I can remember the perl command line args and regex syntax off the top of my head. I often have to take a trip to the sed man page if I use sed. It's easier for me to add special corner cases to a perl one-liner, etc.
That said, sed is simpler, sometimes less to type, sometimes there on systems where perl isn't (though that seems uncommon these days). But I'm using sed more often, and the more I use it the more familiar it is. Nicer to have two good tools around than just one, right?
One big advantage is portability / compatibility. For example, Linux uses GNU sed but Mac OS X uses BSD sed. I've run into issues where a sed script works on one but not the other. Using perl in sed mode avoids this.
Perl's substitution and translation facilities work pretty much the same way as sed, if you understand sed you can just drop those operations into Perl. For example, sed -i 's/old/new/' will do an in-place replacement of all strings 'old' with the string 'new'.
I find that you do escaping a little differently with sed though. For example you need to escape +'s for them to have their normal "one or more matches" semantics. Might just be an abnormality of my environment (Cygwin).
The -i switch takes an optional string that creates a backup of the file with that extension. To use your example (with zsh blobbing):
perl -p -i.bak -e 's/foo/bar/gi' **/*.txt
Recursively replaces foo with bar on every .txt file in a hierarchy and saves the original file with a .bak extension, along with the in-place edit in the original.
Awk is a easier for me to remember than sed or perl, just because "sub" (for substitution) is in there:
awk '{gsub(/this/,"that")}; 1'
Edit: I just learned that "s" in the sed command is for substitute... (http://www.grymoire.com/Unix/Sed.html)
Well, maybe I'll be able to remember that.
Essentially, although it looks complicated, it's the simplest possible idea: it's just trying factorisations, with the `(11+?)` part being the factor, and the `\1+` part insisting that we repeat that factor. It is of course wildly inefficient, and limitations in the backtracking engine mean that it can give false positives for large numbers (http://zmievski.org/2010/08/the-prime-that-wasnt, also linked from that post).
The Perl One-Liners book is great. I learned so much from it. You don't even need to have done any real Perl programming to get a lot out of that book, but if you want to learn Perl it will help you as well.
While it's not in Perl (it uses grep, sed and awk), I like this Unix one-liner (mine) to kill a hanging Firefox process, not so much for the code but for the interesting comment thread that it resulted in on my blog - about Unix processes, zombies, etc.
You can get a lot of useful information out of PS, like blocked, caught, ignored & pending signals, control group, scheduling class & policy, CPU in tenths of a percent, instruction & stack pointers, elapsed time since it started, security label, thread id, number of threads, kernel function that the process is currently sleeping in, process state, current CPU, number of kernel threads, controlling tty, etc.
Also, never use killall. On Solaris it reboots the machine. Fun to find out when logged in as root.
Thanks for the tip. Didn't know you could get so much info from ps. Will re-check its man page.
Do you have any recommendation on books or online sites for learning about the more advanced topics you mention above (the many Unix topics that you say ps gives info on, not just ps itself)? I know about the Stevens [1] series of Unix and networking books, and have read parts of some, but that was a while ago. Things must have changed a good amount since then. Should still read Stevens again, I guess.
Some of those topics are basic UNIX programming things (signals, threads, process state, controlling tty). Others are features of the operating system, so you would want something Linux-specific that tells you about kernel programming or kernel features you can use in your applications (security labels, scheduling classes, control groups, kernel threads, etc). Still others are features of the OS & cpu's execution of programs (instruction/stack pointers), you'll want to learn Assembly for that.
Sorry I don't have reading recommendations. I don't really read as much as just diving into man pages and looking up guides online. Hacker habits...
No problem. I do know what some of the points mean (signals, threads, control groups, instruction/stack pointers, etc.), just did not know some of the others (like security labels, threads vs. kernel threads, etc.). I guess some of those topics might not be found in one or a few books, more like scattered across books, articles, man pages etc. ...
I see you got this comment already, but still: Make a habit of using pkill instead of stuffing ps output to kill when you can. It's not everywhere, but it's on Linux/BSD/Solaris. (Avoid killall; that's unportable is a special way.) That's readable and avoids things like filtering out your greps.
Thanks for the introduction to 'sed 1d', which is something I've somehow never come across and resorted to horrible head|tail hacks to strip fixed header lines off things.
IIRC, I was using Linux on my PC and from a text console, not a console in the GUI env., when I wrote that one-liner, and once written, it can be put in a script with a short name. But didn't know of xkill, thanks.
You can use find -print0 | perl -0ne 'print($_);' ... using a null instead of newlines.
You're correct that -delete is easier in this case, but the perl one liner is handy (as a starting point) if you're doing something conditional, or changing file extensions in bulk, etc.
> > find . -name whatever\* -type f -print -delete
Or you can also do it all in Perl:
perl -E 'while (<whatever*>) { say; unlink }'
EDIT: As tyingq points out (https://news.ycombinator.com/item?id=13065132), I forgot that `find` recurses into subdirectories. I don't know a way to make Perl do the same that doesn't start getting more verbose (but there must be a Perl golfer somewhere around here who does …).
Isn't that just "rm -rv *"? The original spec was to only delete files, not directories - and only those matching a given pattern, at that.
Also, I think yours would go into an infinite spin if it met a symlink loop (say, "ln -s . foo") - File::Find is hardened against that kinda shenanigans.
`-exec` doesn't necessarily spawn a bunch of processes. You can make it supply a list of files up to ARG_MAX with the `-exec echo {} +` syntax.
This is also a good use case for xargs, which makes the process even more flexible. You'd be able to do things like run programs concurrently with 256 files each, for example.
I liked this part of perl so much that I recently rewrote perl style substitution in Node.js. Used as an executable, it functions largely like perl -p -e: https://github.com/rektide/perls
Here are a few quick and dirty I use for handling output of MySQL. (For serious stuff I include real CSV libraries in the one liner to quote correctly etc.)
alias sqlcmdtocsv="perl -nE 'chomp; s/\\t/;/g; say \$_;'"
alias sqlcmdtoperl="perl -MData::Dump=dump -nE 'chomp; @r=split(/\\t/); if (@titles == 0) { @titles=@r; next; } \$row={}; for(\$i=0; \$i < @r; \$i++) { \$row->{\$titles[\$i]} = \$r[\$i]; } push @rows, \$row; END { say dump \\@rows };'"
# Quick and dirty for moving from the real DB to the test DB.
# Use: mysql_generates_a_row | sqlcmdtoperl | dumptosqlinsert
alias dumptosqlinsert="perl -E 'my \$txt; { local \$/; \$txt=<>; } my \$rows= eval \$txt; \$q=chr(39); for my \$r (@\$rows) { @cols=map { \$v=\$r->{\$_}; if (\$v eq \"NULL\") {\"\$_=NULL\"} else {\"\$_=\${q}\$v\$q\"} } keys %\$r; say join(\", \", @cols);}'"
I have a bunch of aliases for conversions, to generate common SQL that dependend on parameters and so on.
Where <code> will be wrapped in a while(<>){ ... }. You can access the current line normally with $_ and either print out something everytime, keep state and then have an END block which can spit the final output. e.g., I was using something like the following today for getting some info on row sizes on a mysql table (untested):
$ echo 'DESCRIBE table'|mysql |cut -f1 |tail -n +2 |perl -nafe 'chomp; push @fields, $_; END { print qq/SELECT MAX(row_size) from (SELECT/ . join(" + ", map { qq/CHAR_LENGTH($_)/ } @fields) . qq/ AS row_size FROM table)/; }'
Incidentally, you can pass a query as a parameter to `mysql` with -e, you don't need to pipe it in; and -N suppresses column headings, so you could use that instead of`tail`.
... also, you can pass -l (lowercase L) to `perl` to enable automatic line-end processing, which autochomps each line in the input (and sets the output record separator so echo() includes a trailing newline, not that it matters here). And FWIW you're not actually using -a (autosplit each line into @F) here - but you could use it instead of `cut`, and it does imply -n; you might not need -f either, unless you actually have a sitecustomize.pl for it to disable.
So:
$ mysql -Ne 'DESCRIBE table' | perl -ale 'push @fields, $F[0]; END { print qq/SELECT MAX(/ . join(" + ", map { qq/CHAR_LENGTH($_)/ } @fields) . qq/) FROM table/ }'
(I could golf it further, but I think that covers the stuff you might actually find useful. I blame my perlmonks days...)
Ha, nice. I must admit I just do `perl -nafe` almost out of muscle memory, without paying too much attention to what the options mean individually. Thanks for this :)
It relies on some fairly subtle interactions in perl's command-line processing, and I do enjoy how cunning that makes it feel. And it's actually useful, by way of being a lot easier to type (less moving your hands around the keyboard for odd bits of punctuation) than the "sensible" way to do it:
-l [octnum] is "automatic line-ending processing": it means the "input record separator" ($/, which defaults to "\n") is automatically stripped from the end of lines as they're read in; and the "output record separator" ($\, defaults to nothing), which is automatically appended to anything you print(), is set - to the character with the given octal code if specified, otherwise to the current value of $/.
-p makes perl act like sed: the input (which can either be stdin or the files listed after the flags on the command-line) is read one-line-at-a-time into $_, and for each line, your program is run, then $_ is printed.
-0 [octnum] sets the input record separator - and with no following digits, it gets set to the null character, "\0".
-e [script] specifies the content of the script to run - in this case, "1", a very terse way of doing nothing.
So: read each null-terminated "line", strip off the trailing null character, do nothing, then print it with a trailing newline.
The sneaky part is the ordering of the flags:
-l does { $\ = $/ }, -0 does { $/ = "\0" }, and they're processed in order - so if -0 appeared before -l, it wouldn't work, you'd wind up with both record separators set to "\0";
"-l0" would mean "-l with 0 as its parameter", ie { $\ = "\0" }, so I need to stick something else (-p) in between them if I want to squash everything into one argument without changing the meaning;
and -e1 has to appear at the end, because -e eats the rest of the argument.
Not really or at least I haven't had that problem. It would be a problem if they used both \r and \n at the same time. in that case you can always do \r\n to \n or something like that.
Most of the basic text processing ones are shorter in and Awk and when they're not shorter, they are still clearer: no cryptic command line options, no gratuitous line noise:
# Double space a file
perl -pe '$\="\n"'
perl -pe 'BEGIN { $\="\n" }'
perl -pe '$_ .= "\n"'
perl -pe 's/$/\n/'
perl -nE 'say'
awk 'BEGIN { ORS="\n\n" }' # output record separator
awk 'print; print "\n"'
awk '$0=$0"\n"'
txr -e '(awk ((set rec `@rec\n`)))' # Awk macro in Lisp!
# Double space a file, except the blank lines
perl -pe '$_ .= "\n" unless /^$/'
perl -pe '$_ .= "\n" if /\S/'
awk 'BEGIN { ORS="\n\n" } /./'
awk '/./ { print; print "\n" )'
awk '/./&&$0=$0"\n"'
txr -e '(awk (#/./ (set rec `@rec\n`) (prn)))'
# Remove all consecutive blank lines, leaving just one
perl -00 -pe ''
perl -00pe0
awk -v RS= -v ORS="\n\n" 1 # RS=<blank> -> Awk paragraph mode
txr -e '(awk (:set rs nil ors "\n\n") (t))'
# Number all lines in a file
perl -pe '$_ = "$. $_"'
awk '{ print FNR, $0 }'
txr -e '(awk (t (prn fnr rec)))'
# Print the total number of lines in a file (emulate wc -l)
perl -lne 'END { print $. }'
awk 'END { print FNR }'
# Find the total number of fields (words) on each line
perl -alne 'print scalar @F'
awk '{print NF}'
# elides loop if cond/action clauses absent so (nil) needed
txr -e '(awk (nil) (:end (prn fnr)))'
# Print the last 10 lines of a file (emulate tail -10)
perl -ne 'push @a, $_; @a = @a[@a-10..$#a]; END { print @a }'
awk '{ ln[FNR%10]=$0 }
END { for(i=FNR-9;i<=FNR;i++)
if (i > 0) print ln[i%10] }'
txr -e '(awk (:let l)
(t (push rec l)
(del [l 10]))
(:end (tprint (nreverse l))))'
txr -e '(awk (:let l)
(t (set [l 10..10] (list rec))
(del [l 0..-10]))
(:end (tprint l)))'
# (real way)
txr -t '[(get-lines) -10..:]' < in
txr -t '(last (get-lines) 10)' < in
You read them to learn. I learned a couple of perl flags in this discussion I had forgotten. Also, you get ideas about what you can do. And third, after reading through a collection like that, you can find stuff back by looking in an index.
http://sed.sourceforge.net/sed1line.txt
http://www.pement.org/awk/awk1line.txt