skill issue

creesch · 2024-09-02T08:07:40 1725264460

Sure. I do not live in the terminal. But, I work with Linux enough to comfortably navigate around, read various shell scripts with relative ease. With the exception of awk. Which to me signals that, at least in my case, awk has a higher barrier for entry compared to most other things in the same environment.

So with alternatives around I can more easily parse myself, I happily concede that I have a skill issue with awk.

mardifoufs · 2024-09-02T19:10:52 1725304252

Well, using awk because you are familiar with it could be due to a skill issue with other languages too. Can't use python for parsing? Skill issue I guess, going by your logic.

keybored · 2024-09-02T10:06:12 1725271572

Even the eminent Mr. A., W., and K. had sKiLl isSueS when designing this language, apparently. You can only ask so much from regular programmers.

watt · 2024-09-02T08:35:40 1725266140

once there are more productive alternatives that require less specialized "skill", your condescending "skill issue" becomes a devex issue, and basically a productivity gap which will doom your language or tool.

DonHopkins · 2024-09-02T08:46:47 1725266807

You just need to have the skill to overcome whatever non-technical, legacy, lack of education, or poor judgement issues that are steamrolling you into choosing to use awk instead of a sane rational decent modern efficient maintainable language.

orwin · 2024-09-02T10:29:43 1725272983

To be fair, sometimes awk is just faster to call. In all other case, as my sibling says, use perl :D

dotancohen · 2024-09-02T09:37:06 1725269826

Perl, then?

mst · 2024-09-02T11:24:46 1725276286

The rule of thumb back at Netcraft was to prototype in awk/sed for brevity/expressiveness and then port to perl for production use for performance reasons.

Been a couple decades since I was wrangling the survey systems there though, no idea what it looks like now.

kragen · 2024-09-02T12:18:03 1725279483

i very much appreciate the server surveys; for a time i read the report every month!

forinti · 2024-09-02T12:00:29 1725278429

As a dare from a friend I compared my Perl solution to an AWK solution:

  $time perl -MData::Dumper -ne '$n{length($_)}++; END {print Dumper(%n)}' bigfile.txt

  $VAR1 = '1088';
  $VAR2 = 349647;

  real    0m1.326s
  user    0m0.814s
  sys     0m0.371s

  $time awk 'length($0) > max { max=length($0) } END { print max }' bigfile.txt

  1087

  real    0m21.400s
  user    0m18.596s
  sys     0m0.455s

I prefer Perl, but I have no issue with AWK and I actually use it frequently.

scbrg · 2024-09-02T14:05:12 1725285912

Well. I don't know. Those two programs don't really do the same thing. There's an awful lot of comparisons in the second one. After making the awk program more similar to the Perl program, and using mawk instead of gawk (which is quite a bit slower) the numbers look a bit different:

  $ seq 100000000 > /tmp/numbers 
  $ time perl -MData::Dumper -ne '$n{length($_)}++; END {print Dumper(%n)}'  /tmp/numbers 
  $VAR1 = '7';
  $VAR2 = 900000;
  $VAR3 = '8';
  $VAR4 = 9000000;
  $VAR5 = '5';
  $VAR6 = 9000;
  $VAR7 = '4';
  $VAR8 = 900;
  $VAR9 = '6';
  $VAR10 = 90000;
  $VAR11 = '10';
  $VAR12 = 1;
  $VAR13 = '2';
  $VAR14 = 9;
  $VAR15 = '3';
  $VAR16 = 90;
  $VAR17 = '9';
  $VAR18 = 90000000;
  
  real 0m16.483s
  user 0m16.071s
  sys 0m0.352s
  $ time mawk '{ lengths[length($0)]++ } END { max = 0; for(l in lengths) if (int(l) > max) max = int(l); print max; }' /tmp/numbers 
  9

  real 0m5.980s
  user 0m5.493s
  sys 0m0.457s

[edit]: Actually had a bug in the initial implementation. Of course.

forinti · 2024-09-02T14:13:59 1725286439

I used them both to find the longest line in a file. The Perl option just spits out the number of times each line length occurs. It will get messy if you have many different line lengths (which was not my case).

You also have to take into account that awk does not count the line terminator.

Let's try the opposite: make the Perl script more like the AWK one.

  $ time perl -ne 'if(length($_)>$n) {$n=length($_)}; END {print $n}'  rockyou.txt 
  286

  real 0m2,569s
  user 0m2,506s
  sys 0m0,056s

  $ time awk 'length($0) > max { max=length($0) } END { print max }' rockyou.txt 
  285

  real 0m3,768s
  user 0m3,714s
  sys 0m0,048s

shawn_w · 2024-09-02T15:36:45 1725291405

`perl -lne ...` to have perl strip the trailing newlines like awk does. Should give the same result with it.

forinti · 2024-09-02T15:50:03 1725292203

You're right. It even makes the times converge.