Lets see the code? Regex has always been a Perl selling point, and you're just b...

cutler · on June 24, 2020

Perl 5:

    #!/usr/bin/env perl
    use 5.026;

    open my $fh, '<', 'logs1.txt';
    while (<$fh>) {
      chomp;
      say if /\b\w{15}\b/;
    }
    close $fh;

Python 3.8

    #!/usr/bin/env python

    import re

    with open('logs1.txt', 'r') as fh:
        for line in fh:
            if re.search(r'\b\w{15}\b', line): print(line,   end='')

Why isn't it a fair comparison? The startup time is generic and string parsing is a major feature of, say, web development. I didn't say Perl5 numerics match Python's but even there Python relies on external libs.

tomku · on June 24, 2020

It doesn't account for anywhere near the whole difference, but in a tight loop like that Python's going to be spending a good chunk of its time re-compiling the regex from the raw string literal every iteration. Hoist the regex definition out of the loop like so and it'll probably run about 30% faster:

  #!/usr/bin/env python3

  import re

  with open('logs1.txt', 'r') as fh:
      regex = re.compile(r'\b\w{15}\b')
      for line in fh:
          if regex.search(line): print(line, end='')

Perl almost certainly does this by default for regex literals, and that's a fair advantage for the "kitchen sink" style of language design versus orthogonal features (regex library, raw strings) that Python uses.

cutler · on June 24, 2020

With pre-compiled regex the Python version comes down to 1.483s on my machine which is still considerably slower than Perl. I wrote versions of these using substring `index` instead of a regex and Perl was still the clear winner:

    time ./index.pl  0.258s
    time ./index.py  0.609s

If you factor-in that Python startup time is 0.279s slower than Perl the processing differential comes down to 0.072s.

dgfitz · on June 24, 2020

I'd be curious on your take of my code below, I have python and perl as dead even if you don't print and ignore pythons startup time.

cutler · on June 25, 2020

Well, the two ifs you hedge to qualify Python's parity are pretty damning in themselves, no? Let's stop beating about the bush - if you want to perform a common text-processing job like parsing a log file Perl is hands down faster than Python.

orf · on June 24, 2020

Python caches patterns, you almost never need to re.compile unless you’re a library or have a specific use case involving lots of unique patterns.

The issue here is that pythons regex engine has overhead, and with lots of sequential calls with small strings like that the overhead adds up.

If you batch lines together in chunks you’ll see a huge improvement in speed, but the point is that it’s not “Python vs Perl” it’s “pythons regex engine vs Perl’s regex engine”. Which is about a contrived Perl-biased benchmark if ever there was one.

dgfitz · on June 24, 2020

Running a different test the slowest part of the python script is the startup time, otherwise they're identical:

    [admin@localhost ~]$ time ./test.py
    real 0m0.295s
    user 0m0.158s
    sys 0m0.138s

    [admin@localhost ~]$ time ./test.pl
    real 0m0.164s
    user 0m0.158s
    sys 0m0.006s

    #!/usr/bin/env perl
    use 5.16.3;

    open my $fh, '<', 'logs1.txt';
    while (<$fh>) {
    chomp;
    if (/\b\w{15}\b/) {}
    }
    close $fh;

    #!/usr/bin/env python

    import re
    regex = re.compile(r'\b\w{15}\b')

    with open('logs1.txt', 'r') as fh:
     for line in fh:
      if regex.search(line):
       continue

dgfitz · on June 24, 2020

The python code should run much faster if you don't print..

Edit: perl is still faster, to be clear

cutler · on June 25, 2020

That's like saying they're both as fast if you amputate one of Perl's legs.

mappu · on June 24, 2020

I took some similar benchmarks recently: https://code.ivysaur.me/interpreter-binary-perf

Perl5 won the dec2bin benchmark.

The other thing I learned was that PHP's binary/decimal functions are two orders of magnitude slower, despite its core interpreter performance being best-in-class.

cutler · on June 25, 2020

On the other hand PHP has a PCRE jit option which manages to beat even Perl's regex implementation.

rurban · on June 25, 2020

Yes but you can use that jit also with perl5. https://metacpan.org/pod/re::engine::PCRE2

It has even less backcompat problems than the native regex.