Hacker News new | past | comments | ask | show | jobs | submit login

Lets see the code? Regex has always been a Perl selling point, and you're just benchmarking the Python Regex library (written in C) against Perl's regex library. That's not a fantastic basis for a comparison.



Perl 5:

    #!/usr/bin/env perl
    use 5.026;

    open my $fh, '<', 'logs1.txt';
    while (<$fh>) {
      chomp;
      say if /\b\w{15}\b/;
    }
    close $fh;
Python 3.8

    #!/usr/bin/env python

    import re

    with open('logs1.txt', 'r') as fh:
        for line in fh:
            if re.search(r'\b\w{15}\b', line): print(line,   end='')
Why isn't it a fair comparison? The startup time is generic and string parsing is a major feature of, say, web development. I didn't say Perl5 numerics match Python's but even there Python relies on external libs.


It doesn't account for anywhere near the whole difference, but in a tight loop like that Python's going to be spending a good chunk of its time re-compiling the regex from the raw string literal every iteration. Hoist the regex definition out of the loop like so and it'll probably run about 30% faster:

  #!/usr/bin/env python3

  import re

  with open('logs1.txt', 'r') as fh:
      regex = re.compile(r'\b\w{15}\b')
      for line in fh:
          if regex.search(line): print(line, end='')
Perl almost certainly does this by default for regex literals, and that's a fair advantage for the "kitchen sink" style of language design versus orthogonal features (regex library, raw strings) that Python uses.


With pre-compiled regex the Python version comes down to 1.483s on my machine which is still considerably slower than Perl. I wrote versions of these using substring `index` instead of a regex and Perl was still the clear winner:

    time ./index.pl  0.258s
    time ./index.py  0.609s
If you factor-in that Python startup time is 0.279s slower than Perl the processing differential comes down to 0.072s.


I'd be curious on your take of my code below, I have python and perl as dead even if you don't print and ignore pythons startup time.


Well, the two ifs you hedge to qualify Python's parity are pretty damning in themselves, no? Let's stop beating about the bush - if you want to perform a common text-processing job like parsing a log file Perl is hands down faster than Python.


Python caches patterns, you almost never need to re.compile unless you’re a library or have a specific use case involving lots of unique patterns.

The issue here is that pythons regex engine has overhead, and with lots of sequential calls with small strings like that the overhead adds up.

If you batch lines together in chunks you’ll see a huge improvement in speed, but the point is that it’s not “Python vs Perl” it’s “pythons regex engine vs Perl’s regex engine”. Which is about a contrived Perl-biased benchmark if ever there was one.


Running a different test the slowest part of the python script is the startup time, otherwise they're identical:

    [admin@localhost ~]$ time ./test.py
    real 0m0.295s
    user 0m0.158s
    sys 0m0.138s

    [admin@localhost ~]$ time ./test.pl
    real 0m0.164s
    user 0m0.158s
    sys 0m0.006s

    #!/usr/bin/env perl
    use 5.16.3;

    open my $fh, '<', 'logs1.txt';
    while (<$fh>) {
    chomp;
    if (/\b\w{15}\b/) {}
    }
    close $fh;

    #!/usr/bin/env python

    import re
    regex = re.compile(r'\b\w{15}\b')

    with open('logs1.txt', 'r') as fh:
     for line in fh:
      if regex.search(line):
       continue


The python code should run much faster if you don't print..

Edit: perl is still faster, to be clear


That's like saying they're both as fast if you amputate one of Perl's legs.


I took some similar benchmarks recently: https://code.ivysaur.me/interpreter-binary-perf

Perl5 won the dec2bin benchmark.

The other thing I learned was that PHP's binary/decimal functions are two orders of magnitude slower, despite its core interpreter performance being best-in-class.


On the other hand PHP has a PCRE jit option which manages to beat even Perl's regex implementation.


Yes but you can use that jit also with perl5. https://metacpan.org/pod/re::engine::PCRE2

It has even less backcompat problems than the native regex.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: