Lets see the code? Regex has always been a Perl selling point, and you're just benchmarking the Python Regex library (written in C) against Perl's regex library. That's not a fantastic basis for a comparison.
#!/usr/bin/env perl
use 5.026;
open my $fh, '<', 'logs1.txt';
while (<$fh>) {
chomp;
say if /\b\w{15}\b/;
}
close $fh;
Python 3.8
#!/usr/bin/env python
import re
with open('logs1.txt', 'r') as fh:
for line in fh:
if re.search(r'\b\w{15}\b', line): print(line, end='')
Why isn't it a fair comparison? The startup time is generic and string parsing is a major feature of, say, web development. I didn't say Perl5 numerics match Python's but even there Python relies on external libs.
It doesn't account for anywhere near the whole difference, but in a tight loop like that Python's going to be spending a good chunk of its time re-compiling the regex from the raw string literal every iteration. Hoist the regex definition out of the loop like so and it'll probably run about 30% faster:
#!/usr/bin/env python3
import re
with open('logs1.txt', 'r') as fh:
regex = re.compile(r'\b\w{15}\b')
for line in fh:
if regex.search(line): print(line, end='')
Perl almost certainly does this by default for regex literals, and that's a fair advantage for the "kitchen sink" style of language design versus orthogonal features (regex library, raw strings) that Python uses.
With pre-compiled regex the Python version comes down to 1.483s on my machine which is still considerably slower than Perl. I wrote versions of these using substring `index` instead of a regex and Perl was still the clear winner:
time ./index.pl 0.258s
time ./index.py 0.609s
If you factor-in that Python startup time is 0.279s slower than Perl the processing differential comes down to 0.072s.
Well, the two ifs you hedge to qualify Python's parity are pretty damning in themselves, no? Let's stop beating about the bush - if you want to perform a common text-processing job like parsing a log file Perl is hands down faster than Python.
Python caches patterns, you almost never need to re.compile unless you’re a library or have a specific use case involving lots of unique patterns.
The issue here is that pythons regex engine has overhead, and with lots of sequential calls with small strings like that the overhead adds up.
If you batch lines together in chunks you’ll see a huge improvement in speed, but the point is that it’s not “Python vs Perl” it’s “pythons regex engine vs Perl’s regex engine”. Which is about a contrived Perl-biased benchmark if ever there was one.
Running a different test the slowest part of the python script is the startup time, otherwise they're identical:
[admin@localhost ~]$ time ./test.py
real 0m0.295s
user 0m0.158s
sys 0m0.138s
[admin@localhost ~]$ time ./test.pl
real 0m0.164s
user 0m0.158s
sys 0m0.006s
#!/usr/bin/env perl
use 5.16.3;
open my $fh, '<', 'logs1.txt';
while (<$fh>) {
chomp;
if (/\b\w{15}\b/) {}
}
close $fh;
#!/usr/bin/env python
import re
regex = re.compile(r'\b\w{15}\b')
with open('logs1.txt', 'r') as fh:
for line in fh:
if regex.search(line):
continue
The other thing I learned was that PHP's binary/decimal functions are two orders of magnitude slower, despite its core interpreter performance being best-in-class.