Python caches patterns, you almost never need to re.compile unless you’re a library or have a specific use case involving lots of unique patterns.
The issue here is that pythons regex engine has overhead, and with lots of sequential calls with small strings like that the overhead adds up.
If you batch lines together in chunks you’ll see a huge improvement in speed, but the point is that it’s not “Python vs Perl” it’s “pythons regex engine vs Perl’s regex engine”. Which is about a contrived Perl-biased benchmark if ever there was one.
The issue here is that pythons regex engine has overhead, and with lots of sequential calls with small strings like that the overhead adds up.
If you batch lines together in chunks you’ll see a huge improvement in speed, but the point is that it’s not “Python vs Perl” it’s “pythons regex engine vs Perl’s regex engine”. Which is about a contrived Perl-biased benchmark if ever there was one.