Perl would do it quite fast and it has the benefit of accessing posix primitives...

gerikson · on Jan 4, 2024

A naive perl solution is really really slow compared to even the reference Java implementation. (I know, I've tried)

lessbergstein · on Jan 4, 2024

That's strange, you should be able to stream the file right into a tiny perl executable at the same speed as the bottlenecking hardware. The kernel will take care of all the logistics. You're probably trying to do too much explicitly. Just use a pipe. Perl should be done before Jit completes.

gerikson · on Jan 4, 2024

Using cat to redirect the file to /dev/null takes 18s on my machine (a low-end NUC). Just running a noop on the file in Perl (ie. feeding it into a `while (<>)` loop but not acting on the contents) takes ~2 minutes.

1B lines is a lot, and Java ain't a slouch.

lessbergstein · on Jan 4, 2024

Why are you using cat at all? Use a pipe. This isn't hard stuff. Don't use <>, feed the file into a scalar or array. it should only take a few seconds to process a billion lines.

https://www.perl.com/pub/2003/11/21/slurp.html/#:~:text=Anot....

sobellian · on Jan 4, 2024

If it isn't hard, then perhaps you could demonstrate with a complete perl program that you think should beat java.

gerikson · on Jan 4, 2024

I profiled my attempt, actually reading each line is the bottleneck.

lessbergstein · on Jan 4, 2024

Perl is always going to be much faster than Java at tasks like this. Use stdin and chomp() instead of reading each line explicitly.

This is really a small, trivial task for a perl script. Even with a billion lines this is nothing for a modern cpu and perl.

gerikson · on Jan 4, 2024

/r/perl would beg to differ:

https://www.reddit.com/r/perl/comments/18ygpsi/1_billion_row...

lessbergstein · on Jan 5, 2024

Reddit?