Hacker News new | past | comments | ask | show | jobs | submit login

For this problem the multiple process version would be quite simple in python or any other languages. It's a classic same program multiple data (SPMD) task. You split the file into N chunks than run N versions of the original program on it (a Map). You then need to collate the results, which required a second program, but that step is similar to the sorting step in the original and so would be negligible wrt wall time (a quick Reduce).

For large files you should get almost embarrassing parallelism.




Oh I think a few simd instructions could reduce processing to near zero without going crazy with multi-threaded architectures.

Remember that fizzbuzz on HN that hit GB/s? Mostly SIMD. Zero multi-threaded IIRC.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: