You keep ignoring the author literally opening it up as an avenue in:
>I’m sure there’s further you could go with the C version: investigate memory-mapped I/O, avoid processing byte-at-a-time, use a fancier data structure for counting, etc. But this is quite enough for now!
This subthread started with bluetomcat talking about the optimized C and what experienced C devs might do. That strikes me as fair game to open up discussion of alternative IO anyway even if the author hadn't already (which he clearly did, as quoted). So, that's two reasons it's worth bringing up, and I was never criticizing your program!
If what you are on about is "Who wins what scorecard against what arbitrary constraints" or "My hands were tied, really!" or whatever, then, sorry, but this benchmark is too uncontrolled for great answers even on its own terms. Various stdio-using things would be "out of constraints" if a system was configured with bigger than 64K default buffers anyway which is certainly possible, if unlikely, today. None of the sample programs or the test harness "check" for that. Some of the languages may not be able to ensure it. Using that to leapfrog to "only streams" is a stretch. Is CPU freq scaling controlled? Min of N trials to filter background noise/get repeatability? Even simple mean+-sdev? No, no, and no. And probably four more things.
Beyond all of that, I also don't think that's what this subthread was ever about. bluetomcat's opening was literally the opposite - equivalent to "real devs would do xyz implicitly independent of the arbitrary constraints posed if performance is paramount"..seemingly a follow up on my quote from the author. He can of course chime in if I misread that. Presumably, the author would have had to relax that already problematic 64K constraint when moving on to mmap (which moving on he might have done if the article were fewer languages or if he had just done it that way first in his open coded C).
The positions you seem dug into here seems to me "don't mention mmap to people questioning the posing of the contest even though the author did" or "you cannot default to mmap and fail over like fstat||mmap||do_streams||aiie_noStdInEven". Yes, which is faster varies by OS/situation. So? Maybe "experienced" C/whatever devs know their situation. (You use mmap in ripgrep...). Maybe these are all honest communication errors, but I don't think your positions are very tenable.
As I mentioned in a few places, performance conclusions here are harder than they might look. As for "distractions", one might say that about literally all the optimized variants, including their numbers in the table since the numbers are likely to change more. The article might be stronger to focus on only the simple variants of all the rest, leaving the optimized ones for the github repo and weird "contest rule" debates on the github issues.
Anyway, to add a little more actual information for passersby less dug in to defending some weird position like "", Nim's stdlib has a trie in critbits module. So, that test is "in bounds" and an easy experiment someone might enjoy.
It's not about a scorecard. I didn't just dig my heels in and say, "language lawyering the OP says mmap isn't allowed so shut up." It's about what's a useful benchmark. I explained why mmap's are a distraction beyond just pointing to the OP's constraints. I don't think there's really much else to say and I disagree with your characterization of this thread.
>I’m sure there’s further you could go with the C version: investigate memory-mapped I/O, avoid processing byte-at-a-time, use a fancier data structure for counting, etc. But this is quite enough for now!
This subthread started with bluetomcat talking about the optimized C and what experienced C devs might do. That strikes me as fair game to open up discussion of alternative IO anyway even if the author hadn't already (which he clearly did, as quoted). So, that's two reasons it's worth bringing up, and I was never criticizing your program!
If what you are on about is "Who wins what scorecard against what arbitrary constraints" or "My hands were tied, really!" or whatever, then, sorry, but this benchmark is too uncontrolled for great answers even on its own terms. Various stdio-using things would be "out of constraints" if a system was configured with bigger than 64K default buffers anyway which is certainly possible, if unlikely, today. None of the sample programs or the test harness "check" for that. Some of the languages may not be able to ensure it. Using that to leapfrog to "only streams" is a stretch. Is CPU freq scaling controlled? Min of N trials to filter background noise/get repeatability? Even simple mean+-sdev? No, no, and no. And probably four more things.
Beyond all of that, I also don't think that's what this subthread was ever about. bluetomcat's opening was literally the opposite - equivalent to "real devs would do xyz implicitly independent of the arbitrary constraints posed if performance is paramount"..seemingly a follow up on my quote from the author. He can of course chime in if I misread that. Presumably, the author would have had to relax that already problematic 64K constraint when moving on to mmap (which moving on he might have done if the article were fewer languages or if he had just done it that way first in his open coded C).
The positions you seem dug into here seems to me "don't mention mmap to people questioning the posing of the contest even though the author did" or "you cannot default to mmap and fail over like fstat||mmap||do_streams||aiie_noStdInEven". Yes, which is faster varies by OS/situation. So? Maybe "experienced" C/whatever devs know their situation. (You use mmap in ripgrep...). Maybe these are all honest communication errors, but I don't think your positions are very tenable.
As I mentioned in a few places, performance conclusions here are harder than they might look. As for "distractions", one might say that about literally all the optimized variants, including their numbers in the table since the numbers are likely to change more. The article might be stronger to focus on only the simple variants of all the rest, leaving the optimized ones for the github repo and weird "contest rule" debates on the github issues.
Anyway, to add a little more actual information for passersby less dug in to defending some weird position like "", Nim's stdlib has a trie in critbits module. So, that test is "in bounds" and an easy experiment someone might enjoy.
Have a nice day.