This is awesome work. I've been following these commits on GitHub and it's great to see AR being optimized.
Unfortunately there is not enough info in these benchmarks to tell whether for a large production app 3.0 might be 10 times slower than 2.3 for that particular find you are benchmarking. Or 10 times faster. Or the same.
Here's why -- for MRI the time spent on each GC run is a function of total process size/# loaded objects, which is mainly a function of your code size. In a trivial test case you can run GC in a few milliseconds. In a production app it's 100ms+. Simply by loading more code into the benchmark it's possible that we can change results completely.
When benchmarking ruby code it's better to keep runtime, # of memory allocations and size of memory allocation separate and report all 3. Meaning you run the test with GC off and patch the interpreter to get the memory allocation info.
Different users will have different tradeoffs between runtime costs vs GC costs depending on the app size, ruby interpreter used, GC tuning parameters, available RAM, etc. It's certainly valuable to also come up with general "5x" number, but that should assume and state some reasonable values for the above.
technical erratum: if each node in the list adds all the previous nodes as new objects, then with 4 nodes, the total number of objects is 10, not 24, and the overall order of the operation is O(n^2). It's still cool that he is faster, and cooler that it is using an AST, which seems to me like the right solution.
Courtesy of Evan Phoenix... apparently in MRI, attr_reader doesn't create a method, instead creating a NODE_IVAR in the method table of the class. This circumvents method creation/invocation overhead and is therefore faster.
Unfortunately there is not enough info in these benchmarks to tell whether for a large production app 3.0 might be 10 times slower than 2.3 for that particular find you are benchmarking. Or 10 times faster. Or the same.
Here's why -- for MRI the time spent on each GC run is a function of total process size/# loaded objects, which is mainly a function of your code size. In a trivial test case you can run GC in a few milliseconds. In a production app it's 100ms+. Simply by loading more code into the benchmark it's possible that we can change results completely.
When benchmarking ruby code it's better to keep runtime, # of memory allocations and size of memory allocation separate and report all 3. Meaning you run the test with GC off and patch the interpreter to get the memory allocation info.
Different users will have different tradeoffs between runtime costs vs GC costs depending on the app size, ruby interpreter used, GC tuning parameters, available RAM, etc. It's certainly valuable to also come up with general "5x" number, but that should assume and state some reasonable values for the above.