Such a state does not exist inside V8. Things happen as they are parsed. Since V8 has no idea what a piece of JS will do until it gets all the context, it has to use a silly thing like character count to perform inlining. It could create an intermediate state, but that would require a code overhaul that would likely slow down the entire thing anyway.
If I had to suggest a change, I would use the 600-char count as a starting point for one-time functions, and then count the number of calls to a function, counting how many instructions it produces (basically tossing an incrementing variable into the compilation state, and storing it per Function object).
But then again, I've never played with the inside of V8, only the outside. I consider myself lucky to be in that position, because V8 is pretty dang sweet.
No, V8 parses your Javascript source into an AST. From the AST, there are two paths it can take. The first step is the "full-codegen" compiler, which walks the AST directly and emits unoptimized native code:
If a function is hot, then it gets promoted to V8's optimizing compiler, which is currently Crankshaft but will soon be replaced by Turbofan. This compiles the AST first into a high-level IR, Hydrogen, which is an architecture-independent SSA-like form vaguely reminiscent of LLVM. Then that's lowered into an architecture-dependent IR, Lithium, where register allocation and instruction scheduling is performed.
If you read the bug for this that someone posted up-thread, the AST information is fully available when inlining in Hydrogen, but a patch to remove the character limit tanked one of their benchmarks. Also, they didn't preclude fixing it in Turbofan, only in Crankshaft, and that's largely because Crankshaft is on its way out.
(I've played a little with the internals of V8, but am not an official developer. Also, one of my friends is the Bay Area TL for V8.)
I still don't understand that. Function.prototype.toString() wouldn't break because the parser stripped the comments, it would just output the source without comments? Does any code anywhere depend on the comments being preserved?
People literally toString functions and parse them to implement features. Those features depend on there being comment text in the function that they parse. Pretty much a terrible hack but it is true that you can't remove the comments without breaking current code.
Wow. That's just something that should be an obvious fix in a future ES. Not only does it hurt the optimizer, it also seems to imply that the content of comments sometimes has semantic meaning in the code, such as people having invented directives etc.
Surely this must be undocumented features of the language spec?
If I had to suggest a change, I would use the 600-char count as a starting point for one-time functions, and then count the number of calls to a function, counting how many instructions it produces (basically tossing an incrementing variable into the compilation state, and storing it per Function object).
But then again, I've never played with the inside of V8, only the outside. I consider myself lucky to be in that position, because V8 is pretty dang sweet.