It's funny because I wouldn't consider the comment that they highlight in their post as a nitpick.
Something that has an impact on the long term maintainability of code is definitely not nitpikcky, and in the majority of cases define a type fits this category as it makes refactors and extensions MUCH easier.
On top of that, I think the approach they went with is a huge mistake. The same comment can be a nitpick on one CR but crucial on another, clustering them is destined to result in false-positives and false-negatives.
I'm not sure I'd want to use a product to review my code for which 1) I cannot customize the rules, 2) it seems like the rules chosen by the creators are poor.
To be honest I wouldn't want to use any AI-based code reviewer at all. We have one at work (FAANG, so something with a large dedicated team) and it has not once produced a useful comment and instead has been factually wrong many times.
1) This is an ego problem. Whoever is doing the development cannot handle being called out on certain software architecture / coding mistakes, so it becomes "nitpicking".
2) The software shop has a "ship out faster, cut corners" culture, which at that point might as well turn off the AI review bot.
I’ve found it’s nice to talk to an LLM about personal issues because I know it’s not a real person judging me. Maybe if the comments were kept private with the dev, it’d be more just a coaching tool that didn’t feel like a criticism?
This goes both ways. I worked for a company where the majority of PR comments were of the "Well, _I_ wouldn't do it this way." form. In some cases, the "way" they were complaining about were direct versions of examples in the language library docs.
One specific case was a PR was held up from merging because I used the plural of "regex" as "regexen" and not "regexes". IN A COMMENT. <eye roll>
This does not address the issue raised in iLoveOncall's third paragraph: "the same comment can be a nitpick on one CR but crucial on another..." In "attempt 2", you say that "the LLMs judgment of its own output was nearly random", which raises questions that go well beyond just nitpicking, up to that of whether the current state of the art in LLM code review is fit for much more than ticking the box that says "yes, we are doing code review."
Something that has an impact on the long term maintainability of code is definitely not nitpikcky, and in the majority of cases define a type fits this category as it makes refactors and extensions MUCH easier.
On top of that, I think the approach they went with is a huge mistake. The same comment can be a nitpick on one CR but crucial on another, clustering them is destined to result in false-positives and false-negatives.
I'm not sure I'd want to use a product to review my code for which 1) I cannot customize the rules, 2) it seems like the rules chosen by the creators are poor.
To be honest I wouldn't want to use any AI-based code reviewer at all. We have one at work (FAANG, so something with a large dedicated team) and it has not once produced a useful comment and instead has been factually wrong many times.