“Mediocre success”, as described by the author, is how great science starts out. You try an experiment, (eventually) get a slight indication that your hypothesis might be valid, and then keep iterating until it seems clear that the idea is real (or not).
Startup companies generally shouldn’t do science, but that same interactive process should guide you. Because when you start you never know enough.
It is also about trying to get the most of that hypothesis testing, defining success and failure the best you can.
I have encountered this "mediocre success" many times in AI solutions due to lack of problem definition. For instance, now with LLMs is very easy to write a prompt that gives you the output you want in 5 or 6 examples you have in mind. The problem is to build up your testing scenario from there, and gather as much data as possible until you make it representative of your use cases.
That is the only way to actually test your prompts, RAG strategies, and so on, instead of buying the last CoT-like prompt trend.
Startup companies generally shouldn’t do science, but that same interactive process should guide you. Because when you start you never know enough.