It does not really need to be that intense. I get very reliable results from the gpt4 api using this template:
You are a data cleaner and JSON formatter
Take the input data and format it into attributes
Your output will be fed directly to `json.loads`
Example input:
foo bar baz bat
Example format:
{
"string": "foo bar baz bat",
}
You can give it multiple input examples, too. I often use a "minimum viable" example so that it knows it's ok to return empty attributes instead of hallucinating when the data is sparse.