Yep, I haven't tried this particular project but that's my overall experience wi...

Yep, I haven't tried this particular project but that's my overall experience with similar projects as well. Smaller models that can be ran locally in compute-poor environments really need structured outputs and just prompting them to "you can ONLY return a python list of str, WITHOUT any other additional content" (a piece of prompt from this project) is nowhere sufficient for any resemblance of reliability.

If you're feeling adventurous, you can probably refactor the prompt functions in https://github.com/zilliztech/deep-searcher/blob/master/deep... to return additional metadata (required output structure) together with the prompt itself, update all `llm.chat()` calls throughout the codebase to account for this (probably changing the `chat` method API by adding an extra `format` argument and not just `messages`) and implement a custom Ollama-specific handler class that would pass this to the LLM runner. Or maybe task some of those new agentic coding tools to do this, since it looks like a mostly mechanical refactoring that doesn't require a lot of thinking past figuring out the new API contract.