Midjourney uses multiple agents for determining if a prompt is appropriate or not.
I kinda did this, too.
I made a 3 agent system — one is a router that parses the request and determines where to send it (to the other 2) one is a chat agent and the third is an image generator.
If the router determines an image is requested, the chat agent is tasked with making a caption to go along with the image.
I am fairly certain that agents are the same thing as prompts (but also could be different).
Only the chat prompt/agent/whatever is connected to RAG; the image generator is DALLE and the router is a one-off call each time.
eg, it could be the same model with a different prompt or a different model + different prompt all together. AFAIU it’s just serving a different purpose than the other calls
Currently all the tools on the market just use a different prompt and call it an agent.
I've build a tool using different models for each agent, e.g. whisper for audio decoding, llava to detect slides on the screen, open cv to crop the image, ocr to read the slide content, llm to summarize everything that's happening during the earnings call.
I kinda did this, too.
I made a 3 agent system — one is a router that parses the request and determines where to send it (to the other 2) one is a chat agent and the third is an image generator.
If the router determines an image is requested, the chat agent is tasked with making a caption to go along with the image.
It works well enough.