I would check out this company, Swarms (https://github.com/kyegomez/swarms) who's working with enterprises to integrate multi-agents. But definitely a great point to focus on, the research paper mentions that the scaling of performance reduces with complexity of the task, which is definitely true for SWE
I totally believe that people are selling solutions around this idea, what I'd like to hear is genuine success stories from people who have used them in production (and aren't currently employed by a vendor).
Midjourney uses multiple agents for determining if a prompt is appropriate or not.
I kinda did this, too.
I made a 3 agent system — one is a router that parses the request and determines where to send it (to the other 2) one is a chat agent and the third is an image generator.
If the router determines an image is requested, the chat agent is tasked with making a caption to go along with the image.
I am fairly certain that agents are the same thing as prompts (but also could be different).
Only the chat prompt/agent/whatever is connected to RAG; the image generator is DALLE and the router is a one-off call each time.
eg, it could be the same model with a different prompt or a different model + different prompt all together. AFAIU it’s just serving a different purpose than the other calls
Currently all the tools on the market just use a different prompt and call it an agent.
I've build a tool using different models for each agent, e.g. whisper for audio decoding, llava to detect slides on the screen, open cv to crop the image, ocr to read the slide content, llm to summarize everything that's happening during the earnings call.
Hmmm I get what you mean...I think it's hard to sell a solution around this idea, but I think it will become something more like a common practice/performance improvement method. James Huckle on Linkedin (https://www.linkedin.com/feed/update/urn:li:activity:7214295...) mentioned that agent communication would be something more like a hyperparameter to tune which I agree with.