I would check out this company, Swarms (https://github.com/kyegomez/swarms) who'...

simonw · 2024-08-06T20:44:35 1722977075

I totally believe that people are selling solutions around this idea, what I'd like to hear is genuine success stories from people who have used them in production (and aren't currently employed by a vendor).

kgdiem · 2024-08-06T20:55:48 1722977748

Midjourney uses multiple agents for determining if a prompt is appropriate or not.

I kinda did this, too.

I made a 3 agent system — one is a router that parses the request and determines where to send it (to the other 2) one is a chat agent and the third is an image generator.

If the router determines an image is requested, the chat agent is tasked with making a caption to go along with the image.

It works well enough.

42lux · 2024-08-06T21:00:20 1722978020

Midjourney does not use an agent system, they use a single call.

kgdiem · 2024-08-06T21:03:51 1722978231

I remembered reading that they used > 1, found this screengrab of discord on Reddit:

https://www.reddit.com/r/midjourney/comments/137bj1o/new_upd...

42lux · 2024-08-06T22:00:48 1722981648

It's still one call they just use a different llm now.

simonw · 2024-08-06T21:06:10 1722978370

Is that multiple agents, or just multiple prompts? Are agents and prompts the same thing?

kgdiem · 2024-08-06T21:23:05 1722979385

I am fairly certain that agents are the same thing as prompts (but also could be different).
Only the chat prompt/agent/whatever is connected to RAG; the image generator is DALLE and the router is a one-off call each time.

eg, it could be the same model with a different prompt or a different model + different prompt all together. AFAIU it’s just serving a different purpose than the other calls

llm_trw · 2024-08-07T00:28:54 1722990534

Depends on who is trying to sell you what.

Currently all the tools on the market just use a different prompt and call it an agent.

I've build a tool using different models for each agent, e.g. whisper for audio decoding, llava to detect slides on the screen, open cv to crop the image, ocr to read the slide content, llm to summarize everything that's happening during the earnings call.

tmaly · 2024-08-07T04:47:47 1723006067

How do you jailbreak midjourney?

alsima · 2024-08-06T20:53:44 1722977624

Hmmm I get what you mean...I think it's hard to sell a solution around this idea, but I think it will become something more like a common practice/performance improvement method. James Huckle on Linkedin (https://www.linkedin.com/feed/update/urn:li:activity:7214295...) mentioned that agent communication would be something more like a hyperparameter to tune which I agree with.

potatoman22 · 2024-08-07T00:29:22 1722990562

Try it for classification tasks.