Hacker News new | past | comments | ask | show | jobs | submit login

Midjourney uses multiple agents for determining if a prompt is appropriate or not.

I kinda did this, too.

I made a 3 agent system — one is a router that parses the request and determines where to send it (to the other 2) one is a chat agent and the third is an image generator.

If the router determines an image is requested, the chat agent is tasked with making a caption to go along with the image.

It works well enough.




Midjourney does not use an agent system, they use a single call.


I remembered reading that they used > 1, found this screengrab of discord on Reddit:

https://www.reddit.com/r/midjourney/comments/137bj1o/new_upd...


It's still one call they just use a different llm now.


Is that multiple agents, or just multiple prompts? Are agents and prompts the same thing?


I am fairly certain that agents are the same thing as prompts (but also could be different).

Only the chat prompt/agent/whatever is connected to RAG; the image generator is DALLE and the router is a one-off call each time.

eg, it could be the same model with a different prompt or a different model + different prompt all together. AFAIU it’s just serving a different purpose than the other calls


Depends on who is trying to sell you what.

Currently all the tools on the market just use a different prompt and call it an agent.

I've build a tool using different models for each agent, e.g. whisper for audio decoding, llava to detect slides on the screen, open cv to crop the image, ocr to read the slide content, llm to summarize everything that's happening during the earnings call.


How do you jailbreak midjourney?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: