How does that work in practice? Do you connect the LLM up directly to the app so...

firebaze · 2025-01-04T20:54:11 1736024051

We made an adapter (a specific CLI interface) for the LLM to interface with the app. Kind of like an integration test, just a little bit more sophisticated.

The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does.

On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that)

If you want to know more details, feel free to drop me a message (username at liku dot social)