Some people are having some success speeding token rates and clawback on VRAM us...

jimmySixDOF on May 15, 2023 | parent | context | favorite | on: Run Llama 13B with a 6GB graphics card

Some people are having some success speeding token rates and clawback on VRAM using a 0- group size flag but ymmv I did not test this yet (they were discussing gptq btw)