If anyone wants to eval this locally versus codellama, it's pretty easy with Ollama[0] and Promptfoo[1]:
prompts:
- "Solve in Python: {{ask}}"
providers:
- ollama:chat:codellama:7b
- ollama:chat:codegemma:instruct
tests:
- vars:
ask: function to return the nth number in fibonacci sequence
- vars:
ask: convert roman numeral to number
# ...
YMMV based on your coding tasks, but I notice gemma is much less verbose by default.
I am really liking the Gemma line of models. Thoroughly impressed with the 2B and 7B non-code optimized variants. The 2B especially packs a lot of punch. I reckon its quality must be at par with some older 7B models, and it runs blazing fast on Apple Silicon - even at 8 bit quantization.
Gemma 2b instruct worked well for me for categorization. I would also say it felt 7b-ish. Very impressed. The initial release left me a bit underwhelmed but 1.1 is better and punches above its weight.
Also looking fwd to use 2b models on iOS and Android (even if they will be heavy on the battery).
It's really sad that Cursor does not support local models yet (afaiu they fetch the URL you provide from their server). Is there a VS Code plugin or other editor that does?
With models like CodeGemma and Command-R+ it makes more and more sense to run them locally.
According to https://forum.cursor.sh/t/support-local-llms/1099/7 the Cursor servers do a lot of work in between your local computer and the model. So porting all that to work on users' laptops is going to take a while.
My issue so far with the various code assistants isn't the quality necessarily, but the ability of them to draw in context from the rest of the code base without breaking the bank or proving so much info that the middle gets ignored. Are there any systems doing that well these days?
If I'm not mistaken, this is not on the models itself, but rather on the implementation of the addon.
I haven't found an open source VSCode or WebStorm addon yet that allows me to use a local model and implements code completion and commands as good as GitHub Copilot.
They either miss a chat feature and/or inline action / code completion and/or fill-in-the-middle models. And if they do, they don't provide the context as intelligently (? an assumption!) as GH's Copilot does.
One alternative I liked was Supermaven: It's really really fast and has a huge context window, so it knows almost your whole project. That was nice! But - one thing I ultimately didn't continue using it for: It doesn't support chat and/or inline commands (CTRL+I on VSCode's GH Copilot).
I feel like a really good Copilot alternative is definitely a still missing.
But: Regarding your question, I think GitHub Copilot's VSCode extension is the best - as of now. The WebStorm extension is sadly not as good, it misses the "inline command" function which IMHO is a must.
Continue.dev allows for this. You can even mix hosted Chat options like GPT-4 (via API) with local completion. I typically use a smaller model for faster text completion and a larger model (with a bigger context) for chat.
Seconding Supermaven here, from the guy that made Tabnine.
Supermaven has a 300k token context. It doesn't seem like it has a ton of intelligence -- maybe comparable to copilot, maybe a bit less -- but it's much better at picking up data structures and code patterns from your code, and usually what I want is help autocompleting that sort of thing rather than writing an algorithm for me (which LLMs often get wrong anyway).
You can also pair it with a gpt4 / opus chat in Cursor, so you can get your slower but more intelligent chat along with the simpler but very fast, high context autocomplete.
Yeah. This is what I imagined should be how these things work. But it is tricky. The system needs to pattern match on what types you've been using, if possible. So you need to vector search for code to do that. Then you need to vector search for the actual dependency source. It's not that simple, but would be the ultimate solution.
[0] https://github.com/ollama/ollama
[1] https://github.com/promptfoo/promptfoo