Google CodeGemma: Open Code Models Based on Gemma [pdf]

typpo · on April 9, 2024

If anyone wants to eval this locally versus codellama, it's pretty easy with Ollama[0] and Promptfoo[1]:

  prompts:
    - "Solve in Python: {{ask}}"

  providers:
    - ollama:chat:codellama:7b
    - ollama:chat:codegemma:instruct

  tests:
    - vars:
        ask: function to return the nth number in fibonacci sequence
    - vars:
        ask: convert roman numeral to number
    # ...

YMMV based on your coding tasks, but I notice gemma is much less verbose by default.

[0] https://github.com/ollama/ollama

[1] https://github.com/promptfoo/promptfoo

sheepscreek · on April 9, 2024

Download the model weights here (PyTorch, GGUF):

https://huggingface.co/collections/google/codegemma-release-...

I am really liking the Gemma line of models. Thoroughly impressed with the 2B and 7B non-code optimized variants. The 2B especially packs a lot of punch. I reckon its quality must be at par with some older 7B models, and it runs blazing fast on Apple Silicon - even at 8 bit quantization.

tosh · on April 9, 2024

Gemma 2b instruct worked well for me for categorization. I would also say it felt 7b-ish. Very impressed. The initial release left me a bit underwhelmed but 1.1 is better and punches above its weight.

Also looking fwd to use 2b models on iOS and Android (even if they will be heavy on the battery).

danielhanchen · on April 9, 2024

Made Code Gemma 7b 2.4x faster and use 68% less VRAM with Unsloth if anyone wants to finetune it! :) Have a Tesla T4 Colab notebook with ChatML: https://colab.research.google.com/drive/19lwcRk_ZQ_ZtX-qzFP3...

trisfromgoogle · on April 9, 2024

Love to see your work, Daniel -- thank you, as always! Playing with the Colab now =). Go Unsloth, and thanks from the Gemma team!

danielhanchen · on April 9, 2024

Thanks! :) Appreciate it a lot! :)

tosh · on April 9, 2024

It's really sad that Cursor does not support local models yet (afaiu they fetch the URL you provide from their server). Is there a VS Code plugin or other editor that does?

With models like CodeGemma and Command-R+ it makes more and more sense to run them locally.

knoopx · on April 10, 2024

https://github.com/huggingface/llm-vscode

  "llm.backend": "ollama",
  "llm.url": "http://localhost:11434/api/generate",
  "llm.modelId": "codegemma:2b-code-q8_0",
  "llm.configTemplate": "Custom",
  "llm.fillInTheMiddle.enabled": true,
  "llm.fillInTheMiddle.prefix": "<|fim_prefix|>",
  "llm.fillInTheMiddle.middle": "<|fim_middle|>",
  "llm.fillInTheMiddle.suffix": "<|fim_suffix|>",
  "llm.tokensToClear": ["<|fim_prefix|>", "<|fim_middle|>", "<|fim_suffix|>", "<|file_separator|>"],

ericskiff · on April 9, 2024

I've been playing with Continue: https://github.com/continuedev/continue

tosh · on April 9, 2024

ty for the pointer!

ado__dev · on April 9, 2024

Cody supports local inference with Ollama for both Chat and Autocomplete. Here's how to set it up: https://sourcegraph.com/blog/local-chat-with-ollama-and-cody :)

tosh · on April 9, 2024

ty for the pointer!

sp332 · on April 9, 2024

According to https://forum.cursor.sh/t/support-local-llms/1099/7 the Cursor servers do a lot of work in between your local computer and the model. So porting all that to work on users' laptops is going to take a while.

tosh · on April 9, 2024

Already on ollama: https://ollama.com/library/codegemma

kolbe · on April 9, 2024

My issue so far with the various code assistants isn't the quality necessarily, but the ability of them to draw in context from the rest of the code base without breaking the bank or proving so much info that the middle gets ignored. Are there any systems doing that well these days?

grey8 · on April 9, 2024

If I'm not mistaken, this is not on the models itself, but rather on the implementation of the addon.

I haven't found an open source VSCode or WebStorm addon yet that allows me to use a local model and implements code completion and commands as good as GitHub Copilot.

They either miss a chat feature and/or inline action / code completion and/or fill-in-the-middle models. And if they do, they don't provide the context as intelligently (? an assumption!) as GH's Copilot does.

One alternative I liked was Supermaven: It's really really fast and has a huge context window, so it knows almost your whole project. That was nice! But - one thing I ultimately didn't continue using it for: It doesn't support chat and/or inline commands (CTRL+I on VSCode's GH Copilot).

I feel like a really good Copilot alternative is definitely a still missing.

But: Regarding your question, I think GitHub Copilot's VSCode extension is the best - as of now. The WebStorm extension is sadly not as good, it misses the "inline command" function which IMHO is a must.

skybrian · on April 9, 2024

Could you use one tool for code completion and another for chat?

evilduck · on April 9, 2024

Continue.dev allows for this. You can even mix hosted Chat options like GPT-4 (via API) with local completion. I typically use a smaller model for faster text completion and a larger model (with a bigger context) for chat.

https://github.com/continuedev/continue

mediaman · on April 9, 2024

I think most of them allow for that. Works in vscode and vscode-derived (e.g., cursor) editors.

wsxiaoys · on April 9, 2024

Checkout tabby: https://github.com/TabbyML/tabby

Blog post on repository context: https://tabby.tabbyml.com/blog/2023/10/16/repository-context...

(Disclaimer: I started this project)

mediaman · on April 9, 2024

Seconding Supermaven here, from the guy that made Tabnine.

Supermaven has a 300k token context. It doesn't seem like it has a ton of intelligence -- maybe comparable to copilot, maybe a bit less -- but it's much better at picking up data structures and code patterns from your code, and usually what I want is help autocompleting that sort of thing rather than writing an algorithm for me (which LLMs often get wrong anyway).

You can also pair it with a gpt4 / opus chat in Cursor, so you can get your slower but more intelligent chat along with the simpler but very fast, high context autocomplete.

Havoc · on April 9, 2024

Continue + deepseek local model has been working reasonably well for me

snovv_crash · on April 9, 2024

A RAG system is better than a pure LLM for this usecase IMO.

viksit · on April 9, 2024

a rag system that uses tree sitter vs vector search alone would lead to better results (intuitively speaking)? have you seen anything like that yet?

kolbe · on April 9, 2024

Yeah. This is what I imagined should be how these things work. But it is tricky. The system needs to pattern match on what types you've been using, if possible. So you need to vector search for code to do that. Then you need to vector search for the actual dependency source. It's not that simple, but would be the ultimate solution.

zexodus · on April 10, 2024

It performed surprisingly well at categorization tasks where json results where involved compared to gemma:2b and gemma:7b