Trying to understand your use case more, but if you already have a corpus of exa...

nathancahill · on March 31, 2023

That's just an example, but the end goal is feeding arbitrary code in JS and getting code back in Python. Think of the existing corpus as training data.

Or do I have to wait for GPT-4 expanded contexts to fine-tune with prompts like:

    Python: Python code (4-8k tokens)
    ----
    JS: JS code (4-8k tokens)

nl · on March 31, 2023

The embeddings on their own are insufficient for this. You need some kind of sequential model to generate the code.

It should be possible to build your own model to do this instead of GPT-4 if you are so inclined. I don't know how the quality would compare but there are various specialized code-specific models already around (and more coming) that work quite well.

yao420 · on March 31, 2023

What are the best ones for code? I’m looking to write and measure the coverage of generated fuzzers and unit tests.

nl · on March 31, 2023

I think the Salesforce ones[1] and the CarperAI[2] one.

BigCode from HuggingFace will be coming soon too.

[1] https://huggingface.co/Salesforce and expand "models" codegen-mono is Python, codegen-multi is multi language I think.

[2] https://carper.ai/diff-models-a-new-way-to-edit-code/

Cyphase · on March 31, 2023

Have you tried asking GPT-4 to convert the code now, without examples? Or with just a few examples?

andrewlu0 · on March 31, 2023

I feel like this should work? If the original code is too long you may just need a clever chunking strategy