Hacker News new | past | comments | ask | show | jobs | submit login

Trying to understand your use case more, but if you already have a corpus of example code in JS and in Python, why the need to use the model to do the transformation?



That's just an example, but the end goal is feeding arbitrary code in JS and getting code back in Python. Think of the existing corpus as training data.

Or do I have to wait for GPT-4 expanded contexts to fine-tune with prompts like:

    Python: Python code (4-8k tokens)
    ----
    JS: JS code (4-8k tokens)


The embeddings on their own are insufficient for this. You need some kind of sequential model to generate the code.

It should be possible to build your own model to do this instead of GPT-4 if you are so inclined. I don't know how the quality would compare but there are various specialized code-specific models already around (and more coming) that work quite well.


What are the best ones for code? I’m looking to write and measure the coverage of generated fuzzers and unit tests.


I think the Salesforce ones[1] and the CarperAI[2] one.

BigCode from HuggingFace will be coming soon too.

[1] https://huggingface.co/Salesforce and expand "models" codegen-mono is Python, codegen-multi is multi language I think.

[2] https://carper.ai/diff-models-a-new-way-to-edit-code/


Have you tried asking GPT-4 to convert the code now, without examples? Or with just a few examples?


I feel like this should work? If the original code is too long you may just need a clever chunking strategy




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: