More

eborgnia · 2025-10-30T00:03:27 1761782607

How do you expect it to be priced? We do give discounts for high volume users.

bn-l · 2025-10-30T04:33:34 1761798814

I’m not sure but when I looked into it wasn’t viable with the back of napkin math I did.

eborgnia · 2025-10-29T20:49:43 1761770983

Hey, happy to answer! The manual evals we did showed that both morph-v3-fast and morph-v3-large had significantly more smoothing and hallucination behaviors.

It's hard to know for sure because their methods aren't public, but my guess is the dataset they constructed pushes the Fast Apply model to more aggressively fix mistakes introduced by the frontier model in the edit snippet.

This aligns with the fact that their flagship model (morph-v3-large) is 4x slower than ours -- the smoothings/hallucinations are not in the initial code or the edit snippet so they break speculative continuations more frequently. Their 2x faster model (morph-v3-fast) is likely quantized more aggressively (maybe fp4? and run on B200s?) because it exhibits very strange behaviors like hallucinating invalid characters at random points that make the code non-compilable.

From an accuracy POV, auto-smoothing is helpful for fixing obvious mistakes in the edit snippet like missed imports from well known packages. However, it does increase the frequency of code breaking hallucinations like invalid local imports among other functional changes that you might not want a small apply model to perform.

swyx · 2025-10-30T00:30:59 1761784259

thank you! referring to it as smoothing is interesting, makes me think of code as a series of bumps in multiple dimensions than discrete tokens.

eborgnia · 2025-05-28T05:39:47 1748410787

Glad it's working out -- thanks for the support :)

eborgnia · 2025-05-28T03:44:54 1748403894

The diffusion approach is really interesting -- it's something we haven't checked out for applying edits just yet. It could work quite well though!

You can definitely use it for markdown, but we haven't seen anyone test it for plaintext yet. I'm sure it would work though, let us know if you end up trying it!

darkteflon · 2025-05-28T06:12:36 1748412756

Perfect, thanks for the reply - I absolutely will try it, we have a specific need for this capability.

eborgnia · 2025-05-28T03:43:12 1748403792

Haha, we think of "vibe-coded codebases" as codebases produced by nontechnical users that are using an AI tool

eborgnia · 2025-05-27T22:28:21 1748384901

Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.

If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.

The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.

Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards

eborgnia · 2025-05-27T22:15:46 1748384146

Happy to collaborate, shoot us an email at info@relace.ai :)

eborgnia · 2025-05-27T18:52:23 1748371943

Cline orchestrates all the models under the hood, you could use our apply model with Cline. Not sure what model they are using for that feature right now

eborgnia · 2025-05-27T18:51:00 1748371860

We trained it on over a dozen languages, with a bias towards Typescript and Python. We've seen it work on Markdown pretty well, but you could try it on plaintext too -- curious to hear how that goes

eborgnia · 2025-05-27T18:32:50 1748370770

Open source git repos are a really good place to get data -- it requires a lot of munging to get it into a useful format, but that's the name of the game with model training.

It's on the roadmap to make public evals people can use to compare their options. A lot of the current benchmarks aren't really specialized for these prompt-to-app use cases