More

bugglebeetle · 2025-06-15T16:16:06 1750004166

I haven’t tried with the most recent Claude models, but for the last iteration, Gemini was far better at Rust and what I still use to write anything in it. As an experiment, I even fed it a whole ebook on Rust design patterns and a small script (500 lines) and it was able to refactor to use the correct ones, with some minor back and forth to fix build errors!

bugglebeetle · 2025-05-25T18:05:33 1748196333

In 50 years, Japan will have a Chinese majority, so…

bugglebeetle · 2025-05-20T05:10:27 1747717827

Claude Code is closed source and Anthropic does take downs of decompilations.

bugglebeetle · 2025-05-09T06:51:24 1746773484

Gemini 2.5 Pro is better at coding than Claude, it’s just not as good at acting agentically, nor does Google have good tooling to support this use case. Given how quickly they’ve come from far behind and their advantage on context size (Claude’s biggest weakness), this could change just as fast, although I’m skeptical they can deliver a good end user dev tool.

MaxikCZ · 2025-05-09T10:49:05 1746787745

> Gemini 2.5 Pro is better at coding than Claude

Id be careful with stating things like these as fact. I asked Gemini for half an hour to write code that draws a graph the way I want, it never got it right. Then I asked Cladue 3.7 and it got it almost right the first try, to the point I thought its compeltely right, and fixed the bug I discovered right after I pointed it out.

virtualmic · 2025-05-09T11:00:18 1746788418

Yup, I have had similar experience too. Not only for coding, but just yesterday, I was asking Gemini to compose an email with a list of attachments, which I had specified as a list of file paths in the prompt, and it wasn't able to count correctly and report in the email text (the text went something like, there are <number_of_attachments> charts attached). Claude 3.7 was able to do that correctly in one go.

arunix · 2025-05-09T10:38:41 1746787121

How much do you pay for Gemini 2.5 Pro?

childintime · 2025-05-09T19:13:10 1746817990

Something like $20/month, first 2 months $10. Depends on the country.

bugglebeetle · 2025-05-06T16:05:31 1746547531

This is generally controllable with prompting. I usually include something like, “be excessively cautious and conservative in refactoring, only implementing the desired changes” to avoid.

bugglebeetle · 2025-05-06T16:04:11 1746547451

It’s annoying, but I’ve done extensive work with this model and leaving the comments in for the first few iterations produced better outcomes. I expect this is baked into the RL they’re doing, but because of the context size, it’s not really an issue. You can just ask it to strip out in the final pass.

bugglebeetle · 2025-04-19T15:42:53 1745077373

If I have to spend this much time thinking about any of this, congratulations, you’ve designed a product with a terrible UI.

jasonjmcghee · 2025-04-19T15:53:11 1745077991

Some tools take more effort to hold properly than others. I'm not saying there's not a lot of room for improvement - or that the ux couldn't hold the users hand more to force things like this in some "assisted mode" but at the end of the day, it's a thin, useful wrapper around an llm, and llms require effort to use effectively.

I definitely get value out of it- more than any other tool like it that I've tried.

oxidant · 2025-04-19T19:30:00 1745091000

Think about what you would do in an unfamiliar project with no context and the ticket

"please fix the authorization bug in /api/users/:id".

You'd start by grepping the code base and trying to understand it.

Compare that to, "fix the permission in src/controllers/users.ts in the function `getById`. We need to check the user in the JWT is the same user that is being requested"

troupo · 2025-04-19T20:52:29 1745095949

So, AIs are overeager junior developers at best, and not the magical programmer replacements they are advertised as.

lacker · 2025-04-19T23:44:51 1745106291

Let's split the difference and call them "magical overeager junior developer replacements".

whywhywhywhy · 2025-04-20T10:47:20 1745146040

On a shorter timeline than you'd think none of working with these tools will look like this.

You'll be prompting and evaluating and iterating entirely finished pieces of software and be able to see multiple attempts at each solve at once, none of this deep in the weeds fixing a bug stuff.

We're rapidly approaching a world where a lot of software will be being made without an engineer hire at all, maybe not the hardest most complex or novel software but a lot of software that previously required a team of 3-15 wont have a single dev.

My current estimate is mid 2026

hu3 · 2025-04-20T21:36:41 1745185001

my current estimate is 2030. because we can barely get a JS/TS application to compile after a year of dependency updates.

our current popular stack is quicksand.

unless we're talking about .net core, java, Django and more of these stable platforms.

xpe · 2025-04-19T21:09:43 1745096983

> So, AIs are overeager junior developers at best, and not the magical programmer replacements they are advertised as.

This may be a quick quip or a rant. But the things we say have a way of reinforcing how we think. So I suggest refining until what we say cuts to the core of the matter. The claim above is a false dichotomy. Let's put aside advertisements and hype. Trying to map between AI capabilities and human ones is complicated. There is high quality writing on this to be found. I recommend reading literature reviews on evals.

troupo · 2025-04-19T21:17:12 1745097432

[flagged]

drodgers · 2025-04-19T21:57:45 1745099865

Don’t be a dismissive dick; that’s not appropriate for this forum. The above post is clearly trying to engage thoughtfully and offers genuinely good advice.

troupo · 2025-04-20T09:38:04 1745141884

The above post produces some vague philosophical statements, and equally vague "juts google it" claims.

xpe · 2025-04-21T13:16:30 1745241390

I’m thinking you might be a kind of person that requires very direct feedback. Your flagged comment was unkind and unhelpful. Your follow-up response seems to suggest that you were justified in being rude?

You also mischaracterize my comment two levels up. It didn’t wave you away by saying “just google it”. It said — perhaps not directly enough — that your comment was off track and gave you some ideas to consider and directions to explore.

troupo · 2025-04-22T12:32:52 1745325172

> There is high quality writing on this to be found. I recommend reading literature reviews on evals.

This is, quite literally, "just google it".

And yes, I prefer direct feedback, not vague philosophical and pseudo-philosophical statements and vague references. I'm sure there's high quality writing to be found on this, too.

xpe · 2025-04-22T14:03:02 1745330582

We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.

> not vague philosophical and pseudo-philosophical statements and vague references

If you stop being so uncharitable, more people might be inclined to engage you. Try to interpret what I wrote as constructive criticism.

Shall we get back to the object level? You wrote:

> AIs are overeager junior developers at best

Again, I'm saying this isn't a good framing. I'm asking you to consider you might be wrong. You don't need to hunker down. You don't need to counter-attack. Instead, you could do more reading and research.

troupo · 2025-04-22T14:38:35 1745332715

> We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.

Aka "I will make some vague references to some literature, go Google it"

> Instead, you could do more reading and research.

Instead of vague "just google it", and vague ad hominems you could actually provide constructive feedback.

xpe · 2025-04-25T15:52:26 1745596346

Want to try a conversational reset? I'll start.

My disagreement with the claim "AIs are overeager junior developers at best" largely has to do with both understanding what is happening under the hood and well as personal experience. Like many people, I have interacted for thousands of hours with ChatGPT, Claude, Gemini, and others, though my interaction patterns may be unusual -- not sure -- which I would characterize as (a) set expectations with a detailed prelude; (b) frame problems carefully; (c) trust nothing; (d) pushback relentlessly; (e) require 'thinking out loud'; (f) resist bundled solutions; (g) actively guide design and problem-solving dialogues; (h) actively mitigate sycophancy, overconfidence, and hallucination.

I've guided some junior / less experienced developers using many of the same patterns above. More or less, they can be summarized as "be more methodical". While I've found considerable variation in the quality of responses from LLMs, I would not characterize this variation as being anywhere close to that of a junior developer. I grant adjusting my interaction patterns considerably to improve the quality of the experience.

LLMs vary across dimensions of intelligence and capability. Here's my current assessment -- somewhat off the cuff, but I have put thought into it -- (1) LLM recall is superhuman. (2) Contextual awareness is mixed, sometimes unpredictably bad. Getting sufficient context is hard, but IMO this is less of a failure of the LLM or RAG and more about its lack of embodiment in a particular work setting. (3) Speed is generally superhuman. (4) Synthesis is often superhuman. (5) Ready-to-go high-quality all-in-one software solutions are not there yet. (6) Failure modes are painful; e.g. going in circles or waffling.

I should also ask what you mean by "overeager"? I would guess you are referring to the tendency of many LLMs to offer solutions problems despite lacking a way to validate their answers, perhaps even hallucinating API calls that don't exist?

oxidant · 2025-04-20T00:43:53 1745109833

The grandparent is talking about how to control cost by focusing the tool. My response was to a comment about how that takes too much thinking.

If you give a junior an overly broad prompt, they are going to have to do a ton of searching and reading to find out what they need to do. If you give them specific instructions, including files, they are more likely to get it right.

I never said they were replacements. At best, they're tools that are incredibly effective when used on the correct type of problem with the right type of prompt.

troupo · 2025-04-22T12:34:08 1745325248

> If you give a junior an overly broad prompt, they are going to have to do a ton of

> they're tools that are incredibly effective when used on the correct type of problem with the right type of prompt.

So, a junior developer who has to be told exactly what to do.

As for the "correct type of problem with the right type of prompt", what exactly are those?

oezi · 2025-04-20T08:38:14 1745138294

As of April 2025. The pace is so fast that it will overtake seniors within years maybe months.

jdiff · 2025-04-20T15:49:06 1745164146

That's been said since at least 2021 (the release date for GitHub Copilot). I think you're overestimating the pace.

apwell23 · 2025-04-20T09:11:36 1745140296

overtake ceo by 2026

djtango · 2025-04-20T09:37:42 1745141862

I have been quite skeptical of using AI tools and my experiences using them have been frustrating for developing software but power tools usually come with a learning curve while "good product" with clean simplified interface often results in reduced capability.

VIM, Emacs and Excel are obvious power tools which may require you to think but often produce unrivalled productivity for power users

So I don't think the verdict that the product has a bad UI is fair. Natural language interfaces is such a step up from old school APIs with countless flags and parameters

tetha · 2025-04-19T19:23:52 1745090632

Mh. Like, I'm deeply impressed what these AI assistants can do by now. But, the list in the parent comment there is very similar to my mental check-list of pair-programming / pair-admin'ing with less experienced people.

I guess "context length" in AIs is what I intuitively tracked with people already. It can be a struggle to connect the Zabbix alert, the ticket and the situation on the system already, even if you don't track down all the zabbix code and scripts. And then we throw in Ansible configuring the thing, and then the business requriements by more, or less controlled dev-teams. And then you realize dev is controlled by impossible sales-terms.

These are scope -- or I guess context -- expansions that cause people to struggle.

sqs · 2025-04-19T16:11:41 1745079101

It's fundamentally hard. If you have an easy solution, you can go make a easy few billion dollars.

bugglebeetle · 2025-04-19T15:24:22 1745076262

Claude Code works fairly well, but Anthropic has lost the plot on the state of market competition. OpenAI tried to buy Cursor and now Windsurf because they know they need to win market share, Gemini 2.5 pro is better at coding than their Sonnet models, has huge context and runs on their TPU stack, but somehow Anthropic is expecting people to pay $200 in API costs per functional PR costs to vibe code. Ok.

owebmaster · 2025-04-19T19:59:32 1745092772

> but somehow Anthropic is expecting people to pay $200 in API costs per functional PR costs to vibe code. Ok.

Reading the thread, somehow people are paying. It is mindblowing how in place of getting cheaper, development just got more expensive for businesses.

tylersmith · 2025-04-19T20:54:09 1745096049

$200 per PR is significantly cheaper development than businesses are paying.

xpe · 2025-04-19T23:28:49 1745105329

In terms of short-term outlay, perhaps. But don't forget to factor in the long-term benefits of having a human team involved.

frainfreeze · 2025-04-20T09:38:43 1745141923

3.5 was amazing for code, and topped benchmarks for months. It'll take a while for other models to take over that mental space.

bugglebeetle · 2025-03-21T03:06:29 1742526389

What’s the best library for fine-tuning VLMs at the moment and do they support this architecture or that for the IBM Granite vision models? Document understanding tasks seem in special need of fine-tuning.

jsight · 2025-03-21T03:42:46 1742528566

It looks like the model itself is here: https://huggingface.co/ds4sd/SmolDocling-256M-preview

It was fine tuned from this: https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct

There's an example of fine tuning the base that would likely be applicable to this one as well.

bugglebeetle · 2025-03-14T16:52:39 1741971159

What I always find hilarious about these naive libertarian types is they never even bother to check their hypotheticals against reality. For example, FutureMotion had to have a regulatory body intervene because they were killing and injuring people with their skateboard designs:

https://www.theguardian.com/sport/2023/oct/03/future-motion-...

So the answer to your question is, “yes, that needs to and did happen.”