over the last week or so I have put probably close to 70 hours into playing arou...

rcarr · 2025-08-07T19:20:12 1754594412

I think some of this might come down to stack as well. I watched a t3.gg video[1] recently about Convex[2] and how the nature of it leads to the AI getting it right first time more often. I've been playing around with it the last few days and I think I agree with him.

I think the dev workflow is going to fundamentally change because to maximise productivity out of this you need to get multiple AIs working in parallel so rather than just jumping straight into coding we're going to end up writing a bunch of tickets out in a PM tool (Linear[3] looks like it's winning the race atm) and then working out (or using the AI to work out) which ones can be run in parallel without causing merge conflicts and then pulling multiple tickets into your IDE/Terminal and then cycling through the tabs and jumping in as needed.

Atm I'm still not really doing this but I know I need to make the switch and I'm thinking that Warp[4] might be best suited for this kind of workflow, with the occasional switch over to an IDE when you need to jump in and make some edits.

Oh also, to achieve this you need to use git worktrees[5,6,7].

[1]: https://www.youtube.com/watch?v=gZ4Tdwz1L7k

[2]: https://www.convex.dev/

[3]: https://linear.app/

[4]: https://www.warp.dev/

[5]: https://docs.anthropic.com/en/docs/claude-code/common-workfl...

[6]:https://git-scm.com/docs/git-worktree

[7]:https://www.tomups.com/posts/git-worktrees/

rcarr · 2025-08-07T22:23:08 1754605388

Seems like VSCode just added a lot of stuff for this in the latest update today, such as worktree support[1] and an agent session mode[2].

[1]: https://code.visualstudio.com/updates/v1_103#_git-worktree-s...

[2]: https://code.visualstudio.com/updates/v1_103#_chat-sessions-...

rcarr · 2025-08-09T19:18:04 1754767084

Adding another comment as over the last few days I have discovered more tools for this flow although I have yet to try any of them out:

- https://conductor.build/

- https://www.superclaude.sh/

- https://github.com/bmad-code-org/BMAD-METHOD

- https://github.com/ruvnet/claude-flow

- https://github.com/smtg-ai/claude-squad

- https://steveasleep.com/autowt/

- https://github.com/stravu/crystal

- https://plandex.ai/

isoprophlex · 2025-08-07T19:25:43 1754594743

Sure sounds interesting but... Where on earth do you actually find the time to sit through a 1.5 hour yt video?!

mceachen · 2025-08-07T22:38:28 1754606308

On a desktop browser, tap YouTube's "show transcript" and "hide timecodes", then copy-paste the whole transcript into Claude or chatgpt and tell it to summarize with whatever resolution you want-a couple sentences, 400 lines, whatever. You can also tell it to focus on certain subject material.

This is a complete game changer for staying on top of what's being covered by local government meetings. Our local bureaucrats are astounding competent at talking about absolutely nothing for 95% of the time, but hidden is three minutes of "oh btw we're planning on paving over the local open space preserve to provide parking for the local business".

theshrike79 · 2025-08-08T07:09:49 1754636989

Copy the url, tap cmd-t

Write '!sum ' hit cmd-v and enter

Then the Kagi summariser will do that :)

rcarr · 2025-08-07T19:40:52 1754595652

Jump in and start coding entire backend with stack not best suited for job and modern AI tools: most likely future hours lost.

Spend 1.5 hours now to learn from an experienced dev on a stack that is better suited for job: most likely future hours gained.

burnished · 2025-08-07T21:12:46 1754601166

1.5x and 2x speed help a lot, slow down or repeat segments as needed, don't be afraid to fast forward past irrelevant looking bits (just be eager to backtrack).

mafro · 2025-08-07T21:21:07 1754601667

Ask an LLM to transcribe and give the overview and key points

davidw · 2025-08-08T00:42:53 1754613773

If it can produce something you can read in 20 minutes, it means there was a lot of... 'fluff' isn't quite the right word, but material that could be removed without losing meaning.

v5v3 · 2025-08-07T20:14:51 1754597691

People find time for things they seem important to them.

theshrike79 · 2025-08-08T07:12:19 1754637139

But with a hour long video, how do you know if the content is any good?

With text I can skim around the headings and images and see at a glance how deep the author is going into the subject.

In that specific video the first 30 minutes is related to everything but the new Web Scale[0] LLM native database the author is "moving to" from SQL.

Meanwhile Postgresql is just chugging along and over-performing all of them.

[0] https://www.youtube.com/watch?v=b2F-DItXtZs

rcarr · 2025-08-10T23:35:13 1754868913

Adding yet another comment as you can also call agents from Linear directly, who will create pull requests in github, but they seem pretty expensive for what they are. They don't seem to offer any real benefit over setting up the mcp server, opening a terminal window and typing "create a pr for $TICKET NUMBER in Linear" other than shaving off a few seconds.

neuronexmachina · 2025-08-07T19:09:56 1754593796

> That said the reality is in my experience the only models that actually work in any sort of reliable way are claude models.

Anecdotally, the tool updates in the latest Cursor (1.4) seem to have made tool usage in models like Gemini much more reliable. Previously it would struggle to make simple file edits, but now the edits work pretty much every time.

throwaway_2898 · 2025-08-07T18:20:35 1754590835

How much of the product were you able to build to say it was good/reliable? IME, 70 hours can get you to a PoC that "works", building beyond the initial set of features — like say a first draft of all the APIs — does it do well once you start layering features?

petralithic · 2025-08-07T19:56:53 1754596613

This has been my experience. The greenfield approach works up to a point, then it just breaks.

Maxion · 2025-08-08T05:50:01 1754632201

It depends on how you use it. The "vibe-coding" approach where you give the agen naive propmts like "make new endpoint" often don't work and fail.

When you break the problem of "create new endpoint" down into its sub-components (Which you can do with the agent) and then work on one part at a time, with a new session for each part, you generally do have more success.

The more boilerplate-y the part is, the better it is. I have not really found one model that can yet reliably one-shot things in real life projects, but they do get quie close.

For many tasks, the models are slower than what I am, but IMO at this point they are helpful and definitely should be part of the toolset involved.

disgruntledphd2 · 2025-08-08T15:17:41 1754666261

> The more boilerplate-y the part is, the better it is. I have not really found one model that can yet reliably one-shot things in real life projects, but they do get quie close.

This definitely feels right from my experience. Small tasks that are present in the training data = good output with little effort.

Infra tasks (something that isn't in the training data as often) = sad times and lots of spelunking (to be fair Gemini has done a good job for me eventually, even though it told me to nuke my database (which sadly, was a good solution)).

ralfd · 2025-08-07T18:14:38 1754590478

Just replying to ask you next week what your assessment on GPT5 is.

risho · 2025-08-08T21:19:37 1754687977

I've been trying it out with openai codex over the last day and a half and I have been incredibly impressed. It has been working quite well. I also had it look over some code that claude produced for me and it said that it would be better to approach it another way and it completely rewrote it in a way that actually was significantly better. The UX for codex is quite a bit worse than Claude Code, but the model has been good enough to justify the switch for now. I'm hopeful that cursor cli will eventually have a good enough ux such that I can switch to it and have access to all of the models rather than needing to use disparate tools for everything. I would strongly suggest you check out gpt 5 for agentic stuff if you are interested.

Centigonal · 2025-08-07T18:28:38 1754591318

Ditto here, except I'm using Roo and it's Claude and Gemini pro 2.5 that work for me.

zarzavat · 2025-08-07T19:12:04 1754593924

The magic is the prompting/tool use/finetuning.

I find that OpenAI's reasoning models write better code and are better at raw problem solving, but Claude code is a much more useful product, even if the model itself is weaker.