The beta is inconsistently showing (required a few refreshes to get something to...

gukov · 2025-01-14T23:21:24 1736896884

You'd think Open AI's dev velocity and quality would be off the charts since they live and breathe "AI." If a company building ChatGPT itself often delivers buggy features then it doesn't bode well for this whole 'AI will eat the world' notion.

practice9 · 2025-01-15T02:15:23 1736907323

Well none of the labs have good frontend or mobile engineers or even infra engineers

Anthropic is ahead in this because they keep their UIs simplistic so the failure modes are also simple (bad connection)

OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).

Find it hilarious and sad that o1-pro just times out thinking on very long or image-intense chats. Need to reload page multiple times after it fails to reply and maybe answer will appear (or not? Or in 5 minutes?). Kinda shows they’re not testing enough and “not eating their own food” and feels like chatgpt 3.5 ui before the redesign

lolinder · 2025-01-15T02:29:48 1736908188

> Anthropic is ahead in this because they keep their UIs simplistic ... OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).

What's funny is that OpenAI's Canvas was their attempt to copy Anthropic's Artifacts! So it's not like Anthropic is stagnant and OpenAI is at least shipping, Anthropic is shipping and OpenAI can't even copy them right.

jeffgreco · 2025-01-15T02:55:49 1736909749

It's a good point, Anthropic is being VERY choosy and winds up knocking it out of the park with stuff like Artifacts. Meanwhile their MacOS app is junk, but obviously not a priority.

cma · 2025-01-15T03:34:45 1736912085

> because they keep their UIs simplistic

How do I edit a sent message in the Claude Android app? It's so simplistic I can't find it.

siquick · 2025-01-15T08:04:42 1736928282

You can’t edit on iOS either

golergka · 2025-01-15T01:30:55 1736904655

So far, I've found AI to be a great force multiplier in green field, small projects. In a huge corporate codebase, it has the power of advanced refactoring (which doesn't touch more than a handful files at a time) and a CSS wizard.

cruffle_duffle · 2025-01-15T00:09:04 1736899744

According to all the magazines I've been reading, all that is required is to just prompt it with "please fix all of these issues" and give it a bulleted list with a single sentence describing each issue. I mean, it's AI powered and therefore much better than overpaid prima-donna engineers, so obviously it should "just work" and all the problems will get fixed. I'm sure most of the bugs were the result of humans meddling in the AI's brilliant output.

Right now, in fact, my understanding is OpenAI is using their current LLM's to write the next generation ones which will far surpass anything a developer can currently do. Obviously we'll need to keep management around to tell these things what to do, but the days of being a paid software engineer are numbered.

xarope · 2025-01-15T01:52:16 1736905936

I think you forgot the /s (sarcasm) in your post!

ineedasername · 2025-01-15T01:55:42 1736906142

When I have it do a search I have to tell it to just get all the info it can in the search but wait for the next request. The I explicitly tell it we’re done searching and to treat the next prompt as a new request but using the new info it found.

That’s the only way I get it to have a halfway decent brain after a web search. Something about that mode makes it more like a PR drone version of whatever I asked it to search, repeating things verbatim even when I ask for more specifics in follow-up.

emkee · 2025-01-17T00:48:13 1737074893

Can you give an example prompt for this approach?

imsotiredspacex · 2025-01-14T23:27:47 1736897267

i posted the system prompt part describing the function call; if you read it and adjust your prompt for creating the task it works way better.

potatoman22 · 2025-01-14T23:01:03 1736895663

I'd rather have buggy things now than perfect things in a year.

dmadisetti · 2025-01-14T23:05:30 1736895930

Doesn't need to be perfect- but using this would actively reduce productivity

sprobertson · 2025-01-14T23:03:31 1736895811

First impressions matter, if the experience is this bad you're probably waiting a year to come back anyway.

jahewson · 2025-01-14T23:15:38 1736896538

Worked out great for Sonos when their timers and alarms didn’t work.

broknbottle · 2025-01-15T04:00:19 1736913619

Found the PM

arthurcolle · 2025-01-14T22:42:53 1736894573

DateTime stuff is generally super annoying to debug. Can't fault them too badly. Adding a scheduler is a key enabling idea for a ton of use cases

sensanaty · 2025-01-14T22:49:28 1736894968

> Can't fault them too badly

The same company that touts their super hyper advanced AI tool that can do everyone's (except the C-level's, apparently) jobs to the world can't figure out how to make a functional cron job happen? And we're giving them a pass, despite the bajillions of dollars that M$ and VC is funneling their way?

Quite interesting they wouldn't just throw the "proven to be AGI cause it passes some IQ tests sometimes" tooling at it and be done with it.

arthurcolle · 2025-01-14T23:58:04 1736899084

it would explain the bugs if they used the AI to make the datetime implementation though

cbeach · 2025-01-14T22:56:31 1736895391

Agreed on date/time being a frustrating area of software development.

But wouldn't a company like OpenAI use a tick-based system in this architecture? i.e. there's an event emitter that ticks every second (or maybe minute), and consumers that operate based on these events in realtime? Obviously things get complicated due to the time consumed by inference models, but if OpenAI knows the task upfront it could make an allowance for the inference time?

If the logic is event driven and deterministic, it's easy to test and debug, right?

singron · 2025-01-15T02:40:06 1736908806

The original cron was programmed this way, but it has to examine every job every tick to check if it should run, which doesn't scale well. Instead, you predict when the next run for a job will be and insert that into an indexed schedule. Then each tick it checks the front of the schedule in ascending order of timestamps until the remaining jobs are in the future.

This is also a bad case in terms of queueing theory. Looking at Kingmans equation, the arrival variance is very high (a ton of jobs will run at 00:00 and much fewer at 00:01), and the service time also has pretty high variance. That combo will either require high queue delay variance, low utilization (i.e. over-provosioning), or a sophisticated auto-scaler that aggressively starts and stops instances to anticipate the schedule. Most of the time it's ok to let jobs queue since most use cases don't care if a daily or weekly job is 5 minutes late.

dmadisetti · 2025-01-14T22:54:41 1736895281

Yeah, they're not exactly a scrappy startup- I'd be surprised if they had 0 QA.

Makes me wonder if they internally have "press releases / Q" as an internal metric to keep up the hype.

airstrike · 2025-01-15T01:30:21 1736904621

Maybe that's the Q* we've been hearing rumors about