Hacker Newsnew | past | comments | ask | show | jobs | submit | more Nihilartikel's commentslogin

100% agree. I had Gemini flash 2 chew through thousands of points of nasty unstructured client data and it did a 'better than human intern' level conversion into clean structured output for about $30 of API usage. I am sold. 2.5 pro experimental is a different league though for coding. I'm leveraging it for massive refactoring now and it is almost magical.


> thousands of points of nasty unstructured client data

What I always wonder in these kinds of cases is: What makes you confident the AI actually did a good job since presumably you haven't looked at the thousands of client data yourself?

For all you know it made up 50% of the result.


This was solved a hundred years ago.

It's the same problem factories have: they produce a lot of parts, and it's very expensive to put a full operator or more on a machine to do 100% part inspection. And the machines aren't perfect, so we can't just trust that they work.

So starting in the 1920s Walter Shewhart and Edward Deming came up with Statistical Process Control. We accept the quality of the product produced based on the variance we see of samples, and how they measure against upper and lower control limits.

Based on that, we can estimate a "good parts rate" (which later got used in ideas like Six Sigma to describe the probability of bad parts being passed).

The software industry was built on determinism, but now software engineers will need to learn the statistical methods created by engineers who have forever lived in the stochastic world of making physical products.


I hope you're being sarcastic. SPC is necessary because mechanical parts have physical tolerances and manufacturing processes are affected by unavoidable statistical variations; it is beyond idiotic to be provided with a machine that can execute deterministic, repeatable processes and then throw that all into the gutter for mere convenience, justifying that simply because "the time is ripe for SWE to learn statistics"


We don't know how to implement a "deterministic, repeatable process" that can look at a bug in a repo and implement a fix end-to-end.


that is not what OP was talking about though.


LLMs are literally stochastic, so the point is the same no matter what the example application is.


Humans are literally stochastic, so the point is the same no matter what the example application is.


The deterministic, repeatable process of human (and now machine) judgement and semantic processing?


In my case I had hundreds of invoices in a not-very-consistent PDF format which I had contemporaneously tracked in spreadsheets. After data extraction (pdftotext + OpenAI API), I cross-checked against the spreadsheets, and for any discrepancies I reviewed the original PDFs and old bank statements.

The main issue I had was it was surprisingly hard to get the model to consistently strip commas from dollar values, which broke the csv output I asked for. I gave up on prompt engineering it to perfection, and just looped around it with a regex check.

Otherwise, accuracy was extremely good and it surfaced a few errors in my spreadsheets over the years.


I hope there is a future where csv comma's don't screw up data. I know it will never happen but it's a nightmare.

Everyone has a story of a csv formatting nightmare


For what it's worth, I did check over many hundreds of them. Formatted things for side by side comparison and ordered by some heuristics of data nastiness.

It wasn't a one shot deal at all. I found the ambiguous modalities in the data and hand corrected examples to include in the prompt. After about 10 corrections and some exposition about the cases it seemed to misundestand, it got really good. Edit: not too different from a feedback loop with an intern ;)


Though the same logic can be applied to everywhere, right? Even if it's done by human interns, you need to audit everything to be 100% confident or just have some trust on them.


Not the same logic because interns can make meaning out of the data - that’s built-in error correction.

They also remember what they did - if you spot one misunderstanding, there’s a chance they’ll be able to check all similar scenarios.

Comparing the mechanics of an LLM to human intelligence shows deep misunderstanding of one, the other, or both - if done in good faith of course.


Not sure why you're trying to conflate intellectual capability problems into this and complicate the argument? The problem layout is the same. You delegate the works to someone so you cannot understand all the details. This makes a fundamental tension between trust and confidence. Their parameters might be different due to intellectual capability, but whoever you're going to delegate, you cannot evade this trade-off.

BTW, not sure if you have experiences of delegating some works to human interns or new grads and being rewarded by disastrous results? I've done that multiple times and don't trust anyone too much. This is why we typically develop review processes, guardrails etc etc.


> not sure if you have experiences of delegating some works to human interns or new grads and being rewarded by disastrous results?

Oh yes I have ;)

Which is why I always explain the why behind the task.


You can use AI to verify its own work. Last time I split a C++ header file into header + implementation file. I noticed some code got rewritten in a wrong manner, so I asked it to compare the new implementation file against the original header file, but to do so one method at a time. For each method, say whether the code is exactly the same and has the same behavior, ignoring superficial syntax changes and renames. Took me a few times to get the prompt right, though.


Many types of data have very easily checkable aggregates. Think accounting books.


It also depends on what you are using the data for, if it's for non (precise) data based decisions then it's fine. Specially if you looking for "vibe" based decisions before then dedicating time to "actually" process the data for confirmation.

30$ to get an view into data that would take at least x many hours of someone's time is actually super cheap, specially if the decision of that result is then to invest or not invest the x many hours to confirm it.


You take a sample and check


In my professional opinion they can extract data at 85-95% accuracy.


> I'm leveraging it for massive refactoring now and it is almost magical.

Can you share more about your strategy for "massive refactoring" with Gemini?

Like the steps in general for processing your codebase, and even your main goals for the refactoring.


Isn't it better to get gemini to create a tool to format the data? Or was it in such a state that that would have been impossible?


what tool are you using 2.5-pro-exp through? Cline? Or the browser directly?


For 2.5 pro exp I've been attaching files into AIStudio in the browser in some cases. In others, I have been using vscode's Gemini Code Assist which I believe recently started using 2.5 Pro. Though at one point I noticed that it was acting noticeably dumber, and over in the corner, sure enough it warned that it had reverted to 2.0 due to heavy traffic.

For the bulk data processing I just used the python API and Jupyter notebooks to build things out, since it was a one-time effort.


Copilot experimental (need VSCode Insiders) has it. I‘ve thought about trying aider —-watch-files though, also works with multiple files.


I'd consider drinking raw milk only if I was on a first name basis with the cow that produced it.

Otherwise I would at least demand it be fermented into kefir so the food microbes can muscle out the bad.


That won't make a difference. Bacteria is something you cannot see and so you have no idea what is on/in the cow.


It sure can make a difference.

Sickness caused by bacteria doesn't happen as soon as one bad bacteria (bacterium?) enters your body, a certain critical mass is usually required. This is very similar to the concept of "viral load" where a certain amount of viral genetic material needs to be exchanged before the viral infection can take hold.

The "beneficial bacteria" on your skin and in your gut make it harder for bad bacteria to take root in many different ways, one of them simply being they provide competition, "crowding out the bad guys".

Another way is that many, many, many types of antibiotics were originally discovered as metabolites produced by bacteria and fungi (examples include penicillin, streptomycin, chloramphenicol, and tetracycline).

And for completeness sake, milk kefir contains many Lactobacillus species that are also a natural part of the mammal microbiome (which makes sense when you think about it; Lactobacillus are named for consuming lactose, an ingredient of mammal milk).


I'll buy an expensive hat and eat it, though, if the chans aren't already crawling with sinister propagand-anon-automatons playing tug of war with the Overton window of edgelord discourse.


So now it just sounds like my favorite imported soy sauce will be more expensive.

Can't wait for the Pittsburgh soy sauce brewery industry to be onshored again!


It's funny. I view it as a common modality of fraud among the 'cufflinked bozo with a sharp haircut' founder crowd. That is, they probably could have actually pulled off their business plan if they had any ability beyond being able to 'talk a big game.'

LLMs are mostly 'there' if one knows how to use them. Maybe they weren't when they started their business, but what kind of leader getting millions in funding doesn't understand the 2nd and 3rd order derivatives of acceleration in their space? Bozos.


The world is run by people with no ability beyond being able to 'talk a big game.' Business promotes people with no ability beyond being able to 'talk a big game.' Investors fund people with no ability beyond being able to 'talk a big game.' It's all talk and bullshit, all the way up the totem pole.


Feature, not a bug. It's capitalism, not meritocracy.


Borgmon readability was my... least favorite readability.

Is 'monarch' still a thing? It was newish around the time that I left.


Monarch is very much still the recommended monitoring backend.


But Monarch is the backend, so people rarely interact with it. As of the time I left Google, the UI is automon, the language to write alert rules in is gmon, the language to do interactive exploration in is mash, and the CLI to interact with retentions and metrics etc is monarch_tool. It didn't matter if the backend was Monarch or not.


Sure. Most engineers will probably poke around on the UI to look at graphs, and write any special monitoring configs in GMon (a DSL in Python, which is then transpiled to mash). Directly interacting with monarch_tool should be rare, as monitoring integration with rollout automation is also provided by default.

Further, the entire point of automon is to automatically generate common monitoring dashboards, which you should expect to be sufficient if you're creating a bog-standard setup.

> It didn't matter if the backend was Monarch or not.

It totally does. Borgmon has a totally different data model, a custom query language, its own UI, and various quirks along the way. To add insult to injury, it was a very real thing where if you wanted to set up new monitoring, you needed to get someone with Borgmon readability to approve your change (that requirement has since been lifted). Meanwhile, today, you don't need anyone with Python readability to ever look at your GMon code.

You can have Automon graphs that fetch from Borgmon under the hood, but everything else that you've described is 100% Monarch-specific.


Davinci Resolve is a lifetime license for an extremely powerful video production platform. It's a great value even at a serious-hobbyist level.


Buying Studio grants a lifetime license to the current major version only. It's just that DaVinci has been upgrading everyone for free with every major update. They can stop doing that at any moment.

Not that they'll ever do that. Resolve / Studio is their loss-leader product to pull people into their very premium camera ecosystem.


I don't know.. I've maintained skepticism, but recently AI has enabled solutions for client problems that would have been intractable with conventional coding. A team was migrating a years old excel based workflow where no less than 3 spreadsheets contained thousands of call notes, often with multiple notes stuffed into the same column separated inconstantly by a shorthand date and initials of who was in the call. Sometimes with text arrows or other meta descriptions like (all calls after 3/5 were handled by Tim). They want to move all of this into structured jira tickets and child tickets.

Joining the mess of freeform, redundant, and sometimes self contradicting data into JSON lines, and feeding it into AI with a big explicit prompt containing example conversions and corrections for possible pitfalls has resulted in almost magically good output. I added a 'notes' field to the output and instructed the model to call out anything unusual and it caught lots of date typos by context, ambiguously attributed notes, and more.

It would have been a man month or so of soul drowningly tedious and error prone intern level work, but now it was 40 minutes and $15 of Gemini usage.

So, even if it's not a galaxy brained super intelligence yet, it is a massive change to be able to automate what was once exclusively 'people' work.


Or, more likely: "the wicked and deceptive establishment have yet again managed to sabotage the brilliant and noble efforts of America's best ever leader. The only way America can be saved now is to suppress these treacherous anti American rabble rousers so that they can't keep the real American heros down."


Implement all this jazz with s-expressions and I am on board!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: