Hacker News new | past | comments | ask | show | jobs | submit | more nostrebored's comments login

This mirrors what I've seen. I've found that LLMs are most helpful in places where I have the most experience.

Maybe this is because of explicitness in prompt and preempting edge cases. Maybe it's because I know exactly what should be done. In these cases, I will still sometimes be surprised by a more complete answer then I was envisioning, a few edge cases that weren't front of mind.

But if I have _no_ idea things go wildly off course. I was doing some tricky frontend work with dynamically placed reactflow nodes and bezier curve edges. It took me easily 6 hours of bashing my head against the problem, and it was hard to stop using the assistant because of sunk cost. But I probably would have gotten more out of it and been faster if I'd just sat down and really broken down the problem for a few hours and then moved to implement.

The most tempting part of LLMs is letting them figure out design when you're in a time crunch. And the way it solves things when you understand the domain and the bottoms-up view of the work is deceptive in terms of capability.

And in this case, it's hoping that people on upwork understand their problems deeply. If they did, they probably wouldn't be posting on upwork. That's what they're trying to pay for.


I just had this conversation with a customer. And it’s hard to avoid anthropomorphizing ai. Once you equate the ai system with a human - a human who creates perfectly pep8 formatted python is probably a decent python programmer, whereas someone who bangs out some barely readable code with mixed spacing and variable naming styles is most likely a novice.

We use these signals to indicate how much we should trust the code - same with written text. Poorly constructed sentences? Gaps or pauses? Maybe that person isn’t as knowledgeable.

These shortcuts fail miserably on a system that generates perfect grammar, so when you bring your stereotypes gleaned from dealing with humans into the ai world, you’re in for an unpleasant surprise when you unpack the info and find it’s only about 75% correct, despite the impeccable grammar.


> But if I have _no_ idea things go wildly off course.

This is the key to getting some amount of productivity from LLMs in my experience, the ability to spot very quickly when they veer off course into fantasyland and nip it in the bud.

Then you point out the issue to them, they agree that they made a dumb mistake and fix it, then you ask them to build on what you just agreed to and they go and reintroduce the same issue they just agreed with you was an obvious problem... because ultimately they are more fancy auto complete machines than they are actual thinking machines.

I have found them to be a time saver on the whole even when working with new languages but I think this may in large part be helped by the fact that I have literally decades of coding experience that sets off my spidey senses as soon as they start going rampant.

I can't begin to imagine how comical it must be when someone who doesn't have a strong programming foundation just blindly trusts these things to produce useful code until the runtime or compile time bugs become unavoidably obvious.


Huge swathes of the country do not want to be involved in Ukraine. Positioning this as “sucking up to Putin” seems intentionally inflammatory.


Huge swaths of the country didn't want to be involved in WW1 and WW2 either. Look how well that worked out.


I’m not sure that I follow. I could say the same about Vietnam and Afghanistan. The situation in both world wars was materially different.


You could ask "Is Putin more like Hitler or is Putin more like Ho Chi Minh?"

Putin does not try to hide the fact that he wants to restore the Russian empire and reconquer the former soviet bloc - a group of peoples who want nothing to do with him.

Ho Chi Minh wanted an independent Vietnam, got it, and never really expanded from there.

We either help the Ukranians stop Putin now or we fight a much bigger fight later. Hitler could have easily been stopped at the Rhineland, or at Czechoslovakia. But instead we got "Peace for our time".


But ruins the sear? Seems like an odd nit


You can still sear it at the end with a tiny bit of butter


No, you still get a dark brown sear.


The spatial awareness is what grounding models try to achieve, e.g. UGround [1]

[1] https://huggingface.co/osunlp/UGround-V1-7B?language=python


PaliGemma on computer use data is absolutely not good. The difference between a FT YOLO model and a FT PaliGemma model is huge if generic bboxes are what you need. Microsoft's OmniParser also winds up using a YOLO backbone [1]. All of the browser use tools (like our friends at browser-use [2]) wind up trying to get a generic set of bboxes using the DOM and then applying generative models.

PaliGemma seems to fit into a completely different niche right now (VQA and Segmentation) that I don't really see having practical applications for computer use.

[1] https://huggingface.co/microsoft/OmniParser?language=python [2] https://github.com/browser-use/browser-use


Nit: the drawback of “not working well in disk based systems” isn’t a drawback unless you’re already using disk based systems.

The difference in recall is also significant — what you really get with HNSW is a system made to give good cost:approximation quality. These IVFPQ based systems are ones I’ve seen people rip and replace if the use case is high value.

I really don’t understand the push to make pg do everything. It wasn’t designed for search, and trying to shove these features into the platform feels like some misguided cost optimization that puts all of your data infrastructure on the same critical path.


Hi, I'm the author of the article. In our actual product, VectorChord, we adopted a new quantization algorithm called RaBitQ. The accuracy has not been compromised by the quantization process. We’ve provided recall-QPS comparison curves against HNSW, which you can find in our blog: https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1....

Many users choose PostgreSQL because they want to query their data across multiple dimensions, including leveraging time indexes, inverted indexes, geographic indexes, and more, while also being able to reuse their existing operational experiences. From my perspective, vector search in PostgreSQL does not have any disadvantages compared to specialized vector databases so fat.


But why are you benchmarking against pgvector HNSW, which is known to struggle with recall and performance at large numbers of vectors?

Why is the graph measuring precision and not recall?

The feature dump is entirely a subset of Vespa features.

This is just an odd benchmark. I can tell you in the wild, for revenue attached use cases, I saw _zero_ companies choose pg for embedding retrieval.


What are you using for HNSW? Is the implementation handwritten? I’ve seen people using it well over XXm at full precision vectors with real time updates


pgvector - I wasn't able to get HNSW to build/rebuild quickly [enough] with a few million vectors. Very possibly I was doing something wrong, but fun research time ran out and I needed to get back to building features.


Yes, real time updates, filtering, and multi vector support make most of these on device, in memory approaches untenable. If you really are just doing a similarity search against a fixed set of things, often you know the queries ahead of time and can just make a lookup table.


Seconding Gemini flash for structured outputs. Have had some quite large jobs I’ve been happy with.


No, the IFR of Covid was hugely overstated, which is why the projected population level impacts were completely wrong, even in places with limited interventions. Attributing cause of death is not as easy as it might seem.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: