This mirrors what I've seen. I've found that LLMs are most helpful in places where I have the most experience.
Maybe this is because of explicitness in prompt and preempting edge cases. Maybe it's because I know exactly what should be done. In these cases, I will still sometimes be surprised by a more complete answer then I was envisioning, a few edge cases that weren't front of mind.
But if I have _no_ idea things go wildly off course. I was doing some tricky frontend work with dynamically placed reactflow nodes and bezier curve edges. It took me easily 6 hours of bashing my head against the problem, and it was hard to stop using the assistant because of sunk cost. But I probably would have gotten more out of it and been faster if I'd just sat down and really broken down the problem for a few hours and then moved to implement.
The most tempting part of LLMs is letting them figure out design when you're in a time crunch. And the way it solves things when you understand the domain and the bottoms-up view of the work is deceptive in terms of capability.
And in this case, it's hoping that people on upwork understand their problems deeply. If they did, they probably wouldn't be posting on upwork. That's what they're trying to pay for.
I just had this conversation with a customer. And it’s hard to avoid anthropomorphizing ai. Once you equate the ai system with a human - a human who creates perfectly pep8 formatted python is probably a decent python programmer, whereas someone who bangs out some barely readable code with mixed spacing and variable naming styles is most likely a novice.
We use these signals to indicate how much we should trust the code - same with written text. Poorly constructed sentences? Gaps or pauses? Maybe that person isn’t as knowledgeable.
These shortcuts fail miserably on a system that generates perfect grammar, so when you bring your stereotypes gleaned from dealing with humans into the ai world, you’re in for an unpleasant surprise when you unpack the info and find it’s only about 75% correct, despite the impeccable grammar.
> But if I have _no_ idea things go wildly off course.
This is the key to getting some amount of productivity from LLMs in my experience, the ability to spot very quickly when they veer off course into fantasyland and nip it in the bud.
Then you point out the issue to them, they agree that they made a dumb mistake and fix it, then you ask them to build on what you just agreed to and they go and reintroduce the same issue they just agreed with you was an obvious problem... because ultimately they are more fancy auto complete machines than they are actual thinking machines.
I have found them to be a time saver on the whole even when working with new languages but I think this may in large part be helped by the fact that I have literally decades of coding experience that sets off my spidey senses as soon as they start going rampant.
I can't begin to imagine how comical it must be when someone who doesn't have a strong programming foundation just blindly trusts these things to produce useful code until the runtime or compile time bugs become unavoidably obvious.
You could ask "Is Putin more like Hitler or is Putin more like Ho Chi Minh?"
Putin does not try to hide the fact that he wants to restore the Russian empire and reconquer the former soviet bloc - a group of peoples who want nothing to do with him.
Ho Chi Minh wanted an independent Vietnam, got it, and never really expanded from there.
We either help the Ukranians stop Putin now or we fight a much bigger fight later. Hitler could have easily been stopped at the Rhineland, or at Czechoslovakia. But instead we got "Peace for our time".
PaliGemma on computer use data is absolutely not good. The difference between a FT YOLO model and a FT PaliGemma model is huge if generic bboxes are what you need. Microsoft's OmniParser also winds up using a YOLO backbone [1]. All of the browser use tools (like our friends at browser-use [2]) wind up trying to get a generic set of bboxes using the DOM and then applying generative models.
PaliGemma seems to fit into a completely different niche right now (VQA and Segmentation) that I don't really see having practical applications for computer use.
Nit: the drawback of “not working well in disk based systems” isn’t a drawback unless you’re already using disk based systems.
The difference in recall is also significant — what you really get with HNSW is a system made to give good cost:approximation quality. These IVFPQ based systems are ones I’ve seen people rip and replace if the use case is high value.
I really don’t understand the push to make pg do everything. It wasn’t designed for search, and trying to shove these features into the platform feels like some misguided cost optimization that puts all of your data infrastructure on the same critical path.
Hi, I'm the author of the article. In our actual product, VectorChord, we adopted a new quantization algorithm called RaBitQ. The accuracy has not been compromised by the quantization process. We’ve provided recall-QPS comparison curves against HNSW, which you can find in our blog: https://blog.pgvecto.rs/vectorchord-store-400k-vectors-for-1....
Many users choose PostgreSQL because they want to query their data across multiple dimensions, including leveraging time indexes, inverted indexes, geographic indexes, and more, while also being able to reuse their existing operational experiences. From my perspective, vector search in PostgreSQL does not have any disadvantages compared to specialized vector databases so fat.
What are you using for HNSW? Is the implementation handwritten? I’ve seen people using it well over XXm at full precision vectors with real time updates
pgvector - I wasn't able to get HNSW to build/rebuild quickly [enough] with a few million vectors. Very possibly I was doing something wrong, but fun research time ran out and I needed to get back to building features.
Yes, real time updates, filtering, and multi vector support make most of these on device, in memory approaches untenable. If you really are just doing a similarity search against a fixed set of things, often you know the queries ahead of time and can just make a lookup table.
No, the IFR of Covid was hugely overstated, which is why the projected population level impacts were completely wrong, even in places with limited interventions. Attributing cause of death is not as easy as it might seem.
Maybe this is because of explicitness in prompt and preempting edge cases. Maybe it's because I know exactly what should be done. In these cases, I will still sometimes be surprised by a more complete answer then I was envisioning, a few edge cases that weren't front of mind.
But if I have _no_ idea things go wildly off course. I was doing some tricky frontend work with dynamically placed reactflow nodes and bezier curve edges. It took me easily 6 hours of bashing my head against the problem, and it was hard to stop using the assistant because of sunk cost. But I probably would have gotten more out of it and been faster if I'd just sat down and really broken down the problem for a few hours and then moved to implement.
The most tempting part of LLMs is letting them figure out design when you're in a time crunch. And the way it solves things when you understand the domain and the bottoms-up view of the work is deceptive in terms of capability.
And in this case, it's hoping that people on upwork understand their problems deeply. If they did, they probably wouldn't be posting on upwork. That's what they're trying to pay for.