I'm sure that was more convenient, but did it provide anything that the API doesn't? Asking because I'm planning to use the API to get a dump of all my bookmarks soon—I've always had a very pleasant experience using the Pinboard API, compared to many others.
Edit: I just checked and it's a single http call to get all bookmarks, very simple. Granted I 'only' have ~800 bookmarks, so it may get tricker with larger archives.
Because it objectively is. Do you use AI tools? The output is clearly and immediately useful in a variety of ways. I almost don't even know how to respond to this comment. It's like saying “computers can be wrong. why would you ever trust the output of a computer?”
But you needed to tell it that it was wrong 3-4 times. Why do you trust it to be "correct" now? Should it need more flogging? Or was it you who were wrong in the first place?
It's a different thing though. In your case they used a timestamp to manually look at footage and confirm an identity. In OP's case, automated recognition is used to identify and track people, in aggregation mass.
That's the amazing thing about the apparently still enthusiastic acceptance of LLMs for tasks that are better done by experts. We've moved on before, but now sometimes it seems almost perfectly okay to peddle uncertainty, hallucinations and random results. And no, it's not about democratizing computer use, it's about choosing the right resource for the right job if the outcome is going to be about something. Everything else is a degeneration into vagueness or half-knowledge, not to mention the problems of missing or abandoned data privacy and (software) freedom.
I’d definitely prefer to have a professional certified DBA help me query my hobby project databases. But do you know one that’ll do it for (almost) free and is available 24/7?
Even at work, I can’t hog a single data engineer’s attention for hours on end without at least trying myself first.
If you have created a hobby project database, you have the skills to learn how to query it - it is probably part of your hobby and fun. If not, and if you do not want to open source it to get community support, you are better off cutting out the database part and choosing other technologies.
At work, the situation should not be so different. If your manager cannot provide you with the means to maintain the database according to the business needs, you / your boss / your team / your business / your company have a problem and should better choose technologies that are manageable and/or for which you have enough resources.
You seem to have some pretty detailed understanding of my hobby projects, and the way I enjoy doing them, considering you know neither them nor me!
> If you have created a hobby project database, you have the skills to learn how to query it - it is probably part of your hobby and fun.
Not at all in many cases. Many existing open source projects these days involve a database for better or worse, and I wouldn't enjoy porting their storage layer to something other than whatever database they already use.
I really don't need more engineering rabbit holes to go down in my life, neither at work nor in my free time. I do so by defining loose boundaries labeled "here be time sinks" and only go there if I'm really curious or it seems enjoyable, but not when I'm trying to get something else done.
LLMs have moved these boundaries somewhat, and I believe for the better.
> If the SQL query fails to execute (due to a syntax error of some kind) it
> passes that error back to the model for corrections and retries up to three
> times before giving up.
Funnily enough syntax errors is the one thing that you can completely eliminate in LLMs simply by masking the output symbol probability vector to just those which are valid upcoming symbols.
Yeah, that's a pretty solid approach if the LLM you're using (and the third party that hosts it, if you're not self-hosting) supports that.
One minor footgun I've seen with that approach is that while the model is guaranteed to produce syntactically valid outputs, it can still "trap" the model into outputting something both wrong and low-probability if you design your schema badly in specific ways (contrived example: if you're doing sentiment analysis of reviews and you have the model pick from the enumeration ["very negative", "negative", "slightly negative", "neutral", "positive"], then the model might encounter a glowing review and write "very", intending to follow it up with " positive", but since " positive" isn't a valid continuation it ends up writing "very negative" instead).
What a tired argument. Is this what the field is reduced to? Computers are supposed to be fast, deterministic, and correct. It is profoundly disappointing that we’re regressing to a system that is as mediocre as the average bullshitter and laud it as an achievement.
The thing with humans is that you can build trust. I know exactly who to ask if I have a question about music, or medicine, or a myriad of other topics. I know those people will know the answers and be able to assess their level of confidence in them. If they don’t know, they can figure it out. If they are mistaken, they’ll come back and correct themselves without me having to do anything.
Comparing LLMs to random humans is the wrong methodology. Of course, Upton Sinclair had a point so I don’t expect to convince someone who is monetarily invested in having this broken assumption succeed.
Is not the answer here the same? Build the trust in the AI systems. Benchmarks help and are a start. Consistent results over time are another measure. We'll learn over the next few years which models, or which companies developing model can be trusted for certain sets of tasks. If these exceed alternatives, and have consistency over a margin I can reason about, I'll make use of them, same as any other tool.
I can ask the exact same question of an LLM multiple times and get different answers with the same degree of confidence. Hard to trust that, and also hard to fix.
Which wouldn’t be so problematic if people didn’t just turn off their brains when interacting with them.
Either way, everything you’re suggesting are possibilities for the future, which may or may not pan out. The bad comparisons to humans are happening today.
There are plenty of use cases where this is a great starting place to be able to jump in and start to ask questions.
I have worked on a variety of acquisitions where I didn’t need to have 100% certainty, I just needed a starting place to make sense of some bizarre home grown financials and something like this would’ve been great to be able to quickly probe and come back with thoughtful questions.
Instead I spent my time either tediously figuring out a schema or waiting for someone in finance to come back with ad-hoc analysis that also had a bunch of errors.
> Why are you pretending this is a binary outcome?
You seem confused. My argument is not at all concerned with outcomes and is not binary in the slightest. My point is clearly spelled out:
> Comparing LLMs to random humans is the wrong methodology.
I’m not saying LLMs are never useful or anything of the sort. What I am saying is that defending LLMs on the basis of “some random undefined human would do the same” is a poor argument which shows a deep misunderstanding of human collaboration.
In other words, just like when I run an SQL query myself or ask a team member to do it.
The correct usage here is of course not to just run a natural language query and blindly trust the results. Sanity checking both the query and results is essential.
I still find LLMs incredibly useful in exposing new SQL functionality to me or refactoring larger existing queries to a very different approach (as SQL is unfortunately does not allow defining query components in a modular way that would let me avoid that).
> refactoring larger existing queries to a very different approach
Huh, that's really interesting. I've found LLMs (mostly Claude) to be pretty bad at writing SQL (they love cross joins for some reason), so it's interesting that others are getting good results. What models are you using, do you do any particular prompt engineering or anything different?
I usually start out by pasting a simple (correct, executing) stub query in, prefacing it with "this query does <general thing I'm trying to do>", and then go step by step: "Filter for x", "now add a join to this other table <definition>" etc., and build the larger query iteratively.
Another pattern I've found pretty useful: "This isn't working. Build a small sample dataset to exercise your query, and I'll paste the results back to you so we can both see what might be going wrong."
Basically, I treat it as an intern, not as an oracle of truth.
Huh that's helpful, thanks! To be fair I mostly only need it for new dialects but my successful attempts with other tools seem to follow the same pattern.
I think a big difference is that people do things in predictably stupid ways and an LLM? Who knows. That could be because I've worked with stupid people my entire life and I know the boundaries of their chaos.
If you read at least 3 paragraphs in, you'll see that this tool attempts to generate a query using an LLM. Dismissing a tool before attempting to understand it is astounding.
If a structured language for querying databases already exists, why wouldn't one use that to query the database?
A tool that returns SQL from human language sounds great, but not one that runs the query unchecked. I speak four (human) languages, and sometimes I'll mistranslate something. Not to mention that this tool won't even bubble up the first two syntax errors that it generates - not very confidence inspiring.
I'm very good at SQL. I still use LLMs to write queries for me on a daily basis because they are faster at it than I am, and I can very quickly review their work to check that they didn't make any mistakes.
Same applies for JavaScript and Python and Bash and AppleScript and Go and jq and dozens of other programming languages.
If you're going to use LLMs as a productivity boost you need to get good at quickly verifying that they've done the right thing. Effectively that means investing more in QA skills and habits, which is generally valuable anyway.
Because it’s honestly quite bad and clunky by modern standards (I specifically miss composability of sub-query components, which makes every query like a throwaway effort), many people don’t enjoy learning it, and at least personally the likelihood of getting a given query right without doing lots of double checking is pretty low for advanced queries.
I do find it much easier to read/validate than to write though, which makes it an excellent application for LLM usage, in my experience.
reply