Hacker News new | past | comments | ask | show | jobs | submit | nostrebored's comments login

Dude someone did this for a check box. It’s honestly more shocking that you read it

What do you think AWS decided to sell? Both companies had a significant interest in making infrastructure easy to create and scale.

AWS had a cleaner host-guest abstraction (the VM) that makes it easier to reason about security, and likely had a much bigger gap between their own usage peaks and troughs.

Yep. Google offered app engine which was good for fairly stateless simple apps in an old limited version of python, like a photo gallery or email client. For anything else is waa dismal. Amazon offered VMs. Useful stuff for a lot more platforms.

I'm legitimately curious -- why do people want to put EVERYTHING into postgres? I don't understand this trend (vector search, full text search, workload orchestration, queues, etc.)

I've built a number of systems that run a database and a separate search index (Elasticsearch, Solr, Xapian). The hardest part by far is keeping the search index in sync with the database. I gave a talk about this a while ago: https://simonwillison.net/2017/Aug/16/denormalized-query-eng...

Using the search engine built into PostgreSQL, MySQL or SQLite makes this problem SO MUCH less difficult.


Yes. Scale matters here. If it’s just me trying to get shit done, then one system (database, monorepo, whatever) is 100% the best. I’ve done the multiple system thing at scale, and it’s fine if I have the liquidity of engineers to do it and the cost is warranted.

Bundling everything into one system will eventually fall apart. But it’s sooo good while you can do it.

And I am decades past the point where I introduce new shit just to learn it under the guise of “needing” it. Instead I’ll introduce new things I want to learn under the guise of new things I want to learn and it will find the appropriate place (often nowhere, but you never know).


Doing everything in postgresql need not necessarily mean doing it in one database server. We could have different servers meant for search, queues, analytics (columnar) etc, all replicated (pg native replication) from a vanilla transactional postgresql server.

Now applications need only one technology, not necessarily one server.


You are right that this is the hardest, and most important, thing in search to get right. It's usually referred to as ETL. Extract, transform, load. Load: for each thing, put it somewhere for processing. Transform: for each thing process it by applying some algorithm/algorithms in one or more steps. Load: for each thing, shove it into your store. It's the transform part that is important. Extract and Load are kind of trivial to implement usually. I've seen decent implementations of only a few lines of code. Transform is application specific business logic. E and L are just simple plumbing.

What you query on is not the same as what you store in your DB. And it can be expensive to calculate and re-calculate. Especially at scale. And iterating over all your stuff can be challenging too. It requires IO, memory, CPU, etc. Your application server is the wrong place. And so is your main application database.

The challenge with search is that querying just gets a lot easier if you calculate all the expensive stuff at index time rather than at query time. Different tokenization strategies for different languages, calculating things like page rank, normalization, tokenization, semantic vectors, enriching data with other data (including denormalizing things from other data sources), etc. There are a lot of tricks you can use to make stuff easier to find.

Foregoing all of that indeed makes things simpler and faster. But your search quality will probably suffer. And if you aren't measuring that to begin with, it is probably not great. Doing all these things on write in your main database schema is going to cause other issues (slow writes, lots of schema migrations, complicated logic around CRUD, etc.). The rookie mistake with ETL is just joining the three steps into one thing that then becomes hard to run, evolve, and scale. I see that with a lot of my clients. This is textbook "doing it wrong". It's usually neither fast nor very good at search.

Even if you are going to use postgresql as your main search index, you are probably doing it wrong if your search table/schema isn't decoupled from your main application database via some ETL pipeline. That logic has to live somewhere. Even if it is a bit of a simplistic/limited "do everything on INSERT" kind of thing. That's going to hold back your search quality until you address it. There is no magic feature in postgresql that can address that. Nor in Elasticsearch (though it comes with way more features for this).

I've worked with postgresql's FTS a few times. It's pretty limited compared to Elasticsearch. Anybody running performance benchmarks should be running quality benchmarks instead. Being fast is easy if you skip all the difficult stuff. Being high quality and fast is a big challenge. And it's a lot easier with proper tools and a proper ETL pipeline.

And indeed engineering that such that the two stay in sync requires knowing how to engineer that properly. I usually start with that when I consult clients looking to level up their home grown search solutions to something a bit better.

Of course if you do ETL properly, having your DB and search index in the same place stops making sense. And if you are going to separate them, you might as well pick something more optimal for the job. There are a lot of decent solutions out there for this.


If you can avoid adding an extra service without paying too much penalty, it means not having to acquire an extra skill or hire another devops person or keep yet another service in sync / maintained / etc.

The cost of adding services to an app is so much higher than people give it credit for at organizations of every size, it's shocking to me that more care isn't done to avoid it. I certainly understand at the enterprise level that the value add of a comprehensive system is worth the cost of a few extra employees or vendors, but if you could flatten all the weird services required by all the weird systems that use them in 30,000+ employee enterprises and replace them with one database and one web/application server, you'd probably save enough money to justify having done it.


Where I work did an inventory a few years back of their systems and found that we had about the same number of databases (not tables!) as employed engineers, counting all deployed (QA and prod) instances.

The team on that inventory project obviously created a new database to put their data in, plus QA and test replicas. They (probably) have since moved to another DB system but left the old ones running for legacy applications!


Hah, that's table stakes - I have definitely worked at companies with 100 or 1000x the database to engineer ratio.

Depending on your database system, it may even have a 1:1 equivalency with Schemas (MySQL.)


We've been using Elasticsearch + PG and it's pretty nice and fast, but it adds a whole layer of extra stuff to deal with when your data is in PG but then also needs to be indexed into an external server outside of those PG transactions. In our case I'm pretty convinced it hasn't been worth the effort. I think we could've optimized PG to be as fast as we needed with a lot less overhead than dealing with an external search index.

We moved our queues to PG and it cuts out the same kind of overhead to be able to wrap an update and start a job in a transaction. PG has been plenty fast to keep up with our queue demand.

Ultimately I think being able to do things transactionally just avoids a whole class of syncing issues, which are basically caching issues, and cache invalidation is one of the 2 hard things.


.. along with naming things and off by one errors.

You can abstract this to any RDBMS, and the justification is that it makes everything a lot faster & easier.

I just got off a call with a client where their developers were using ORM-style abstractions to manipulate data for downstream processing in code, turning what should have been a few seconds of one custom SQL command into several hours of passing objects around multiple computer systems.

If we can put the FTS engine inside the SQL engine, we can avoid the entire space of APIs, frameworks, 3rd parties, latency, etc that goes along with otherwise integrating this feature.

Modern SQL dialects represent universal computation over arguably the best way we know how to structure complex data. Adding custom functions & modules into the mix is mostly just syntactic sugar over what is already available.

There is ZERO honor in offloading what could have been a SQL view into a shitty pile of nested iterators somewhere. I don't understand where any of this energy comes from. The less code the better. It's pure downside.


> There is ZERO honor in offloading what could have been a SQL view into a shitty pile of nested iterators somewhere. I don't understand where any of this energy comes from. The less code the better. It's pure downside.

I wholeheartedly agree with you. As to why we use ORMs, the impression I get from the engineers I work with is that many of them a) don’t know SQL and b) feel like it’s “data analyst” stuff and so beneath them to learn it. Real engineering requires objects and inheritance or structs, pointers and arrays (depending on the engineer).

I think it’s the declarative nature of SQL that turns them off.


This. ORMs by and large suck.

I've had to experience it firsthand again a while ago but yeah.

I was replacing an application management interface of sorts, large ish sets of configuration parameters, ideal for a relational database. But I wanted to treat the combined configuration as a document, since that's what the front-end would send over. Ended up using GORM, which was fine for a little while... but quickly falls apart, especially when your data model is nested more than one level deep. And then you end up having to figure out "how do I solve X in GORM" and find yourself with limited documentation and a relatively small community whose members quickly burn out of trying to help people.

I'll just write the code next time.


Maintaining a new service sucks. Not being able to do atomic commits to both postgres and the other db sucks.

Avoiding distributed systems problems. Distributed systems are so incredibly hard to get right that I will vertically scale postgres until I hit an insurmountable wall before giving in.

You can also build distributed DBs with PG. For example for a DB with multiple write nodes all you need to do is implement an event shipping model with logical replication where your servers publish their events and subscribe to others, and you need to implement conflict resolution rules, naturally. I think PG's type system can probably be leveraged to make a CRDT system on top (and I bet someone's already done it).

IBM mainframes were created for you. Imagine you had a single computer that had multiple nines reliability. Hot swappable disk, RAM, CPU. Redundant power supply. Redundant network stack. OS designed to never need restarting. That's basically what a mainframe is, and IBM sells billions of dollars worth of them to this day.

People ask me - "but can we just distribute it because everything in one basket makes me uneasy"

Yeah distributing state among 10 nodes, totally easy, fine, good.


It's because postgres is in fact good at a lot of vaguely database-looking things. Even if it weren't the best for anything, if it does 80% of things at 80% best possible — it is reasonable to have postgres as "first thing to reach for" by default.

That said, it's easy to forget to check if you're in either of those 20% (or both.) There's probably a whole bunch of postgres usage where really something else should be used, and people just never checked.


Because PG is a fantastic platform for doing everything you need in SQL and it performs real well. Add PostgREST and you've got a REST API almost for free (you have to design a schema of what to expose, so not entirely free, but still, pretty close). Also, PG moves _fast_ and has a very active developer and user community, which means you'll be getting more awesome functionality in the coming future. (E.g., I've been following the AIO thread and the performance improvements coming from that patch set will be hefty.)

When you're starting something new and the amounts of data is still small, it's often better early on to focus on the product than optimizing for theoretical performance optimization that may never pan out (either because the project will fail or that the bottlenecks may ultimately not be what you thought or expected).

At my current gig, we used to shove everything (including binary data) into postgres because it was easy and all our code plugged into it anyways. When it started to become uneconomical (mostly due to RDS storage costs), we then started shunting data to S3, DynamoDB, etc.

Also, not everybody can be on a cloud with easy access to all the fancy products for queuing, caching, etc. Sometimes it's better overall to have to deal with one complex beast (that you'd have to maintain anyways) than spending time deploying Kafka, MongoDB, etc (even though it can sometimes be easier than ever with pre-built manifests for K8s) as well as securing and keeping them all up to date.

I do strongly encourage people to treat code that deals with these things with as much abstraction as possible to make migrations easier later on, though.


Tbf everything you mentioned, MySQL also has (in the latest version, for vectors).

But either way, the answer is simplicity and cost. I assume you’ve heard of Choose Boring Technology [0]? Postgres is boring. It’s immensely complex when you dive into it, but in return, you get immense performance, reliability, and flexibility. You only have to read one manual. You only have to know one language (beyond your app, anyway – though if you do write an app in pure SQL, my hat is off to you). You only have to learn the ins and outs of one thing (again, other than your app). ES is hard to administer; I’ve done it. Postgres is also hard to administer, but if I have to pick between mastering one hard thing and being paged for it, or two hard things and getting paged for them, I’ll take one every day.

[0]: https://boringtechnology.club/


It is very good at doing the job that people over-eagerly offload to specialized services. Queues, notifications, scheduled jobs. And can be specialized with extensions.

I am going to hazard a guess that it's because the closer the services are to your data, the easier they are to implement, and you often get great speed too. FTS in Postgres has been fantastic for me, and combining it with vector search and RAG gives you a pretty sweet deal for low effort.

Disclaimer, I have no experience with this kind of thing. However, theoretically, less tools is better for an organization - see [0] - and if your job adverts say just "postgres" instead of "postgres, elasticsearch, tool x, tool y, tool z" etc, you don't need to find (or train) a unicorn that is up to speed on all of them.

That said, "postgres" is a very broad subject if you take all of those into consideration, if you need to specialize your search for someone who knows how to do X in PG specifically you're almost back at the same spot. (I say almost because I'm sure it's easier to learn a specialization in Postgres if you're already familiar with Postgres than it is to learn a completely new tool)

And caveat, there's a high golden hammer risk there. I'd start questioning things when needing to query JSON blobs inside a database.

[0] https://mcfunley.com/choose-boring-technology


I agree with one of the points from The Art of PostgreSQL by Dimitri Fontaine, which is appropriate for answering this question:

"So when designing your software architecture, think about PostgreSQL NOT as storage layer, but rather as a concurrent data access service. This service is capable of handling data processing."


All these replies have me so confused, the reason to shove everything into your database when you can is because you can transact across them. That's thing you can't get once you have a second system.

Well you could, with some distributed locking mechanism, but doing that right has its challenges.

It’s free, it’s stable, it’s excellent, and it’s everywhere. The question should be “why don’t people put everything in Pg?”

The only things you shouldn’t put in Pg are things where there’s an obviously better alternative that’s so much better as to make running an extra service worth it. There definitely are cases where that’s true (Redis when you need exceptionally quick responses, for example) but it’s a high bar to clear.


PostgreSQL hasn't let me down in 20+ years. It's not perfect but it's really damn good for all your data cases (may require tuning)

Because the vast majority of people on HN or in the real world don’t need to scale beyond 10 concurrent users. It’s insane how much infrastructure script kiddies add to their projects when it could be done in the database and scale well.

Why not? Pareto principle. It’ll get you 80% of the way there for most things… then when you need a highly optimized solution you pivot to that.

There are good reasons mentioned already, but additionally, there’s a real strong cargo cult developing around Postgres these days.

Sorry you’re being downvoted; you are correct. I love Postgres, but devs absolutely flock to it because influencers said to. At a job a while ago, my team put out a poll asking for devs opinions and reasons for their preferred RDBMS. Every single one said Postgres, but no one could elaborate as to why. One said “it’s more flexible,” which is true, but no one there was using ANY of its flexibility.

That’s the part that baffles me. You’ve selected a DB with native support for esoteric but useful data types like INET (stop storing IP addresses as strings in dotted quad!), and a whole host of index types beyond B+tree, but they’re never using them.

Read your RDBMS docs, people. They’re full of interesting tidbits.


I think you're looking at it from a weird angle...

"Every single one" in your team agreeing on a single specific technological choice is one of the rarest things I can image! Developers argue about libraries, frameworks, programming languages, services, etc., and I think it speaks for itself if Postgres is the thing that comes closest in bridging the gap at least on one layer in the tech stack. Postgres is a "conservative" choice with a very active community and extensible ecosystem.

Also, nobody is ever making use of their technological choice to its full extent, you'd rarely know what you'll need beforehand, and it's just nice not having to add other storage engines when that one feature request steps into your life.


That’s a much harder claim to prove. The value of an attention span is non zero, but if the speed of access to information is close to zero, how do these relate?

If I can solve two problems in a near constant time that is a few hours, what is the value of solving the problem which takes days to reason through?

I suspect that as the problem spaces diverge enough you’ll have two skill sets. Who can solve n problems the fastest and who can determine which k problems require deep thought and narrow direction. Right now we have the same group of people solving both.


> The value of an attention span is non zero, but if the speed of access to information is close to zero, how do these relate?

Gell-Mann Amnesia. Attention span limits the amount information of information we can process and with attention spans decreasing, increases to information flow stop having a positive effect. People simply forget what they started with even if that contradicts previous information.

> If I can solve two problems in a near constant time that is a few hours, what is the value of solving the problem which takes days to reason through?

You don't end up solving the problem in near constant time, you end up applying the last suggested solution. There's a difference.


Your dentist is not an expert in this — that’s like saying the guy implementing your frontend is an expert in design. Yes, they’re working in the space, but their job isn’t understanding the whole system.

If you’re this deep on the appeal to authority train, the NIH released a report in the last year linking fluoride exposure to moderate drops in IQ with moderate confidence.

It’s probably not the worst thing in the world, but is definitely not inert.


I am competent on this particular subject matter, I have worked in fluorine chemistry and am familiar with the biology and medical literature of fluorine toxicity. The report made much weaker claims than people seem to think.

There is a very serious mechanism of action problem. Fluorine poisoning is a thing that happens. The observed effects and empirical evidence, as well as the mechanisms of action that cause them, are incompatible with any mechanism of action that supports the hypothesis that it causes brain damage. Basically, it would invalidate the entire history of actual fluoride exposure.

The other serious problem is that people are exposed to far more fluorine through what they eat than through water. What is special about trace levels in municipal water? And many parts of the world have far higher natural fluoride levels in their water than any municipal water supply with no evidence of adverse consequences. This has been studied many times in many countries! In fact, the only consistent correlation with naturally high fluoride levels is better cardiovascular health (for which there is a known mechanism of action).

This notion that trace levels of fluoride in some municipal water is adversely impacting IQ based on thin evidence from the developing world is just the public health version of “faster than light neutrinos”. Someone thinks they measured it but it contradicts everything we know about the subject. The rational approach isn’t to discard everything we know without a hell of a lot more evidence.

I don’t think adding fluoride to municipal water does much these days but it also isn’t harming anyone.


It also seems to mirror the rhyme with the vaccine "debate."

That debate is framed around being vaccinated vs the scare of "vaccine caused autism" (or myocarditis), but that frame is missing the risk of things like measles.

Likewise tooth decay is not only expensive, but it can have dreadful health consequences if left unaddressed. Missing teeth is also socially costly. Being poor or "ugly" or poor looking is a serious adverse health consequence. Imagine parents barely making ends meet or working multiple jobs. It's easy to imagine disadvantaged kids missing out on dental care.

I also explicitly remember reading multiple reports of poor tooth health correlating with dementia development. I've also read that serious infections of any sort can harm IQ.


Sure, but we need to look at this from the other side, too. Does fluoridating water provide benefits? I think it's safe to say it did way back when we started doing it. But we didn't have fluoride toothpaste back then. Putting fluoride in the water is presumably more costly than not doing it. If it's actually providing benefits, and the risk of harm is below some very low threshold, then sure, let's keep doing it. But is it actually providing benefits?


Dentists have to spend 8 years at school right? …and do various annual training to stay licensed?

I’d say that’s a reasonable sign of someone qualified to have an opinion.

I think you’re getting confused with a dental technician.


I would be really surprised if dentists had much expertise on the impact of fluorine on physiology or the mechanisms of action for its toxicity. They know what it does to your teeth, and maybe that it is known to have positive effects for cardiovascular health, but that is about the extent of it. The systematic effects on the rest of your body are outside their domain.

Chemists who work in fluorine chemistry on the other hand have to become experts on the biological effects of fluorine exposure. Small and seemingly innocuous exposures can do a lot of damage and kill you, though not in a way that lends any support to the idea that municipal fluoridation will harm you. If you do understand how it kills you (basically by being exceptionally narrowly focused on making free calcium ions and to a lesser extent magnesium ions biologically unavailable), it is hard to describe a chemically plausible scenario that somehow avoids this basic fact of chemistry. Fluoride behaves the same way outside the body.

Municipal water exposure is far below the noise floor for fluoride. Food has far higher levels of fluoride than municipal water and the body has ample excess calcium and magnesium to absorb the loss of bioavailability of a microscopic amount of those minerals. Humans consume calcium measured in grams per day, multiple orders of magnitude more than can be lost via municipal fluoridation. Natural dietary variation will have a far larger effect.


You don’t seem to understand the difference between public and private health.

Your dentist is well qualified to have an opinion on the effects of fluoride on your teeth.

They are poorly qualified to have an opinion on whether it should be added to the water supply at source.


[flagged]


Getting increasingly snarky when you're apparently unaware that Public Health is a thing is not a good look: https://en.wikipedia.org/wiki/Public_health

Generally it's multi-discipline, but a good start here would be an epidemiologist with a focus on dental issues.


This was my perspective for awhile — recommend you look into more recent studies if you haven’t in the past 2y or so. I don’t think it’s the worst thing in our water but do think it’s objectively a bad idea.


Yes, 90% of the world’s population does not absorb dietary vitamin D well, which should make people skeptical of this result.


Nonesense. It's well established that dietary D3 (or D2, but D3 is better) clearly incleases 25(OH)D serum levels. It is that serum biomarker whose deficiency (levels below 20ng/ml or 30ng/ml) that has the largest association with higher risk of dozens of diseases as well as all-cause mortality. You can defend your phrasing by saying that "not well" just means one has to take a lot of it. Yes, there is a wide variety of dose response, which is why it's best to test blood levels to titrate supplementation amount.


Is the blood level diagnostically significant with vit D? Or is this something like magnesium where serum levels can be normal while having a magnesium shortage? IIRC vit d is fat soluble, does the body store any excess quickly, or does it linger around?


Yes, from https://pmc.ncbi.nlm.nih.gov/articles/PMC5577589/

  The goal of this study was to investigate whether the relationship between body composition, serum 25-hydroxyvitamin D (25OHD), vitamin D in subcutaneous (SQ) and omental (OM) adipose, and total adipose stores of vitamin D differ among OB and C. ... In summary, although OB had significantly greater total vitamin D stores than C, the relationship between serum 25OHD and fat vitamin D and the overall pattern of distribution of vitamin D between the OM and SQ fat compartments was similar.


Wouldn’t you just do that with an SDK? Why the extra layer of complexity with MCP?


Not all http based APIs have an SDK. It’s wildly inconsistent. And when you ask the llm to do something new, does it download the SDK on the fly?


Personally, I’ve found the SDKs worse in almost all cases.


Can you explain why low percentage improvement over placebo is not important?


Imagine you have a depression inventory (test) with 21 questions, rated from 0-3. The highest score is 3 * 21 = 63 indicating the most severe depression. The lowest score is 0, indicating no depressive symptoms at all.

In practice, the average person will fall more in the range of maybe 5-15 due to vague symptoms like "I don't sleep as well as I used to" triggering some of the points. The average depressed person who seeks treatment might fall in the range of 25-35.

Now imagine the placebo group goes in with an average score of 35 and improves to a score of 25 by the end of the test. The SSRI group improves to an average of 20 by the end of the test. Is this significant? Well, it depends on how many patients you have in the sample size.

That's the problem. There's only so much room in these scales for improvement, so when both groups improve a lot you need to have a larger sample size to get statistical significance. Getting a lot of patients in a study (hundreds) is very expensive, so it's only a small number of studies that can pull this off.


Right, but study power is really the responsibility of a pharma company. It's not like this is some new and novel medication, it's been used for decades and has been questionable for _literally the entirety of the time_.


The problem with being barely measurably better than a placebo is that each study is a coin flip whether it supports your drug or not. And you can just file drawer any study that didn't go your way (as happens with the majority of null results).

So the published results are over-sampling studies where the statistics happened to work and under-sampling studies where they didn't.


strong agree with this -- I don't understand outside of integration with Claude Desktop why to use MCP rather than a dedicated API endpoint.


I believe MCP helps here because it standardizes integrations, enhances user control via opt-in plugins, and improves security by avoiding direct endpoint calls.

@gregpr07 wouldn't adoption of MCP open up Browser-use to more use cases?


It probably would yeah. I was very against it but this HN post sorta points me to “people want mcp”


Hmm, I don't mean this negatively, but MCP is mentioned only in this thread and 2 comments up you say 'I thought about this a lot' and now you are convinced based on 2 people saying they would like it? Isn't standardising something people want always? And MCP seems to have easily won that 'battle' user wise -> what have you been thinking about exactly?


What’s your take - how can we expose Browser Use to as many use cases as possible? Is there easier way than openapi config?


I want to use browser-use in Cursor but I am using another option because it doesn't support MCP integration which is the common language they support for external tools


as building blocks we of course prefer APIs. However, interfacing directly with the browser (or desktop) can enable end users to do way more things without having integration built by devs, in theory at least. In reality, LLM may not have reached that point yet and there are security concerns.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: