Hacker News new | past | comments | ask | show | jobs | submit login

I've been down that road. Eventually it turns into a nightmare because evolving your DB means making simultaneous changes to multiple applications.

"We think we can drop this column, someone figure out which of our eight apps using this DB might be using it still"

I much prefer putting a single service layer in front of the DB that speaks thrift or protobuf and letting all clients interface with that instead. Evolving thrift services is fairly straightforward and allows you to make changes without needing every app the keep up.




> I much prefer putting a single service layer in front of the DB

Yeah, agree. This is exactly how big companies end up doing, very interesting approach!

It's inevitable that someone has to invent another abstraction, but it does pay the bill. One thing you can do with this single layer is to control the flood gate.

I remember hearing this from a talk given by someone working at Instagram: as the team grew larger, cooperating on DB changes became problematic. So Instagram picked up Facebook's Tao.


How does an additional layer make changes to the DB easier? If a column is dropped what will your layer do when a request for data from that column comes in? What if the request joins it with other columns from other tables?


It's better because if you have one application that has authority over that area, you only have to answer this question once.

The answer will depend on the data in question, of course; maybe it is fine to serve stale data for a while, maybe you need to write to one DB, read from another, and combine in-process, etc., until the change fully propagates. But the impact needs to be localized to whatever extent is possible.

If you have several applications accessing the database directly, it makes the database everyone's problem, instead of just the one thing's problem. Then everyone has to know about the downtime and come up with their own strategy to mitigate. They can't say "Well we'll just trust what we get from Service A", because they don't actually get info from service A; they get info from service A's underlying datastore.

Worse, in most cases like this, there will just be one global database for everything, so schema changes, database restarts, etc., necessary for one thing can have negative effects, both direct and indirect, across the entire ecosystem. If Bob's Service decides it needs to do a massive reindexing and Alice's Service is on the same DB, even if they're using completely independent tables, etc., the performance hit is going to affect both. If Bob changes his schema and Alice reads or writes directly to those tables (e.g., Alice's service updates a column in records originally inserted by Bob's service), now Alice has to know about the change, plan for it, and coordinate her deployment in sync with Bob, etc.

That kind of thing is what people mean when they say "distributed monolith". There is no real "private" and "public" space where one service provider could reasonably offer a stable API but change things as necessary on the back-end. Nothing is really independent. All you've done is make a monolith that is much harder to coordinate, manage, debug, and understand.


This was the problem at my last job. We had a few applications and each had their own database, but they were all on the same server.

Over time they all learned to reach into eachother’s databases. The truth is we had ONE database arbitrarily divided into three schemas, each with different traditions.

As load increased it became a nightmare and a literal single point of failure. If one app misbehaved or took a load spike all the rest would slow/fall down. Even though huge chunks of the applications had nothing to do with each other they couldn’t be scaled independently.

We were working very hard, slowly, to detangle it without blowing everything up.

No application should ever have direct access to another application’s database. It’s going to go wrong. The temptation is too great. And by the time you realize it the technical debt it has caused may be MASSIVE.


Not familiar with thrift or protobuf, but when I think of a single API layer in front of a database, the API specification becomes independent of the database. You also want to version your API.

Let's take an address for an example. Your contact database might have "id", "name", "addr_line_1", "addr_line_2, "addr_line_3", "city", "state", "zip". Your API v1 lets you query any combination of these fields and return a set of fields.

/api/v1/contacts?zip=10018&fields=name

[{"name": "Jessica Jones"}, {"name": "Matt Murdock"} ]

/api/v1/contacts?name=Matt%20Murdock&fields=zip returns something like:

[{"zip": 10018}]

Later, you change your database to split name into fname and lname. You update your code and API to reflect this. In calls to /v1/ of the API, the field "name" is concatenated by your code to combine the database columns of fname and lname. In /v1.1/ of your code, you may or may not keep the "name" field as a convenience , but you do provide the ability to search on fname and/or lname fields.

Basically, for any change to the underlying database, your API maintains the same interface to the data, however you might also publish a newer version of the API. In fact, your code that handles the API might be entirely different. For instance, /api/v1 might be handled by a python flask framework, while /api/v2 is handled by node. You can run multiple versions of the API at one time, supporting multiple iterations of clients. Obviously to reduce the amount of maintenance efforts, you will deprecate some versions and provide deadlines for them to be disabled. You can run all sorts of reports on the usage of each API endpoint and see what calls and response formats are still in use. This can open up the ability to do an outreach to those developing a client and make them aware of necessary changes.


In proper microservices architecture only a single application access the DB. Other apps get the data via an API call. Thus you can drop the column (maybe it's derived data) and still keep the API interface.

In the end, it's mostly about enforcing contracts and making devops simpler.


I mean, that's not true at all. No two apps should be working off the same database. Different services can have their own datastores.


Reading back, my post was confusing. What you wrote was what I actually meant. Each app may have it's own non shared DB.


The old-timey way to do this is with SQL views and triggers. Separation between logical schemata and physical schemata.


As well as stored procs.

I heard a few years ago that some government jobs apparently actually still require stored procs for security reasons. The ACL is controlled there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: