I serve many, many millions of rows, collect real-time statistics, push real-time updates, and maintain data integrity and consistency... all from a single database server. If our workloads require it we're prepared to scale out horizontally. I'm really looking forward to PostgreSQL 10's new parallel query features for some of our analytics work.
Stack Overflow runs everything across what, 4 MS SQL Servers in total?
The argument is not that DBs failed the web for scalability, but that they failed the web for features. I don't fully agree but it is an interesting argument.
DBs have auth systems already, but we mostly bypass those and reimplement auth in the stateless layer. DBs have programming capabilities but we mostly bypass that and build business logic in the stateless layer. DBs are great at storing and maintaining relations and relationships but much of the other functionality is little-used.
Many DBs weren't built to be exposed on the open web. The web didn't even exist at the time some popular databases were invented, and even after the fact, it's still not the purpose.
I would even argue that DBs have overstepped their bounds already with what programming abilities some have built-in. They should be focused on reading and writing structured data efficiently and correctly, not much more than that. Which database to use should depend on access patterns and structural requirements. Let the application logic be handled at a higher layer instead of mixing responsibilities everywhere.
And they have a stringly typed interface! It sucks! When both the languages inside and outside the database are typed, why should the interface between them not be?
Uh, what? How is this stringly-typed: https://www.postgresql.org/docs/9.4/static/datatype.html ? Postgres's type system is actually more sophisticated than any programming language that I know. The problem is that ORM developers are much more serious about interoperability than can be reasonably argued, so you cannot rely on much more than VARCHAR, INTEGER and maybe BOOLEAN.
Query input to SQL is literally a string. Output is also a string. Thus "stringly typed" interface. Yes, DB storage is strongly typed, but the interface is not.
I'm not sure how much an issue this really is, but it's what the GP is talking about.
So? Data in computers is all 0s an 1s, but you don't see anyone complaining about bitly-typed languages. Strong typing refers to the presence of type context wherever a value is used, not to the wire protocol.
The data you get out of the database will be untyped.
When the language in the database is typed (SQL) and the language outside is typed too (c#, typescript, etc), it would be nice to keep that type information.
An ORM doesn't count since it is just a bad attemot to work around the inherent problems with SQL as an interface.
Maybe if there was better support at the connection layer, this might be better.
If ActiveRecord (Rails) allowed you to pass a free-form ruby function and built a procedure and executed it in SQL, I'd be all for it, but I'm not writing a one-off procedure and mantaining it.
No one cares until they do, because the dataset's grown nontrivial and the crap 10s query Active Record generates is now a crap 120s query and one of the server, the client, or the user times out before it finishes.
> DBs have auth systems already, but we mostly bypass those and reimplement auth in the stateless layer.
This is sane and good design. The db auth layer is for db management, not frontend auth (are you seriously advocating this?).
> DBs have programming capabilities but we mostly bypass that and build business logic in the stateless layer.
This is just (initial) design / programmer laziness. I bet if you start to scale up, you will think a lot about moving your business logic further to the real data for safety and for performance reasons. This is the point when you realise that db systems are very different and switching from one to another is not that easy...
> This is sane and good design. The db auth layer is for db management, not frontend auth (are you seriously advocating this?).
The auth layer of PostgreSQL is very advanced, up to row level security if you need that. [1] It was never intended to be "just for administration", quite the contrary! Moreover, some applications rely exclusively on a central PostgreSQL service rather than a central web service, such as "Artikel23" which is (was?) a CRM whose user management, encryption and security stuff is 100% handled by PostgreSQL.
> The db auth layer is for db management, not frontend auth
Yes, but why? The database already has a table of database users, and the different permissions they're allowed. (Well, hopefully your webapp doesn't just do everything as the root DB user with * privs.) Is there that much to be gained from the database having a second table of users that you've created, and then you get to reimplement a bunch of logic in your app server on top of the database so that you have... users and permissions?
You gain making users of your system a first class and distinct part of your application from those that maintain it.
Or, would you rather swap a regular user entry into and away from a maintenance role when you hire/fire someone? (Certainly doable, but seems extreme.)
You also gain the ability to severely lock down destructive permissions to the database behind fairly rigid authentication rules.
> Or, would you rather swap a regular user entry into and away from a maintenance role when you hire/fire someone? (Certainly doable, but seems extreme.)
How is this more extreme or more cumbersome than deleting an admin account?
For that matter, you could also just give admins a separate admin account even if they are also users. When I worked in "IT support", I had a separate admin account even though I had a user account in the same system.
You probably already have a separate system that is regionally and logically separated from your application to control access to company resources.
The argument seems to be that the logical separation is not desired. But that implies that if I were an employee of Google, I should have a specially flagged Google account. And if I were to leave, I would lose that flag. Which means it should probably not be a simple flag, anymore, but a date gated property which may go off an on several times.
And already, that sounds more complicated than just keeping your application accounts distinct and completely separate from your organizations accounts.
> You probably already have a separate system that is regionally and logically separated from your application to control access to company resources.
The separate system your company's employees are tracked is is also likely not integrated with your app database's auth system. I kind of doubt that the various Postgres/MySQL/Whatever databases at Google have accounts in them for all of Google's employees who might need access. Instead they build more tooling on top of the same accounts the app uses to access the DB. Or maybe they create JIT accounts for admin functionality. As it stands today, though, most places don't use the DB auth at all really.
> But that implies that if I were an employee of Google, I should have a specially flagged Google account.
No. You have two accounts if you work at Google. Your personal account is not your corp account. That doesn't mean your corp account doesn't exist in the same systems as your personal account, though.
> And already, that sounds more complicated than just keeping your application accounts distinct and completely separate from your organizations accounts.
Maintaining two entirely separate auth systems does not sound less complicated to me.
I wasn't planning on really diving on this. However, two scenarios easily come to mind. Either selling a product to another company, or getting acquired. In either case, it would be nice to not have to merge two identity management systems just because corporate structure changed.
You're arguing for a separation of customer and admin accounts. These scenarios are both easier if admin and user are rolled together. If administration is tied into your corporate user system, then transferring a product means it's a pain to enable administration in the destination environment.
What you're actually arguing for is keeping app auth separate from corp auth. And that's pretty reasonable, but app admin is app auth, and must move with the app.
More generally, though, anyone buying your app/company will want it integrated with whatever auth they already have anyway, so you can look forward to a big user migration regardless.
Apologies, I was always arguing for a separation of corporate and customer accounts. My assertion is essentially that all administration of the application is done by corporate users, not application users.
Now, it is not uncommon to actually do this in three parts. A set of users that covers administration. Often made by tools that guarantee corporate level access to these accounts. And then a separate system built on top of them that does the application.
If you have truly "admin" users in the system; then yes, they should be separate from corp. And should be a flaggable part of a normal user's account. Mixing these with who can create the tables that are required for the system to run seems awkward, at best.
Consider, why not support all users being able to just login to the systems that run the application? It is ultimately the same argument, no?
> Consider, why not support all users being able to just login to the systems that run the application? It is ultimately the same argument, no?
I don't know what this means.
I don't think there's a good argument for having multiple auth systems in general, much less three. It's a lot of redundancy for little reason. It ends up complicating every app as they end up needing to support multiple systems to support both end users and administrators.
I wouldn't be surprised to find that Google has a single auth system for all users, including employees, and that they've wired up their databases to honor the single auth source.
My point is this doesn't end at the database. So, you can successfully let the database manage users and access permissions to the data in the database. But then, why not let the operating system that the database is running on control that?
And how is that a different argument than what you are proposing here?
All I was saying is that we already have auth systems and don't use them. The DB has auth. As you noted, your OS has auth. And yet we insist on reimplementing auth for our users again and again (often with terrible flaws). It does feel like something is wrong with this.
> Or, would you rather swap a regular user entry into and away from a maintenance role when you hire/fire someone? (Certainly doable, but seems extreme.)
How is it extreme; granting or dropping permissions to a particular Role for one user account is certainly not more extreme than dropping or addin a user account in that situation, which is the alternative.
> You also gain the ability to severely lock down destructive permissions to the database behind fairly rigid authentication rules.
Actually, no, you don't, because multiuser RDBMSs (well, most of the commercial ones, plus Postgres, and I'm pretty sure Firebird; MySQL/MariaDB may be weak here, they often seem to be in things that I'd like to say are generally true of RDBMSs) built in security and auth systems already give you this.
It seems extreme to base my user account of a product on my employment status. Too much that could go wrong, when you could just make your application completely independent of your organization.
My point is that there is not a single user of my applications that has the ability to drop tables from a database. In your world, there would be some. But again, why? Why should any given user of GMail, for example, be able to delete any accounts on GMail? I can see why an employee of Google would be able to. But why a user of GMail?
Because traditionally the database was managed by IT, not by engineering. IT came up with the idea that it should take forever to approve changes to the database, so engineering invented ways around their can-don't attitude. Can you imagine any CTO approving a web-based signup page being allowed to change the users to the database?!
Note that IT has good reason for the complex change rules: if they make ANY change to the data base, everyone needs to change. The change itself is simple, but in many cases thousands of programs access the database. If taxes don't go out on time because the program that runs once a year at tax time didn't get updated - the best case is the CTO goes to prison, things get worse quickly. When prison is a very real risk of approving a database change would you approve it?
> Note that IT has good reason for the complex change rules: if they make ANY change to the data base, everyone needs to change.
Only if the DB is used by multiple apps and not designed correctly for that use (e.g., all apps use direct access to base tables rather than app-specific views.)
Unfortunately, this is more common than it should be, but the solution is to stop doing that.
Nobody ever went to jail because a software defect made their tax filing late. Don't be ridiculous. It's this sort of over-aggrandizement of IT's supposed responsibility that is the cause of the problem. IT acts like they are a thin, red line single-handedly holding compliance and hackers at bay. The god complex needs to end.
IT is still a risk. High level executives can go to prison for some failures, and tax issues are on the list. It isn't clear if courts will accept software issues as an excuse.
> we mostly bypass that and build business logic in the stateless layer.
> This is just (initial) design / programmer laziness.
I disagree, it's just good practice. I've never seen a system evolve towards SQL based business logic, only away from it. SQL is meant for storage, that's it, obviously there needs to be a coding layer to move data in and out of it efficiently but that's where it should end.
That's what this article is arguing against. The "good practice" is more precedent than anything. Sure, we can argue that it "adds security" to the mix by adding a separate layer between the user and the actual place where billing info and passwords are stored, but if that front layer gets breached then you're still screwed. From what I understand, the article suggests the reverse of what you're saying. Not necessarily moving the business logic _into SQL_, but what if the DB actually had a very extensible framework and libraries that let you serve HTTP _directly_ from the DB? How would that change reactive websites whose sole "backend" is a SQL<->JSON+Permissions system? What about if the DB itself supported a psuedo-PHP runtime, or better yet closer to native code?
Granted, I'm not saying it's a good idea. I do see a vast array of benefits, primarily in the sense of scalability, from separating the business logic from the DB, but it's an interesting topic to ponder nonetheless.
> What about if the DB itself supported a psuedo-PHP runtime, or better yet closer to native code?
Note that in PostgreSQL, user defined functions can be written in many languages, including SQL but also Python (very well supported) and PHP (not so well supported, but doable).
Post author here. I actually think it would be much cleaner to support an event log API and run this stuff out-of-process. My database shouldn't have to support the language-de-jour, and I don't want my react rendering process slowing down my database servers. You can totally decouple this stuff with a clean kafka-like eventlog API, but postgres doesn't provide one.
The articles point is that there's a bit more to Stack Overflow than the four database servers it sits on, and because MS SQL is treated as a monolith,a stupid amount of development time, the world over, goes into doing things the database, arguably, could be configured to do itself.
I click the accept answer button on my web browser, stack overflow stack reads the cookie my browser presented, does a bunch of logic in the app server, then does a database lookup. It then takes the web request, does a bunch of other logic with it in the app server again, then... runs some more SQL against the database.
Stack Overflow, and I, and presumably you as well, have written tons of custom code that sits between the web browser and database, but the article's thesis is this is stupid because if the database "just" spoke http, then it could do the majority of that natively, no Ruby/Python/JavaScript/.Net or anything needed.
> the article's thesis is this is stupid because if the database "just" spoke http, then it could do the majority of that natively, no Ruby/Python/JavaScript/.Net or anything needed.
The issue then becomes scaling. Instead of having half a dozen database nodes you need to keep in sync you could now dozens or hundreds. Possibly even thousands on some applications.
The point of databases is they're meant to be a monolithic repository of persistent data. The point of web servers is that they're supposed to "dumb" and interchangeable. You meant to be able to spin up new instances to scale to demand. This gets greatly more complex if your application is also your database.
It's also not true that databases haven't evolved for the web. We now have NoSQL databases, JSON support in many RDBMS and in-memory databases for caching. Running your web application inside your database is already possibly. There have been posted on HN discussing people doing this. But it's really not practical for much beyond a small scale blog or private (IP whitelisted) API server. ie the kind of thing that the application be hosted on a single node instance and potentially with the DB sharing the same VM too. The moment you need to scale sideways, even just to a small few nodes, it becomes more advantageous to have your database separated from your application servers.
Sorry i should have been clearer on my NoSQL point. Yes NoSQL databases may have been around a lot longer than SQL databases but what I'm was talking about specifically was those NoSQL databases that use "webified" formats like JSON as their query language.
It's also worth mentioning that NoSQL encompasses a much broader spectrum of database engines than just key-value stores. Technically any relational database can also be NoSQL so long as it's query language isn't SQL (or even not just SQL if you go by the "not only SQL" definition of NoSQL like some do. Though i find that interpretation too broad personally).
Right, he's basically saying that he wants the browser to behave identically to the desktop, where all logic is executed locally and direct connections are made out to external databases for storage. I assume this is because he's a language hipster and resents ever having to look at something that's not JavaScript.
There are multiple reasons that people stopped shipping big blobs with application logic and now rely on backend servers, but database servers having "failed the web" are not among them.
Exactly! How about "developers have failed the databases"
The majority of the developers don't use (or don't know how to) views/stored procedures/roles/triggers. Whenever you mention something like this you hear "omg, you put business logic in the db? everybody knows that does not scale" or "we deconstruct our request into a few simple queries instead of a join so we run super optimal"
I suspect it's because many developers are introduced to programming through web development. The pedagogy fails them and they go on to successful careers believing falsehoods.
It's hard enough to learn a programming language the first time that I can see how ORMs and frameworks relieve the burden of having to learn SQL, RDBMSs, etc.
However it's a convenient lie that seems to rarely ever be corrected.
The reality is that ORMs are exactly the wrong abstraction. They introduce premature pessimization into your application. They are not structured around how your data should be designed but how data-structures in your language of choice are designed. And left uncorrected they lead many developers to the mistaken belief that RDBMSs are slow, inflexible, etc, etc.
It turns out that relational algebra gives us great tools for querying complex data structures and relationships and building consistent, correct abstractions.
RDBMS servers have great facilities for managing and interacting with our data. Procedural languages, authentication, authorization, inter-process communication, etc, etc.
I also have a theory about this when it comes to a bit older devs (like 30+, yours i think relates to younger ones). We've entered our careers in the days of php4 and mysql4 and in those days if i remember correctly, mysql did not have all these nice features postgres has today and also the query planner was not that powerful, joins indeed were painful. We did not have access to databases like Oracle so this is where the culture of fat middle layer comes from, because it was the only way. But ppl, 15 years have passed, databases are monsters now, postgres can do 1.5M tps on a single box. How many of you are working on projects with that kind of scale ...
I agree that more business logic needs to be "closer" to the DB - if not in the DB itself (using those facilities you mentioned), then at a level very nearby. There's a very good reason for this:
Showing a consistent interface for data update and query.
When there are multiple teams developing against the same DB server, they might be able to see most or all of the tables, but they each might implement updates/selects (SQL queries) that exhibit extremely different results - which either present or update the data in potentially incompatible or inconsistent ways.
This ultimately manifests itself as errors/bugs which can be very difficult to figure out what went wrong where and why (and correcting the data becomes a nightmare).
Things get even worse when you have people querying the data via ad-hoc methods (like ODBC to an Excel file or something weird like that). Essentially, it is possible (likely) that every team sees a slightly (or greatly) different view from each other.
Coupling the BL closer to the DB level would prevent this; whether via the tools mentioned (views, triggers, etc) or something else. Ultimately, app developers shouldn't be hitting the DB tables directly, but going through these interfaces.
Of course - that brings up a host of problems on its own (table updates become an issue of coordination and conflict between teams, etc) - and there's also the issue of implementing versioning of the business logic (and how that relates to current/past DB structure)...
...but implementing it would solve a lot of headaches (and create more I guess).
I really disagree with this - separation of concerns is incredibly important. Why does the database server need to do everything my application server does now?
I don't want to have to know about how my database works internally just to implement a new feature in my application. I don't want to worry about a junior dev corrupting data while building a login page.
My "simple old-school" database is reliable and consistent BECAUSE it's been tested for years. My application server is not because it's solving a new problem (And that's OK, I won't lose customer data if someone can't login).
Yes, and the fact that a relational model is such a convenient way to represent lots of data problems means that even if SQL disappeared, the concept and usefulness of a relational database would long outlive it.
Well, no. There were a zillion databases competing and SQL is what survived. The NoSQL movement has created a bunch of useful DBs for specific scaling workloads, and of course there are graph DBs and other special purpose DBs, but SQL still dominates, and with good reason: it's the right balance of performance, flexibility, structure and integrity that makes it an excellent hedge for the early phases of application development when you're not yet sure what exactly the workload is going to look like.
Of course they are. Most things on the web are documents. The fact that it is abused for SPAs means that yes they are contorted and abused, but that is inevitable because Gall's Law.
It's easy sit in an armchair and point out all the warts on specific tech, but if you were to build a better solution to broad technology segments like this, and magically get adoption, you'd find that you only understood 5% of the application space, and your better version is subject to whole swaths of criticism that you didn't anticipate. Better to find one small/simple thing you can do well and build from there.
Every time I have to glue together pieces of SQL at runtime I have to agree. It seems crazy to me that there is no "native" way to query relational DB's (yes, there are some SQL factories for certain languages)
Came here to basically say this much. Part of me feels already that databases are already too hugely complex. The "do one thing, well" part of me wants one application for data storage, and then middleware for relationship enforcement, stored procedures, etc. That's likely too pedantic for most but I really see wanting your data layer to be your application layer as a silly throwback.
Also isn't this a similar idea to what FileMaker Bento did?
I'd argue that there is some merit into combining those things, or at least changing where the separation happens.
There might be performance improvements and better abstractions from moving what is traditionally known as "application code" into a place where it has better, more direct, access to the data.
I'm not saying that we should be throwing out SQL and starting over, but if there are massive gains to be had by letting the database do MORE (or redefining what the database actually is, maybe "merging" it with the application code so that this new "thing" has knowledge of both how the data is stored as well as how it will be processed), why not explore those?
Honestly I'm being vague because I have no real idea what I'm talking about here, but if you look at the past, you see this in many other areas. First the FPU joined the CPU, then the GPU joined the CPU, then phones happened and you get the SOC where everything is in the "CPU".
Stored Procedures have been around for a long time, so it's very easy to put some business logic closer to your database without resorting to the level of integration suggested by the post.
yes but in my experience even though they exist, many applications are written in an outside environment, in order to take advantage of software engineering facilities that are not available "in the DB". This competes with the idea of merging the app into the DB and is why the split is common place.
I work in BI. There is a joke that the most utilized BI tool is MS Excel. Organizations pay millions for tools like Tableau and Cognos. The stumbling block to successful BI is the amount of time it takes to create tables, write ETL and make tables available to end users. It takes so long users just export data from operational systems and do their own reports.
I see where OP is coming from. The data resides in database but there is no "easy" may to make it available to end users. So everyone does their own thing anyway. Yes I appreciate a well curated database but something is amiss.
The conceit of this article seems to be that all web applications are merely CRUD apps that do nothing but talk to the database, thus databses are "failing the web" for... not being more than just a database.
I see a number of problems with this:
1. Not every web app is merely CRUD. Some of us work on apps that do quite a bit more than CRUD, and the CRUD part is relatively small and boring.
2. There's very little respect here for the idea that good utilities should do one thing and do that thing very well. Lots of people need relational databases. Not everybody needs a relational database that can automagically filter out XSS attacks and serve HTTP responses.
I think CRUD apps are the most boring example. But think about something like Slack. There's some data and queries, sure - but the app also needs realtime feeds. It needs to have triggers running notification tasks off changes made. There's lots of stuff that need to happen as a result of data changes.
Doing all that stuff purely at the application layer is an architectural mess. What you really want is an event log of messages, with workers and search indexes listening on that. Maybe later you'll want to add full text search via elasticsearch - and that needs to be kept in sync via the event log as well.
Rolling your own version of all this stuff correctly is really hard. Either you do it inside postgres using triggers and stored procs, or you use kafka and invent your own complicated system to handle conflicting writes. Or you add hooks in all the places in your app where you do writes, but that introduces its own set of consistency problems. Samza, Datomic, Bottled Water, Couchdb and firebase all solve at least some of this stuff better than classic SQL stores. Its time for relational dbs to get on board.
Being no more than a database isn't good enough anymore.
Your complaint is that SQL databases aren't message brokers? SQL servers have been abused as message brokers for decades. It's not the use case they're designed for, and while a plugin that integrates this type of functionality into PgSQL would be interesting, there's no reason to expect them to assume that problem space.
If you want the database to handle the incoming message streams, then set your consumers to call a stored procedure. That should alleviate your concern with conflicting data models.
Aside from real-time feeds, RDBMSs usually provide the functionality you are discussing, and some can provide real-time change notifications to clients, as well.
> Access control on modern databases is too course. You want to whitelist which queries a user is allowed to make and you want fine-grained permissions around updates
Only allow direct access to stored procs, not queries. Or restrict access to specific views and use rules (https://www.postgresql.org/docs/current/static/sql-createrul...) but intuitively that seems more dangerous (with CTE, I believe SQL is turing-complete) and completely unnecessary.
> Databases only talk custom binary TCP protocols, not HTTP. Not REST. Not websockets. So you need something to translate between how the server works and how the browser works.
I think databases have been so wildly successful because they were actually based on some reasonable research about data access models and set theory. I encourage anyone who wants to criticize the relational model to start with E.F. Codd's "A relational model of data for large shared data banks".
The PDP-11 were considered minicomputers, not mainframes, like the IBM System/360. The notion of midrange computing still exists with IBM's iSeries (AS/400), as well mainframes like IBM's zSeries. One could argue that some of these definitions get mushy, but I haven't ever heard disagreements about where the PDPs, AS/400s, and System/360s of the world live in this hierarchy.
This seeming error was an early red flag in this article; having some good facts but having an incomplete picture. While some of the historical observations of what computing was is warranted, this same sort of not having a complete enough picture to draw correct conclusions shows up again in some of the main theses presented.
Consider the statement about the coarseness of permissions. In many modern database systems this simply isn't true. What is true is that is the fully set of security features offered database systems are 1) not widely understood; 2) not bothered with by developers that choose to implement security elsewhere.
Clearly the author spent some time with article and it is well structured, it's simply a matter of the author not having spent as much time getting to the heart of the matter as he/she should have.
> What is true is that is the fully set of security features offered database systems are 1) not widely understood; 2) not bothered with by developers that choose to implement security elsewhere.
That statement applies to the full feature set of modern RDBMSes, not just the security features in particular.
I'm always surprised that I seem to be about the only one (or at least one of very few) on my devops team of ~30 who's comfortable with issuing SELECT statements on the production DB to investigate issues (and sometimes UPDATEs to clean up a mess that users made). And that's not even touching all the crazy stuff (stored procedures, triggers, access permissions), just CRUD (or in this case, INSERT-SELECT-UPDATE-DELETE).
In the mid-1990s, some of the most complex web applications were implemented inside the database! I was the technical lead for a production web app written in... 250,000 lines of PL/SQL. This model had some significant advantages and worked surprisingly well considering that Oracle was not designed with that (ab)use case in mind.
So why didn't this model become common given that it was relatively elegant and capable? A few reasons:
- It required developers to be sophisticated at both using the database and creating web front-ends, since they were inextricably mixed. Even today, most developers are strictly one or the other, not both.
- The tooling inside the database was not designed for this use case, so while the architectural model was elegant, the development environment was piggybacking on functionality designed for reporting systems to drive interactive websites. This got better with time but by then no one cared. For a minimal website, hacking together a couple Perl scripts had a lower learning curve but was less capable.
- At the time, only a couple databases had the level of sophistication and features to make this feasible. Like Oracle. The upfront licensing costs were outrageously high, so there was no cheap way to bootstrap or incrementally grow into your application.
A 2017 version of this would work very well, database engines have much more sophisticated capabilities than back then that would make the development and operations experience pretty efficient and nice. As a practical matter no one designs web apps this way any more so there is no market for it.
> As a practical matter no one designs web apps this way any more so there is no market for it.
Well, the problem is that I'd say PL/SQL isn't a good choice of language to implement a web templating engine. And the web framework selection is ... lacking. And its hard to hire engineers who know PL/SQL.
But even if the database has a native JS engine or something there's also a separation of concerns problem here. But if the database simply exposed an event stream and versioning information from queries, we could write our applications in whatever language we want. They can listen on the database's event stream and invalidate caches or eagerly re-render the app as data changes. And if you have a kafka-style event log they can live on other machines and run in separate processes.
From any angle I think the samza/datomic model is simply a much nicer architecture to do this sort of thing. More ergonomic than the pl/sql approach, and certainly much nicer than what we have now using databases as dumb stores.
Databases are a miracle product. If you think of an application as a car, the database is the engine.
The idea that you have a platform that can do everything without the abstraction of a separate data storage/query platform, that exists too. I'd argue that FileMaker, Lisp, MUMPS, and a few others basically do this in different ways. I used to be a DBA at a company where the entire company ran on Informix 4GL code (which was sort of like the Informix version of PL/SQL) within the database. Also a similar approach.
But... they also have significant drawbacks. You're permamently married to that app/database stack. If any component of the system doesn't scale... you're fucked.
By chunking out the solutions to include databases, app tiers, etc, you gain complexity but lose a lot of risk. If you cannot afford Oracle anymore, you can invest in labor to move to Postgres. If you're hitting a limitation with MySQL, you can move to Oracle. If you wrote your app in PHP, it goes viral, and you cannot scale it, you can migrate to a Java Application Server layer.
> Databases are a miracle product. If you think of an application as a car, the database is the engine.
Yeah - you're totally right. Thats kind of why I'm complaining about them - because we use them so much I think the potential benefits from improving more them are massive.
The modern web stack is a mess, and changes all the time. But there's a handful of features we need in just about every application - login, update feeds, data-based rendering (computed views), etc. But stored procs aren't good enough - they're inaccessible, hard to debug and usually require their own (non-standard, barely known) language to develop in. So lots of developers write functions outside of the database through multiple queries, and in doing so will often break the database's transactional consistency model.
Databases have managed to stick around with us throughout the evolution of the web. I'm being critical, but I'm doing it out of love. I'm saying that databases have the potential to provide so much more to the modern web ecosystem than just data storage. They could do access control, and run stored proc-equivalent functions or computed views from whatever languages my app is written in. If I need some complicated data flow where data needs to be updated in realtime in a secondary db (like elasticsearch), that should be dead simple to do too. I didn't get into it in that article but I believe we should be able to extend our databases to share data flow logic between applications. Or just build a lot of this stuff into databases so its reusable between projects.
> All because we're programming against a frozen database spec.
Relational databases, unlike XML, JSON, Key/Value stores and ORMs, do not pre-suppose document structure. On top of that, it is very easy to create new relations (entities) using Views. On top of that, you get a real algebra to play with: relational algebra.
SQL makes it crazy easy, in real time, to see your data in any hierarchal manner you like (via denormalized entities).
The one thing SQL 'lacks', and JSON shines at, is a way to return data in a hierarchical format (aka: to return JSON directly). I have 'lacks' in quotes because there are SQL solutions that can consume/spit out JSON.
An interesting idea then is to provide a way to easily convert between SQL and JSON. To that end, there is an open source project https://github.com/erichosick/sql-json that attempts this. The results are promising but there is a lot of room to grow.
My point about the frozen spec doesn't have anything to do with the data format. Its that modern databases don't expose the API that modern web applications actually want, like:
- Integrated access control
- Realtime updates (sorta, kinda)
- Resumable event logs
- The ability to do computed views in my language (eg, make a computed view using a nodejs react static rendering process)
Because databases lack these features, modern web developers often end up rewriting them at the application level, for every app.
I don't really understand his point. He says that SQL has been around for 40+ years, as if that's a bad thing, and doesn't really say anything else.
I guess he is complaining about the idea that the user's input doesn't get insert directly into the database, i.e., the connection would be browser->db, and instead we need an application component that reads from the browser and shapes to the DB.
At this point, we must ask if the author has ever worked with one of the thousands of applications that implements their "API" through stored functions. I won't condemn this method wholesale, but people have mostly moved on to more flexible representations for good reason.
The whole article seems predicated on a belief that most work should be on the consumer workstation, which is why he starts by talking about how much better it is that people now have desktop computers instead of clients that connect to a mainframe in the basement.
The author apparently doesn't grasp that the web acts just like those old terminal clients he refers to in beginning of the article. There's a big server running the application in someone's datacenter, and your browser is a thin client over the top of it, an interface into its inner workings.
He blames this on databases. (???) The browser doesn't provide the mechanisms to directly connect to arbitrary protocols.
That said, I think having the database server worry about being a database is a good thing. In my career I've had a few projects that I've been too ambitious with; by trying to do too many things it failed to do any of them well.
>Databases only talk custom binary TCP protocols, not HTTP. Not REST.
Let's pretend that the database can now talk via HTTP/Rest. Is the database now responsible for handling business rules? Is it responsible for per row authorization/authentication? How does this impact performance? What if we want to export the data in another format, say into an excel spreadsheet, should it be responsible for that as well and the formatting? Where is the line drawn?
>protect against SQL injection attacks.
I mean, how would the database know the difference between a legitimate request that should be allowed and one that shouldn't? This is the point of parameterization.
>Check for XSS
I think it's plenty legitimate for a database to return some HTML data, how would the database know when it's malicious or not?
-------------------
In full I think the grievances the author raises are with middleware, not a problem of the database.
I haven't read all the comments here yet, but I'm going to throw this out anyway...
My confidence in the author took a hit at the point he called a PDP-11 a "mainframe". One would think a computer scientist would know what computers fall into what "generation". I guess "history of computation" is just not taught as part of such a degree anymore...?
But...I decided to read on, thinking maybe things would get better, and to give him the benefit of doubt. I think somewhere in there was maybe a few points to think about, but ultimately it almost looks like he has some weird problems with "separation of concerns", and maybe doesn't understand why that would be a bad thing for scaling...
...which again, I find odd for a computer scientist.
Furthermore, he seems to ignore the great amount of improvements and changes which have occurred in the database software/engine and server world; today's DBs and DB systems are -nothing- like they were back in the early 90s when I started my career (as a fresh high-school graduate). Yeah, we still used VT-100 terminals (then quickly transitioned to PCs - running VT-100 terminal emulators, of course), and things were starting to transition to PC apps communicating to the servers - and I am sure there were SQL injection issues (and no, we didn't think about that) - but things have, over the decades (yeesh - getting old here!) have changed for the better!
Could they be better? Certainly! Are there things the DB server could be doing to make life easier for the app? Yes (and some of this has been implemented - ie, when developed properly, your queries can be "sanitized" at the DB server level - but you know, you should still do this at the app and browser level too - just in case). Realtime updates and notifications? That's pretty much there as well - but ultimately, a lot still has to be done at other levels.
It isn't that "everybody emulates the VT-100 for some reason" but that the VT-100 was the first terminal to support the ANSI standard for control codes and that that standard has been evolving ever since.
Yep, I also wanted to add that if you use Mac, or Unix you most likely don't use VT-100.
Yes, they all support VT-100, but that's because it became a standard, and essentially any device understands it. For example if you connect to a network switch you know that it will at least support VT-100 so you have a common protocol to communicate with it, and be able to switch to a modern protocol if the device supports it.
I feel like the author is a front end developer and just hates the fact that JavaScript in the browser can't communicate with a database server directly.
I don't understand though, if there is a need for something like that, and it is also possible that database could provide a generic enough interface that would work, then there must be an application that you install together with the database which would do exactly what he wants.
There's no need to add this functionality to a database, because it only adds a complexity that majority of people who use a database won't need.
> I feel like the author is a front end developer and just hates the fact that JavaScript in the browser can't communicate with a database server directly.
Author here. I hate the fact that here in 2017 we still write login code, again, for every application. I hate the fact that if I want to trigger re-renders, or run workers off database changes all my options suck.
> it is also possible that database could provide a generic enough interface that would work
Exactly. It doesn't have to suck - our databases could provide a nice interface to do all this stuff cleanly instead. We just need a resumable event stream, with versions and some other API tweaks and we can do computed views out-of-process of the database, written in whatever language I want. I'm firmly of the opinion that all databases should provide a Samza / Storm / Kafka / etc style event log.
If the problem was as simple you make it, you could write your backend code only once, then re-use it in every project.
Why you are expecting a relational database which has many more use cases than the web programming (and web is not even a primary one) to solve this difficult task for small use cases?
Anyway, there were attempts in solving this and those were called frameworks, which shows that hardly there's a one size fit all. If you use a framework and have complex project you reach a point where the framework will just stay in your way.
> Exactly. It doesn't have to suck - our databases could provide a nice interface to do all this stuff cleanly instead. We just need a resumable event stream, with versions and some other API tweaks and we can do computed views out-of-process of the database, written in whatever language I want.
Not really, a database as name suggests is meant for storing data, it's an already a difficult task by itself (if you want durability and still make it fast). What you need is a backend code that provides this functionality.
> I'm firmly of the opinion that all databases should provide a Samza / Storm / Kafka / etc style event log.
All the technologies you described are stream processors, while they work on data, they operate completely different than a database and they have different use cases. Postgres does have LISTEN/NOTIFY but that's meant more like a side channel for communication.
To be honest, this article seems pretty insubstantial. To the best of my ability to discern, the only concrete complaint is that access control is too coarse on modern databases, although it's not really specified in what way this is true.
And he gets a lot of things right – having a way to do queries against a database directly, being able to get changes directly is useful.
And GraphQL, for example, offers exactly that – there’s even a plugin to directly build a GraphQL API from your postgresql database. Although this still can’t send changes over the net, so you can’t have an always up-to-date view of the database.
But while this and Firebase¹ solve the problem of offering an API directly for a database, he’s missing to address the other task frontend servers do: They can render stuff on the server, in case you actually do have just a dumb terminal.
And that’s something that’s very useful for websites, as usually servers are more powerful than smartphones, and you need to do your computations somewhere. A web service applying deep dream to an image can’t run it in the phone, nor in the database – it has to run it on a specialized server
But it’s indeed a good question why there’s little academic research into changing the way web applications work. We’re already making databases directly open via GraphQL APIs, we’re separating statically hosted content into CDNs, so how can we combine this, and work truly "serverless"²?
[1]: Firebase is a great tool for prototyping your app, but if you want to run it it’s usually too expensive, and relying on proprietary Backend-as-a-Service technology has proven to be a bad idea already when parse.io shut down.
[2]: Serverless here meaning that you have no specialized application server – you have a general database able to handle all your applications, a general CDN, and all special code for the app is handled within the database, or triggers of it.
The only part of this I agree with is his comment on database permissions.
Every modern SQL database has a concept of users and permissions that are divorced from your application, you're left with three options all of which are flawed.
1. Handle security inside your application. This is the worst choice if users need to get a LIST of records they have access to and it's determined by something more than a simple WHERE owner_id = :user_id. Think multi-tenant applications where records can belong to a tenant, and users have access based on a org hierarchy or other criteria. Suddenly you're having to filter a whole list of records out in your application code, and this makes implementing pagination awful (do you requery until you fill up a page, or present a partial page?). You are also taking full responsibility for security, if you modify your queries to filter records out you open up to human error where someone forgets to filter this one query.
2. You implement a method to synchronize your application users with the database, and use the database engines RLS support to handle access control. This is probably the best approach for web applications, but the caveats that come from it still suck. You have to make sure the connection is set as the user performing the action, this is doable with PostgreSQL, MSSQL and Oracle at the least (SET ROLE / EXECUTE AS) without destroying your ability to use a single connection pool - but for all the security you get out of this your web application user still has all the keys and if that account is compromised or there is a flaw that can cause your application to not switch security contexts you just failed at protecting your data.
3. Just use the database directly, it's handles authentication and you never change security contexts for a session - you can safely utilize the RLS functions of your database without any real risk since that database session is fixed to the user it is handling. Downside, you just lost connection pooling and while pooling middleware like pgPool can help you still have Y more connections since you need to maintain a separate one for each user or tenant at best.
Approach 2 is by far the best we have, and you can make it safer by doing things like utilizing pg_hba.conf to limit access to the application user to the servers hosting your applications - but maybe you're using docker and IP addresses aren't fixed anymore (well, shit!). Also, how are you going to ensure the database connection is in the correct state when you make a query? Where are you going to plug that into your request pipeline?
I'd really like to see modern tooling around this problem, I don't know what exactly it would look like but it would be nice to have SOME improvement in the area.
A lot of it is technology stack dependent, unfortunately.
Option 1 can be implemented when using an ORM with decent support for something like Active Record scopes, with LINQ you could have a method on your model that returns a pre-filtered IQueyrable, with JPA you'd use the criteria API. If you're doing hand written SQL then have fun adding it to all of your queries or using some query builder to add the security filters. Either way, this approach is prone to human error.
Option 2 is a little simpler, assuming you understand the RLS functionality of your database. At the start of your HTTP request before you touch the database you SET ROLE to your tenant or user (you'll have to write some code to create these database users), and in some hook that is called when a connection is returned to your pool you'll RESET ROLE.
Personally, I've had a hard time finding anything decent on handling security in multi-tenant applications or where security goes beyond simple ownership checks. Maybe I'll get around to writing a series of articles on my blog someday.
Follow the money. The notion that databases have failed the web was put to Mike Stonebraker of Ingres fame at a seminar I attended back in the late 1990s. The actual question was this:
"Why aren't database vendors building object oriented databases for the web?"
Object oriented databases store binary objects as opposed to object-relational databases Mike pioneered with Postgres. The plan was for CORBA, common object request broker architecture, to shuffle versioned objects around the web and object oriented databases would store binary code objects.
Mike's answer? "Because the money is in transaction processing. Banks pay millions of dollars for transaciton systems. Transactions are the meat-and-potatoes of database sales."
In some sense database companies are wise for ignoring the web. Web companies are now hell bent on "move fast and break things." A transaction database is the antithesis of move fast and break. Transactions move deliberately with consistency. Durability says I can unplug my database at any time and bring it back up. This is the opposite mind set of move fast and break things.
Here in the valley whole QA departments are being disbanded in favor of LEAN and move fast and break things. All of this is antithetical to the database world of transaction processing.
At they say: you get what you pay for. What the enterprise companies are paying for is exactly what is being delivered. Open Source fits the every changing, move fast and break things of the web world.
To whit, you might see database vendors move into the web space some day if in fact companies are willing to pay millions of dollars for them.
Your argument may hold for Oracle and IBM, but the web crowd produced MySQL and Postgres, investing effectively millions of dollars of developer salaries and sweat equity, and those systems have evolved more towards features already in the commercial RDBMSes than towards a new, web-oriented DB flavor. Apparently web backend developers said "we want more of Oracles features" and not "we want the DB to handle these web-specific issues". At least, not until NoSQL systems like Mongo came around.
Look, kids. You are going to have to have an execution layer you can trust somewhere, you can't just expose your data store directly to the outside world or you run into an infinite number of security issues[1]. (Not that this is stopping people from doing exactly this these days.)
That somewhere is going to be a server side execution environment, and it will be separate from your data store so you can scale these concerns independently. Additionally, you are probably going to want a DSL for your data access as well as a highly tuned indexing system for your data store.
We have systems that do this, they are called databases. Are they perfect? No. Nothing is. But the idea that they have "failed the web" is so over-the-top childish that, like the author says at the end of his post, I'm not inclined to be charitable.
> Look, kids. You are going to have to have an execution layer you can trust somewhere, you can't just expose your data store directly to the outside world or you run into an infinite number of security issues
The security issues your source points to in new, SQL-like (but not SQL) general query APIs (GraphQL, etc.) are actually already fairly completely addressed in modern relational databases, which are designed for multiuser access with users having different permissions. So exposing the datastore directly solves, rather than causes, the issues it raises with common generalized near-SQL query APIs.
You can't expose generalized query functionality without either encoding the security constraints in the datastore query layer (which is really, really hard, we are talking column-level security constraints and worse) or limiting the expressiveness of your query layer.
That's the trade off and, in most real world systems, the data stores are wide open to the developer because expressivity wins. Yes, the technology is there to secure databases at the row level, for example, but people don't use it. They just use a user/pass with total access.
A lot of folks are tossing HTTP endpoints on data stores without thinking through the security/expressiveness tradeoff.
> You can't expose generalized query functionality without either encoding the security constraints in the datastore query layer (which is really, really hard, we are talking column-level security constraints and worse) or limiting the expressiveness of your query layer.
Column and even row-level security constraints are not "really, really hard" to use in modern DBs, and you have to do the analysis of what people should be able to access and effect on that level to build the app, no matter whether you are implementing the constraints through a separate application layer or through the DB itself.
A user can only see the salaries of all his direct reports.
Again, there's a reason why almost every web app built in the last twenty years just used user/pass credentials and didn't implement db security beyond that: expressiveness.
But I'm not gonna change your mind on it today. Just think about it for a while.
> A user can only see the salaries of all his direct reports.
With an employee-salary table, that's a fairly simple row-level security rule.
> Again, there's a reason why almost every web app built in the last twenty years just used user/pass credentials and didn't implement db security beyond that: expressiveness.
I agree that there is a reason, but that's not it; there is a combination of the lack of features in popular (particularly free) databases in the early part of the period (heck, MySQL/MariaDB still doesn't support row-level security), inertia, and lack of database knowledge among web app developers.
> But I'm not gonna change your mind on it today. Just think about it for a while.
This is not a new issue for me; I have been thinking about it for quite a long while.
The traditional RDBMS have failed because are all-or-nothing.
A lot of people here (and elsewhere) think is non-sense to build a full app with the full business logic inside the "db".
WHY?????
That is a VERY NARROW viewpoint.
But when we say "let's build a full-app inside a Virtual Machine, yeah that actually is ok!"
And why is ok to build a full app on lisp?
Or in a OO language (a grah of objects)?
Or a array language (a array is relation with 1 column!)
If you think that:
print([1, 2, 3])
Id OK. then YOU MUST ACCEPT THAT:
print([Code = 1, 2, 3; Name= Miami, New York, Bogota])
Is ALSO OK.
The relational model is just move from 1-columns arrays to 2 N-Columns (In rows of columns as internal storage) plus some universal operations.
WHERE THE RDBMS FAILED EVERYONE IS:
Because them (the guys at the DB side) insist in adding: transactions, triggers, Surrogate-Keys, Inter-Relation dependencies, storage, sub-query languages, catalogs, views, etc.
So at the end, you get a full half/big semi-OS virtual machine tailored to a specific niche.
------
Was only when the artificial divide between the RDBMS and the front-end language appear (and the death by MS of Fox/Vb to only focus in .NET) that building database apps start to suck big time.
I have talk about in HN before about this, and instead consider that make even MORE sense to build the logic inside the DB, however, is necessary to re-think how it look to make it more useful. Is not a novel concept. The dBase family was almost that, and the people like me that use it was very happy and productive.
Why make more sense? Because Program = Data + Algo.
Data is not to be treated as third-class citizen. Must be a first class. The relational model make it first class (as with lisp model and array).
And what about separation of concerns and all that? That is pure architecture and is tangential to be or not inside a DB, the same is tangential inside a VM.
I would argue that databases tried to do too much for too long. If you want a database that has a built in web server, rest apis, security, etc...well Oracle has had that for years. Problem was, that wasn't what people wanted (or wanted to pay for).
The more recent trends have been to decompose the database into something that is great at storing and retrieving data...leaving all of the other stuff to products that do that well.
I find the DB part to be the only sane part of my stack these days. It's funny though, I got the hang of functional programming by cross-training in SQL first. Writing attribute expressions is a lot like writing pure functions of immutable data.
No plumbing is required in modern architectures such these of SAP HANA, Starcounter, or Tarantool. When the application server and database are combined, the access control is arbitrary, databases talk whatever you want, no need to torn the code apart into stored procedures, backend and frontend code. Plus to this, lots of unnecessary moving parts are removed in such architectures, so that the whole thing runs on 2 servers instead of 20, while the code is simpler than ever.
The referred article needs clarification. What it really addresses are the flaws of conventional software architectures. Fortunately, this critique does not generalize.
So you want to implement a new feature, and test it locally, then run tests to make sure you didn't break anything, and when you are done, upload the changes to the production server. This is sane devops, nothing fancy, yet impossible with todays databases. You basically have to manually do the same changes you did in development to the production database, then test if it works on live data in production. Then your finger might slip when writing an SQL query and you have to reset all production data from backup.
Anywhere between maybe 1-2 man hours and multiple man months, depending on how nice the data is right now, and how accurate it need to be split. The optimal case would be the application already enforcing values to look like e.g. "last, first", so that the migration can happen via a simple regex.
Stack Overflow runs everything across what, 4 MS SQL Servers in total?
How has that "failed the web?"