Message DB: Event Store and Message Store for PostgreSQL

kiwicopple · on Dec 17, 2019

For anyone just looking for the pub/sub functionality, I have been developing something that provides the functionality for PostgreSQL: https://github.com/supabase/realtime

It's an Elixir server (Phoenix) that allows you to listen to changes in your database via websockets.

Basically the Phoenix server listens to PostgreSQL's replication functionality, converts the byte stream into JSON, and then broadcasts over websockets. The beauty of listening to the replication functionality is that you can make changes to your database from anywhere - your api, directly in the DB, via a console etc - and you will still receive the changes via websockets.

I'd have to dig into their implementation to verify, but the main benefit over OP is scalability - i believe their implementation is tightly coupled to postgres, but with this you could scale the Elixir servers without any additional load on your DB

It's still in very early stages, although I am using it in production at my company and will work on it full time starting Jan

adamfeldman · on Dec 17, 2019

Reading the WAL is interesting. I've been other tools [1] accomplish similar ends using Postgres triggers

[1]: https://github.com/hasura/graphql-engine/blob/master/event-t...

kiwicopple · on Dec 18, 2019

Hasura is an amazing tool. I tried with triggers + NOTIFY at the start, which was how Hasura does it (at least last I looked). I ran into the limitations of NOTIFY, which can only have a 8000 byte payload.

The recommended resolution online was just to send a NOTIFY with a Primary key or identifier, then to query the database for the resource that changed. I didn't like this approach because it required more load on the database.

One other advantage of using the WAL is that you don't have to set up then triggers everytime you set up a table. You just set and wal_level to "logical" on the database and then all your tables will be included in the stream replication

ralusek · on Dec 17, 2019

Is this better than postgres's native NOTIFY?

kiwicopple · on Dec 17, 2019

I actually started with NOTIFY but the payload has a limit of 8000 bytes. Postgres throws an error for anything longer.

manigandham · on Dec 17, 2019

NOTIFY just sends a pub/sub message. You still need a trigger or something else to actually call it on updates, and it has payload size limits.

jtdev · on Dec 17, 2019

Like...a trigger?

Tostino · on Dec 17, 2019

Yeah...triggers can have significant overhead though while sending a WAL stream to another server which will process it is super low overhead.

tyre · on Dec 17, 2019

Do you know of anything like this for SQLite? I have a need for exactly this. Are there similar triggers?

kiwicopple · on Dec 18, 2019

It's a good question - I don't know the specifics of their implementations, but there are SQLite adapters for RxDB and PouchDB which both have a focus on realtime functionality. YMMV but I found them both pretty good for small implementations

frafra · on Dec 17, 2019

Similar, using triggers, Python, aiohttp + asyncpg: https://github.com/frafra/postgresql2websocket/

fjp · on Dec 18, 2019

Any reason for your choice of asyncpg vs aiopg? Curious as I have worked with both

hashhar · on Dec 17, 2019

Look at embedded Debezium. https://debezium.io/docs/embedded/

Very similar to what you are doing but instead of Websockets you can decide where to send your data. The core-debezium sends it to Kafka.

kiwicopple · on Dec 17, 2019

I took a lot of inspiration from Debezium - at the time they required the wal2json plugin (not sure if that's still the case?). I didn't have an option to install the plugin and I mostly wanted a websocket-friendly implementation to replace Firebase.

And if I'm 100% honest I just wanted to make something i thought was cool

gunnarmorling · on Dec 21, 2019

By now the Debezium Postgres connector also works with pgoutput (coming with Postgres by default as of PG 10 and later).

Disclaimer: working on Debezium

bobbydreamer · on Dec 17, 2019

Why replace Firebase ?

kiwicopple · on Dec 18, 2019

Consolidating our codebase - most of our infra was already built on top of Postgres and we were just using Firebase for realtime chat.

pythonaut_16 · on Dec 17, 2019

This is pretty cool.

Mnesia (one of the databases built into Erlang) has a similar ability to subscribe to its tables that I've used for realtime updates in some demo projects.

adamfeldman · on Dec 17, 2019

Very interesting. This uses "Postgres server functions that can be used with any programming language or through the psql command line tool. Interaction with the underlying store through the Postgres server functions ensures correct writing and reading messages, streams, and categories." (https://github.com/message-db/message-db#api-overview)

In Elixir-land, there is https://github.com/commanded/eventstore (which AFAICT is closely tied to the Commanded CQRS/RS framework https://github.com/commanded/commanded)

PankajGhosh · on Dec 17, 2019

On behalf of a technology team, looking to incorporate a messaging platform to our landscape: I am curious to understand the motivation behind implementing a messaging system on top of a database technology. There are robust offerings both for small scale and large throughput systems. What are the benefits of this project over other implementations(RabbitMQ, Kafka, Celery, ActiveMQ, ZeroMQ, SNS/SQS)?

sgk284 · on Dec 17, 2019

At my previous startup we put everything we could into the database (work queues and pup/sub included). It was magical.

It minimized complexity (moving parts, boundaries, etc...), everything was transactional, and a single dump of the database gave you a copy of the entire persistent state of your system at that moment (across all services).

We had several services running, but any notion of persistent state was stored in a single database. We didn't have particularly high demands, but processed ~140M jobs per month in addition to our normal query load.

Postgres handled this like a champ, had far lower p50 and p95 latency than SNS/SQS, and rarely went above single-digit percentage CPU usage on a fairly cheap DigitalOcean box.

I've worked at a lot of big tech companies (Google, Microsoft, Twitter, Salesforce) in many varieties of giant distributed systems and the most valuable lesson I've learned over and over again is: Distributed systems are hard, avoid them until you can't.

That's the best advice I can give to any startup. Until you're regularly sustaining 1,000's of TPS or have grown to dozens of TB of data, it is generally a distraction to even think about anything other than a single relational database.

manigandham · on Dec 17, 2019

That's great to read, and a perfect example of solving real problems instead of wasting time on exotic infrastructure.

The vast majority of companies and their scale will never go above a single midsize database server. All of the transactional, backup, and querying functionalities you list make it much more productive than using fancy AWS/cloud services.

djrobstep · on Dec 17, 2019

We did a very similar thing where I used to work. Didn't want to bother with a separate task queue and wanted a consistent SQL interface for everything.

We wrote a very simple task queue (<500 lines of python) using a postgres table and inbuilt pub/sub. Ran millions of jobs, scheduled and unscheduled, without a hitch.

simplecto · on Dec 17, 2019

> Distributed systems are hard, avoid them until you can't.

thank you. May I borrow this?

AtlasBarfed · on Dec 19, 2019

Yes, but why architect yourself into a more painful corner when you reach "can't"?

sgk284 · on Dec 17, 2019

Please do!

vvern · on Dec 18, 2019

This! So many wasted years of engineering making correct applications on top of Cassandra/Kafka/Elasticsearch when a single Postgres server could have handled everything no problem.

Scalability for a tech product should be a concern at some point, but if you don’t have product market fit it doesn’t matter. The complexity of building a distributed system right out of the gate will cripple rapid iteration or lead to a broken product.

traviscj · on Dec 17, 2019

> put everything we could into the database (work queues and pup/sub included). It was magical.

I've been building systems on some pretty similar ideas for the last 5+ years, and have a similarly optimistic outlook on it. Just to add another datapoint here, we've been able to scale them up to 20-40x higher than your figure (a couple thousand jobs/sec) on a single (albeit beefy) MySQL instance. It definitely takes a fair bit of care & we certainly have some time invested in it, but it can be done.

I guess my services haven't had (gotten to? :-) ) to share databases with other services for quite some time -- they did when I started my job! -- but even with that concession, the benefits of DB guarantees have been amazing. There's nothing like a transaction for preserving your sanity.

The best tactic I've learned from these systems is to think really hard about the batching & partitioning behavior to keep msgs/sec high enough & transactions/sec low enough to stay afloat.

BenoitEssiambre · on Dec 17, 2019

To add to this, the numerous, impossible to solve issues that happen when your data starts hopping all over the network are proven mathematically:

https://groups.csail.mit.edu/tds/papers/Lynch/podc89.pdf

MaxBarraclough · on Dec 17, 2019

For some perhaps-off-topic contrarianism, what do you make of Uncle Bob railing against database-oriented architectures? [0] (He takes a good few minutes to get his point across.)

[0] https://youtu.be/o_TH-Y78tt4?t=2566

ianamartin · on Dec 17, 2019

I can't watch that at work right now. But is it pretty much this? https://blog.cleancoder.com/uncle-bob/2012/05/15/NODB.html

I've heard of him before but never really read anything much by him. Is this guy for real? He worked at one startup, got angry about a database, became a consultant, and screams about everyone being wrong about everything?

MaxBarraclough · on Dec 17, 2019

Yes he makes similar points there. A few scattered critical thoughts:

* I don't see any problem with stored procedures. They can make good sense for, say, auditing, as well as for performance.

* People describe their software systems as "Using Oracle" because it matters, not because their design is stupid. It tells me that Oracle skills are relevant, for instance.

* NoSQL is generally, in my experience, awful and chaotic, rather than liberating. Lots of good work has gone into making serious grown-up relational databases. Schemas, normal forms, constraints, rigorous work on the ACID properties. Something like MongoDB is just sloppy amateur-hour by comparison.

* In the YouTube video, he says that solid-state drives render relational databases obsolete. This strikes me as absurd. Not even ultra-fast SSD storage technologies will do that. The relational model is effective, and the associated DBMSs still well justified. Joins belong in a query language, not in imperative code. Why would you want to try managing a huge complex dataset manually?

[0] https://en.wikipedia.org/wiki/3D_XPoint

(Edit: formatting)

AtlasBarfed · on Dec 19, 2019

Yeah, I have changed my opinion of stored procs to :

If you have a proprietary database, stored procedures are awful. AHEM ORACLE.

NotSammyHagar · on Dec 18, 2019

Yeah, seems like he vastly overstates things. ssds are great for relational databases. They all use them.

jbergens · on Dec 18, 2019

I think the point is that you might be able to build something else than a standard relational db and make it faster.

Something like VoltDB but there are other solutions. There used to be some blogs about their architecture and how they could avoid a couple of steps a normal RMDB needs that takes a lot of time.

https://www.voltdb.com/why-voltdb/

MaxBarraclough · on Dec 18, 2019

> I think the point is that you might be able to build something else than a standard relational db and make it faster.

I sincerely doubt that.

For it to be useful in real world situations, you'd have to build your own highly flexible, scalable, rock-solid, high-performance data-management solution, ideally one which enables the user to use a declarative means of expressing queries.

This is, of course, a DBMS.

If you build one which imposes structure on the data, you've got either a relational DBMS, or an object DBMS, or some other kind of well-studied database solution... except you've rolled your own from scratch, without input from database experts, so it's going to be a disaster.

Unless you're someone like Microsoft, Google, or Amazon, you pretty much can't build one of those. It costs tens of millions. It takes a huge amount of testing, for obvious reasons.

I really don't see any argument for not using a mature database system for managing a typical company's data.

Of course, if we aren't talking about a full-scale database that has to cope with a huge amount of messy mission-critical data in a changing business environment, then the game changes, and sure, you might have a chance just writing something yourself.

Netflix's video-streaming CDN ('OpenConnect'), for instance, obviously isn't powered by an RDMS. They put in a huge amount of highly technical work building their own finely tuned technologies to pull data off the disk and get it to the NIC with minimal glue in between. But that's Netflix, and virtually no real companies face that kind of challenge.

Also, VoltDB looks like a streaming DB technology. Isn't that both an DBMS (rather than a means of rolling-your-own), and very niche? I don't see how it's relevant here. If it's really able to serve business needs better, then great, but it's still a complex-DBMS-as-a-product.

tincholio · on Dec 17, 2019

> became a consultant

To me, he sounds more like a snake oil salesman.

code_biologist · on Dec 18, 2019

I think his take on databases is moronic with a dash of common sense:

He rails against a strawman where people put SQL everywhere, into views, into application logic ("mail merge" is one he mentions), and DBAs gatekeep everything.

The common sense part: no, your views shouldn't be composing SQL with string interpolation.

The moronic parts:

- He venerates application logic and regards the data model as a detail, but data models aren't details — they tend to outlive application logic.

- He holds up in-memory data structures as a platonic ideal, but doesn't address the things databases provide for you: schemas, constraints, transactional semantics and error recovery, a clear concurrency story.

gigatexal · on Dec 17, 2019

Fascinating, was there ever more written about your tech stack?

kiwicopple · on Dec 17, 2019

Not OP, but we do something similar it our startup. PostgREST (http://postgrest.org) definitely helps if you want to go down this path as it exposes everything in the database (including functions etc) via a RestFUL api.

A bit outdated now, but some details here: https://paul.copplest.one/blog/nimbus-tech-2019-04.html#tech...

barrkel · on Dec 17, 2019

Of course, at the point where you do need to break out beyond a single DB, life is more painful because you've got dependencies on the database all over the place...

y4mi · on Dec 17, 2019

That's incorrect. It's trivial If you have a clearly defined API with which to extract or insert data.

It only starts to leak if everyone accesses the the DB through raw sql, mishmashing modules and creating insane dependencies.

Give each module ownership to a table and demand to only access it through there and the backend storage system becomes irrelevant

barrkel · on Dec 17, 2019

If you put everything in the database, and use it for all it's able to do, as the comment I replied to suggested, it is practically inevitable under startup feature pressure conditions that total modularity won't hold. And in fact I think it would be irresponsible to pursue total modularity; it would court failure.

I believe you're being deeply naive about modules owning single tables. You forego relational integrity and almost the whole power of relational algebra if you take that approach. That's heaps of functionality to leave behind when you and one or two other guys need to crank out a feature a day.

y4mi · on Dec 17, 2019

You obviously need to write queries which span tables (I.e. joins).

I can see how you misunderstood me there however, it was admittedly poorly worded.

What I was talking about was basic data hygine as it's often called. If you need specific data, you define a clear way to get this data and only access it through that API. This can be a class, method or anything your language of choice prefers.

If you skip that step and directly access the DB everywhere in your code, you'll create another unmaintainable dumbsterfire as soon as your team goes beyond the initial programmers.

barrkel · on Dec 17, 2019

Having scaled up the tech in a company from $0 to $10+MM ARR, from zero customers to over 300 million records a day, I think you are simply incorrect and uninformed about the trade-offs that make sense in a startup.

NotSammyHagar · on Dec 18, 2019

so you are the only one with this knowledge, sensei?

nthj · on Dec 17, 2019

You hit on the real reason architects push microservices in smaller companies: it limits options to engineers and to the startup by putting up walls in the form of network rules that are resistant to change.

A startup needs all the options on the table because the alternative is they go out of business and there is no startup. This idea that we can enforce “good architecture” (subject to interpretation) with technology by isolating junior engineers with networking rules needs to at least be more transparent about its motivations.

barrkel · on Dec 17, 2019

Services permit scaling development teams. In a startup where you're all in the same room, lots of separate services don't really make sense, and you probably don't know where to make the right cuts even if you tried. When you want to grow beyond the one big room, with a lot of teams, potentially in other time zones and countries, then you want to be able to carve off services so teams can own them.

sbellware · on Dec 18, 2019

The problem here is that an unmitigated monolith with no domain partitioning in its data model can't be transformed into a service architecture by carving off pieces.

For pieces to be able to be carved off, they already have to be autonomous.

What usually happens is that devs without experience in service architectures presume that service architectures are probably just the things they are used to seeing, ie: monolithic entities.

It's usually then that we hear things like "extract the product service". The problem is that product is an entity, and entities are the last thing to build services around. That's how we end up with distributed monoliths, and subsequently failed service architecture projects.

In order for an app to be able to transition to services, the app has to be designed this way from the beginning.

And yes, it's definitely possible to know what the model partitions should be up front. They're very natural divisions. But they can't be arrived at by looking through an entity-centric lens. And unfortunately, forms-over-data apps very rarely provide us with an opportunity to learn about the "other" way to do it.

ethangarofolo · on Dec 17, 2019

Message DB happens to be implemented using a RDBMS, but the streams in it end up being very clear partition points. Some thought would be required to move data to a different database, but an event-sourced model isn't the same as coupling through a traditional RDBMS schema.

Edit: Fixed a typo

Thaxll · on Dec 17, 2019

Running a PG instance on DO for a startup is a really bad idea, I don't understand how can choose that over a managed solution.

If you're a startup just use SNS / Kinesis / Google pub sub ect ...

sbellware · on Dec 17, 2019

Or managed Postgres on AWS Aurora, AWS RDS, Google Cloud SQL, Heroku, etc :)

SNS, Kenesis (Kafka), Google Pub/Sub are awesome. Not the same problem/solution fit as an event store, but awesome for the scenarios and architectures they're targeted at.

ziftface · on Dec 17, 2019

Is there a difference from a programming perspective between, say Kafka, and the event store linked in this post?

I understand that Kafka can scale horizontally and can handle crazy throughput, but I mean from a programming point of view, the idea of a unified log as a data model applies to both, correct?

ethangarofolo · on Dec 17, 2019

There are some similarities, and there are definitely worse technologies you could choose for a message store than Kafka. It's worth calling out the difference between event-sourced and event-based. The former is necessarily the latter, but that doesn't go in the opposite direction.

Event-based just means that communication happens over events. Event-sourced means that the authoritative state of the system is sourced from events. If the events are literally the state, then how those are retrieved begins to matter.

Kafka breaks down as a message store in 2 key ways that I mentioned elsewhere in all these threads.

> The first is that one generally has a separate stream for each entity in an event-sourced system. Streams are sort of like topics in Kafka, but it would be quite challenging to, say, make a topic per user in Kafka. The second is Kafka's lack of optimistic concurrency support (see https://issues.apache.org/jira/browse/KAFKA-2260). The decision to not support expected offsets makes perfect sense for what Kafka is, but it does make it unsuitable for event sourcing.

If my only tool were Kafka, then I wouldn't be able to use the messages in the same way that I can with something like Message DB. And that's okay, different tools for different jobs.

tyri_kai_psomi · on Dec 17, 2019

Can you elaborate on why it is a really bad idea?

Thaxll · on Dec 17, 2019

Because running you're own DB on a VPS means:

- Dealing with sec upgrade for PG and Linux

- Dealing with backups

- Dealing with HA, so settings up slaves / replica, then what do you do when something goes wrong? Do you manually SSH and do some magic?

- Network security / usernames / passwords

- ect ...

Clearly what startup don't want to do and usually lacks expertise into.

ethangarofolo · on Dec 17, 2019

The technologies you listed are message brokers, while Message DB is a message store. The former transport messages, and the latter is a database specialized for storing message data and sourcing system state from those messages. Kafka can, for example, move a lot of events around, but it isn't suitable for event sourcing for 2 key reasons.

The first is that one generally has a separate stream for each entity in an event-sourced system. Streams are sort of like topics in Kafka, but it would be quite challenging to, say, make a topic per user in Kafka. The second is Kafka's lack of optimistic concurrency support (see https://issues.apache.org/jira/browse/KAFKA-2260). The decision to not support expected offsets makes perfect sense for what Kafka is, but it does make it unsuitable for event sourcing.

Being built on top of Postgres, Message DB gives you access to event sourcing semantics using a familiar database technology.

We use Message DB in our production systems, and I'd be happy to talk more about it if you have other questions. We've found it very reliable.

As a disclaimer, I'm listed as a contributor at the Eventide Project, the project Message DB was extracted from, though I did not write any of the code behind Message DB.

iamspoilt · on Dec 17, 2019

I think the biggest difference is that it’s a Message Store as highlighted by the preceding comment.

Some clarity by example: In financial systems, it’s extremely important to keep track of all the transactions between your microservices (if your architecture is based on that). You could potentially lose a message delivered to you via a broker if your service fails to write it to a persistent storage. If the producer of that message never stored that message or it was produced on wire and transmitted, there is no way to recover it anymore. A system designed around a Message Store can mitigate such problems. You can build that similar architecture with brokers as well but for each of your application, you will have to implement something analogous to a Message Store to handle idempotency and things like that.

h91wka · on Dec 17, 2019

Incorrect. Kafka is a message _storage_. The server can be seen as a distributed persistent append-only log. All broker logic is encoded in the client. Source: maintainer of one of client implementations.

SahAssar · on Dec 17, 2019

Is this using NOTIFY/LISTEN to stream messages or some other way to get new messages as they arrive?

ethangarofolo · on Dec 17, 2019

Good question. Consumers poll for updates. One of the stored functions in Message DB is designed for this very query.

Polling sounds very crude, but for the systems Message DB is designed for, it's a virtue. No back pressure problems, for example.

paulryanrogers · on Dec 17, 2019

Biggest wins I can imagine:

Atomic changes (rollback can undo any unprocessed messages)

Fewer moving parts, if you already need a DB

Easily query state of queues and messages with famailar SQL

davidgl · on Dec 17, 2019

Depends on your requirements, and the message throughput, but we have had great success using a database as a message broker. We get atomic commits on jobs, audit history, database backup is all we need for recovery, and a less complex stack. We did use RabbitMQ, but replaced it with using SQL Server, and the sql notifications to make it efficient. Couldn't be happier, but we only do thousands of messages a day, not sure I'd use the same approach with millions.

Dowwie · on Dec 17, 2019

Antirez finally made Disque a Redis Module, too..

SideburnsOfDoom · on Dec 17, 2019

I have heard it said many times that it is a bad idea to implement message queues over a relational SQL database.

e.g.

"Databases suck for Messaging" https://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Nigh...

http://mikehadlow.blogspot.com/2012/04/database-as-queue-ant...

https://www.cloudamqp.com/blog/2015-11-23-why-is-a-database-...

https://softwareengineering.stackexchange.com/questions/3514...

I wonder if the designers of this system disagree, or if they did anything in particular to mitigate these issues?

mumblemumble · on Dec 17, 2019

There is definitely a performance ceiling if you use the RDMBS as your event store. I used to worry about that. Now that I've got a few years working at a very successful high frequency trading firm that did message queues in an RDBMS behind me, I don't worry about it so much anymore.

They didn't use it for everything; there was also a blindingly efficient homegrown messaging system for the stuff that needed to be really fast. But that was for fairly well-defined use cases. (Latency and concurrency were the key limiting factors.) By default, everything went into the RDBMS-based system. It was inherently more robust, it replaced a whole quagmire of design questions you have to make when using other messaging systems with ACID guarantees, and it made it a lot quicker & easier to do post-mortem analysis of any surprising things that happened in production.

sbellware · on Dec 17, 2019

Yes, we disagree :)

The response given by @ethangarofolo does a good job of addressing the main points.

I would say that the slide deck linked is a bit sneaky. Sooner or later, in anything built with a computer, there's going to be polling. Higher-level libraries abstract the polling so that it appears as push (rather than pull) to the application developer. But under the hood, there's polling.

It should also be pointed out that any message queue with persistent messaging is built on a database. Even RabbitMQ's persistent messaging is built on a database (Mnesia).

In the end, a message store or event store can be used to model message queues, but the opposite is rarely true. One of the creators of the Event Store database once said to me something like, "What's a queue but a degenerate form of a stream".

A message store provides for patterns that are above and beyond message queues, like event sourcing, for example. Like Ethan mentioned, it's an application of "dumb pipes". It's in the same vein as Event Store or Kafka, rather than RabbitMQ, ActiveMQ, etc.

The critical difference is durability of messages. If you have an application that doesn't require durability of messages, then a plain old message queue or message broker technology may be a better choice for the situation.

In the end, polling is totally fine and totally manageable. What matters is that it's not done naively; that batching is intelligent and polling is only optionally triggered in the right circumstances and tuned based on batch processing cycle time and message arrival time.

Polling doesn't mean that the database will be "hammered" unless it's implemented that way.

SideburnsOfDoom · on Dec 17, 2019

> Sooner or later, in anything built with a computer, there's going to be polling. ... But under the hood, there's polling.

Is that true? I am reminded of this essay that goes into a deep dive of what happens under the hood with http and the underlying network I/O

https://blog.stephencleary.com/2013/11/there-is-no-thread.ht...

And at the lowest level, the network card interrupts the CPU because it has finished reading or writing data.

> Some time after the write request started, the device finishes writing. It notifies the CPU via an interrupt.

Is that polling? It seems more like a push.

sbellware · on Dec 17, 2019

Anything that processes a signal checks if a signal has been received. It's no so black-and-white at the level of electricity, but higher-level things at the level of durable message queues check for new signals, even if those signals arrive via "push".

mike_hock · on Dec 17, 2019

It is still good design to do the polling only at the lowest level where you must. Higher layers should be reactive.

SideburnsOfDoom · on Dec 17, 2019

Does a network card "poll" ? it's hardware activated by current flowing into it. Does the CPU poll the card, no, the CPU is interrupted by the network card, again by receiving an electrical signal.

If there's polling, it happens in a matter of a few CPU cycles.

mike_hock · on Dec 17, 2019

At some point, this is splitting hairs. Single instructions are atomic wrt. interrupts, so surely there must be some sort of check every cycle whether an interrupt has arrived during that cycle.

The magnitude of the time slice or polling interval is immaterial as to whether it is to be considered "polling."

SideburnsOfDoom · on Dec 17, 2019

> persistent messaging is built on a database. Even RabbitMQ's persistent messaging is built on a database (Mnesia).

Good point that any system that is "persistent" has a data store by definition. The real question is that if it's a good fit to a SQL relational database.

sbellware · on Dec 17, 2019

The right store is the one that has the features needed to implement the targeted patterns, has client libraries for most programming languages, can be tuned and scaled, has a large ecosystem, has numerous managed hosting solutions offered by popular cloud providers, and is approachable by the largest segment of the potential audience.

Postgres checks those boxes, as do others.

Technically, though, Postgres has been an "object-relational" database for some time.

In the end, Message DB uses a table as an append-only log, and leverages Postgres indexes, advisory locks, and JSON documents (and indexes) to implement some of the critical messaging features and patterns. It's not using the "relational" aspects of Postgres. There are no relational tables.

Here's the table schema: https://github.com/message-db/message-db/blob/master/databas...

Given the paucity of Postgres features that the message store leverages, it could have been implemented against the raw Postgres storage engine. Had we done that, though, few people could have understood it and been able to specialize it to their purposes using plain old SQL. And any performance improvements induced by skipping Postgres's tabular data abstractions would have been so negligible for the schema in question that it wouldn't have offered much return on the effort.

bradstewart · on Dec 17, 2019

It depends heavily on what exactly you need out of a message queue. E.g. if extremely (extremely) high throughput is required and you don't mind dropped or lost messages, SQL may be a bad choice. But I've had a lot of success using Postgres-backed messaging.

Also your links are (mostly) from message queue companies who have a vested interest in your not using Postgres, or are from a time before LISTEN/NOTIFY and JSON(B) columns.

ethangarofolo · on Dec 17, 2019

In addition to what the sibling comments to this one said, Message DB solves a different problem than what queues solve. Message DB is a good fit in a microservice-based architecture, and the "micro" in "microservice" comes from concept of Dumb Pipes / Smart Endpoints (https://martinfowler.com/articles/microservices.html#SmartEn...). Message DB is a "dumb pipe," whereas queues and brokers fit into the "smart pipe" side of the divide.

Message DB is a message store, a database optimized for storing message data. The database doesn't track the status of anything---that responsibility falls to consumers of the data.

Message queues are fine pieces of technology when what you need is a message queue. Message DB, on the other hand, is used for event-sourced systems. "Event-sourced" in turn differs from merely "event-based." Since I've started building event-sourced systems, I haven't run across the need for message queues, but ymmv, and of course, like all humans, I too have my own hammer/nail biases.

SideburnsOfDoom · on Dec 17, 2019

> Message DB is a good fit in a microservice-based architecture

I don't know about that, I have seen many microservices that process data that comes in from messages on message queues; frequently that is a better design than receiving the data over synchronous http PUT or POST.

By "message queues" I mean AWS SNS/SQS and Azure event hubs. Would a SQL Db be "a good fit in a microservice-based architecture" replacement there?

or are you suggesting that such a microservice's first action on receiving a message from a queue would be to store it a local, private Message DB? That might make sense, I've seen enough tables that store a JSON blob already.

ethangarofolo · on Dec 17, 2019

100% with you that messages are a better way for microservices to receive their input. I don't see how a service could receive its input over HTTP and still retain its autonomy.

A message store like Message DB can at the same time serve as record of system state and communication channel. Writing a message to the store is the same as publishing it for consumers to pick up. Messages are written to streams, and interested parties subscribe to those streams by polling them for new messages.

The store itself isn't aware of which components are subscribing or managing their read positions.

So to your question, the first action of a component receiving a message is do whatever it does to handle the message and then write new messages (specifically events) to record and signal what happened in response to the original message.

mumblemumble · on Dec 17, 2019

Take a look at that 2nd link in the grandparent post. I remember it as one that was often cited when this debate first went around the Internet half a decade or so ago.

The almost-but-not-quite unstated major premise of the whole argument is that you're putting a key piece of smarts into the queue: Tracking whether a message has been processed.

One could argue that that is the real antipattern. It'll hurt you big time if you're storing messages in a database table. But it'll hurt you even if you don't. For example, by tracking that kind of information inside of the queue, you're losing the ability to add a second listener without either affecting what messages are being seen by (and therefore the behavior of) the existing listener(s), or modifying the queue itself. Which you don't want to have to do any more than you have to on a critical piece of shared infrastructure like that.

It might be fine if it's an oldschool monolith that's just processing some sort of work queue on multiple threads. But, if you're doing microservices, you're probably trying to keep things more flexible than that, and want to allow them to evolve more independently.

sbellware · on Dec 17, 2019

> The almost-but-not-quite unstated major premise of the whole argument is that you're putting a key piece of smarts into the queue: Tracking whether a message has been processed. One could argue that that is the real antipattern.

Indeed.

Lots of good writing on the tradeoffs, for example: https://sookocheff.com/post/messaging/dissecting-sqs-fifo-qu...

At the risk of beating the point to death, the transition from SOA to Microservices is marked by the transition of smart pipes to dumb pipes, and dumb pipes have no knowledge of whether a message has been processed. In the microservices era, the basic presumption is that the transport has no knowledge of the state of the message processors' progress through a queue.

SideburnsOfDoom · on Dec 17, 2019

> the transition from SOA to Microservices is marked by the transition of smart pipes to dumb pipes, and dumb pipes have no knowledge of whether a message has been processed.

That doesn't to seem to be characteristic of Microservices at all. It's more about small bounded context and independent deployment of the service. All of which is completely possible over the "smart pipes" that message queues that I mentioned will give you.

Insisting that this wheel must be re-invented or worked around to make it a microservice seems very odd.

hmeh · on Dec 17, 2019

> That doesn't to seem to be characteristic of Microservices at all.

Many have their own "unique" definition, which is why "microservices" is bordering on no longer meaning anything. Some people say it and mean SOA—the characteristics you mention are pretty much characteristics of SOA when done right, though independent deployment isn't a necessity.

Martin Fowler attempted to codify a definition, and this is inline with the definition that sbellware is referring to. He (and others) refer specifically to smart endpoints and dumb pipes:

https://martinfowler.com/articles/microservices.html#SmartEn...

sbellware · on Dec 18, 2019

Indeed. And having been around in both the SOA period and the Microservices period, and having witnessed the transition, I'm comfortable with maintaining the assertion that "Microservices" as a term was specifically introduced to demarcate the period characterized by big-vendor smart message transports and the period characterized by dumb transports we settled on as a reflection of all that we'd learned by trying to rely on messaging "magic".

It's a very similar transition to the shift from EJB to Hibernate (or EJB to Rails) or from SOAP to REST.

In the mean time, a lot of folks who don't have that background and perspective picked up on Microservices and made a lot of presumptions based on a lot of experience with web apps and web APIs. It's this that made "Microservices" a largely meaningless, muddied term. This is ironic because "Microservices" was introduced as a means to disambiguate the many competing meanings of "Service-Oriented Architecture" that had come into existence as many and vendors wanted to be seen as being involved with it without actually having been involved with it.

So, there's two meanings of "Microservices": One that comes from a background of service architectures and one that comes from a background of web development. My background is in both of them, but I hew to the meaning of "Microservices" which is closer to "SOA without big vendor smart pipes of the mid-to-late 2000s" than "The same old HTTP APIs we see the world through as web developers".

That's largely a lost battle now, just as SOA was lost to the competing interests of message tech vendors.

The term "Autonomous Services" is a much more precise and unambiguous in its intent, and conveys more specifically what's intended by "SOA done right", a.k.a.: "Microservices". And even more specifically, "Evented Autonomous Services" does an even better job of conveying the intent and the implications or the architecture than what "Microservices" can do now.

Or as Adrian Cockroft originally put it: "Loosely-coupled service oriented architecture with bounded contexts".

Knowing what the implications of all those words are, it's quite impossible to look at what is commonly asserted about the meaning of "Microservices" in 2019 and see much left of the great value of what was originally intended. Much of it still remains unlearned.

SideburnsOfDoom · on Dec 17, 2019

> you're putting a key piece of smarts into the queue: Tracking whether a message has been processed.

yes, AWS SQS does that. e.g. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQS...

Are you saying that this is bad? It doesn't seem so.

> you're losing the ability to add a second listener without either affecting what messages are being seen by (and therefore the behavior of) the existing listener(s).But, if you're doing microservices, you're probably trying to keep things more flexible than that, and want to allow them to evolve more independently.

I'm not following. Both AWS SNS/SQS and Azure Event hubs were able to handle that scenario just fine, by design.

In AWS you attach multiple queues to the same SNS endpoint and have a pool of subscribers on each queue, in Azure you declare a "consumer group" and a pool of subscribers in it. Or declare a new consumer group if need be.

We regularly scaled up AWS from 3 listeners to 30 depending on load, and back down again, or replaced instances one by one, and still it guaranteed at-least once delivery (in practice, exactly once in the vast majority of cases) across the subscribers. We _never_ "lost the ability to add another listener" either scaling up in the same pool, or creating a new pool.

sbellware · on Dec 18, 2019

> Are you saying that this is bad? It doesn't seem so.

It's not "bad" per se, but it's not what it seems on the surface.

ACKing a message doesn't mean that the message will not be received more than one time. As long as the smarts for recognizing recycled messages are embedded in the application logic, all will be fine.

Here's a good explanation written about SQS, but applies to all message technologies that work based on ACKs: https://sookocheff.com/post/messaging/dissecting-sqs-fifo-qu...

So, while it's possible to add more consumers, depending on the implementation of the technology's internals for tracking consumer state, you may loose messages or, more typically, receive messages more than once or receive them out of order.

This often happens without the developers' and operators' knowledge. Without foreknowledge of the causes and effects of these things, developers and operators may know there there's some kind of strange intermittent glitch, but don't presume that it's because message processors are processing messages that had already previously been processed (or are not processing messages that had been skipped).

What you can't do is add a new consumer that is interested in processing historical messages beyond the retention window of typical cloud-hosted message transports.

But that's ok because message queues are typically just transports, not message stores, and they serve the needs of moving messages from one place to another. A message store can do that as well, but has different semantics and serves other needs.

So, as long as ACK-based message processors work within the retention window, everything's good. It's when that's not the case that the problems arise. Lost messages and recycled messages are pretty much the only guarantee over the lifetime of a messaging app. As long as that's within tolerances, then it's fine.

Specifically, like all "dumb pipe" technology that came about in the post-SOA age, a message store doesn't use a protocol based on ACKs. Instead, a consumer tracks its own state and does not defer this responsibility to the transport.

And inevitably, if any consumer of any technology wants to guarantee that it never processes a recycled message, this tracking logic has to also be implemented in the application logic of even SQS, RabbitMQ, etc consumers.

There's no such thing as a messaging technology that can guarantee only-once delivery, as you alluded to. There's a good examination of this aspect of messaging and distributed systems here: https://bravenewgeek.com/you-cannot-have-exactly-once-delive...

The best guarantee from message transports that we can hope for amounts to a "maybe". And that's ok. The conditions and countermeasures are well-known. As long as we're not blind-sided and haven't taken "only once" literally, things will be fine.

If we don't realize that application logic is always responsible for making the ultimate decision as to whether to reject a recycled message, we're going to have problems that can be difficult-to-impossible to detect and correct.

Message stores and event stores aren't alternatives to message queues. They're technologies that support architectures that are themselves alternatives to each other.

SideburnsOfDoom · on Dec 18, 2019

> Here's a good explanation written about SQS, but applies to all message technologies that work based on ACKs

Deals with FIFO and ordering. You also mention "historical messages".

Sure, if those are things that you need, then you almost certainly want a different tech to SQS, likely Kafka. I won't disagree with that, except as in making it the focus of the criticism of SQS and similar message queues. It isn't that thing.

kylecordes · on Dec 17, 2019

It is so interesting to see these problem solved again and again.

Very important bit to understand, that many smart people (including perhaps some in this HN discussion) don't know yet: the semantics of an event store are meaningfully different from the semantics of a pub/sub system. If you pick up a pub/sub system and start trying to build an event store based system architecture with a set of CQRS-ES-style followers, you will find you are reinventing a bunch of things quite painfully.

Here's what we did, a few years ago. In some ways it is similar to EventTide.

https://blog.oasisdigital.com/category/cqrs/

https://github.com/OasisDigital/nges

We picked up JGroups, a truly spectacular piece of open source technology, and used it to automatically wake up event stream listeners in a distributed way. (At the time there was some limitation of the NOTIFY mechanism in PG that sent us this direction.). With very little code around automated wake-up, we achieved low latency (and therefore fast automated testing) of an arbitrary group of event stream followers.

AtlasBarfed · on Dec 19, 2019

JGroups seems superficially related to lots of the cassandra functions, especially gossip.

Is it masterless, or does it handwave a lot of distribution problems like most java distributed projects by just using Zookeeper?

adverbly · on Dec 17, 2019

Looks cool. I can't find any performance or benchmarks though. Anyone care to guess/ballpark what kind of write speeds single nodes might be able to handle?

sbellware · on Dec 17, 2019

There are too many "it depends" issues, especially the hardware, and not least the application architecture sitting on top of it.

There's a script in the Message DB codebase that will do some fairly rudimentary read and write performance measures for your Postgres deployment:

https://github.com/message-db/message-db/blob/master/databas...

It's run right on the database, so no network I/O or client implementation in the loop - which is inevitably non-representative of "real world" scenarios.

GordonS · on Dec 17, 2019

I got stuck looking for some perf figures too. I realise they are going to vary widely by hardware, batch size, message size etc, but it would be really useful to have a comparison against, for example, a single RabbitMQ instance.

sbellware · on Dec 18, 2019

A message store is really not a comparable pattern to a message queue. It's unlikely that the usages would be similar enough to engender a meaningful comparison. They're not as dissimilar as an apples-to-oranges comparison, but maybe an apples-to-applesauce comparison.

stevefan1999 · on Dec 17, 2019

It looks very awesome! However, how does it compares with a tiered message queue like Redis as frontend and Postgres/MySQL as backend?

pavelevst · on Dec 17, 2019

If add retentions with automatic backup to bigquery or cloud storage we'll get infinity messaging with very reasonable resource usage

fiatjaf · on Dec 17, 2019

"for"? I feel like this should be "in PostgreSQL". I thought this was something that was used internally by Postgres.

SahAssar · on Dec 17, 2019

"for" is pretty commonly used in this way. See for example https://koajs.com/ and https://rocket.rs/ which both say "web framework for X"

meantheory · on Dec 17, 2019

this looks very interesting

techdragon · on Dec 17, 2019

Looks interesting but needs a few more language libraries to demonstrate broader utility outside of just Ruby projects.

Building on top of generic Postgres using a simple service is pretty neat but without the extra client libraries it’s hard to see a bright future.

The npm module doesn’t look like it actually does anything besides unzip and run the dB setup code at the moment so it’s definitely early days for them. But with a few basic client libraries that can actually do the work of publishing and subscribing to messages, this will be a pretty neat tool.

I’m going to have to watch how this develops.

dmach · on Dec 17, 2019

Publishing and subscribing are through SQL SELECT statements which can be called straight forward enough from most or all languages. Administering it does indeed require shell or ruby. Better still in my opinion if possible, implement your business logic in PL/pgSQL procs with these SELECTs within them and call them from whatever client/frontend you are using. I know it's not cool right now but there's big advantages and a disadvantage to putting business logic in the DB is said to be vendor lock but when it comes to Postgres, lock me up and throw away the key.