I once used a MySQL database as a replacement for a message queue. This was the easiest solution to implement since all the servers were already connected to the database anyways. A server would write a new row to the table and all the servers would remember the last row they had already seen. Occasionally the table is cleared. I'm sure there are some race conditions in the system but its only purpose is to send Discord notifications when someone breaks a highscore in a video game, so its not really critical. It's still working that way today.
The code is in there for Postgres, MS SQL and MySQL (which all support SKIP LOCKED) though at some point I abandoned all but Postgres.
If I was to write another message queue then I wouldn’t use a database, I’d use the file system based around Linux file moves, which are atomic. What I really want is a message queue that is fast and requires zero config, file based message queues are both…. better than a database.
I really feel like file systems aren't used for enough things. File systems come with so much useful metadata.
I've experimented with using the file system for storing configuration data where each config value is a single file. Nested structures just use directories. The name of the file is the field name. The file extension is a hint about the data type contained within (.txt is obvious, but I also liked .bool). Parsing data is trivial. I don't need any special viewing or editing tools, just a file manager and text editor. You can see when a specific config value was changed by checking the file update time. You don't have to load the whole configuration just to access one field. And you could conceivably TAR up the whole thing if you wanted to transmit it somewhere.
I use it to configure little sub-projects in my personal website. I really like it, but I shudder to think of the complaining I'd hear from other developers if I ever used it in a work project, just because it's not whatever they've ever seen before and would require a moment of two of thinking on their behalf to get over ingrained habits.
A company I used to work for extensively used this method. It's incredibly useful to be able to read a config or state value from any language and even bash scripts quickly.
However, and this is a big drawback, once you have too many config files and you start reading and writing from different processes, you get into bottleneck situations quickly.
I haven't used this system extensively yet. But I don't really see how that situation gets improved by having a single-file configuration system.
First of all, if you have multiple processes trying to read/write the same config, that's kind of suspect, and if file I/O is a bottleneck for your config system, that's a different suspicious situation. Why are your processes writing to the config so... often?
But regardless, I can't see how those problems get immediately better by storing that config in a single file. If anything, having it split across multiple files would improve the situation, as different processes that might only be concerned with different sections of the config won't need to wait on file locks from process unrelated to their concerns.
I realize it may have sounded like I was suggesting the approach. Quite the contrary: I would never do that again or suggest it. I was merely pointing out that it was quite useful at times :-) We had lots of problems similar to what you were describing. Luckily that's all in the past now.
The paradigm is also used by /proc and /sys, so I guess other developers won't get confused. However I never tried to tar -x into /proc to start the same set of processes on another node, or as an alternative to /etc/sysctl.conf :)
This was tried and called Elektra I think around Y2K. Don’t believe the idea was even new then, but there was also research into tiny file performance at the time, resulting in things like reiserfs. I think it packed tiny files into the directory itself resulting in blistering speed.
Anyway it’s an elegant idea. Silly to have dozens of config file formats when the fs already has everything it needs. We have xattr too.
The flaw on the OS level is that it is hard to get everyone to change. For new apps not a problem, and any performance concerns are no longer an issue for config.
Oh man, reiserfs. Seeing that name reminds me that the original developer, Hans Reiser, is currently spending time in prison for murdering his wife. She was the interpreter during his first meeting with a Russian "mail-order bride".
It's a single byte file. You read the entire file contents and if it's not zero, it's true. The existence of the file tells you whether or not to use a default value. An INI file would have to be fully parsed before we know whether it contains a value for that config value.
You read the entire file contents, trim leading and trailing whitespace and toLower for good measure if you want, then validate against your list of installed themes. Done. No goofy JSON or YAML parser in site.
And what reinvention is there? If you're just using a system that already exists, you're not reinventing anything.
I actually did this once (over SMB on Windows, though) - and unintentionally crippled our corporate SAN with all of its polling and locking activity. I had a cluster of 20 workers which would poll every five seconds for messages, and I believe we had an EMC VNX storage appliance. I never did figure out why that was enough to bring the whole thing to its knees, but IT was very quick to track the problem back to me.
Interesting. What makes you want to switch to the file system? I wrote one for a project[0] a while back (for MongoDB) and it didn't seem like the database introduced too much complexity. I didn't write the implementation from scratch, but the couple hundred lines of code were easy to reason about.
I found almost all message queues to be horribly complex to configure, debug and run. Even database queues require a lot of config, relative to using the file system.
I did actually write a file system based message queue in Rust and it instantly maxed out the disk at about 30,000 messages a second. It did about 7 million messages a second when run as a purely RAM message queue but that didn’t use file system at all.
It depends what you’re doing of course… running a bank on a file system queue wouldn’t make sense.
A fast message queue should be a tiny executable that you run and you’re in business in seconds, no faffing around with even a minute of config.
> I did actually write a file system based message queue in Rust and it instantly maxed out the disk at about 30,000 messages a second. It did about 7 million messages a second when run as a purely RAM message queue but that didn’t use file system at all.
Did you try an in-memory filesystem through tmpfs?
Database config should be two connection strings, 1 for the admin user that creates the tables and anther for the queue user. Everything else should be stored in the database itself. Each queue should be in its own set of tables. Large blobs may or may not be referenced to an external file.
Shouldn't a message send be worst case a CAS. It really seems like all the work around garbage collection would have some use for in-memory high speed queues.
Are you familiar with the LMAX Disruptor? Is is a Java based cross thread messaging library used for day trading applications.
Since you seem to be from citusdata: I used cstore_fdw 2 - 3 years back and at least when paired with TPC-H it was horrendously broken for both small (10 gig) and large (100 gig) datasets. It has been integrated into some other product by the time being, I hope you managed to improve it.
This is actually pretty common, and usually a "good enough" solution. You can also add things like scheduling (add a run_at column), at least once execution (mark a row when it is being processed, delete it only when successful), topics, etc with minor modifications to your table.
If you want something that works "well enough" I'd say it's a reasonable choice.
Yeah, I'm using it as a transactional outbox to ensure at least once delivery to SNS.
Can't really think of a better way to ensure that a message is always sent if the DB transactions succeeds and is never sent if the DB transaction fails
You can get so far by ensuring at least once and making everything idempotent (will get you as close to "exactly once" as you can). With a database, the most common pattern is: insert the row for the job, when a worker starts working on it, mark it as in progress so it doesn't get started again, if the task fails, or after some reasonable time-out period, another worker can pick up the task again, ultimately the row for the task is only ever deleted when a worker successfully completes it.
> We decided to store Centrifuge data inside Amazon’s RDS instances running on MySQL. RDS gives us managed datastores, and MySQL provides us with the ability to re-order our jobs.
If you don't want to publish events from uncommitted transactions you'll have to first store them in a local table and then move them to the queue after the commit. But if all consumers have direct access to the database anyway...
I am doing the same with SQL Server. The messages table is more of a bus than a queue in our case (columns like ReplyToId, etc). Using it for RPC communication between cloud bits. Much cheaper than Azure Service Bus and friends.
I don’t know the OPs answer, but I’d hazard to guess because ssmb is a completely neglected feature with very little in the way of community or UI. In theory it would be great, but MS basically never invested in it after its release and now it’s just a random, “who knows when we’ll drop support for this” sql server feature.
I briefly worked for a major corporation 15 years ago that did this with SQL Server to create distributed worker processes to handle all the AI-generated used car listings and photo recolorings [0] for almost all of the used car lots in the country.
[0] Why take hundreds of photos of Honda Civics in red, green, blue, and black when you already have a dozen in white?
Why even take the dozen in white when they have a model you can render in any manner? Most car commercials do not have real cars in them. Maybe the shots of a car actually in motion, but most of the static shots are 3D models placed onto backgrounds. I don't know why, but I was surprised by this when I worked in a post house that did a lot of car commercials. One of the roles for a coworker was to get flown around to locations to take the images for the background plates using photogrammetry. "Can't fly an Alexa through the back glass to zoom in on the dash now can we" was one comment.
I've built a hybrid task queue/process supervisor on top of SQL. Classical task queues like Celery didn't exactly fit our use case: a single process could run for hours or days, but in case of a node failing, it must be resurrected elsewhere as soon as possible (within seconds). I didn't have the time to re-architect everything for Kubernetes, or rewrite half the product in Erlang; so I built that weird thing. It's been super stable, running mission critical code, and making us money - for several years now.
I implemented a message queue in MySQL too and it worked pretty well. Incoming messages would be written to the table and the workers would poll the database each cron period and process whatever rows were in the queue. To avoid race conditions, the workers would lock the records they were working on and then delete them as soon as the work was complete. It was simple but it worked just fine for my purposes
This has been a thing since before databases were relational. 4G languages (Progress, etc.) were especially nice for their ability to wrap a queue table around a series of reversible transactions, if you coded things right .. meaning a lot of modules written for app infrastructure were based on an 'inbox table' methodology ..
I’ve run into all sorts of database locking issues and concurrency issues when using a database as a queue. I saw that mistake made a long time ago and I would never do it myself.
Database engines are getting features like SELECT FOR UPDATE SKIP LOCKED, so what were once serious blockers on this idea may no longer be as much of a problem.
It’s not necessary, but it is a lot less fiddly: you automatically look at only the tasks that someone else isn’t currently working on, and because the lock is held by the database connection you get automatic retries if your worker crashes and drops the connection. You could figure out all of the interactions needed to make this work yourself, but if the database already has support built in you may as well use it (and there’s a straightforward path to migrate if you need more sophistication later).
No? Unless there's some edge case with that statement I don't know about. That statement is basically tailor made for queues so you can select jobs that aren't currently being worked on by other workers.
Inasmuch as you trust your db's locking correctness it eliminates the concurrency issues. You can very naively have n workers pulling jobs from a queue not stepping on each-other.
It’s not so out of the ordinary. A few libraries in Rails create message queues in Postgres using advisory locks and listen/notify.
Hell, if it’s not an RDBMS then it’ll be Redis (at a much greater expense for a managed instance). I’ve seen that setup in the Ruby world far more often than using a dedicated message queue.