More

levkk · 2025-01-04T05:28:30 1735968510

It does not require that you do that. It only requires you to open source any modifications you've made to ffmpeg.

newusertoday · 2025-01-04T06:19:38 1735971578

it is slightly more nuanced, if you do static linking it is considered derivative work and does requires you to open it. Since the app is based on ffmpeg which is GPL(core is LGPL) i am curious to know how it is getting used.

kiritanpo · 2025-01-04T13:47:32 1735998452

Static linking of LGPL content (thus making it derivative work) only requires that it must allow "modification of the work for the customer's own use and reverse engineering for debugging such modifications".

Making your own code public is not the only way to achieve this.

You can also make available to customer object files and build instructions to recreate your software with the (modified) statically linked LGPL content. (if it's LGPL > 2.1 you have extra requirements: you need to provide all toolchains/dependencies and it must be actually possible to install a modified version on the hardware)

Granted, this is not commonly used but I've used this on some projects where dynamic linking was not available/desired by client.

anshrathodfr · 2025-01-04T06:36:49 1735972609

i didn't bundle the ffmpeg in the app. It uses homebrew to install ffmpeg on the user's device. this is the acceptable way since ffmpeg confirms it on twitter/X.

https://x.com/FFmpeg/status/1766649563891339510

p4bl0 · 2025-01-04T13:05:22 1735995922

This tweet also explicitly says "But still would be nice to have some credit.".

Why isn't FFmpeg mentioned on your app's website?

The same goes for ImageMagick by the way.

anshrathodfr · 2025-01-05T08:20:21 1736065221

there is a dedicated page for credits: https://pimosa.app/credits

levkk · 2024-12-10T18:24:50 1733855090

Hesitating to put this because it's not related to the topic at hand, but I find monospace font for blogs hard on my eyes. I think I'm getting old, but Arial (and other sans-serifs) are easier to read.

levkk · 2024-12-05T21:10:25 1733433025

You don't have to finish them, but you should share them anyway. Software is never finished, so hitting the state where you're happy with what you've built isn't likely to happen.

The same philosophy applies to other ventures and ideas. Just ship it.

latexr · 2024-12-05T23:39:44 1733441984

> Software is never finished

This is a common mantra, but I completely disagree with it. I have tons of finished software projects. And yes, I do mean things that need zero maintenance and continue to work after more than a decade without me having to touch the code at all. I don’t necessarily use all of them anymore, but they still work.

And they’re not stuff that’s only useful to me, either. I particularly remember a script I built to unofficially get data from a project. After the project made their own version, I deprecated mine. After years of not using the script, I deleted it. Not even a day later, I had multiple people asking me to bring it back because my version did some small thing in a way they preferred.

Software can be finished, it’s not that hard.

shiroiushi · 2024-12-06T05:44:18 1733463858

I completely disagree. There's always something that can be improved in software. But usually, it gets to a state of "good enough" so people move on to something more important and just leave it. Sometimes, they have to come back to it, perhaps after a long time even, for a bug fix, or to add a new feature, but it may go for very long periods with no updates at all.

Of course, this is all a matter of perspective and semantics.

I suppose if something becomes obsolete enough that no one's interested in developing it further, it could be considered "finished". But even here, it seems like people still have some interest in very old software sometimes, resurrecting it to run on emulators perhaps, and sometimes even making changes (if they can get the source code) to improve its operation on newer systems.

latexr · 2024-12-06T10:38:40 1733481520

> resurrecting it to run on emulators perhaps

That has nothing to do with the software’s finished state. An NES game which runs bug free and without performance issues on the NES doesn’t stop being complete because someone else decides to run it in a different environment decades later.

> There's always something that can be improved in software.

No, there’s not always something which can be improved. There’s always something that can be fiddled with, which is not at all the same thing.

By your logic, no painting is ever finished, nor is any novel, nor any meal, nor any internet comment, nor literary anything. Things are finished when the author decides they completely fulfil their need, it’s not your opinion as an outsider that matters.

levkk · 2024-12-04T18:10:31 1733335831

No Man's Sky is kind of what you're looking for, except you may notice its quests (and worlds) become redundant quickly...I say quickly, but that became the case for me after like 30 hours of game play.

jsheard · 2024-12-04T18:16:09 1733336169

That's the kicker, LLM driven stories are likely to fall into the same trap that "infinite" procedurally generated games usually do - technically having infinite content to explore doesn't necessarily mean that content is infinitely engaging. You will get bored when you start to notice the same patterns coming up over and over again.

Procgen games mainly work when the procedural parts are just a foundation for hand-crafted content to sit on, whether that's crafted by the players (as in Minecraft) or the developers (as in No Mans Sky after they updated it a hundred times, or Rougelikes in general).

est31 · 2024-12-04T19:17:38 1733339858

Yeah, generative AI can create cool looking pictures and video but so far it hasn't managed to create infinitely engaging stories. The models aren't there yet.

jsheard · 2024-12-04T19:21:37 1733340097

I'd argue that the same principle applies to pictures, there are many genres of AI image that are cool the first time you see them, but after you've seen the exactly the same idea rehashed dozens of times with no substantial variety it starts wearing really thin. AI imagery is often recognizable as AI not just because of charactistic flaws like garbled text but because it's so hyper-clichéd.

lenocinor · 2024-12-04T21:20:58 1733347258

I wonder if there's some threshold to be crossed where it can be surprising for longer. I made a video game name generator long ago that just picks a word (or short phrase) from each of three columns. (The majority of the words / phrases are from me, though many other people have contributed.)

I haven't added any words or phrases to it in years, but I still use it regularly and somehow it still surprises me. Maybe the Spelunky-type approach can be surprising for longer; that is, make a bunch of hand-curated bits and pick from them randomly: https://tinysubversions.com/spelunkyGen/

levkk · 2024-12-04T14:59:00 1733324340

Reddit works just fine. Where there is a will, there is a way. I do believe (without proof) that incentives for Google are misaligned here.

sbarre · 2024-12-04T15:12:09 1733325129

So while Reddit is huge, I don't think it's even close to the same scale as Google. And problems at Google's scale are hard to compare to smaller scenarios.

But I do agree with your 2nd point about misaligned incentives though. I don't think "how do we ensure that every user can get fair support" was ever on any product roadmap for these free global-scale products..

Or more accurately, the "users" in this case are the advertisers, not the uploaders.

qingcharles · 2024-12-13T18:08:32 1734113312

This isn't correct. I've just been discussing the shadowbanning/appeal system with Huffman because it doesn't work right now.

some_random · 2024-12-04T15:22:41 1733325761

Reddit is definitely a different beast, it's easier to admin and has unpaid community moderators who pick up pretty much all the grunt work

levkk · 2024-11-30T17:43:31 1732988611

My main concern with application-defined schemas is that this schema is validated by the wrong system. The database is the authority on what the schema is; all other layers in your application make assumptions based on effectively hearsay.

The closest we came so far to bridging this gap in strictly typed language like Rust is SQLx, which creates a struct based on the database types returned by a query. This is validated at compile time against a database, which is good, but of course there is no guarantee that the production database will have the same types. Easiest mistake to make is to design a query against your local Postgres v15 and hit a runtime error in production running Postgres v12, e.g. a function like gen_ramdom_uuid() doesn't exist. Another is to assume a migration in production was actually executed.

In duck-typed languages like Ruby, the application objects are directly created from the database at runtime. They are as accurate as possible, since the schema is directly read at application startup. Then of course you see developers do something like:

    if respond_to?(:column_x)
        # do something with column_x
    end

To summarize, I think application-defined schemas provide a false sense of security and add another layer of work for the engineer.

IshKebab · 2024-11-30T18:05:11 1732989911

This doesn't seem fundamentally different from any schema/API mismatch issue. For example using the wrong header for a C library, or the wrong Protobuf schema.

I guess it would be good if it verified it at runtime somehow though. E.g. when you first connect to the database it checks Postgresql is the minimum required version, and the tables match what was used at compile time.

dietr1ch · 2024-11-30T21:32:30 1733002350

It could be verified at runtime, but I haven't seen anyone trying to version/hash schemas and include that in the request.

The workaround in practice seems to be to keep the DB behind a server that always™ uses a compatible schema and exposes an API that's either properly versioned or at least safe for slightly older clients. To be fair it's hard to get rid of the middleman and serve straight from the DB, it's always deemed too scary for many reasons, so it's not that bad.

Someone · 2024-12-01T13:46:59 1733060819

It shouldn’t be hard for a database to keep a hash around for each database, update it whenever a DDL (https://en.wikipedia.org/wiki/Data_definition_language) command is run, and, optionally, verify that a query is run against the exact database structure.

Could be as simple as secure hashing the old hash with the text of the DDL command appended.

That would mean two databases can be identical, structure-wise, but have different hashes (for example if tables are created in a different order), but would that matter in practice?

Alternatively, they can keep a hash for every table, index, constraint, etc. and XOR them to get the database hash.

jeltz · 2024-12-01T15:09:28 1733065768

Sounds very hard to me. How do you handle online schema changes with this? Schema changes are trivial to do if you can be down for a long time.

Someone · 2024-12-02T08:35:18 1733128518

You can only do those if schema changes are transactional. The transaction could update the checksum and return it for the caller to use in subsequent calls.

And yes, that would shut off other clients that want to be sure to talk to the correct schema, but if you opt in to such a scheme, that’s what you want.

Hytak · 2024-11-30T18:06:08 1732989968

rust-query manages migrations and reads the schema from the database to check that it matches what was defined in the application. If at any point the database schema doesn't match the expected schema, then rust-query will panic with an error message explaining the difference (currently this error is not very pretty).

Furthermore, at the start of every transaction, rust-query will check that the `schema_version` (sqlite pragma) did not change. (source: I am the author)

Merad · 2024-12-01T00:18:12 1733012292

Unless you have some tricks up your sleeve that I'm not thinking of, an immediate consequence of this is that zero downtime deployments and blue/green deployments become impossible. Those both rely on your app being able to run in a state where the schema is not an exact match for what the app expects - but it's compatible so the app can still function.

ComputerGuru · 2024-12-01T00:30:26 1733013026

Semantic versioning?

Merad · 2024-12-01T01:30:23 1733016623

If I understand the GP correctly, there's no notion of semver involved. Any difference in the schema results in a runtime error.

ComputerGuru · 2024-12-01T03:40:28 1733024428

Yes, that’s what I gathered from the code. But I was proposing it as an “easy” solution that doesn’t involve throwing away OP’s main idea.

Filligree · 2024-12-01T00:58:05 1733014685

And that's okay. Most applications don't need zero-downtime deployments, and there are already plenty of APIs that support that use case. I'd rather have more like this one.

kelnos · 2024-11-30T22:24:33 1733005473

In addition to the deployment-time issues and other stuff I and others have commented downthread, I thought of another problem with this.

I can't see how this would even work for trivial, quick, on-line schema changes. Let's say I have 10 servers running the same service that talks to the database (that is, the service fronting the database is scaled out horizontally). How would I do a migration? Obviously I can't deploy new code to all 10 servers simultaneously that will do the schema migration; only one server can run the migration. So one server runs the migration, and... what, the other 9 servers immediately panic because their idea of the schema is out of date?

Or I deploy code to all 10 servers but somehow designate that only one of them will actually do the schema migration. Well, now the other 9 servers are expecting the new schema, and will panic before that 1 server can finish doing the migration.

It seems to me that rust-query is only suitable for applications where you have to schedule downtime in order to do schema changes. That's just unacceptable for any business I've worked at.

rendaw · 2024-12-01T06:38:20 1733035100

I think first and foremost, if you're going to use a tool like this, you need to do everything through the tool.

That said, for zero downtime migrations there are a number of techniques, but it typically boils down to splitting the migration into two steps where each step is rolled out to each server before starting the next: https://teamplify.com/blog/zero-downtime-DB-migrations/ https://johnnymetz.com/posts/multistep-database-changes/ etc

I'm not sure if there's anything that automates this, but it'd probably need to involve the infrastructure layer (like terraform) too.

Edit: There's one other approach I've heard of for zero downtime deployments:

Start running the new version in new instances/services parallel to the old version, but pause it before doing any database stuff. Drain client connections to the old version and queue them. Once drained, stop the old version, perform database migrations, and start the new version, then start consuming the queue.

This is (I think) more general but you could get client timeouts or need to kill long requests to the old version, and requires coordination between infrastructure (load balancer?) and software versions.

dayjah · 2024-12-01T04:14:56 1733026496

This isn’t unique to rust-query; this problem also exists with ActiveRecord, for example. At Twitch we just had to really think about our migrations and write code to handle differences.

Basically no free lunch!

zozbot234 · 2024-12-01T11:53:03 1733053983

You can always use SQL views to expose a version of your schema to apps that's different from what's in the underlying DB tables.

mjr00 · 2024-11-30T19:27:14 1732994834

> rust-query manages migrations and reads the schema from the database to check that it matches what was defined in the application. If at any point the database schema doesn't match the expected schema, then rust-query will panic with an error message explaining the difference (currently this error is not very pretty).

IMO - this sounds like "tell me you've never operated a real production system before without telling me you've never operated a real production system before."

Shit happens in real life. Even if you have a great deployment pipeline, at some point, you'll need to add a missing index in production fast because a wave of users came in and revealed a shit query. Or your on-call DBA will need to modify a table over the weekend from i32 -> i64 because you ran out of primary key values, and you can't spend the time updating all your code. (in Rust this is dicier, of course, but with something like Python shouldn't cause issues in general.) Or you'll just need to run some operation out of band -- that is, not relying on a migration -- because it what makes sense. Great example is using something like pt-osc[0] to create a temporary table copy and add temporary triggers to an existing table in order to do a zero-downtime copy.

Or maybe you just need to drop and recreate an index because it got corrupted. Shit happens!

Anyway, I really wouldn't recommend a design that relies on your database always agreeing with your codebase 100% of the time. What you should strive for is your codebase being compatible with the database 100% of the time -- that means new columns get added with a default value (or NULL) so inserts work, you don't drop or rename columns or tables without a strict deprecation process (i.e. a rename is really add in db -> add writes to code -> backfill values in db -> remove from code -> remove from db), etc...

But fundamentally panicking because a table has an extra column is crazy. How else would you add a column to a running production system?

[0] https://docs.percona.com/percona-toolkit/pt-online-schema-ch...

kelnos · 2024-11-30T20:58:23 1733000303

It's a bummer that you've been downvoted, because it really does seem like people here have not operated databases at scale.

I will never claim that we were great at managing databases at Twilio, but often a schema change would take hours, days, or even a week or two to complete. We're taking about tables with hundreds of millions of rows, or more.

We'd start the change on a DB replica. When it would finish, we would have to wait for the replica to catch up with the primary. Then we would bring up new replicas, replicating from the replica with the new schema. Finally that replica would get promoted to primary, with all the old replicas (and the old primary, of course) removed from service, and the new replicas brought in.

Only then could we deploy code that was aware of and used the updated schema. The previous code of course had to ignore unknown columns, and if we ever wanted to drop a column, we had to first deploy code that would stop using that column. Any column type changes would need to be backwards-compatible. If that wasn't possible, we'd have to add a new column and backfill it. Adding indexes would usually be fine without preparatory code changes, but if we wanted to drop an index we'd first have to make sure there were no queries still depending on it.

Even for a "small" schema change that "only" took minutes or a few tens of seconds to complete, we'd still have to use this process. What, do you think we'd shut part or all of a real-time communications platform down while we do a schema change? Of course not.

The idea that the application could or should be in control of this process, or could always be in sync with the database when it came to its understanding of the schema, is impossibly unrealistic.

mjr00 · 2024-11-30T21:18:19 1733001499

Yep, sounds like we have similar experiences! I first had to start thinking about this stuff at Hootsuite, back in the exciting 1million+ DAU days a decade ago. Before then, to me databases were just a thing that got deployed along with the application, and deploys only happened on a Friday night so who cares about downtime? By the time anyone tries logging into the app on Monday morning, the code and database will all be up to date. Going to a place where deploys were happening constantly and nonzero downtime was unacceptable was eye-opening.

> The idea that the application could or should be in control of this process, or could always be in sync with the database when it came to its understanding of the schema, is impossibly unrealistic.

These days my attitude is to treat databases as a completely separate service from the application code, which they effectively are. They're on a different set of servers, and the interface they provide is the columns/tables/views/etc, accessed through SQL. So yeah, no breaking changes, and the only thing application code should care about is if the queries it tries to execute return the expected sets of data, not if the schema itself matches. And certainly not about things like views, triggers or indexes.

This does end up being more overhead than migrations alongside the application code, which I know a lot of developers prefer because they're easier to use, but the approach just doesn't work after a certain scale.

(to be clear, I still use Liquibase etc to manage migrations, the process for applying those changes is just completely separate from deploying application code.)

kelnos · 2024-11-30T22:17:43 1733005063

> These days my attitude is to treat databases as a completely separate service from the application code, which they effectively are. They're on a different set of servers, and the interface they provide is the columns/tables/views/etc, accessed through SQL.

I've never thought of it this way, but I think this is really smart. If I have a service that exposes a REST API, I can, say, add a new field to a JSON object that's returned from an API endpoint without telling clients about it. Those clients can update later in order to take advantage of the information returned in the new field.

Same thing with a database: I can add a new column, and clients can learn about the new column later in the future, no problem. The database schema is just a part of the database's API, and it can be evolved in a backwards-compatible manner just like any other API.

> to be clear, I still use Liquibase etc to manage migrations, the process for applying those changes is just completely separate from deploying application code.

Right, the schema needs to be managed and there needs to be a source of truth for it, with tooling to do migrations, but coupling that so closely with the application so the schema and application always must be in sync (like some others seem to think is the One True Way) is a mistake, and would be a complete non-starter for my past professional needs.

jessekv · 2024-12-01T10:45:24 1733049924

IMO its a similar situation to the discussion here:

https://capnproto.org/faq.html#how-do-i-make-a-field-require...

dathinab · 2024-12-01T00:56:35 1733014595

I had to realize that at least in start up world most (non db focused) devs thing they might not be experts in SQL but have a very solid understanding

... and then don't know about a lot of very fundamental important parts and are blissfully unaware about that, too.

And to be clear I'm not saying they don't remember the exact details of something.

What I mean they don't even know that there are things they have to look up, nor any experience or willingness to understand what they did wrong by consulting the official documentation instead of just randomly googling and trying out "solutions" until one seem to happen to work.

The most common example would be having so little understanding about transaction that they believe transactions are just magically fixing all race conditions, and then then being very surprised that they don't. Or believing that transactions in SQL are fundamentally broken after realizing that somehow their databases got corrupted.

And again I don't mean junior deves, but people with 10+ years of backend or "fullstack" experience, i.e. people which at least should know that when to consult documentation/lookup protections transactions provide etc.

I have seen more then one time a (final state of) the situation where people started with believing SQL transaction magically fix everything, then get "corrupted" data then blame SQL for being broken and move to NoSql.

The joke here is all the concurrency problem are very fundamental and independent of SQL vs. NoSQL.

And SQL often gives you more powerful/easy to use (at small scale) tools to enforce synchronization, but at a cost. While NoSQL often gives you harder to use primitives where you have to do much more outside of the database to guarantee correctness, but then at least you will more likely blame you code instead of the db for things not working.

The most ironic thing here is I'm not a db expert, I just know where my knowledge stops and where I can lookup the missing parts and can't even give you much tips about huge dbs in production luckily surprisingly many companies have comparatively "small" db needs.

And honest where I see race condition related issues in SQL quite often I'm rarely not seeing them in NoSQL code. Where this issues in SQL make me sad as they are often very avoidable in NoSQL I often feel like giving up in resignation.

Through that experience is for "smallish" databases not Twillo scale. But a surprising large amount of companies have surprisingly "smallish" databases. Like no joke I have seen companies being very vocal about their "huge database" and then you realize it's like 5GiB ;=)

Honestly I (metaphorically speaking) don't even want to know how db experts feel about this, I'm not a db expert and just have a solid enough foundation to know where my knowledge stops and when I have to look things up (which is all the time, because I'm not writing that much SQL).

threeseed · 2024-11-30T20:25:48 1732998348

> Even if you have a great deployment pipeline, at some point, you'll need to add a missing index in production fast because a wave of users came in and revealed a shit query.

This sounds more like a CI/CD and process issue.

There is no reason why adding a new index in code and deploying it into Production should be more complex or error prone than modifying it on the database itself.

mjr00 · 2024-11-30T20:33:54 1732998834

Direct execution of `CREATE INDEX...` on a database table is always going to be faster than going through a normal deployment pipeline. Even if we assume your pipeline is really fast, which is probably not the case at most orgs, you are still comparing a single SQL statement execution, to a single SQL statement execution + git push + code reviews + merge + running through Jenkins/Circle/whatever. How long does that overhead take? How much money have you lost because your website won't load when your post is on the frontpage of HN? Seconds and minutes count. I don't want my code crashing because an unexpected index exists in this scenario.

threeseed · 2024-11-30T20:38:18 1732999098

You should be able to deploy end to end to Production in less than a minute.

Companies should be focused on solving that problem first before doing insanely short-sighted workarounds like skipping pushing to Git and code reviews.

mjr00 · 2024-11-30T20:43:15 1732999395

> You should be able to deploy end to end to Production in less than a minute.

When I was at AWS (RDS) our end-to-end production deployment process was 7 days. We were also pulling $25million/day or so in profit. I'm sure that number is much higher now.

There's a large difference between what the theoretical "right" thing is from a textbook perspective, and what successful engineering teams do in reality.

edit: besides, it doesn't even make sense in this context. I have 100 servers talking to the database. I need to create an index, ok, add it to the code. Deploy to server 1. Server 1 adds the index as part of the migration process, and let's say it's instant-ish (not realistic but whatever). Do the other 99 servers now panic because there's an unexpected index on the table?

jeltz · 2024-11-30T20:52:50 1732999970

I don't think I have ever seen a non-toy project where that was the case.

kelnos · 2024-11-30T20:50:27 1732999827

That's a lovely ideal, but I'm the real world, there are relatively few companies that meet that metric.

threeseed · 2024-11-30T21:02:10 1733000530

I've worked at FAANG and enterprise companies and we managed to do it.

There are no technical reasons why it can't be done. Only process and will.

kelnos · 2024-11-30T22:09:45 1733004585

Yes, and that's exactly the point. The reality doesn't usually match the ideals, and many orgs do not have good process, and do not have the political will to get good process implemented. Part of being a professional is recognizing where reality falls short of the ideals (an all-too-common occurrence), and doing the best you can to successfully get your work done in that environment.

And of course I don't know which FAANGs you worked at, but I know folks at FAANGs who have complained to me about CI and deployment times. Hell, these are huge companies; while they try to harmonize tooling, deployment times (especially when test suites of varying quality are involved) can vary a lot across a company. I wouldn't be surprised if there were people at the companies you worked at that were upset with deployment times, even if the teams you worked on were in good shape.

Honestly, when someone suggests something like you've suggested (that everyone should be able to get their deployment times to under a minute), I really do wonder if they're intentionally arguing in bad faith or are trolling. I know for a fact that things are not that rosy, and are rarely that rosy, even at the companies you claim to have worked at, and it's hard to believe that anyone could genuinely think that this is a broadly-attainable target. That doesn't mean that no one can do it, but that does mean that designing tooling that assumes everyone can do it is... well, just kinda naive and not very useful.

threeseed · 2024-11-30T23:46:56 1733010416

You have two choices: (1) try and solve your deployment issues or (2) make unmanaged, untested, unreviewed changes directly in Production.

Now you may say I'm just trolling but option (1) seems better to me for the long-term health of the project/company. And I don't believe it's correct to say it is an unrealistic goal.

swiftcoder · 2024-12-01T10:32:36 1733049156

> I've worked at FAANG and enterprise companies and we managed to do it.

You have a very different experience to the rest of us, in that cases.

The big AWS services all had deployments measured in days - or even weeks (depending on how many regions they are deployed across). Facebook's monorepo took upwards of an hour just to get a PR through the merge queue. Both were notorious for "hand-jamming" critical fixes directly to production.

pclmulqdq · 2024-12-01T01:00:42 1733014842

There are lots of reasons to do slow rollouts. You should be rolling out in stages anyway.

tempodox · 2024-11-30T21:57:17 1733003837

You do code review in less than a minute?

threeseed · 2024-11-30T23:48:24 1733010504

You do code reviews/automated testing in lower environments before you make changes directly to Production.

And in this case if it's an emergency hot fix then it's still better to do this through a managed, tracked, tested pipeline.

clutchski · 2024-12-01T12:59:02 1733057942

This is the correct answer.

jessekv · 2024-12-01T10:28:37 1733048917

IMO "parse, don't validate" can apply to data coming out of the database too.

theptip · 2024-12-01T05:49:34 1733032174

In all the systems I’ve built (mostly Django) you need to tolerate vN and vN+1 simultaneously; you are not going to turn off your app to upgrade the DB.

You’ll have some Pods on the old application version while you do your gradual upgrade.

How do you envision rolling upgrades working here?

Filligree · 2024-12-01T01:23:42 1733016222

I'm so glad you made this. I've been searching for a decent Rust database library for half a year already, and this ticks all the boxes.

I haven't tried it yet, so I might have to eat my words later, but- great job! It's going to save tons of effort.

sobellian · 2024-11-30T18:19:08 1732990748

Surely it is easier to just check that all migrations have run before you start serving requests? Column existence is insufficient to verify that the database conforms to what the application expects (existence of indices, foreign key relationships with the right delete/update rules, etc).

runeks · 2024-12-01T06:43:29 1733035409

> This is validated at compile time against a database, which is good, but of course there is no guarantee that the production database will have the same types. Easiest mistake to make is to design a query against your local Postgres v15 and hit a runtime error in production running Postgres v12, e.g. a function like gen_ramdom_uuid() doesn't exist. Another is to assume a migration in production was actually executed.

One could have the backend fetch DB schema/version info at startup, compare it to its own view of what the schema should look like, and fail if the two disagree. That way, a new deployment would fail before being activated, instead of being deployed successfully and queries failing down the line.

cryptonector · 2024-12-02T04:17:38 1733113058

If the database is external to the host language and DB library, then you have external linkage and you'll have this problem no matter what -- not great. If the database is internal to the host language and DB library then you won't have this problem, but also you'll only be able to interact with the database via your application's code -- also not great.

Either way is not great. I think the best bet is to have:

  - an external RDBMS
  - a compiler from the RDBMS schema to
    host languages/libraries
  - run-time validation that the schema
    has not changed backwards-incompatibly

When does run-time validation take place? When you compile a query: a) the RDBMS will fail if the schema has changed in certain backwards-incompatible ways (e.g., tables or columns dropped or renamed, etc.), b) the library has to check that the types of the resulting rows' columns match expectations.

ris · 2024-11-30T20:41:01 1732999261

And you end up with no canonical declaration of the schema in your application code, leaving developers to mentally apply potentially tens, hundreds of migrations to build up an idea of what the tables are expected to look like.

eddd-ddde · 2024-11-30T22:22:17 1733005337

No matter how you define your schemas, you still have a series of migrations as data evolves. This is not an issue of schema definition.

Kinrany · 2024-11-30T19:27:53 1732994873

The application is necessarily the authority on its expectations of the database.

mjr00 · 2024-11-30T19:35:30 1732995330

You can see my sibling comment, but in the real world of operating databases at any sort of scale, you need to have databases in transitory states where the application can continue to function even though the underlying database has changed.

The quintessential example is adding a column. If you want to deploy with zero downtime, you have to square with the reality that a database schema change and deployment of application code is not an atomic operation. One must happen before the other. Particularly when you deal with fleets of servers with blue/green deploys where server 1 gets deployed at t=0minutes but server N doesn't get deployed until t=60minutes. Your application code will straight up fail if it tries to insert a column that doesn't exist, so it's necessary to change the database first. This normally means adding a column that's either nullable or has a default value, to allow the application to function as normal, without knowing the column exists.

So in a way, yes, the application is still the authority, but it's an authority on the interface it expects from the database. It can define which columns should exist, but not which columns should not exist.

Hytak · 2024-12-01T17:02:28 1733072548

You (and many other commenters) are right that rust-query currently requires downtime to do migrations. For many applications this is fine, but it would still be nice to support zero-downtime migrations

Your argument that the application should be the authority on the interface it expects from the database makes a lot of sense. I will consider changing the schema check to be more flexible as part of support for zero-downtime migrations.

kelnos · 2024-11-30T22:43:45 1733006625

Absolutely not. Certainly it's necessary that the database schema be compatible with whatever the application believes the schema to be, but the application need not be in control of it, and at the orgs I've worked at, there's no way we could build zero-downtime systems if the application had to be in charge of schema.

Consider even a very simple case: let's say I have a database with two (identical) application servers talking to it (that is, I've horizontally scaled my application due to load and availability requirements). If I need to do a schema change, and the application needs to be in charge of schema, how would that even work? If I deploy the change & migration to one of the two servers, once the migration is complete, the second server will freak out because the schema doesn't match its "authoritative" view anymore. If I deploy the change to both servers at the same time, and somehow designate one of them to actually run the migration, then the other one will immediately panic on startup because the first server hasn't completed the migration yet.

Not to mention this setup breaks red/black deployments: even in the case where I only have one application server in front of the database, how do I bring up a new deployment, allow both servers to run while the new one is passing health checks, and then bring down the original server? They would both have different "authoritative" views of what the schema should be.

This also completely breaks the ability to roll back, at least without also rolling the schema back too. That's risky; I don't want to have my service rollback depend on schema rollback also working properly.

This kind of "application is authoritative about schema" only works when you can schedule downtime to do schema changes. That would be a non-starter for any company I've worked at.

Kinrany · 2024-12-01T05:00:06 1733029206

The application can ensure that its assumptions hold without asserting the exact state of the database.

Almost all migrations are idempotent and backwards-compatible, and almost all of the rest of them can be made to be by splitting into multiple changes and waiting for old versions of the application to be shut down.

throwawaymaths · 2024-11-30T19:37:39 1732995459

You might have more than one application hitting the same database

seanhunter · 2024-12-01T08:09:42 1733040582

How does that work in the case where you have one database and n applications using it? Are they all somehow the authority, or do you have a way to pick one?

nurettin · 2024-12-01T10:46:49 1733050009

> schema is validated by the wrong system. The database is the authority on what the schema is

What you describe is db-first. This rust library is code-first. In code-first, code is responsible for generating ddl statements using what is called a "migration" where the library detects changes to code and applies them to the schema.

shermantanktop · 2024-12-01T17:26:34 1733073994

Agree. Mid-tier developers who create queries for a SQL db to execute are doing the equivalent of using Java code to generate HTML. The target is not a programmatic API, it’s a language that was designed for end users, and it is both more expressive and more idiosyncratic than any facade you build in front of it.

ojkelly · 2024-11-30T22:28:22 1733005702

Would it make more sense to consider the response from the DB, like a response from any other system or user input, and take the parse don’t validate approach?

After all, the DB is another system, and its state can be different to what you expected.

At compile time we have a best guess. Unless there was a way to tell the DB what version of the schema we think it has, it could always be wrong.

ninetyninenine · 2024-11-30T19:35:36 1732995336

agreed. Maybe having a schema check on the build step of the application will solve this. If the schema doesn't match then it doesn't compile. Most orms of course do the opposite. They generate a migration for the database from the code.

levkk · 2024-11-16T20:16:19 1731788179

Come build the next full stack web framework for your favorite programming language!

https://github.com/levkk/rwf

All aspects of project are open to contributors. Beginner friendly. Learn Rust/web tech if you're not familiar with how the sausage is made.

timka · 2024-11-21T12:30:29 1732192229

You say beginner friendly. How does rwf look like compared to Pavex[1] in this regard?

[1] https://github.com/LukeMathWalker/pavex

levkk · 2024-11-26T17:27:59 1732642079

I think rwf is easier to use, but it's not my place to make this argument. Try them both out and let me know which one you like best!

levkk · 2024-11-07T19:41:33 1731008493

Moving data between systems is problematic. Where this product is actually needed (multi-TB databases under load) is where logical replication won't be able to sync your tables in time. Conversely, small databases where this will work don't really need columnar storage optimizations.

woodhull · 2024-11-07T20:17:16 1731010636

For my use case of something similar on Clickhouse:

We load data from postgres tables that are used to build Clickhouse Dictionaries (a hash table for JOIN-ish operations).

The big tables do not arrive via real-time-ish sync from postgres but are bulk-appended using a separate infrastructure.

exAspArk · 2024-11-07T22:31:17 1731018677

Would you be able to share how you implemented "bulk-appended using a separate infrastructure" at a high level?

exAspArk · 2024-11-07T19:56:42 1731009402

Fair point. We think that BemiDB currently can be useful when used with small and medium Postgres databases. Running complex analytics queries on Postgres can work, but it usually requires tuning it and adding indexes tailored to these queries, which may negatively impact the write performance on the OLTP side or may not be possible if these are ad-hoc queries.

> (multi-TB databases under load) is where logical replication won't be able to sync your tables in time

I think the ceiling for logical replication (and optimization techniques around it) is quite high. But I wonder what people do when it doesn't work and scale?

delive · 2024-11-08T03:36:58 1731037018

What would you consider to be small or medium? I have a use case for analytics on ~1 billion rows that are about 1TB in postgres. Have you tried on that volume?

exAspArk · 2024-11-08T03:46:43 1731037603

We haven't tested this with 1TB Postgres databases yet, assuming that most companies operating at this scale already built analytics data pipelines :) I'm curious if you currently move the data from this Postgres to somewhere else, or not yet?

delive · 2024-11-08T04:12:13 1731039133

Not yet, mostly just kicked the can down the road due to costs. Like you said in another post, careful indexes on postgres get you quite far, but not nearly as flexible as a columnar DB.

I think your project is great. I suspect incremental updates will be a big feature for most uptake (one we would need to try this out at least).

levkk · 2024-10-23T17:06:34 1729703194

jemalloc (as opposed to GNU libc and LLVM) sometimes performs better. [1]

[1] https://jemalloc.net/

levkk · 2024-10-23T16:16:56 1729700216

Thanks!

Re: callbacks. They are very nice, when you have CRUD endpoints that modify models directly from JavaScript [1]. It ends up being pretty DRY, especially since you'll find yourself modifying the same model from different places in the code, and without callbacks, you'll have bad data.

Re: service layer. It's a matter of taste, and you can probably avoid it until you're way in the thousands of LOCs. Rails and Django are all about building quickly - that's an advantage you shouldn't give away to your competitors. Service layer is a drag that you may need as an "enterprise".

Re: MVC not production-ready, we know that's not true, but appreciate the hot take, always a good starting point for a great discussion.

Re: existing ORMs, they were not flexible enough. I used ActiveRecord and Django as my inspiration; those are both excellent ORMs, while existing Rust ORMs lean too heavily on type safety in Rust in my opinion. The database should be the source of truth for data types, and the framework should allow for intentional drift.

Hope you get to try Rust soon. I've been using it for years, and I don't want to go back to Python or Ruby, hence this project.

Cheers!

[1] https://levkk.github.io/rwf/controllers/REST/model-controlle...