More

zbentley · 2024-11-13T03:15:39 1731467739

In a great number of (probably most) business-to-business software domains, most rows are only relevant to one tenant at a time. It’s a partition/shard key, basically: queries will only uncommonly retrieve rows for more than one tenant at a time.

Hassles emerge when the tenant that owns a row changes (client splits or mergers), but even then this is a common and robust architecture pattern.

zbentley · 2024-11-12T23:22:01 1731453721

Erratum:

> Transactions may observe none, part, or all

Should, I think, read:

> Consumets may observe none, part, or all

aphyr · 2024-11-12T23:29:35 1731454175

Both are true, but we use "transactions" for clarity, since the semantics of consumers outside transactions is even murkier. Every read in this workload takes place in the context of a transaction, and goes through the transactional offset commit path.

zbentley · 2024-11-13T02:22:23 1731464543

Ah, got it; I was assuming that “transactions” was referring to the transactions mentioned as the subject of the previous sentence, not the transactions active in consumers observing those. My mistake!

zbentley · 2024-11-12T03:36:47 1731382607

I think the pedantic answer is "if so, your branches/commits are too big."

In practice, what you describe is often the case, and having a quick shortcut to do "stick $small_fix in main via a separate branch, but yoink that commit into my branch before that first branch lands on main so I can keep working" would be very useful.

Historically, though, I usually addressed this need by copy/pasting the code into separate commits in both branches, and being careful to keep the $small_fix-to-main code and its twin in my larger branch un-drifted and textually discrete to make the eventual merge conflict a rubber-stamp to resolve. Ugly, yes. Maybe I should feel guilty about it, who knows?

stouset · 2024-11-12T11:02:41 1731409361

That sounds like a lot of work.

And it is a lot of work. Even in the best case. I know it is because I used to do exactly this. But more often than not the small fix ends up getting affected by your changes in the branch and then you're in rebase hell again.

zbentley · 2024-11-12T03:22:39 1731381759

As another commenter said, OTP messages are meant to be between processes in the same privilege zone. That said, using a custom protcol via a good library can actually bring benefits relative to core OTP stuff.

For example, several of the gRPC libs I've used for Erlang/Elixir are pretty low-cognitive-overhead to use, and they come with all the added gRPC goodies: RPC semantics are described in one place rather than ad-hoc throughout code, protobufs have at least a documented (if not actually good) process for upgrades and backwards compatibility, multilanguage gets easier (even if your second language is just a tiny sliver of "dump protobufs into a database/Jupyter notebook/Rust program occasionally for offline reporting").

To be clear, this isn't a paean to gRPC; most of those features are table stakes for an IDL-driven protocol definition. Just saying that you do get some things in return for giving up the convenience of OTP, if you pick the right tools.

zbentley · 2024-11-12T02:33:17 1731378797

The threading approach is roughly:

1. Start a thread

2. That thread starts a child process and signals "started" by storing its PID somewhere globally-visible (and hopefully atomic/lock-protected).

3. The thread then blocks in wait(2), taking advantage of its non-main-thread-ness to avoid some signals and optionally masking/ignoring some more.

4. When the process exits, the thread can write exitstatus/"completed" to the globally-visible state next to PID. The thread then exits.

3. External observers wait for the process with a timeout by attempting to join the thread with a timeout. If the timeout occurs, they can access the globally-visible PID and send a signal to it.

This is missing from the article (EDIT: it has since been added, thanks!). That doesn't mean it's a good solution on many platforms. It's more costly in resources (thread stack), more code than most of the listed options, vulnerable to PID-reuse problems that can cause a killsignal to go to the wrong process, likely plays poorly with spawning methods that request a SIGCHLD be sent to the parent on exit (and plays poorly with signals in general if any customization is needed there), and is probably often slower than most of TFA's alternatives as well, both due to syscall count and pessimal thread/scheduler switching conditions. Additionally, it multiplexes/composes to large numbers of processes poorly and with a high resource cost.

EDIT: Golang's version of this is less bad than described above, but not perfect. Go's spawning infrastructure mitigates resource cost (goroutines/segmented stacks are not as heavy as threads), is vulnerable to PID-reuse (as are most platforms' operations in this area), addresses the SIGCHLD risk through the runtime and signal channels, and mitigates slowness with a very good scheduler. For multiplexing, I would assume (but I have not verified) that the Go runtime is internally using pidfds/kqueue where supported. Where not supported, I would assume Go is internally tracking spawn requests through its stdlib, handling SIGCHLD, and has a single global routine calling wait(2) without a specific PID, waking goroutines waiting on a watched PID when it comes out of the call to wait(2).

nasretdinov · 2024-11-12T09:08:28 1731402508

Thanks. I believe that Go indeed _could_ use those APIs to wait for the child more efficiently if they chose to, but the current implementation suggests that they're just calling wait4() in a separate thread: https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/...

To be fair, in Go process spawning is very inefficient to begin with, since it requires lots of runtime coordination to not mess with the threads/goroutines state during fork, so running wait4() in a separate thread (although the thread can be re-used afterwards) is not the biggest concern here.

broken_broken_ · 2024-11-12T07:57:55 1731398275

Thanks for the suggestion, I have added a short section about threads.

zbentley · 2024-11-10T17:47:54 1731260874

> assume the public defender is at least friends with the police and therefore may not have your best interests in mind

What? Police fucking hate public defenders as a general rule. Like sure, a police officer might be familiar with a public defender who frequently works cases brought by that officer’s precinct. But it’s perfectly clear to both of them that they work in opposition. Absent a few extremely rural and/or corrupt cases I do not believe this friendship is often the case.

Loughla · 2024-11-11T00:53:10 1731286390

I know that's what TV says, but my anectdata is the opposite of that.

I've known multiple attorneys in the public defenders office in three different courts. One rural, two very urban. They absolutely had relationships with both street officers and leadership in the precincts of the city and town.

zbentley · 2024-11-04T09:07:05 1730711225

Rye wraps uv and adds python version management, among other things.

zbentley · 2024-11-01T02:28:44 1730428124

I buy that there's bias here, but I'm not sure how much of it is activist bias. To take your example, if a typical user searches for "is ___ a Nazi", seeing Stormfront links above the fold in the results/summary is going to likely bother them more than seeing Mother Jones links. If bothered by perceived promotion of Stormfront, they'll judge the search product and engage less or take their clicks elsewhere, so it behooves the search company to bias towards Mother Jones (assuming a simplified either-or model). This is a similar phenomenon to advertisers blacklisting pornographic content because advertisers' clients don't want their brands tainted by appearing next to things advertisers' clients' clients ethically judge.

That's market-induced bias--which isn't ethically better/worse than activist bias, just qualitatively different.

In the AI/search space, I think activist bias is likely more than zero, but as a product gets more and more popular (and big decisions about how it behaves/where it's sold become less subject to the whims of individual leaders) activist bias shrinks in proportion to market-motivated bias.

_bin_ · 2024-11-01T05:45:48 1730439948

I can accept some level of this, but if a user specifically requests it, a model should generally act as expected. I think certain things are fine to require a specific ask before surfacing or doing, but the model shouldn't tell you "I can't assist with that" because it was intentionally trained to refuse a biased subset of possible instructions.

HeatrayEnjoyer · 2024-11-01T08:22:09 1730449329

How do you assure AI alignment without refusals? Inherently impossible isn't it?

If an employee was told to spray paint someone's house or send a violently threatening email, they're going to have reservations about it.. We should expect the same for non-human intelligences too.

_bin_ · 2024-11-02T18:38:52 1730572732

The AI shouldn’t really be refusing to do things. If it doesn’t have information it should say “I don’t know anything about that”, but it shouldn’t lie to the user and claim it cannot do something it can when requested to do so.

I think you’re applying standards of human sentience to something non-human and not sentient. A gun shouldn’t try to run CV on whatever it’s pointed at to ensure you don’t shoot someone innocent. Spray paint shouldn’t be locked up because a kid might tag a building or a bum might huff it. Your mail client shouldn’t scan all outgoing for “threatening” content and refuse to send it. We hold people accountable and liable, not machines or objects.

Unless and until these systems seem to be sentient beings, we shouldn’t even consider applying those standards to them.

HeatrayEnjoyer · 2024-11-06T15:24:21 1730906661

Unless it has information indicating it is safe to provide the answer, it shouldn't. Precautionary Principle - Better safe than sorry. This is the approach taken by all of the top labs and it's not by accident or without good reason.

We do lock up spray cans and scan outgoing messages, I don't see your point. If a gun technology existed that could scan before doing a murder, we should obviously implement that too.

The correct way to treat AI actually is like an employee. It's intended to replace them, after all.

zbentley · 2024-10-30T15:42:39 1730302959

Strongly disagree. At the level of io_uring (syscalls/syscall orchestration), it is expected that available tools are prone to mis-use, and that libraries/higher layers will provide abstractions around them to mitigate that risk.

This isn't like the Rust-vs-C argument, where the claim is that you should prefer the option of two equivalently-capable solutions in a space that doesn't allow mis-use.

This is more like assembly language, or the fact that memory inside kernel rings is flat and vulnerable: those are low-level tools to facilitate low-level goals with a high risk of mis-use, and the appropriate mitigation for that risk is to build higher-level tools that intercept/prevent that mis-use.

eqvinox · 2024-10-30T20:45:18 1730321118

Full ACK.

To bring it back to the FD leak in the original post: the kernel won't stop you from forgetting to call close() either, io_uring or no.

zbentley · 2024-10-30T15:38:12 1730302692

Agreed, though a better title would probably not use the term "safe" unqualified (e.g. "Async rust with io_uring leaks resources (and risks corrupting program state").