io_uring (and eBPF for that matter) are still super concerning attack vectors an...

hedora · on April 13, 2023

The io_uring API is fairly narrow. It's basically the same interface as the kernel uses to talk to network cards and modern disks. Those are untrusted just like userspace is. You can look at the packet format here (section 4.1):

https://kernel.dk/io_uring.pdf

It's not hard to parse, and nowhere near the level of exposure eBPF creates.

If you don't trust the hardware MMU to allow the kernel to safely read buffers from userspace, then there's really no way to perform I/O in the first place. (write() already does this, for example).

softirq · on April 13, 2023

The kernel talks to NICs in kernel mode and can actually segregate a device's view of memory via an iommu. While there's some overlap in potential vulnerabilities, bad hardware/firmware is a different vector than userland having a shared mapping active to use in exploits that read arbitrary kernel data.

io_uring is also very complex. It's now it's own subsystem, has it's own worker pool, and even the dance of the rings themselves moving pointers around and using data structure that must be manipulated from both sides is not simple and thus probably not that secure.

firstlink · on April 13, 2023

> In io_uring's case sharing a userspace mapping with kernel space is simply always going to be dangerous.

Out of curiosity, have you ever used a syscall which writes or reads a userspace buffer?

softirq · on April 13, 2023

It's not the same thing. io_uring remaps a contiguous chunk of pages in both the kernel and userspace vas. For read/write the data is copied from userspace into kernelspace.

firstlink · on April 13, 2023

You're swapping terms around to draw distinctions where there are no real differences. There's no such thing as a "kernel page" and a "userspace page". There's only pages of memory which are mapped into one or both. All pages accessible to userspace are also mapped in the kernel. That means that the ring buffers live in perfectly ordinary, mapped-in-both-places userspace pages. There is zero basis in fact for your claims that these are "kernel pages" which have been mapped to userspace. You have taken the exact same phenomenon, pages accessible to both userspace and the kernel, and called it by a new scary name "kernel pages accessible by userspace", instead of the other safe and ordinary name, "userspace pages". Now, the addresses used for the kernel mapping may be different than normal, but (a) that is completely inconsequential to security and (b) as you point out yourself, kernel address space is inaccessible to userspace so it could not possibly matter where the kernel maps the shared pages.

Now it is in general a real security bug when the kernel operates multiple times on data mapped into userspace instead of taking a one-time copy of that data and using this copy for multiple operations. However, the whole point of a ringbuffer is that it specifically operates in such a shared environment. Moreover it will be just as necessary for the kernel to perform the snapshot copy out of an SQE (and other shared structures) before beginning a sequence of operations on them. The only difference with io_uring is when is that copy performed: at the time of a syscall or during other operations. That too is completely inconsequential to security.

To sum up: It is correct to state that the kernel must be careful with shared mappings. It also would have been correct to state, had you reached this far, that io_uring is moving the userspace-copy boundary "deeper" into the kernel rather than isolating it at the syscall layer. It is, however, incorrect to state that io_uring contains a new, more dangerous kind of shared mapping. It is incorrect to state that the shared mapping used by io_uring is itself in any way a threat to kernel security.

softirq · on April 13, 2023

Let's go over what I said:

io_uring remaps a contiguous chunk of pages in both the kernel and userspace vas.

This is a true statement - io_uring makes a compound page and calls remap into the usersapce vas in mmap, and I did not say that the pages were kernel or userspace page. However, you've said "userspace pages" in your own argument which by your own admonition is incorrect. You are correct in saying that pages are just pages, because a page is just a chunk of physical addresses assigned to a pfn and has no meaning in userspace or kernel space without a vma.

There is a difference between kernel and userspace mappings, and mapping userspace virtual address to point to direct mapped kernel addresses that the kernel is manipulating is dangerous and there are many CVEs that have taken advantage of these types of command buffers on other kernels.

joek1301 · on April 13, 2023

Does the distinction between sharing VA mappings and copying buffers to/from kernel matter from a security perspective? (I assume it does, but I don't know why.)

softirq · on April 13, 2023

Yes, you're looking at kernel pages through userspace virtual memory mappings, this isn't the case with copy to user. You're just copying data from a userspace page to a kernel page, but only in kernel mode. You don't get to "see" kernel pages and in fact post spectre/meltdown the kernel is unmapped in userspace.

saagarjha · on April 13, 2023

I don’t think it is fair to call eBPF an attack vector. It does happen to currently be used by exploit developers to pivot a preliminary bug into code execution, but that’s just because it’s convenient low-hanging fruit. If you take it away authors will move on to other techniques.