There are several storage class memories that are nearing commercialization. Intel is betting big on at least one of them. Most technologies in this class are orders of magnitude faster and have orders of magnitude better endurance than flash memory, while being only slightly slower the DRAM, yet non-volatile.
It is plausible that with another layer of in-package cache they could eliminate DRAM altogether, replacing it with ultrafast NVM. Imagine the resume/suspend speed and power savings of a machine whose state is always stored in NVM.
> There are several storage class memories that are nearing commercialization.
I'm very interested in this. Could you point out which technologies that are near ready for commercialization?
My understanding is that the current cost is orders of magnitude higher per unit of storage for these new technologies compared to NAND flash or even DDR3 RAM. But of course, a dedicated fab could change that very quickly.
Well nvDIMMs are available right now (from companies like Netlist, Agigatech, Viking, Smart, Micron). This is DRAM with an analog switch, a controller and flash memory. When you lose power, the DRAM is disconnected from the processor and the contents are copied to the flash. The newer technology might be cheaper, but I thought so far the write performance is not as good as DRAM.
The issue is the cache: the data is not non-volatile until it has been written back to DRAM. Even then, you need some advanced warning of a power outage for it all to work.
Unibus (bus for PDP-11 core memory systems) had an early warning signal, to give the memory controller a chance to write back the previous (destructive) read.
Components are available on the market now based on PCM, MRAM, and FRAM. I know that Intel has large productization, not research, teams working on a variant of SCM. Near means 2-3 years though. Research exit to market ready is always a 3-5 year cycle when process engineering is involved.
if non-volatile memory is becoming the new disk, why is it any more or less likely to be encrypted than current disk storage (mostly not, as far as I've seen).
This could be done mostly transparently, with the encryption in the memory controller. Addresses and data are already scrambled with a (non-cryptographic) scrambling code for EMI reasons. Of course, a sufficiently fast hardware crypto core would be required.
EDIT: Also, I forgot that the last generation of consoles (and I assume the current) have transparent encryption of main memory.
How do you square that with the performance of the AES-NI instructions? That is theoretically 16 bytes per cycle from the manual. Per core. That is way in excess of memory bandwidth, even with DDR4.
The theoretical maximum for current chips is less than 16 bytes per cycle. On Haswell you can process (in parallel) 7 blocks in roughly the time it would take to process 1. The latency of each round is 7 cycles, a full AES-128 10 rounds is ~70 cycles, so effectively you can process at most 1.6 bytes per cycle, or 1.14 if you use 256-bit keys (ignoring the cost of key scheduling and overhead here).
Even if you dedicate all CPU cores to the task of encrypting memory, you still stop short of exceeding theoretical memory bandwidth by quite a bit.
Do you believe it's reasonable to assume that AES performance will remain constant over the same 5-7 year timeframe? That's at least a couple of hardware generations for an improvement they could make in the current generation if there was a market for it.
The VIA C7 AES implementation could keep up with memory (ca. 20Gb/s). With suitable cipher modes you can use multiple pipelined units in parallel with negligible overhead.
Remanence attacks are pointless against non-volatile media. You use them against volatile media in a physical attack in an attempt to sneak under/manipulate the limits of that volatility to cause violations of security assumptions, such as "the keys are in RAM" (true) > "RAM is instantly volatile on shutdown" (not quite true) > "keys are instantly zeroised on shutdown" (not this easily they're not).
Some RAM is much more volatile than conventional bulk SRAM or DRAM (for example, frequently L1/L2 caches on CPUs are impractical to exploit). Properly encrypt bulk data held in high-remenance or non-volatile RAM with a key held in such low-remanance RAM, and your security problem is solved.
That still doesn't answer the question. If you treat non-volatile memory as a disk, then the data would never touch it unencrypted, so a cold boot attack is useless against the non-volotile memory. Of course, you could still launch a cold boot attack on the volotile memory, but we can do that already.
Computing really hasn't figured out how to handle non-volatile memory as yet. It's almost always used to emulate rotating disks, with file systems, named files, and a trip through the OS to access anything. Access times for non-volatile memory are orders of magnitude faster than disk access times, so small accesses are feasible. But that's not how it's treated under existing operating systems.
There are alternatives. Non-volatile memory could be treated as a key/value store, or a tree, with a storage controller between the CPU and the memory device. With appropriate protection hardware, this could be accessed from user space through special instructions. That's what I though this article indicated. But no. This is just better cache management for the OS.
There have been systems where everything is memory mapped and disks are just used to emulate more memory.
It's called "single-level store" in System 36 and descendants. File access in Multics was all memory mapped.
There's nothing inherently rotating-disky about current filesystem APIs from the user point of view, a they just provide a database interface which has a certain type of namespace system for access. The block level part is largely invisible to the FS users (modulo leaky abstractions).
It is already treated as a k/v store, where key is a LBA and the value is a 512/4096byte block. The OS builds everything else (ie. filesystems) on top of that. Applications can already now access the raw k/v store directly if they wish (open /dev/sd? directly, permissions allowing).
This is not (specifically) for the OS. This is for non-volatile memory that is directly attached to the memory bus. The OS can then directly map NVRAM into the address space of a user-space process; the application could use these instructions to efficiently ensure the crash consistency of its persistent data.
Well, this is about userspace access to nvram, with the nvram mapped as memory. It just so happens that cache management is one of the hard parts of doing that, so that's what these new instructions are for.
Also, current Flash memories do not allow single address writes. At least the write endurance problem could be addressed by adding write leveling to an address translation layer. The single address thing could be addressed by a caching/grouping layer that could interact with the leveling mechanisms. Add to that an all-core state dump to a block write and you can recover to an internally consistent state after a power failure.
Very interesting! It's always fun to see "external" development in the general field of computer architecture affect low-level stuff like a CPU's cache and memory subsystems.
It wasn't super-easy to figure out who in the grand ecosystem view of things is going to have to care about these instructions, but I guess database and OS folks.
Also, if the author reads this, the first block quote with instruction descriptions has an editing fail, it repeats the same paragraph three times (text begins "CLWB instruction is ordered only by store-fencing operations").
Isn't there a higher risk of data loss, if your "hard drive" is 100% memory mapped - all it would take is one buggy kernel driver writing to an invalid pointer or memset'ing the whole thing to 0?
well, the same is true now as well right ? for example, a buggy driver can override a buffer-cache pointer with something else, and then you are hosed. if you are playing in the kernel-land and not careful enough, you are courting disaster...
True, but if it overruns a buffer, it still needs to maintain a valid SCSI/ATAPI/whatever command packet format and submit the packet to the controller with repeatedly increasing block numbers - that's a lot of instructions, while something that clears the entire address space could probably be done in 1-2 assembly instructions (mov rcx, -1; rep stosq)
Support for non-voltile memory needs to be added to Linux. For example, one should be able to map the non-volatile memory into user space and directly access it. There needs to be some BIOS-OS interaction so that the OS doesn't treat the non-volatile memory as general memory (for the likely case where only some of the memory is non-volatile).
Alternatively, the non-volatile memory should be usable as a block device.
The non-volatile memory needs a layer of RAID-like volume management. For example, when you transfer the memory from one system to another, there should be a way to determine that the memory is inserted in the correct slots (remember there is RAID like interleaving/striping across memory modules).
How about: a cpu that has scores of hyperthreads? They don't block in the kernel; they stall on a semaphore register bitmask. That mask can include timer register matches another register; interrupt complete; event signaled.
Now I can do almost all of my I/o, timer and inter-process synchronization without ever entering a kernel or swapping out thread context. I've been waiting for this chip since the Z80.
While not exactly a chip (it never reached board stage) I designed a processor in college where the register file was keyed to a task-id register. This way, context switches could take no longer than an unconditional jump.
I dropped this feature when I switched to a single-task stack-based machine (inspired by my adventures with GraFORTH - thank you, Paul Lutus). This ended up being my graduation project.
It looks like the original text was automatically processed by replacing various words by their synonyms from a thesaurus, leading to hilariously non-idiomatic prose.
It is plausible that with another layer of in-package cache they could eliminate DRAM altogether, replacing it with ultrafast NVM. Imagine the resume/suspend speed and power savings of a machine whose state is always stored in NVM.