Intel’s Plans for 3DXP DIMMs Emerge

jzelinskie · on July 24, 2018

It seems obvious in retrospect, but persistent memory adds a pretty exciting new advantage for persistent data structures.

Another thought: As potentially paradigm changing technology like this becomes available will it ever make sense to redesign the OS?

caf · on July 24, 2018

I used to think so - once you have persistent primary memory, you can install operating systems and applications directly into primary memory, so there's no longer any point in having a distinction between "install" and "run/boot".

However, iOS and Android have shown that it's possible to do away with this distinction even with a traditional OS running underneath. So I now tend to think that instead what will happen is more continual evolutionary changes at the OS level to work better in a "boot once" environment, rather than a revolution.

amluto · on July 24, 2018

> so there's no longer any point in having a distinction between "install" and "run/boot".

When Linux boots, the in memory state changes quite a bit. Even the actual code gets modified during boot. The whole process takes well under a second. Linux does support an “execute in place”, but it’s barely a win, and I don’t think it works on x86.

A more interesting idea is to put your OS installation on a DAX (direct access) filesystem.

glangdale · on July 24, 2018

Seriously, yes. If there was ever a time to rethink OS design, surely this is it.

That being said, operating systems like Linux tend to capture most of the value from these kind of advances - often by dint of being able to simply 'get out of the way' if a sufficiently important user space process wants access to the device.

But one would suspect that things have changed sufficiently from the 1970s to warrant a ground-up rethink. Core counts, distributed systems (the Plan 9 folks already too a swing at this in the 90s), nearly ubiquitous graphics/GPGPU accelerators, persistent memory, nearly ubiquitous access to 64-bit address spaces (at least for desktop and most phones) - you'd think something would change about design. I don't work in the area so I don't know what that is...

dragontamer · on July 24, 2018

> Seriously, yes. If there was ever a time to rethink OS design, surely this is it.

Why?

Traditional servers are persistent: they never turn off. 500+ days of uptime is typical. And today, with VMs which at worst... hibernate... it seems like "never turning off" might be the norm.

stouset · on July 24, 2018

On the contrary, as a security professional I’d be thrilled if servers had a lifespan of hours instead of weeks or months. Reimaging VMs/containers/machines from scratch frequently gives so many advantages.

When OS, system, or library updates happen, you can easily launch replacement servers on the updated stack, put them in the rotation, and decommission the old ones. This is so much simpler than trying to run OS upgrades in-place across an entire fleet. The longer a machine has been running between reboots, the lower my belief in its odds of upgrading and restarting cleanly.

Further, this regularly tests your load balancing setup and pretty much fundamentally gives you capacity to scale up and down as load permits. Problems will be discovered early on, instead of during crunch time when you have to scale or when a few of your machines go offline during peak hours.

Security-wise, you don’t just get the benefit of fast, regular updates. But you also get assurances that users haven’t left stale data like unencrypted database exports, PII dumps, etc. lying around. Go on a long-lived machine some day and check out users’ home directories. That shit is a gold mine if someone who wants to do harm gets on your systems.

Not to mention regular reimaging makes it harder for an attacker to establish a permanent foothold in your infra.

None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.

mycall · on July 24, 2018

You forgot, frequent rebuilds kill off any intrusion as the world reflashes -- unless they get into IoT or microcontroller packages.

stouset · on July 24, 2018

I did mention that it makes it harder for an attacker to keep a foothold in your infrastructure, but I think I wasn't as clear as I wanted to be.

But yeah, it's bad that an attacker has been able to get to a critical system, but it's a phenomenal defense if any of their beacons or remote access tools last at most a few hours or days before being wiped. This makes an attacker's life much harder.

dragontamer · on July 24, 2018

> None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.

On the contrary. Persistent memory means that infinite uptime is the future. Which, as you note, is difficult. Resetting the OS every now and then to a known state is a good practice, although disruptive to a lot of workflows.

If anything, I consider your post to be an argument AGAINST persistent memory.

stouset · on July 24, 2018

Persistent memory might enable those sorts of uptimes, but it doesn't inherently mandate it.

magicalhippo · on July 24, 2018

But traditional operating systems still assume RAM contents is volatile (because currently it is), most filesystems assume disks are glacially slow etc.

A traditional spinning rust HDD has an effective latency of ~10 ms. The NVME version of the 3DXP has an effective latency of ~50 us, or two orders of magnitude better. Not sure how low the DIMM version will go, but maybe another order of magnitude?

If so, we're talking three orders of magnitude difference. That would radically affect the assumptions going into storage algorithms. Suddenly you can no longer spend millions of instructions trying to avoid I/O. Batching of I/O is also not needed to the same degree. Complex syncing of memory and disk is not needed. Etc etc.

dragontamer · on July 24, 2018

> But traditional operating systems still assume RAM contents is volatile (because currently it is)

RAM is only volatile on startup. Certainly not when a VM hibernates and comes back.

magicalhippo · on July 25, 2018

> RAM is only volatile on startup.

No it isn't. Anything that needs to survive a power cycle needs to go to non-volatile storage. And this is assumed to be very, very slow.

glangdale · on July 24, 2018

I don't think "computers stay up a long time these days" is an argument against doing OS research on order-of-magnitude-faster, byte-addressable persistent storage.

We seem to be doing pretty well with a bunch of abstractions from the 1970s, as well as with the idea of just building giant trapdoors into our hardware whenever these abstractions fail (e.g. most databases, DPDK in the network space, etc). It's not a crisis. It just seems like a pretty good time to do some basic OS research (aside from all the usual headwinds for that, e.g. massive complexity of underlying hardware, difficulty finding meaningful workloads for a "toy" OS, etc).

scurvy · on July 24, 2018

Your ideas around uptime are still in the 70's. Systemd updates require reboots. Then there's Spectre and Meltdown BIOS updates, gotta reboot for those. Oh and the SSD and NIC firmware as well.

To think we formerly only had one devil in glibc. Now everything is constantly being updated and it's fine. We've moved on from the uptime as phallic measuring stick mantra. Patch, reboot, and stay secure.

mozumder · on July 24, 2018

Probably the optimal use of this technology is databases, since they rely most on random access to large persistent data.

Any operating system that's designed around this technology is probably going to look like a database.

Basically, boot to Postgres and all "files" are now SQL tables, stored in NVDIMM. Indexes are in DRAM, and critical nodes are in cache.

All data (system and user) is organized and opinionated: All photos are in a photo database, with tables for IPTC metadata. All music. All executable files. If you're browsing the web, it'll probably cache data in local SQL tables. etc..

I can envision using SQL stored procedures as actual apps, perhaps with an API to access graphics hardware, network, sound, etc..

infogulch · on July 24, 2018

The entire information world is either a database or a cache (or communication between them), layered on top of each other over and over. Every new storage technology typically ends up being yet another layer as either database or cache (or both). This case is pretty unique in that it can actually serve to remove a layer: ram (typically a cache) is not necessary if nvs is viable at the same speeds. But in general new storage tech just adds another layer, which the software world reacts to by rushing in as if to fill a void by creating new software to take advantage of it which ends up being... another database or cache, often with similar tradeoffs to the layers of cache/database surrounding it. In the limit, I see more and more layers of cache/database until they merge into some kind of continuous data/cache field with a continuous tradeoff gradient between size and latency.

freeone3000 · on July 24, 2018

Hey, that sounds line a mainframe... I wonder what it'll take to get zOS running on commodity.

mozumder · on July 24, 2018

Mainframes have always led the way in computer architecture.

shawnz · on July 24, 2018

And yet z systems have now moved to PCI express, an interconnect designed for desktop PCs

oblio · on July 24, 2018

They can afford to, considering how much one costs :)

WhitneyLand · on July 24, 2018

Persistent dara structures? Well yes, but that’s just the tip of the iceberg. There other scenarios like powerful real-time analytics that could benefit from 100TB of RAM immediately.

100TB systems at RAM speeds are theoretically possible without this new memory. For,example 64-bit systems could easily provide enough address space.

The problem is practically speaking, server systems limit address bit capabilities quite often. And other problems still remain, not the least of which is the crazy price for 100TB of DDR4,physical slots, etc. The price would be crazy even for most enterprise projects.

So yes this new generation of memory will be disruptive, but also keep in mind even though it’s faster than SSDs, that’s not nearly enough. I’m not positive, but IIRC correctly it’s still 2 or 3 orders of magnitude slower than conventional memory.

Does that mean this new wave od persistent RAM it’s not useful and awesome? Not at all, I’ve already started using it.

But it does mean it’s still at the stage where you have to analyze your scenarios carefully, see if it’s a good for your architecture and environment, and benchmark your particular stack to verify assumptions and make sure it’s help you the best way it can.

tgtweak · on July 24, 2018

There will, almost inevitably, be someone who needs 101TB of memory. Then you get back to the same place where you need to scale out instead of up. If you asked cloud architects for cheaper, lower latency network or faster more expemsive storage you'd probably get the former most of the time.

Spark already works nicely with 100+TB datasets, and those can sit in memory across a thousand spot instances. Technology like tidalscale's hyperkernel can also merge together multiple systems into a single addressable memory space at the OS level so that you can run non-distributed applications across multiple commodity machines (like a reverse VM).

If 3d xpoint can give competitive price and speeds to tradional DRAM, then it will have a place in the market. Nobody has seen pricing yet nor benchmarks for these. For Intel however, this could increase their component share from CPU/chipset/network/storage to also include memory. That is pretty compelling since it's a market they haven't monetizes (not counting memory controllers) since the early days of Intel.

platform · on July 24, 2018

I would also think, OSes that tend to emphasize their primary file systems as 'the distributed memory', like DragonFly BSD -- would benefit significantly.

I am speculating, of course, but the whole Hammer 2 design of DF BSD emphasizes cross-machine 'database-like' file system, with built-in transparent state snapshots, state-branches, etc. [1].

So with this new type of persistent storage, DF's Hammer2 could erase the difference between 'persistent state' and in-memory-only state.

Therefore eliminating the need for reconciliations, application-specific backups, and application-specific distributed architectures.

[1] http://apollo.backplane.com/DFlyMisc/hammer2.txt

spamizbad · on July 24, 2018

> Another thought: As potentially paradigm changing technology like this becomes available will it ever make sense to redesign the OS?

Realistically, it's implications are much bigger for applications that depend heavily on persistent storage, like databases. They make tons of assumptions about persisting to block storage, whereas 3DXP could enable them to function entirely "in memory", so all that block storage specific optimization they have is now working against them. I'm just generalizing here, though.

gravypod · on July 24, 2018

Zero serialization. Imagine installing a program and always having it "running". It may be swapped out but littetally everything in it is ready to go when you switch to it's window.

robotresearcher · on July 24, 2018

We have that right now, do we not? You don't have to quit apps except on reboot.

Except that many apps are so buggy you have to restart them often in practice. NVM won't change that, sadly.

TeMPOraL · on July 24, 2018

I'd still want a way to kill it.

This is a huge PITA with mobile devices - I have no clue what code is, or isn't, being executed at any given time. Even if I force-kill an app, it has still most likely left some background service running, that will still use data, trigger GPS updates, wake the phone up, etc. What I wanted since the very day I first got my smartphone is to have PC-like control over applications.

In a perfect world of total ubiquity of wireless electricity, not to mention infinite CPU speeds and free and unlimited bandwidth, having everything running all the time in some way might be ok. As it is today, we still need the ability to kill software (and have it stay down), up to and including rebooting everything, to deal with obscure bugs in applications, OS and drivers. Not to mention being able to have some semblance of understanding of the device's state.

vbezhenar · on July 24, 2018

Leaking memory would be dangerous.

fh973 · on July 24, 2018

While there are some analytics workloads that will benefit tremendously, the main use case will be improving server utilization.

Currently RAM is not a compressible resource like CPU. However many applications don't have a fixed or if easily predictable RAM footprint and so you have to overprovision. Swap has been there to solve that but with its performance impact, it often can't be used for server applications.

These DIMMs will blur the boundary between memory and swap and make swap again viable.

tpetry · on July 24, 2018

I dont get your logic. CPUs have a finite number of instructions they can do in a timeframe, its not compressible. In the other one someone could compress the memory, works great for storage. Sure, it‘ll be slower but compressing seldomly used memory pages like macOS does is indeed possible

oblio · on July 24, 2018

I think his point is that you can "run" a large amount of applications at the same time on a CPU. It will execute everything, albeit slowly. This might not be acceptable for performance concerns, but it's doable.

He's not talking about actual data compression in RAM. Because even with compression, with current OSes, if you try to fit more than 20GB of data, let's say becoming 10GB compressed, into 5GB of RAM, it's not possible. You have to swap and at that point your performance is completely gone.

The performance gap between an overloaded CPU and swapping is humongous. One is annoying or slightly troublesome, the second is a death knell.

dmichulke · on July 24, 2018

> and make swap again viable

You shouldn't work in political marketing :-)

grogers · on July 24, 2018

One interesting thing for databases is that as nonvolatile storage latency decreases, traditional btrees get more attractive relative to newer log-structured designs. Especially if the write endurance is increased as well over current SSDs.

wtallis · on July 24, 2018

Or to put it another way: there's not a lot of reason to have two layers of log-structured storage. Your SSD already needs its own log-structured flash translation layer, and if that's tuned properly for your database workload, then another layer of the same kind of thing may not help much.

gravypod · on July 24, 2018

There's so much amazing stuff I could do with this. Imagine persistent redis? Huge huge pages? Booting from a DIM?

The possibilities are endless.

oh_sigh · on July 24, 2018

Why can't we just put a battery onto DRAM that maintains state if the power goes out, and be done with it?

steve19 · on July 24, 2018

That is how storage worked on PDAs back in the day with volitile memory. Let's the batteries completely die or change them incorrectly and you lost your data. Let's not go back to those days!

wmf · on July 24, 2018

DRAM is not that dense; you can fit 128 GB of DRAM or 512 GB of XPoint on a DIMM. XPoint is also supposed to be cheaper than DRAM.

hetman · on July 24, 2018

Because the power consumption to keep DRAM refreshed is fairly high so you'd need a pretty big battery, and because it would still be more expensive than 3DXP. It's just not practical for most use cases.

amelius · on July 24, 2018

I figured that, but can we have some numbers here?

ggm · on July 24, 2018

We do on RAID cards. It has limits.

-G

jcoffland · on July 24, 2018

Battery backed DRAMM has been around for a decade or more.

IshKebab · on July 24, 2018

You mean like suspend-to-RAM?

QuadrupleA · on July 24, 2018

I guess in a theoretical NVM-only system you could pull the plug at any time, and instantly resume it when the power is back on? If I'm reading right though the latency of 3DXP is somewhere in the 10-20us ballpark, still 100-1000x slower than DRAM.

sp332 · on July 24, 2018

Yes but it's also cheaper than DRAM.

You could resume after pulling the plug as long as things are consistent. If you commit data in the wrong order you could have trouble!

randyrand · on July 24, 2018

The CPU has internal state as well that won’t be persisted - at the very least the registers.

Would be to save them when you notice power loss.

rasz · on July 24, 2018

no, unless Everything else in your system retains state, that includes all the registers in every single chipset/controller/processor.

ksec · on July 24, 2018

Price. Remember the current DRAM is 2.5x the price of what is was two to three years ago. So the XP DIMM being 4x cheaper then DRAM Now isn't that much different if DRAM dropped back to its median level.

kristianp · on July 24, 2018

Is there any tech that's faster than DRAM and cheaper than SRAM? There's a need to fill that gap.

AtlasBarfed · on July 24, 2018

L2/L3 cache?

kristianp · on July 25, 2018

That's SRAM usually.

Ocha · on July 23, 2018

website times out. Also, this story hasnt been reported by any other news source. Is there some other site I can see the story at?

tarlinian · on July 23, 2018

It's not a news story...mostly analysis/predictions. (If you're talking about generic announcements of XPoint DIMMs that was in the news at the end of may: https://www.anandtech.com/show/12828/intel-launches-optane-d...)

dkanter · on July 23, 2018

Website should be up, I just rebooted AWS :)

pvg · on July 23, 2018

Oh don't do that. We have other shit running on it.

stephengillie · on July 23, 2018

I have coworkers who believe AWS is "Amazon WorkSpace" and complain "my AWS is slow, please reboot my AWS."

kartD · on July 23, 2018

Unfortunately it’s still timing out :(

dkanter · on July 24, 2018

Should work now. apache was getting freaked out.

kartD · on July 24, 2018

Yup it works, thanks!

jaytaylor · on July 23, 2018

Found a copy freshly archived today @ archive.org:

http://web.archive.org/web/20180723220131/https://www.realwo...

hughes · on July 23, 2018

Sadly this only archived the first of four pages.

vbezhenar · on July 23, 2018

removed

dkanter · on July 24, 2018

Thank you for being a gentleman (or gentlewoman)!

inamberclad · on July 24, 2018

On a few occasions I've found myself in the presence of a senior engineer at a large defense company who would never stop talking about how persistent memory will change everything forever. Fair enough, but he'd go on about it in the weirdest ways. I think his impression is that the CPU registers would also be nonvolatile. I'm concerned that guy might be a few electrons short of a full orbital.

vbezhenar · on July 24, 2018

Saving and restoring CPU registers and flushing caches into non-volatile memory doesn't require much time or energy.

monocasa · on July 24, 2018

Specifically, you can normally do something like that between when you notice the power dropping and when it drains too much and you have to shut down.

Milner08 · on July 24, 2018

This is perfectly possible. I used to work on something that did just this but at the time used a small battery and an SSD to quickly dump the volatile state before power loss. This meant we had to limit the amount of volatile data that could be stored in ram (due to the transfer rate and up time on the battery). We were eagerly awaiting 3DXP DIMMs so that we could remove that limit. It really will have a big impact on critical systems where any data loss is not acceptable.

zeusk · on July 24, 2018

Shouldn't be hard, all they have to do is a light context switch to idle thread on power loss interrupt and make the idle thread externally re-entrable.

AtlasBarfed · on July 24, 2018

The initial release was really underwhelming, given the hype around this. So my personal (uninformed) expectations is just incremental improvement to the initial product.

wtallis · on July 24, 2018

Moving 3D XPoint memory from the peripheral IO bus to the memory bus is way more than an incremental improvement.

throwaway2048 · on July 24, 2018

Says the company that claimed it would be 1000x faster than NAND flash, it isn't, and moving the location of the bus isn't going to change that.

wtallis · on July 24, 2018

> Says the company that claimed it would be 1000x faster than NAND flash, it isn't, and moving the location of the bus isn't going to change that.

Using the IO bus instead of the memory bus is exactly why existing Optane products haven't delivered latency that's 1000x better than NAND flash. NVMe transactions take at least 5-10µs even with DRAM as the SSD media rather than NAND flash or 3D XPoint. Moving to the memory bus is a prerequisite to 3D XPoint fulfilling those original performance claims.