Hacker News new | past | comments | ask | show | jobs | submit | baruch's comments login

Asynchronous IO with user space threads works wonders to get both the performance of async IO and the convenience of sequential programming.

My day job is working on a product that uses DPDK for a super high performance file system.


Your company already doesn't really care about you (for the vast majority of them at least), so there is a conflict of interest but you personally shouldn't take the side of the company.


There’s an element to that, for sure. But can it be generalised to a whole planet? I’m certain you have met people on your way who genuinely rooted for you. Companies are made out of people.


I do. To learn a code base it before tremendously, also to find what happened and why things were done it is off great help. I don't need it every day but I routinely do git archeology.


NVMe drives fail at a fairly low rate so this is an optimization for a very small edge case and since they are also very fast it's not like you'll be doing a rebuild for 6+ hours like with HDDs.

It also doesn't change anything for distributed storage.


Recently had 8TB NVMe drive die after power outage. Such a mess.

Can't afford data recovery. It was a backup drive though, so I need to redo the backups.

Thinking about buying more smaller drivers maybe on a couple mini PCs connected to network.


Do note that more drives mean higher chance to see a failure...


A model without power loss protection I presume?


Sabrent Rocket Q


So a consumer drive, on a 22x80mm card that barely has enough physical space for 8TB of NAND (+controller and DRAM) and doesn't come anywhere close to having enough space for the capacitors needed to provide enterprise-level full power loss protection.

The drive still shouldn't fail entirely from a power outage, and should at most suffer data loss, but at the end of the day it's designed to be cheap rather than reliable.


Lesson learned.

I needed it for an experiment where I had about 6TB of small files to process and wanted to have them on a single drive. It did the job and then I repurposed it for backup / dump drive for stuff I didn't want to delete, but also didn't now where else to put it.

The drive shows up in the system but with 0TB capacity, I recall once or twice it reported 8TB but I was unable to read anything.

I'll have a look one day maybe that was something simple like dead cap that I could replace (I have microscope, rework station).


Are you using the drive in an external USB enclosure? Those sometimes have power delivery that cannot keep up with the demands of high capacity or high performance drives.


No, it is mounted on the motherboard (Gigabyte with X570 chipset) also has a thick heatsink.


Samsung and others (including WD IIRC) do internal journaling, so even though they don't have capacitors the drive shouldn't get bricked by a power outage.


SSDs read fast, write much slower for anything which is bigger than a few hundred megabytes.


Enterprise SSDs usually don't use SLC caching—especially not to the extent that consumer drives do—so their sequential write speed doesn't drop much for really large/sustained writes, and doesn't have a short unsustainable burst of accepting writes quickly into a cache.


In high end enterprise storage, the drive do a form of caching (SLC to TLC in background by the drive) and it also does compression and encryption. Look at the Flashcore FCM4 used in IBM Flashsystem. https://www.redbooks.ibm.com/redpapers/pdfs/redp5725.pdf (no affiliation except that work recently aquire an IBM SAN and I am satisfied by this storage unit, it's not like a Purestorage SAN but it's fast enough)


IBM's drives are exactly why I said "usually don't" rather than "never". SLC caching is still not normal for enterprise drives, whereas it is now universal for consumer SSDs.


I'm working mostly with enterprise drives and not consumer. These drives can write continuously at 1 to 4 GB/s depending on the specific type (mixed use vs read intensive vs very low writes).


> NVMe drives fail at a fairly low rate

But they still fail. Backups are great and all, but for hardware-failure nothing beats redundancy (while RAID1, RAID5, etc allow for faster reads - I don't know how-often NVMe SSDs saturate their PCIe links though...).

Granted, you don't need hardware RAID for that (and HostRAID is a joke, lol): we still want redundancy, but today you'd do it with ZFS or similar so you aren't locked-in to some HW RAID vendor, or suffer the ironic consequences of having non-redundant HW RAID controllers.


I have an NVME device that very rarely literally (but figuratively) falls off the PCI port and disappears [0]. It is one of several Physical Volumes (PV) in a Logical Volume Management (LVM) Volume Group (VG) that backs several RAID-1 mirror Logical Volumes (LV).

When it drops off file-systems writes to the LVs are blocked and reads can also fail but the system survives sufficiently to do a controlled power off/on that recovers it.

In some cases the LVs pair up a spinning disk with the NVME but due to how I've configured the LV the spinner is read-mostly and the NVME is write-mostly (RAID member syncing is delayed and in background). There isn't too much noticeable latency except for things like `git log -Sneedle` - and worth it for the resilience.

[0] first time it happened it was spiders that had taken up residence around the M2 header and CPU (nice and warm!) and causing dust trails allowing current leakage between contacts (yes, I did do microscopic examination because I could not identify any other cause) that a simple blast with the air-compressor resolved. Later incidents turn out to be physical stress due to extreme thermal expansion and contraction as best as I can tell - ambient air temperature can fluctuate from 14C to 40C and back over 18 hours. Re-seating the M2 adapter fixes it for a a few months before it starts again! All NVME SMART self-tests pass; the failure is of the link not the storage - effectively being removed from the PCIe port. Firmware was at one stage suspected, although it had been fine for a couple of years on the same version, but updates haven't changed it in any way. ASPM is disabled.


You would now typically use a distributed system for redundancy. Then you don't necessarily need RAID.


How do I fit a distributed system into my laptop?


They do fail and you should have redundancy and backup but there isn't a real point to do the optimization that the article describes.


And here we are working on systems with dual 400Gbps and sending data out of storage clusters at 5 TiB/s.


I believe pricing (in quantity, these usually do not sell by single pieces) is around 6 cents a Gig so about $7680. I could very well be wrong though, it's been a while since I heard pricing of DC SSDs.


To be honest, that doesn't seem unreasonable. Obviously it's steep for a homelab project but I could see small-medium sized businesses buying that for on-prem needs.


It's much better than not unreasonable. If that price is accurate it's competitive with consumer drives. The big clouds would charge you 6 cents per gigabyte per quarter. Add on RAID and you still break even after four months.


the 128 also looks to be in proof of concept stage. The rest of the articles seem to be for consumer being 8/16TB SSD drives and 30TB spinning. With 64 being the the data center sizes for this set of announcements.


Effect of element alignment in a memory pool by L1 cache collision.


Just the other day I saw a local company get investment to be cybersecurity for LLMs. It's here already.


Calculating the divisor values is also expensive, this works when you do this work once and then do the efficient divides multiple times.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: