Solid state drives—they’re fast, high capacity, and increasingly versatile. True, HDD remains the backbone of hyperscale data centers, but SSDs are gaining ground every year. However, SSDs aren’t without their drawbacks. Endurance is a perennial issue, especially as density increases. Here, we take a close look at attempts to improve SSD endurance and make flash last longer.
Oxide Degeneration and SSD Endurance
Flash memory works by storing charge in cells, with “1” and “0” representing charged and uncharged cells respectively.
How this happens depends on the flash drive. RAM is volatile: all cells reset to 0 once power is off. Flash storage proper, or ROM, is non-volatile, retaining data when the device powers down.
Maintaining charge in the absence of power requires a more complex cell structure. Enter the MOSFET transistor. Some MOSFET transistors function as a control gate which directs charge to cells. Floating-gate MOSFET (FGMOS) adds an extra gate within each cell which traps charge, enabling data retention.
There’s a catch, though. The “O” in “MOSFET” is for “Oxide”, which separates the floating gate from the rest of the cell. Oxide degrades over time, as high voltage gradually breaks down the atomic bonds. The result: the stored charge leaks out, leading to data loss. That’s why SSDs burn out.
MOSFET transistors are everywhere. Since their invention, it’s estimated that 13 billion trillion have been manufactured, 2.3 billion trillion of the floating-gate variety. This makes MOSFET gates by far the most frequently manufactured device in human history
The Impact of SSD Density
Here’s the thing: one of the most effective ways of increasing flash density一adding bits to individual cells一adversely affects SSD endurance.
A single-level cell (SLC) can be at 0 or 1. Adding voltage thresholds allows one to store more bits of data within each cell. MLC, TLC, and QLC allow for 2, 3, and 4 bits per cell respectively.
Such cells allow for denser storage while reducing cost per terabyte, as you still only need one FGMOS per cell. However, the more voltage thresholds there are, the more endurance and write speeds suffer.
Flash comes in NOR and NAND varieties, which refer to the sorts of logic gates employed. NOR has faster read speeds, and is more precise, allowing you to rewrite individual bits, while NAND can only access memory in blocks. NOR is vastly more durable than NAND, but is much more costly. Because of this, NOR is usually reserved for executing code, while NAND is used in storage.
Regardless of how many voltage levels you pack in, it’s still typically on a 2D chip一2D NAND. 3D NAND, first developed by Toshiba in 2007, allows cells to be stacked vertically, perpendicular to the plane of the chip.
What’s so exciting about 3D NAND is that it breaks the pattern of storage getting less durable as it gets denser. As such, 3D NAND has better endurance than MLC, TLC, etc. Also, its layout means less cell-to-cell interference, which allows for simpler storage algorithms.
That means greater speed, all while using 50% less power. However, it carries a lot of costs when one considers the surrounding system, such as the expensive controllers that make 3D NAND usable.
Conclusion #1: Increasing SSD endurance is relatively easy. The challenge is increasing endurance while also minimizing costs and maintaining/improving speed and density.
Measuring SSD Endurance
There are several ways to measure the endurance of a flash drive:
- Terabytes written (TBW): How many terabytes can be written to the drive over the course of its lifetime.
- Drive writes per day (DWPD): How many writes a drive can make per day within a set warranty period (usually 3-5 years).
- Program/Erase (P/E) cycles: How many times a drive, block, or cell can be written to and then erased.
DWPD relates to drive capacity, as demonstrated by the equation:
TBW=Capacity (in TB) x DWPD x 365 x Warranty Period (in years)
Two drives with the same DWPD and warranty period will have different TBW if the drives have different capacities.
The number of P/E cycles is tied to the rate of wear of the insulating layers within the flash cells.
Error Correction and Wear Leveling
Interestingly, the number of P/E cycles a drive can undergo doesn’t determine TBW, as programming embedded within the flash controller itself makes a difference to drive endurance, allocating data within the drive in order to reduce wear.
The first such technique is error correction. Degraded cells can leak charge, and there’s also an error inherent to the writing process. Error correction code (ECC) tries to detect and fix such leaks and errors. Algorithms move data to new blocks before the Bit Error Rate (BER) on the original block gets too high for the ECC to correct.
Another trick is wear-leveling. Instead of just erasing/writing the same block repeatedly, data is written to different parts of the drive, spreading the wear across multiple blocks. In fact, “overwriting” a file usually involves putting data in a fresh location, and removing access to the data stored at the old location. The junk data is then erased for real during the drive’s garbage collection procedure.
Error correction, wear leveling, and efficient garbage collection involve strategically shifting data to different blocks. The more spare blocks a drive has, the easier this is. Setting aside spare blocks for this purpose is overprovisioning.
Conclusion #2: When it comes to endurance, flash hardware isn’t everything. Innovative controllers and protocols can utilize the same flash hardware in more efficient ways.
NVMe has additional procedures which increase endurance. Namely, it sorts data into streams depending on how often it’s rewritten, improving garbage collection efficiency; sets, which physically and logically isolate data from different workloads; endurance groups for the purpose of wear leveling.
What other advantages might NVMe bring to storage? Hyperconvergence in a flash-first world offers all sorts of possibilities. There’s even an ongoing initiative to design NVMe-native HDDs.
Floadia’s Gambit
Improvements in flash protocol may help, but can only take you so far when you’re seeking endurance gains without sacrificing density, speed, or your hard earned money.
One development makes for an interesting case study. Japanese flash memory developer Floadia has developed a flash memory with 7 bits per cell, far more than the 4bpc currently available.
Usually, increasing bpc only allows greater density at the cost of endurance and speed. But Floadia’s drive endures: it can store bits for 10 years at 150°C, 20 years at 125°C, and presumably much longer at room temperature.
To accomplish this, Floadia ditched the ubiquitous floating-gate MOSFET in favor of SONOS, which stands for Silicon-Oxide-Nitride- Oxide-Silicon, the layers of material within the cells. The charge-trapping layers一the ONO film一were optimized to increase retention time when storing 7 bits.
Image from Floadia
Increasing density while still boasting high endurance is intriguing. But SONOS is unlikely to feature in flash drives anytime soon. While Floadia hasn’t released its drive’s stats yet, it is unlikely to be fast, as sensitivity to noise is a problem when digitizing multiple voltage levels.
This isn’t the deal-breaker, as MLC, TLC, etc are slower than SLC for the same reason. However, those drives, unlike SONOS, made memory cheaper. According to Jim Handy, SONOS-based technology is unlikely to be cheap, since storing voltages precisely requires a big charge trap, and hence a physically bigger bit. Bigger bits mean fewer bits per wafer, driving up costs.
Floadia pairs density with endurance, but the technology is not yet cost effective. 3D NAND is even more promising, combining density, endurance, speed, and low power consumption. But while cost per bit is low in theory, cost per wafer is still an issue, as it is for SONOS.
Conclusion #3: Cost matters一it’s often the Achilles heel of otherwise workable technologies.
The Outlook for SSD Endurance
Endurance is important, but so is density, speed, and cost. Clever workarounds can increase endurance, with some, such as 3D NAND, also maintaining density and speed. But cost is the real sticking point.
Of course, while costs matter, they also change over time. Technology that is unworkable at today’s prices might eventually become ubiquitous if these prices lower. At the same time, the supply crisis has reminded everyone of the crunch which ensues when events drive up prices of material components.
Another takeaway is that use cases matter. How essential is endurance for your intended use or sustainability goals? Density? Speed? How do these factors affect cost? There are a lot of flash-based technologies out there, but companies will ultimately look to the back of their own envelope for the best solution.
RELATED READING
What’s the state of SAS? Will the rollout of the new 24G standard boost demand for SAS drives in an increasingly flash-first world?
Get in touch with Horizon Technology for expert support with all your data center storage needs.