What’s driving the growing interest in DNA data storage as an alternative media for long-term use? 

If there’s one thing the world could always use more of, it’s data storage. The rapidly growing Internet of Things produces a never-ending stream of data on a global scale. This data can be harnessed for valuable insights, but only if it can be stored. The challenge: what to do when data generation begins to outstrip global storage capacity?

Although traditional storage media continues its upward march in capacity and speed, new technologies will eventually be necessary to keep pace with data storage needs. Thankfully, necessity is the mother of invention, and storage researchers are forever on the lookout for innovation.

One promising line of research is a game-changing attempt to inscribe data into the fabric of life itself. Welcome to the mind-bending world of DNA data storage. The same type of material which serves as your unique biological footprint can also store those photos from your last vacation. 

DNA data storage has its limitations一don’t expect to see it replacing traditional flash and HDD anytime soon. But where density and longevity are priorities, as in archival storage, the technology looks promising. 

The Impending Storage Squeeze

Why seek new storage technologies, when improvements are still being made in flash, disk, and tape storage? In short, because current technologies will eventually prove inadequate for archival storage needs.

You might have heard of “Moore’s Law”—that the number of transistors one can pack into a dense integrated circuit doubles roughly every two years. While useful in setting benchmarks for growth, the law is on borrowed time: it’s always been clear from a physics standpoint that funny things happen with charge and magnetism once you get down to scales at which quantum effects kick in. There are density limits inherent to flash storage.

Same story with HDD. Kryder’s Law says that the areal density of platters will double every 13 months. In 2009, Mark Kryder himself predicted, based on then dominant trends of growth, that in 2020 we would have 40TB HDDs containing only two platters, and selling for $40. But the “law” was a trend, and trends end: 2020 only saw nine-disk 20TB HDDs.

What about Magnetic Tape? The good news is that the storage density of tape is growing at a steady clip. But retention rates is a pressing issue. Magnetic tape can stay intact for 30 years, but that’s only under optimal storage conditions. And maintaining such conditions doesn’t come cheap.

Get That On Tape: The Past And Future Of Magnetic Tape Storage

In addition to these limits, by 2040 global memory demand will have far exceeded the supply of chip-grade silicon.

Meanwhile, the global datasphere is predicted to grow at a CAGR of 21.2% through at least 2026. More than 90% of this data is unstructured.

According to a recent IDC forecast, unstructured data is set to soar. Bring on the Exabytes!

DNA Data Storage: The Basics

The moral is clear. Despite capacity growth in HDDs, SDDs, and magnetic tape, there’s a need for new storage technology. This doesn’t mean traditional media is going away anytime soon. It does mean, however, that innovation now can pay huge dividends in terms of efficiency later. 

Thankfully, there’s a storage medium which has been around for eons: DNA. 

The basic idea is fairly simple. When data is stored in a silicon chip or on a magnetic disk, physical properties such as charge or magnetic direction represent 0s and 1s. These numbers, in turn, represent the data being encoded. 

DNA data storage forgoes the 0s and 1s for combinations of four nucleotide bases: adenine (A), cytosine (C), guanine (G), and thymine (T). Encoding data with these four letters rather than the two numbers 0 and 1 has a crucial benefit: far denser storage. 

Think about it for a moment. Suppose you encode data in a string of symbols just 4 bits long. There are 8 different combinations of 0s and 1s which can fill up a 4-bit string, but 256 different combinations of the four letters “A”, “C”, “G”, and “T”. The 3D structure of DNA molecules also helps with density.

How Dense Exactly?

Researchers at the University of Seattle claimed in a 2021 paper that DNA data storage was potentially six orders of magnitude denser than the densest storage then currently available. According to Los Alamos researcher Bradley Settlemyer, you could store all of YouTube in a space the size of your refrigerator, instead of acres of data centers. 

Files in vials: in 2019, DNA storage startup Catalog technologies stored all 16GB of English Wikipedia in a single vial.

There are other benefits as well. Denser storage means far smaller data centers, and thus a vastly reduced carbon footprint. And longevity? If encapsulated in an inorganic case, DNA is stable for 20-90 years at room temperature, and a whopping 2000 years at 9.8°C/49°F, a vast improvement over tape. Strands of DNA also aren’t linked to any one specific decoding technology, which allows for flexibility.

Add to all this the fact that we’re unlikely to run out of DNA any time soon (if we do, we have bigger problems) and DNA storage begins to look very appealing indeed. 

Catalog’s Breakthrough

DNA data storage is not without its drawbacks, though. Speed is perhaps the first point of order. Read-write speeds in silicon are on the scale of microseconds. Tape is slower, with read-write speeds of a few hundred MBps at best. Writing the same amount on DNA can take several hours

Accuracy is another issue. As every mutant superhero knows, DNA is prone to error, which means it necessitates an error correction mechanism to avoid data degradation. There are ways around this, but even once DNA is rendered more stable, it has a long way to go before it’s cost effective. 

Where there’s a will, there’s a way. Improving search techniques helps the cost effectiveness and efficiency of the read process. For instance, in 2022, Catalog Technologies had a breakthrough. After encoding 17000+ words of Shakespeare’s “Hamlet” on synthetic DNA, the firm used a parallel search computation to find all instances of certain keywords.

 

 

“By computing chemically we were able to reduce the amount of data to be read by a sequencer to just the targeted search term” explains Hyunjun Park, Catalog’s founder and CEO. “This netted a two orders of magnitude speed up in “reading” courtesy of avoiding having to read 99 percent of the DNA encoded data.”

Scaling DNA Data Storage Solutions

Crucially, this method scales: the number of steps required in the search would be about the same if the dataset was hundreds or even thousands of times larger. Catalog plans to demonstrate this search technique on datasets of 100 million words. According to Park, this will allow the firm to begin “addressing more sophisticated applications from signal processing to machine learning over massive datasets.”

“CATALOG’s unique encoding scheme that utilizes a hierarchical mapping of information to a huge combinatorial space of DNA blocks – is perfectly suited for computation functions on data stored in DNA.”

Catalog Founder & CEO Hyunjun Park

With advances like this, Park believes the future for DNA storage is bright. “The industry is making great advances in using DNA for storage. Using data stored in DNA, we have the ability to find insights into data not previously attainable using traditional technologies.”

DNA Data Storage: Past and Future

Despite rapid progress, DNA storage is still in its infancy. But the technology is growing quickly, and the milestones keep piling up. A brief timeline

  • 1988. Artist Joe Davis collaborates with Harvard to encode a 35-bit work of art into E. Coli DNA. 
  • 2011. Harvard encodes a 659KB text file. 22 writing errors were found. 
  • 2013. The European Bioinformatics Institute accurately encodes 739KB worth of files, including image, audio, and pdf files. 
  • 2016. Microsoft and the University of Washington store 200MB worth of files. 
  • 2019. Catalog Technologies stores all of English Wikipedia in a vial of DNA.

Another major innovator in the budding industry is Microsoft, which has done as much as anyone to defuse the drawbacks of DNA storage. In a 2021 research paper, it introduced a nanoscale DNA storage writer which can write 2.5 million sequences within a square centimeter, three orders of magnitude denser than previous attempts. 

Microsoft has also made headway on write speeds, attaining several megabytes per second. Georgia Tech Research Institute reached similar speeds, introducing a microchip which allows data to be written at 20GB/day. While it’s not yet at the speed of tape, it appears that DNA storage has a real shot at feasibility. 

To make a long story short, DNA data storage has a ways to go before it can compete with tape’s cost efficiency and read-write speeds, but it’s improving by leaps and bounds. And if it catches on enough to fuel demand for further innovation, we may see the same exponential growth that HDDs and SSDs underwent in the past. 

New Directions

While DNA data storage has a ways to go before it disrupts the industry, hardware innovations make such developments feel a lot closer. Twist Bioscience has been experimenting with ways to mass produce “oligos,” which are short synthetic strands of DNA or RNA.

Twist Bioscience’s silicon-based DNA Synthesis platform can produce over a million oligos in a single run.

Other researchers are working on “DNA movable types”. This is a write process which uses pre-produced fragments of DNA which are then assembled like printing blocks. The goal is to cut down on the cost of writing. Once the fragments are pre-produced, the write process is a lot easier.

It’s early days, and as such it’s still unclear how DNA will actually be stored and read. One company, Biomemory, is now selling DNA cards. About the size of a credit card, these can safely store data for up to 150 years. Of course, the cards currently only store 1kb of data each, and it’s 1000 to get on the waiting list. But the market has to start somewhere!

Another way of storing DNA was developed by researchers at Eindhoven University of Technology in the Netherlands. They’ve developed a self-sealing capsule in which DNA can be stored and read. One copy is anchored to the capsule, and duplicated by the millions. This duplication process, PCR, is the same used in Covid test kits. In the lab, the researchers have managed to read 25 files at once, with little error.

New Alliances

Developing technologies benefit greatly from a shared ecosystem in which to compare notes. 

A few years ago, fifteen companies came together to do just that. 2020 saw the formation of the DNA Data Storage Alliance. Current board members include Catalog, synthetic DNA designer Twist Bioscience, Microsoft, and storage companies such as Western Digital and Quantum. Other members include Seagate, Dell, IBM, and Lenovo.

The DNA Data Storage Alliance brings leaders in the budding industry into a shared ecosystem

The Alliance is joined by a host of labs and biotech companies. It will focus on creating an interoperable storage ecosystem, recommending standards/specifications, and educating the public.

Demand for innovation is leading to new partnerships. Seagate, primarily known for its HDD products, has partnered with Catalog to develop a technology which can deposit liquid media containing DNA onto a substrate. Catalog tested the process on a large table-sized substrate, and Seagate is helping it miniaturize the technology.

Looking Forward

There’s no sure-fire way to read the future. Commentators use the term “disruption” with new technologies for a reason: it’s certain that new ways of storing data will affect the organization of data centers and the specifics of how hot and cold storage are structured into tiers.

The precise contours of these changes are difficult to predict, but one good guess would be that a new technology such as DNA has the potential to dominate a new, colder layer of long-term storage. 

One thing is for sure: human beings are clever一where there’s a need, we have a knack for eventually finding the technology to meet it. And there’s a very clear need for mass archival storage, in order to enable the preservation and utilization of the massive amount of data expected in the coming decades. If innovators plays their cards right, DNA data storage may be well-placed to accomplish something extraordinary in the years ahead. 

For expert support with your data center center storage needs, find out how Horizon Technology can help.