'Compact & durable’: Scientists encode, retrieve 10,000 gigabytes stored on DNA molecules

© National Human Genome Research Institute
Researchers at the University of Washington and Microsoft are developing one of the first complete storage systems to house digital data in DNA. The news comes as the digital universe is expected to hit 44 trillion gigabytes by 2020.

“Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works. It’s very, very compact and very durable,” said co-author Luis Ceze, UW associate professor of computer science and engineering, in a press release.

We’re essentially repurposing it to store digital data, pictures, videos, documents, in a manageable way for hundreds or thousands of years.”

The team of bioengineers and computer science and electronic engineers were able to encode data from four image files into the nucleotide sequences of synthetic DNA snippets. This was achieved by converting the long strings of 1’s and 0’s in digital data into the four building blocks of the DNA sequence – adenine, guanine, cytosine and thymine. The synthesized DNA molecules were then dehydrated for long-term storage.

Miraculously, researchers then reversed the process by retrieving the correct sequence from a large pool of DNA and reconstructing the images without losing a single byte of information, according to the University of Washington.

To make it easier to find the images they had encoded, researchers put the equivalent of zip codes and street addresses into the DNA sequence to facilitate easier retrieval. Researchers also used Huffman coding for lossless data compression, according to Techworm.

Encoding a stream of binary data as a stream of
nucleotides. A Huffman code translates binary to ternary
digits, and a rotating encoding translates ternary digits to
nucleotides. © washington.edu

“How you go from ones and zeroes to As, Gs, Cs and Ts really matters because if you use a smart approach, you can make it very dense and you don’t get a lot of errors,” said co-author Georg Seelig, a UW associate professor of electrical engineering and computer science, said in the press release.

“If you do it wrong, you get a lot of mistakes.”

Researchers were up against a punishing deadline. The world is producing data faster than it can create new storage. The digital universe – all the data contained in computer files, historic archives, movies, photo collections, plus the exploding volume collected by businesses and devices worldwide – is expected to hit 44 trillion gigabytes by 2020, according to the UW. That’s a tenfold increase compared to 2013.

Ceze and his team think they can go much further, moving on to store video and large digital files. They claim it could be possible to “shrink the space needed to store digital data that today would fill a Walmart supercenter down to the size of a sugar cube.”

Researchers said DNA molecules can store information many millions of times more densely than existing technology for digital storage, such as flash drives and hard drives, as well as magnetic and optical media. Those systems also degrade after a few years, while DNA can preserve information for centuries.

The study, A DNA-based Archival Storage System, was presented at this year’s ACM International Conference on Architectural Support for Programming Languages and Operating Systems.