Columbia University’s Yaniv Erlich and New York Genome Center’s Dina Zielinski have found a way to increase information storage in DNA by 60 percent, using an algorithm that’s commonly used to stream video on a cell phone.
DNA has the potential to provide large-capacity information storage. However, current methods have only been able to use a fraction of the theoretical maximum. Erlich and Zielinski present a method, DNA Fountain, which approaches the theoretical maximum for information stored per nucleotide. They demonstrated efficient encoding of information—including a full computer operating system—into DNA that could be retrieved at scale after multiple rounds of polymerase chain reaction.
DNA is an attractive medium to store digital information. Here we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14 × 106 bytes in DNA oligonucleotides and perfectly retrieved the information from a sequencing coverage equivalent to a single tile of Illumina sequencing. We also tested a process that can allow 2.18 × 1015 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecule and obtained a perfect retrieval from a density of 215 petabytes per gram of DNA, orders of magnitude higher than previous reports.
A full computer operating system, old films, and even computer viruses can now be stored inside human DNA molecules. Researchers are trying to streamline and perfect the process using algorithms.
Algorithm creates error-free DNA data storage and retrieval
Using specialized erasure-correcting algorithm fountain codes, the researchers demonstrate how to pack 1.6 bits into each nucleotide. A DNA strand contains four base nucleotides that can each hold up to 1.8 bits. To demonstrate, the researchers encoded six files and stored them in DNA. They encoded a 1948 scientific study, an 1895 French film, an entire computer operating system, an Amazon gift card, a Pioneer plaque, and a computer virus. These files were compressed into a master file. The data inside was split into short strings of binary code, represented by ones and zeros. Utilizing the fountain codes, the researchers packaged the strings of data into what’s called droplets. The ones and zeros could then be mapped in each droplet, corresponding to the four nucleotide bases (A, G, C and T). The algorithm adds a barcode to each droplet for easy identification and effectively deletes letter combinations that cause errors.
The two researchers sent the text file to Twist Bioscience, a group that specializes in converting digital information to biological data. The encoded information was compressed into 72,000 DNA strands that are 200 bases long. The final product was a vial that contained a speck of DNA. Those molecules held all the encoded information that could now be retrieved using modern sequencing technology and software that translates genetic code back to binary. The storage and retrieval was error-free. (RELATED: For more scientific discoveries, visit Scientific.News.)
Data can be copied indefinitely in DNA and secured for hundreds of thousands of years
Taking it a step further, they demonstrated how to use their coding technique to copy the files indefinitely by multiplying the DNA sample through a polymerase chain reaction. This is a breakthrough, considering that DNA can last hundreds of thousands of years.
This method of information storage is now the highest-density data storage method ever created. With this new technique, one gram of DNA can hold 215 petabytes of data. Previous attempts of DNA data storage at the European Bioinformatics Institute were successful but they packed 100 times less information in the DNA and there were errors when the information was retrieved.
The only hurdle to this method is the cost. Synthesizing DNA is not cheap, and neither is reading it. The researchers spent at least $9,000 to store and read the information. They are optimistic that lower-quality molecules can be produced to bring down the cost. They also believe time intense molecular coding can be sped up using computer techniques such as algorithm fountain codes.