Genome data could outgrow YouTube, Twitter content by 2025 – report
According to the report, published in the journal PLoS Biology, this means that as much as 2–40
exabytes of storage capacity will be needed by 2025 just for the
human genomes. And although the computer scientists believe that
these needs can be diminished with effective data compression,
“decompression times and fidelity are a major concern in
compressive genomics,” they say.
The team estimates that YouTube currently has 300 hours of video being uploaded every minute, and this could “grow to 1,000–1,700 hours per minute (1–2 exabytes of video data per year) by 2025 if we extrapolate from current trends.”
Twitter, meanwhile, currently generates 500 million tweets/day, each about 3 kilobytes including metadata, the report states. “While this figure is beginning to plateau, a projected logarithmic growth rate would suggest a 2.4-fold growth by 2025, to 1.2 billion tweets per day, 1.36 petabytes/year.”
In other words, data acquisition in these domains is expected to
grow by up to two orders of magnitude in the next decade, the
“Although total genomic data could far exceed the demands for the others, with the right new innovations the net requirements could be similar to the domains of astronomy and YouTube,” according to the report.
The most practical, and perhaps only, solution for distributing genome sequences at a population scale, the researchers say, is to use “cloud-computing systems that minimize data movement and maximize code federation.”
The report adds that new developments from companies like Google, Amazon, and Facebook that include applications designed to “fit the frameworks of distributed computing efficient data centers and distributed storage and cloud computing paradigms” are also expected to be part of the solution.
Last but not least, authentication, encryption, and other security safeguards “must be developed” to ensure that genomic data remain private, the researchers wrap up.