Oligo Design with Single Primer Binding Site for High Capacity DNA-Based Data Storage

Yixin Wang, Md Noor-A-Rahim, Jingyun Zhang, Erry Gunawan, Yong L. Guan, Chueh L. Poh*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

DNA has become an attractive medium for long-term data archiving due to its extremely high storage density and longevity. Short single-stranded DNAs, called oligonucleotides (oligos), have been designed and synthesized to store digital data. Previous works designed the oligos with a pair of primer binding sites (PBSs) (each with a length of around 200) attached at the two ends of each basic readable data block. The addition of PBSs decreases the data density significantly because in the current DNA synthesis, the maximum length of a synthesized oligo in good quality is around 200. Furthermore, the maximum homopolymer run allowed by the existing experiments has been reported to be three nucleotides. In this work, to increase the data density, we have devised and tested an oligo design for DNA-based storage with the basic readable data block appended by a single PBS at one end only, while allowing the maximum homopolymer run to be increased to 4. We also present an oligo assembly algorithm that can reconstruct oligos with a single PBS from the error-prone raw readouts obtained from the sequencing process. We have conducted a wet lab experiment to validate the proposed design, where we tested with 398KB of data stored into 10,750 oligos. The experimental results show that it is possible to recover over 99 percent of the oligo sequences without error, which proves that one PBS is sufficient for implementing a DNA-based data storage system with maximum homopolymer run relaxed to 4. The use of single PBS leads to a significant data density gain from 14.3 to 140.2 percent over the existing short-strand DNA data storage schemes by reserving more nucleotides for storing information bits.

Original languageEnglish
Article number8827908
Pages (from-to)2176-2182
Number of pages7
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume17
Issue number6
DOIs
Publication statusPublished - Nov 1 2020
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2004-2012 IEEE.

ASJC Scopus Subject Areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Keywords

  • DNA data storage
  • long-term storage
  • Next-generation sequencing
  • sequence clustering
  • single primer binding site

Fingerprint

Dive into the research topics of 'Oligo Design with Single Primer Binding Site for High Capacity DNA-Based Data Storage'. Together they form a unique fingerprint.

Cite this