Modelling, Characterization of Data-Dependent and Process-Dependent Errors in DNA Data Storage

Yixin Wang*, Md Noor-A-Rahim, Erry Gunawan, Yong L. Guan, Chueh L. Poh

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Using DNA as the medium to store information has recently been recognized as a promising solution for long-term data storage. While several system prototypes have been demonstrated, the error characteristics in DNA data storage are discussed with limited content. Due to the data and process variations from experiment to experiment, the error variation and its effect on data recovery remain to be uncovered. To close the gap, we systematically investigate the storage channel, i.e., error characteristics in the storage process. In this work, we first propose a new concept named sequence corruption to unify the error characteristics into the sequence level, easing the channel analysis. Then we derived the formulations of the data imperfection at the decoder including both sequence loss and sequence corruption, revealing the decoding demand and monitoring the data recovery. Furthermore, we extensively explored several data-dependent unevenness observed in the base error patterns and studied a few potential factors and their impacts on the data imperfection at the decoder both theoretically and experimentally. The results presented here introduce a more comprehensive channel model and offer a new angle towards the data recovery issue in DNA data storage by further elucidating the error characteristics of the storage process.

Original languageEnglish
Pages (from-to)2147-2158
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume20
Issue number3
DOIs
Publication statusPublished - May 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

ASJC Scopus Subject Areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Keywords

  • Channel modelling
  • DNA data storage
  • error characterization
  • long-term storage

Fingerprint

Dive into the research topics of 'Modelling, Characterization of Data-Dependent and Process-Dependent Errors in DNA Data Storage'. Together they form a unique fingerprint.

Cite this