Publication date: Jan 20, 2024
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pathogen responsible for coronavirus disease 2019 (COVID-19), continues to evolve, giving rise to more variants and global reinfections. Previous research has demonstrated that barcode segments can effectively and cost-efficiently identify specific species within closely related populations. In this study, we designed and tested RNA barcode segments based on genetic evolutionary relationships to facilitate the efficient and accurate identification of SARS-CoV-2 from extensive virus samples, including human coronaviruses (HCoVs) and SARSr-CoV-2 lineages. Nucleotide sequences sourced from NCBI and GISAID were meticulously selected and curated to construct training sets, encompassing 1,733 complete genome sequences of HCoVs and SARSr-CoV-2 lineages. Through genetic-level species testing, we validated the accuracy and reliability of the barcode segments for identifying SARS-CoV-2. Subsequently, 75 main and subordinate species-specific barcode segments for SARS-CoV-2, located in ORF1ab, S, E, ORF7a, and N coding sequences, were intercepted and screened based on single-nucleotide polymorphism sites and weighted scores. Post-testing, these segments exhibited high recall rates (nearly 100%), specificity (almost 30% at the nucleotide level), and precision (100%) performance on identification. They were eventually visualized using one and two-dimensional combined barcodes and deposited in an online database (http://virusbarcodedatabase. top/). The successful integration of barcoding technology in SARS-CoV-2 identification provides valuable insights for future studies involving complete genome sequence polymorphism analysis. Moreover, this cost-effective and efficient identification approach also provides valuable reference for future research endeavors related to virus surveillance.
|Complete genome sequences
|RNA barcode segments
|Severe acute respiratory syndrome coronavirus 2
|coronavirus disease 2019