ViPRA-Haplo: De Novo Reconstruction of Viral Populations Using Paired End Sequencing Data.

Publication date: Mar 07, 2024

We present ViPRA-Haplo, a de novo strain-specific assembly workflow for reconstructing viral haplotypes in a viral population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a subset of paths from a De Bruijn graph of reads using the pairing information of reads. The paths generated by ViPRA are an over-estimation of the true contigs. We propose two refinement methods to obtain an optimal set of contigs representing viral haplotypes. The first method clusters paths reconstructed by ViPRA using VSEARCH [1] based on sequence similarity, while the second method, MLEHaplo, generates a maximum likelihood estimate of viral populations. We evaluated our pipeline on both simulated and real viral quasispecies data from HIV (and real data from SARS-COV-2). Experimental results show that ViPRA-Haplo, although still an overestimation in the number of true contigs, outperforms the existing tool, PEHaplo, providing up to 9% better genome coverage on HIV real data. In addition, ViPRA-Haplo also retains higher diversity of the viral population as demonstrated by the presence of a higher percentage of contigs less than 1000 base pairs (bps), which also contain k-mers with counts less than 100 (representing rarer sequences), which are absent in PEHaplo. For SARS-CoV-2 sequencing data, ViPRA-Haplo reconstructs contigs that cover more than 90% of the reference genome and were able to validate known SARS-CoV-2 strains in the sequencing data.

Concepts Keywords
Genome Contigs
Hiv Cov
Ieee Data
Vipra End
Viral Haplo
Novo
Paired
Paths
Populations
Real
Reconstruction
Sars
Sequencing
Vipra
Viral

Semantics

Type Source Name
disease VO population
disease IDO algorithm
drug DRUGBANK Tropicamide
disease VO BPS

Original Article

(Visited 1 times, 1 visits today)