Refinement of the Reference Viral Database (RVDB) for improving bioinformatics analysis of virus detection by high-throughput sequencing (HTS).

Publication date: Jun 23, 2025

All biological products are required to demonstrate the absence of adventitious viruses (AVs), which may be inadvertently introduced at different steps involved in the manufacturing process. The currently recommended in vitro and in vivo virus detection assays have limitations for broad detection and are lengthy and laborious. Additionally, the use of animals is discouraged by the global 3 R’s initiative for replacement, reduction, and refinement. High-throughput or next-generation sequencing (HTS/NGS) technologies can rapidly detect known and novel viruses in biological materials. There are, however, challenges for HTS detection of AVs due to differential abundance of viral sequences in public databases, which led to the creation of a non-redundant, Reference Viral Database (RVDB) containing all viral, viral-like, and viral-related sequences, with a reduced cellular sequence content. In this paper, we describe improvements in RVDB, which include the transition of RVDB production scripts from the original Python 2 to Python 3 codebase, updating the semantic pipeline to remove misannotated non-viral sequences and irrelevant viral sequences, use of taxonomy for the removal of phages, and inclusion of a quality-check step for SARS-CoV-2 genomes to exclude low-quality sequences. Additionally, RVDB website updates include search tools for exploring the database sequences and implementation of an automatic pipeline for providing annotation information to distinguish non-viral and viral sequences in the database. These updates for refining RVDB are expected to enhance HTS bioinformatics by reducing the computational time and increasing the accuracy for virus detection. IMPORTANCEHigh-throughput sequencing (HTS) has emerged as an advanced technology for demonstrating the safety of biological products. HTS can be used as an alternative adventitious virus detection method for replacing the currently recommended in vivo and PCR assays and supplementing or replacing the in vitro cell culture assays. However, HTS bioinformatics analysis for broad virus detection, including both known and novel viruses, depends on using a comprehensive and accurately annotated database. In this study, we have refined our original comprehensive Reference Virus Database (RVDB) for greater accuracy of virus detection with a reduced computational burden. Additionally, the production script for automating the generation of RVDB was updated to facilitate reliable database production and timely availability.

Open Access PDF

Concepts Keywords
Global high-throughput sequencing
Pcr next-generation sequencing
Python virus detection
Reliable
Viruses

Semantics

Type Source Name
disease IDO process
disease IDO production
disease IDO quality
disease IDO cell
drug DRUGBANK Rotavirus Vaccine
drug DRUGBANK Alteplase
drug DRUGBANK Coenzyme M
drug DRUGBANK Aspartame
drug DRUGBANK Platelet Activating Factor
drug DRUGBANK Phenformin
disease IDO host
disease IDO homo sapiens
disease IDO site
drug DRUGBANK Myricetin
disease MESH fibrosarcoma
disease MESH scar
disease MESH COVID 19 pandemic
disease IDO algorithm

Original Article

(Visited 5 times, 1 visits today)