BERT-siRNA: siRNA target prediction based on BERT pre-trained interpretable model.

Publication date: Feb 29, 2024

Silencing mRNA through siRNA is vital for RNA interference (RNAi), necessitating accurate computational methods for siRNA selection. Current approaches, relying on machine learning, often face challenges with large data requirements and intricate data preprocessing, leading to reduced accuracy. To address this challenge, we propose a BERT model-based siRNA target gene knockdown efficiency prediction method called BERT-siRNA, which consists of a pre-trained DNA-BERT module and Multilayer Perceptron module. It applies the concept of transfer learning to avoid the limitation of a small sample size and the need for extensive preprocessing processes. By fine-tuning on various siRNA datasets after pretraining on extensive genomic data using DNA-BERT to enhance predictive capabilities. Our model clearly outperforms all existing siRNA prediction models through testing on the independent public siRNA dataset. Furthermore, the model’s consistent predictions of high-efficiency siRNA knockdown for SARS-CoV-2, as well as its alignment with experimental results for PDCD1, CD38, and IL6, demonstrate the reliability and stability of the model. In addition, the attention scores for all 19-nt positions in the dataset indicate that the model’s attention is predominantly focused on the 5′ end of the siRNA. The step-by-step visualization of the hidden layer’s classification progressively clarified and explained the effective feature extraction of the MLP layer. The explainability of model by analysis the attention scores and hidden layers is also our main purpose in this work, making it more explainable and reliable for biological researchers.

Concepts Keywords
Cd38 BERT
Mrna Explainable deep learning
Pretraining SARS-CoV-2
Reliable siRNA prediction
Researchers

Semantics

Type Source Name
disease VO gene
disease VO efficiency
disease VO effective

Original Article

(Visited 1 times, 1 visits today)