A SAR and QSAR study on 3CLpro inhibitors of SARS-CoV-2 using machine learning methods.

Publication date: Jul 30, 2024

The 3C-like Proteinase (3CLpro) of novel coronaviruses is intricately linked to viral replication, making it a crucial target for antiviral agents. In this study, we employed two fingerprint descriptors (ECFP_4 and MACCS) to comprehensively characterize 889 compounds in our dataset. We constructed 24 classification models using machine learning algorithms, including Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNN). Among these models, the DNN- and ECFP_4-based Model 1D_2 achieved the most promising results, with a remarkable Matthews correlation coefficient (MCC) value of 0. 796 in the 5-fold cross-validation and 0. 722 on the test set. The application domains of the models were analysed using d calculations. The collected 889 compounds were clustered by K-means algorithm, and the relationships between structural fragments and inhibitory activities of the highly active compounds were analysed for the 10 obtained subsets. In addition, based on 464 3CLpro inhibitors, 27 QSAR models were constructed using three machine learning algorithms with a minimum root mean square error (RMSE) of 0. 509 on the test set. The applicability domains of the models and the structure-activity relationships responded from the descriptors were also analysed.

Concepts Keywords
Antiviral 3CLpro inhibitors
Coronaviruses machine learning
Ecfp_4 SARS-CoV-2
Promising structure-activity relationship (SAR)
Proteinase

Semantics

Type Source Name
pathway KEGG Viral replication
drug DRUGBANK Flunarizine
drug DRUGBANK MCC
disease IDO algorithm

Original Article

(Visited 3 times, 1 visits today)