Publication date: Jul 30, 2024
The 3C-like Proteinase (3CLpro) of novel coronaviruses is intricately linked to viral replication, making it a crucial target for antiviral agents. In this study, we employed two fingerprint descriptors (ECFP_4 and MACCS) to comprehensively characterize 889 compounds in our dataset. We constructed 24 classification models using machine learning algorithms, including Support Vector Machine (SVM), Random Forest (RF), extreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNN). Among these models, the DNN- and ECFP_4-based Model 1D_2 achieved the most promising results, with a remarkable Matthews correlation coefficient (MCC) value of 0. 796 in the 5-fold cross-validation and 0. 722 on the test set. The application domains of the models were analysed using d calculations. The collected 889 compounds were clustered by K-means algorithm, and the relationships between structural fragments and inhibitory activities of the highly active compounds were analysed for the 10 obtained subsets. In addition, based on 464 3CLpro inhibitors, 27 QSAR models were constructed using three machine learning algorithms with a minimum root mean square error (RMSE) of 0. 509 on the test set. The applicability domains of the models and the structure-activity relationships responded from the descriptors were also analysed.
Concepts | Keywords |
---|---|
Antiviral | 3CLpro inhibitors |
Coronaviruses | machine learning |
Ecfp_4 | SARS-CoV-2 |
Promising | structure-activity relationship (SAR) |
Proteinase |
Semantics
Type | Source | Name |
---|---|---|
pathway | KEGG | Viral replication |
drug | DRUGBANK | Flunarizine |
drug | DRUGBANK | MCC |
disease | IDO | algorithm |