Publication date: Jun 01, 2025
This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control. •Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost. •The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data. •Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.
Open Access PDF
Concepts | Keywords |
---|---|
Forest | COVID-19 |
Misinformation | DistilBERT Sentiment Analysis |
Machine learning | |
Vaccinations | Natural Language processing |
Word2vec | Sentiment analysis |
Vaccination |
Semantics
Type | Source | Name |
---|---|---|
disease | MESH | COVID-19 |
drug | DRUGBANK | Flunarizine |
drug | DRUGBANK | Tropicamide |
disease | MESH | emergencies |
drug | DRUGBANK | Coenzyme M |
drug | DRUGBANK | Nonoxynol-9 |
drug | DRUGBANK | Trestolone |
disease | IDO | process |
drug | DRUGBANK | Alpha-1-proteinase inhibitor |
disease | IDO | quality |
drug | DRUGBANK | Isoxaflutole |
drug | DRUGBANK | Naproxen |
disease | MESH | confusion |
drug | DRUGBANK | Saquinavir |
drug | DRUGBANK | MCC |
disease | MESH | privacy |
disease | IDO | algorithm |