Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.

Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.

Publication date: Jun 01, 2025

This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control. •Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost. •The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data. •Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.

Open Access PDF

Concepts Keywords
Forest COVID-19
Misinformation DistilBERT Sentiment Analysis
Twitter Machine learning
Vaccinations Natural Language processing
Word2vec Sentiment analysis
Vaccination

Semantics

Type Source Name
disease MESH COVID-19
drug DRUGBANK Flunarizine
drug DRUGBANK Tropicamide
disease MESH emergencies
drug DRUGBANK Coenzyme M
drug DRUGBANK Nonoxynol-9
drug DRUGBANK Trestolone
disease IDO process
drug DRUGBANK Alpha-1-proteinase inhibitor
disease IDO quality
drug DRUGBANK Isoxaflutole
drug DRUGBANK Naproxen
disease MESH confusion
drug DRUGBANK Saquinavir
drug DRUGBANK MCC
disease MESH privacy
disease IDO algorithm

Original Article

(Visited 1 times, 1 visits today)