Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions.

Publication date: Jul 21, 2024

Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI). The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided. The Google Topic “Coronavirus disease 2019” has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e. g., coefficient of percentage variation “CV%” and its 4-surprisal interval “4-I”). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries. The stability of Google Trends’ RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e. g., CV% = 10, 4-I: [8,13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the “interest over time” data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends. Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.

Concepts Keywords
Coronavirus Aggregation
Google Algorithms
Infodemiology Categories
Reliable Conclusions
Unmask Google
Infodemiological
Methodological
Queries
Reliability
Roi
Rsv
Search
Stability
Surveys
Trends

Semantics

Type Source Name
disease VO volume
disease MESH Coronavirus disease 2019
disease VO time
disease VO effective
disease VO efficient

Original Article

(Visited 2 times, 1 visits today)