If you’re here from my poster at SPSP, you’ve come to the right place! This supplement will show how we are building longitudinal multilevel models for Google Search data related to coronavirus. The first example looks at search interest in cough symptoms; the second example examines all search interest related to the topic of coronavirus, in general.
Many thanks to my talented collaborators: Andrés Gvirtz, Elisa Militaru, Fritz Götz, Tobi Ebert, and Jason Rentfrow :)
This project remains exploratory and ongoing, please contact me with any questions or ideas for collaboration you may have!
(Skip to tl;dr/Discussion)
Geographical variation in macropsychological traits predicts substantial political, economic, social and health outcomes (Rentfrow et al., 2013).
This example uses Google’s public COVID-19 Symptoms Dataset.
We can view the distribution of search interest scores (continuous, scaled from 0 to 100) for cough symptoms here. While the distribution isn’t zero-inflated, it’s overdispersed. For this reason, the search data for cough do not meet the normality criterion for multilevel linear modeling.
We can borrow some statistical rigor from machine learning to investigate the linear and non-linear predictive relationships between search interest and personality. Using a predictive power score (PPS) framework, we fit cross-validated decision trees and their normalized evaluation metrics for every bivariate combination of regional personality and search interest variables. PPS scores are data-type agnostic, so the distribution of cough interest isn’t an issue. Notice in the matrix visualization of PPS below that search interest in cough happens to predict personality, not the other way around!
Double-click on a state’s abbreviation (on the right) to visualize its individual growth curve for cough-related search interest.
m0 <- glmmPQL(search_interest ~ 1,
random = ~ 1 | region,
family = quasipoisson(link = 'log'),
data = data_joined)
## iteration 1
## iteration 2
## iteration 3
summary(m0)
## Linear mixed-effects model fit by maximum likelihood
## Data: data_joined
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~1 | region
## (Intercept) Residual
## StdDev: 0.136 1.193
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: search_interest ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 1.598 0.01945 18615 82.16 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.2045 -0.6772 -0.4353 0.7075 4.6908
##
## Number of Observations: 18666
## Number of Groups: 51
m1 <- glmmPQL(search_interest ~ date,
random = ~ date | region,
family = quasipoisson(link = 'log'),
data = data_joined)
## iteration 1
## iteration 2
## iteration 3
summary(m1)
## Linear mixed-effects model fit by maximum likelihood
## Data: data_joined
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~date | region
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.1208669 (Intr)
## date 0.0004323 0.09
## Residual 0.8230224
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: search_interest ~ date
## Value Std.Error DF t-value p-value
## (Intercept) 2.1348 0.017583 18614 121.42 0
## date -0.0033 0.000066 18614 -49.49 0
## Correlation:
## (Intr)
## date -0.01
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.79238 -0.72305 -0.05586 0.65527 6.60049
##
## Number of Observations: 18666
## Number of Groups: 51
m2 <- glmmPQL(search_interest ~
extraversion +
agreeableness +
conscientiousness +
neuroticism +
openness +
date,
random = ~ date | region,
family = quasipoisson(link = 'log'),
data = data_joined)
## iteration 1
## iteration 2
## iteration 3
summary(m2)
## Linear mixed-effects model fit by maximum likelihood
## Data: data_joined
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~date | region
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.1190034 (Intr)
## date 0.0004352 0.025
## Residual 0.8230438
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: search_interest ~ extraversion + agreeableness + conscientiousness + neuroticism + openness + date
## Value Std.Error DF t-value p-value
## (Intercept) 2.1348 0.0173 18614 123.15 0.0000
## extraversion 0.6302 0.6891 45 0.91 0.3653
## agreeableness 0.2520 0.9125 45 0.28 0.7837
## conscientiousness 0.0151 0.9854 45 0.02 0.9878
## neuroticism 0.2089 0.5443 45 0.38 0.7029
## openness 0.4154 0.4216 45 0.99 0.3297
## date -0.0033 0.0001 18614 -49.20 0.0000
## Correlation:
## (Intr) extrvr agrbln cnscnt nrtcsm opnnss
## extraversion -0.001
## agreeableness 0.000 -0.244
## conscientiousness -0.001 0.023 -0.747
## neuroticism 0.000 0.117 -0.026 0.308
## openness 0.000 0.059 0.555 -0.325 0.247
## date -0.069 0.002 -0.001 0.002 0.000 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.79073 -0.72305 -0.05492 0.65442 6.60218
##
## Number of Observations: 18666
## Number of Groups: 51
m3 <- glmmPQL(search_interest ~
(extraversion +
agreeableness +
conscientiousness +
neuroticism +
openness) * (date),
random = ~ date | region,
family = quasipoisson(link = 'log'),
data = data_joined)
## iteration 1
## iteration 2
## iteration 3
summary(m3)
## Linear mixed-effects model fit by maximum likelihood
## Data: data_joined
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~date | region
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.1190490 (Intr)
## date 0.0002118 0.109
## Residual 0.8232172
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: search_interest ~ (extraversion + agreeableness + conscientiousness + neuroticism + openness) * (date)
## Value Std.Error DF t-value p-value
## (Intercept) 2.1347 0.0173 18609 123.10 0.0000
## extraversion 0.6100 0.6912 45 0.88 0.3822
## agreeableness 0.3463 0.9150 45 0.38 0.7069
## conscientiousness -0.2068 0.9880 45 -0.21 0.8352
## neuroticism 0.2402 0.5458 45 0.44 0.6620
## openness 0.4978 0.4228 45 1.18 0.2452
## date -0.0033 0.0000 18609 -82.11 0.0000
## extraversion:date 0.0007 0.0016 18609 0.46 0.6451
## agreeableness:date -0.0056 0.0021 18609 -2.65 0.0080
## conscientiousness:date 0.0138 0.0023 18609 6.11 0.0000
## neuroticism:date -0.0012 0.0013 18609 -0.98 0.3265
## openness:date -0.0047 0.0010 18609 -4.83 0.0000
## Correlation:
## (Intr) extrvr agrbln cnscnt nrtcsm opnnss date extrv:
## extraversion -0.001
## agreeableness 0.000 -0.244
## conscientiousness -0.001 0.023 -0.747
## neuroticism 0.000 0.117 -0.025 0.308
## openness 0.000 0.059 0.555 -0.325 0.247
## date -0.072 0.003 -0.001 0.004 0.000 0.000
## extraversion:date 0.003 -0.081 0.023 -0.005 -0.005 -0.002 -0.009
## agreeableness:date -0.001 0.023 -0.073 0.054 -0.001 -0.039 0.006 -0.258
## conscientiousness:date 0.004 -0.005 0.054 -0.068 -0.018 0.022 -0.018 0.037
## neuroticism:date 0.000 -0.006 -0.001 -0.018 -0.073 -0.021 0.000 0.108
## openness:date 0.000 -0.002 -0.039 0.022 -0.021 -0.073 0.007 0.052
## agrbl: cnscn: nrtcs:
## extraversion
## agreeableness
## conscientiousness
## neuroticism
## openness
## date
## extraversion:date
## agreeableness:date
## conscientiousness:date -0.753
## neuroticism:date -0.019 0.298
## openness:date 0.552 -0.327 0.253
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.79025 -0.72459 -0.05383 0.65274 6.57886
##
## Number of Observations: 18666
## Number of Groups: 51
plot_TICs(data_joined, "search_interest") + ylim(.99, 1.02) + labs(title = "Estimated Personality Effects on Search Trends for Cough")