Subject-specific parameters and effects

How to identify specific subjects’ effects?

When plotting subject specific effects using mlm_pars_plot() or mlm_spaghetti_plot(), individual subjects’ effects or fitted values are represented as points or lines. Sometimes it might be of interest to identify specific subjects with lines / points in the figures.

One solution to identifying specific subjects in the estimated model is to inspect the numerical values of the estimated parameters: All subject-specific effects’ posterior samples are saved in the estimated model object as u_x, where x is the parameter (one of a, b, cp (c’), c, me, or pme).

For example, let’s focus on the subject-specific a parameters. Let’s identify the 6 subjects with the lowest values for this parameter. First, we’ll fit a model using the included BLch9 data set:

library(bmlm)
head(BLch9)
##    id time fwkstrs   fwkdis  freldis          x          m          y
## 1 101    1       3 5.590119 3.034483  0.3333333  0.9828781 -1.4420726
## 2 101    2       3 5.535224 4.620690  0.3333333  0.9279833  0.1441343
## 3 101    3       3 3.888381 2.850575  0.3333333 -0.7188603 -1.6259807
## 4 101    4       4 5.352242 6.398467  1.3333333  0.7450007  1.9219121
## 5 101    5       1 4.483074 2.544061 -1.6666667 -0.1241668 -1.9324941
## 6 101    6       2 3.339433 5.164751 -0.6666667 -1.2678081  0.6881956
fit <- mlm(BLch9, x="x", m="m", y="y", cores = 4, iter = 500)

Once the model has been estimated, you can find the u_a parameters with mlm_summary(fit, pars = "u_a"). However, this would return a row of information for each subject in the data. What we would like to do here is to arrange the estimated u_a parameters in an (ascending) order of their magnitudes (calculated from posterior means). To do that, we’ll use the arrange() function (Wickham and Francois 2016) to arrange the data frame returned by mlm_summary()

library(dplyr)
u_a <- mlm_summary(fit, pars = "u_a")
head( arrange(u_a, Mean) )
##   Parameter  Mean   SE Median  2.5% 97.5% n_eff Rhat
## 1   u_a[45] -0.39 0.18  -0.39 -0.74 -0.05  1360    1
## 2   u_a[76] -0.15 0.16  -0.15 -0.46  0.16  1140    1
## 3   u_a[21] -0.14 0.21  -0.14 -0.55  0.27  1678    1
## 4   u_a[39] -0.14 0.17  -0.14 -0.49  0.19  1377    1
## 5   u_a[64] -0.11 0.16  -0.11 -0.40  0.19  1161    1
## 6   u_a[77] -0.11 0.19  -0.11 -0.47  0.25  1267    1

Notice that we used the head() function to return the first six rows of the resulting data frame. The subject numbers are in square brackets: ID 45 has the lowest u_a parameter value. To find the individuals with the highest u_a values, wrap Mean in desc() (this arranges the data frame on descending values of Mean).

head( arrange(u_a, desc(Mean)) )
##   Parameter Mean   SE Median 2.5% 97.5% n_eff Rhat
## 1    u_a[9] 0.65 0.18   0.64 0.33  1.02  1159    1
## 2   u_a[61] 0.56 0.18   0.55 0.23  0.93  1071    1
## 3   u_a[57] 0.55 0.19   0.55 0.19  0.93  1060    1
## 4   u_a[49] 0.54 0.15   0.54 0.23  0.83  1099    1
## 5   u_a[22] 0.52 0.18   0.52 0.17  0.87  1597    1
## 6   u_a[32] 0.52 0.16   0.52 0.21  0.85  1374    1

Important: To use this method accurately, please ensure that the subjects’ ID numbers in your data are sequential integers from 1 to however many subjects there are in your sample. The underlying Stan MCMC sampler requires sequential integer subject IDs, and if your IDs are something else (letters, non-sequential integers, etc.) mlm() will coerce the subject numbers to 1:N sequential integers before fitting the model.

Thanks to Xiao Hu for asking this question.

References

Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation. http://CRAN.R-project.org/package=dplyr.