When plotting subject specific effects using
mlm_pars_plot()
or mlm_spaghetti_plot()
,
individual subjects’ effects or fitted values are represented as points
or lines. Sometimes it might be of interest to identify specific
subjects with lines / points in the figures.
One solution to identifying specific subjects in the estimated model is to inspect the numerical values of the estimated parameters: All subject-specific effects’ posterior samples are saved in the estimated model object as u_x, where x is the parameter (one of a, b, cp (c’), c, me, or pme).
For example, let’s focus on the subject-specific a
parameters. Let’s identify the 6 subjects with the lowest values for
this parameter. First, we’ll fit a model using the included
BLch9
data set:
## id time fwkstrs fwkdis freldis x m y
## 1 101 1 3 5.590119 3.034483 0.3333333 0.9828781 -1.4420726
## 2 101 2 3 5.535224 4.620690 0.3333333 0.9279833 0.1441343
## 3 101 3 3 3.888381 2.850575 0.3333333 -0.7188603 -1.6259807
## 4 101 4 4 5.352242 6.398467 1.3333333 0.7450007 1.9219121
## 5 101 5 1 4.483074 2.544061 -1.6666667 -0.1241668 -1.9324941
## 6 101 6 2 3.339433 5.164751 -0.6666667 -1.2678081 0.6881956
fit <- mlm(BLch9, x="x", m="m", y="y", cores = 4, iter = 500)
Once the model has been estimated, you can find the u_a
parameters with mlm_summary(fit, pars = "u_a")
. However,
this would return a row of information for each subject in the data.
What we would like to do here is to arrange the estimated u_a
parameters in an (ascending) order of their magnitudes (calculated from
posterior means). To do that, we’ll use the arrange()
function (Wickham and Francois 2016) to
arrange the data frame returned by mlm_summary()
library(dplyr)
u_a <- mlm_summary(fit, pars = "u_a")
head( arrange(u_a, Mean) )
## Parameter Mean SE Median 2.5% 97.5% n_eff Rhat
## 1 u_a[45] -0.39 0.18 -0.39 -0.74 -0.05 1360 1
## 2 u_a[76] -0.15 0.16 -0.15 -0.46 0.16 1140 1
## 3 u_a[21] -0.14 0.21 -0.14 -0.55 0.27 1678 1
## 4 u_a[39] -0.14 0.17 -0.14 -0.49 0.19 1377 1
## 5 u_a[64] -0.11 0.16 -0.11 -0.40 0.19 1161 1
## 6 u_a[77] -0.11 0.19 -0.11 -0.47 0.25 1267 1
Notice that we used the head()
function to return the
first six rows of the resulting data frame. The subject numbers are in
square brackets: ID 45 has the lowest u_a parameter value. To
find the individuals with the highest u_a values, wrap
Mean
in desc()
(this arranges the data frame
on descending values of Mean).
## Parameter Mean SE Median 2.5% 97.5% n_eff Rhat
## 1 u_a[9] 0.65 0.18 0.64 0.33 1.02 1159 1
## 2 u_a[61] 0.56 0.18 0.55 0.23 0.93 1071 1
## 3 u_a[57] 0.55 0.19 0.55 0.19 0.93 1060 1
## 4 u_a[49] 0.54 0.15 0.54 0.23 0.83 1099 1
## 5 u_a[22] 0.52 0.18 0.52 0.17 0.87 1597 1
## 6 u_a[32] 0.52 0.16 0.52 0.21 0.85 1374 1
Important: To use this method accurately, please
ensure that the subjects’ ID numbers in your data are sequential
integers from 1 to however many subjects there are in your sample. The
underlying Stan MCMC sampler requires sequential integer subject IDs,
and if your IDs are something else (letters, non-sequential integers,
etc.) mlm()
will coerce the subject numbers to 1:N
sequential integers before fitting the model.
Thanks to Xiao Hu for asking this question.