A tutorial for deception detection analysis

Computational notebook

Authors
Affiliations

Matti Vuorre

Mircea Zloteanu

Kingston University London

Published

May 16, 2023

Preface

This document is a computational notebook supplement to “A Tutorial for Deception Detection Analysis or: How I Learned to Stop Aggregating Veracity Judgements and Embraced Bayesian Mixed Effects Models” (Zloteanu & Vuorre, 2023). We discuss Signal Detection Theory (SDT) and show how to apply it in the area of deception research. Moreover, we implement SDT models as Generalized Linear Mixed Models (GLMM) in a bayesian inferential framework in R. This document is a Quarto computational notebook. You can download the source code from GitHub or Zenodo and reproduce the analyses, for example in RStudio.

R environment

First, ensure that your current working directory contains both the analysis script and the data file. It is best to download the whole project directory so that you have the dependencies. If you use RStudio, open up the relevant R Project (deception-sdt.Rproj) in RStudio.

Then, make sure that you have the required R packages installed. We use renv to ensure that you can install and use the exact same versions of the packages. To install those packages, execute renv::restore() in the R console (you only need to do this once).

You can now load the required packages (click “Code” to show code here, and in code blocks in the rest of the document). (Note that the MCMC program Stan may have some problems on Windows computers; see here for more information.)

Code
library(knitr)
library(brms)
library(emmeans)
library(bayestestR)
library(patchwork)
library(varde)
library(scales)
library(posterior)
library(tidybayes)
library(ggdist)
library(distributional)
library(parameters)
library(tidyverse)

We then set some options for the document output.

Code
theme_set(
  theme_linedraw() +
    theme(panel.grid = element_blank())
)
bayesplot::color_scheme_set(scheme = "brewer-Spectral")
options(digits = 3)
knitr::opts_chunk$set(
  
)

Next we set options for the bayesian model estimation procedures. We use as many cores as are available on the machine.

Code
options(
  mc.cores = parallel::detectCores(logical = FALSE)
)
dir.create("models", FALSE)

Preliminaries

Data

For this tutorial, we use a synthetic copy of previously published data. Note that all categorical predictors are R factors, and the outcome is an integer or float (0s and 1s).

Code
d <- read_rds("data/dataset-synthetic.rds")
head(d)
Table 1: First six rows of the example data.
Participant Training LieType Stimulus isLie sayLie
p003 Emotion Experiential 1-1_T No 1
p003 Emotion Experiential 2-2_L Yes 1
p003 Emotion Experiential 3-2_L Yes 1
p003 Emotion Experiential 4-1_T No 0
p003 Emotion Experiential 6-1_L Yes 1
p003 Emotion Experiential 7-1_L Yes 1
  • Training is a between-subjects variable indicating which training group the participant was in,
  • LieType is a within-subjects variable; what type of lie was presented on the trial
  • Stimulus indicates which stimulus was presented
  • isLie indicates whether the stimulus was a lie (Yes) or not (No)
  • sayLie is the participant’s response (“not lie”: 0 “lie”: 1)

Non-SDT analysis

We first analyse these data with correct / incorrect classifications. Typically, this amounts to aggregating each person’s data to proportions correct (in different conditions / groups), and then visualizing and modelling those. For this example, we ignore any groups and conditions.

First, we create a new variable encoding response accuracy.

Code
d <- d %>% 
  mutate(
    Accuracy = as.integer(as.integer(isLie) - 1 == sayLie)
  )
head(d)
Table 2: First six rows of the example data with response accuracy.
Participant Training LieType Stimulus isLie sayLie Accuracy
p003 Emotion Experiential 1-1_T No 1 0
p003 Emotion Experiential 2-2_L Yes 1 1
p003 Emotion Experiential 3-2_L Yes 1 1
p003 Emotion Experiential 4-1_T No 0 1
p003 Emotion Experiential 6-1_L Yes 1 1
p003 Emotion Experiential 7-1_L Yes 1 1

We then summarize the data to proportion correct for each person.

Code
d_accuracy_participant <- d %>% 
  summarise(
    Accuracy = mean(Accuracy), 
    .by = Participant
  )

We can then display these proportions correct as a simple histogram.

Code
d_accuracy_participant %>%
  ggplot(aes(Accuracy)) +
  scale_x_continuous(
    "Percent correct",
    limits = c(0, 1),
    labels = percent
  ) +
  scale_y_continuous(
    "Count",
    expand = expansion(c(0, .05))
  ) +
  geom_histogram(
    color = "black",
    fill = "grey80"
  )
Figure 1: Histogram of proportions correct.

On average, a t-test shows that participants are at chance:

Code
t.test(d_accuracy_participant$Accuracy, mu = 0.5) %>% 
  parameters()
Table 3: Parameter estimate from t-test of proportions correct against chance.
Parameter Mean mu Difference CI CI_low CI_high t df_error p Method Alternative
d_accuracy_participant$Accuracy 0.499 0.5 -0.001 0.95 0.482 0.516 -0.111 105 0.912 One Sample t-test two.sided

However, this simple measure of accuracy does not differentiate between bias (to e.g. prefer “lie” over “not lie” responses) and the ability to discriminate between lies and true stimuli. Signal Detection Theory allows studying bias and sensitivity to lies separately.

Signal Detection Theory

Introduction

At the heart of Signal Detection Theory is the idea that following an observation period during which a stimulus may or may not have been presented, an observer evaluates her internal degree of evidence for whether a stimulus was presented. Because of noise in the perceptual-cognitive apparatus, the latent evidence signals are not always simply absent (for observation periods, or generally trials, with no stimulus) or present (for trials with a stimulus), but instead assumed to fall on a continuum. She then has or develops some kind of inner criterion which this latent evidence must exceed for her to report “Yes, signal was present during the observation period”. On each trial, a random value from the latent evidence continuum is realized, and the participant’s decision task is simple: If this particular evidence value is greater than the criterion, the participant responds “Yes.”

SDT models are applied (sometimes routinely) in a wide range of research areas. They are very popular in, for example, perception and memory research. Depending on the task at hand and concept under study, the mysterious quantity “latent evidence” is often labelled differently. For instance, in memory research it is often called “memory strength” or “familiarity”.

In the current application, we are interested in participants’ judgments of whether a just-seen stimulus is a lie or not. SDT assumes that the responses originate from a latent continuous quantity that is the participant’s perceived falsehood of the stimulus. Here, we will refer to this subjective impression on each trial as the participant’s “internal evidence” for the falsehood of the stimulus, or just evidence for convenience. Note that this evidence is a latent quantity that is never observed by the experimenter, but instead is the participant’s internal response to the stimulus at hand. There are then hypothesized to be two evidence distributions: One for lie trials, and one for true trials. SDT allows us to examine the degree to which lies and true stimuli lead to different evidence distributions.

If participants are completely unable to detect any difference between lies and truths, these two distributions should be identical. Alternatively, if participants somehow are sensitive to the falsehood of the stimulus, lies and truths should lead to an internal evidence distribution for lies that has a greater average value than does the internal evidence distribution following true trials.

In addition, to make a decision the participant needs some criterion to which each evidence signal is compared to. SDT allows us to examine where participants set this criterion. For example, it may be of interest whether participants show a liberal criterion, and are likely to respond “lie” to just about anything.

One of the great benefits of SDT is that it allows us to separate sensitivity and bias in examining task performance.

We visualize the basic SDT framework for deception tasks in Figure 2. This figure shows two internal evidence distributions with some made-up parameter values. In red is the distribution for truths: Sometimes the realized evidence value on that trial is low, sometimes high, but importantly there is some variation. In blue is the evidence distribution for lies: Sometimes the value is high, sometimes low, but in this example it has a greater mean than the distribution for truths.

Code
sdt_draw <- function(d, crit) {
  tibble(
    d = c(0, d),
    isLie = factor(d, labels = c("No", "Yes"))
  ) %>% 
    ggplot() +
    scale_y_continuous(
      "Density",
      expand = expansion(c(0, 0.025))
    ) +
    scale_x_continuous(
      "Evidence"
    ) +
    scale_color_brewer(
      "Lie stimulus",
      palette = "Set1",
      aesthetics = c("color")
    ) +
    geom_vline(
      xintercept = crit,
      linewidth = 0.5
    ) +
    stat_slab(
      aes(
        xdist = dist_normal(d), 
        color = isLie,
        fill = after_scale(alpha(color, .25))
      ),
      p_limits = c(0.0001, 0.9999)
    ) +
    # Criterion
    annotate(
      geom = "curve",
      ncp = 9, curvature = -0.3,
      x = 3, xend = crit, y = 0.5, yend = 0.2,
      arrow = arrow(length = unit(0.5, "lines"), type = "closed", angle = 20)
    ) +
    annotate(
      geom = "text",
      x = 3, y = 0.53,
      size = 3.4,
      label = "Criterion"
    ) +
    # Sensitivity
    annotate(
      geom = "segment",
      x = d, xend = 0.0, y = 0.901, yend = 0.901,
      arrow = arrow(ends = "both", length = unit(0.5, "lines"), type = "open", angle = 90)
    ) +
    annotate(
      geom = "curve",
      ncp = 9, curvature = 0.4, angle = 135,
      x = 3.3, xend = mean(c(0, d)), y = 0.901, yend = 0.901,
      arrow = arrow(length = unit(0.5, "lines"), type = "closed", angle = 20)
    ) +
    annotate(
      geom = "text",
      x = 3.3, y = 0.87,
      size = 3.4,
      label = "Discriminability"
    ) +
    theme(
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank()
    )
}
sdt_draw(1, 0.3)
Figure 2: Illustration of the basic SDT model.

Although SDT-based analyses might then seem theory-laden, in fact this model is statistically widely used for binary responses with some binary predictor value.

Practice

In practice, the parameters of this basic SDT model can be calculated from two proportions. To calculate those proportions, we first classify each trial into four different categories: Hits are trials in which a lie is presented, and the participant correctly responds “lie”. Correct rejections are trials in which a truth is presented, and the participant correctly responds “not lie”. False alarms are trials in which a truth is presented, but the participant incorrectly responds “lie”. Misses are trials in which a lie is presented, but the participant incorrectly responds “not lie”.

Code
sdt <- d %>% 
  mutate(
    Type = case_when(
      isLie == "No" & sayLie == 1 ~ "fa", # False alarm
      isLie == "No" & sayLie == 0 ~ "cr", # Correct rejection
      isLie == "Yes" & sayLie == 1 ~ "hit", # Hit
      isLie == "Yes" & sayLie == 0 ~ "miss", # Miss
    )
  )
head(sdt)
Table 4: Trials with SDT classification.
Participant Training LieType Stimulus isLie sayLie Accuracy Type
p003 Emotion Experiential 1-1_T No 1 0 fa
p003 Emotion Experiential 2-2_L Yes 1 1 hit
p003 Emotion Experiential 3-2_L Yes 1 1 hit
p003 Emotion Experiential 4-1_T No 0 1 cr
p003 Emotion Experiential 6-1_L Yes 1 1 hit
p003 Emotion Experiential 7-1_L Yes 1 1 hit

Below is a table illustrating all the classifications’ counts in the example data.

Code
count(sdt, isLie, sayLie, Type, Accuracy)
Table 5: Counts of SDT trial types.
isLie sayLie Type Accuracy n
No 0 cr 1 1254
No 1 fa 0 866
Yes 0 miss 0 1258
Yes 1 hit 1 862

We leave it as an excercise to the reader to find each classification above in Figure 2.

To calculate the SDT parameters, we must then aggregate the data to per-participant counts of these four categories. For computations, we also pivot the data to a “wide” format.

Code
# Aggregate per participant
sdt <- sdt %>% 
  count(Participant, Type) %>% 
  pivot_wider(names_from = Type, values_from = n)
head(sdt)
Table 6: SDT classification counts.
Participant cr fa hit miss
p001 8 12 11 9
p002 9 11 6 14
p003 7 13 10 10
p004 18 2 9 11
p005 14 6 6 14
p006 16 4 9 11

The hit rate is then the proportion of “hits” over the sum of “hits” and “misses”. The false alarm rate is the proportion of “false alarms” over the sum of false alarms and correct rejections. Those rates are then “z-scored” to establish the latent continuum as a standard normal distribution. Sensitivity, often called d-prime, is then the distance between the distributions’ means. The response criterion is calculated as negative half of the sum of the z-scores. In practice, we calculate all these in R as follows:

Code
# Calculate measures
sdt <- sdt %>% 
  mutate(
    hr = hit / (hit + miss),
    far = fa / (fa + cr),
    zhr = qnorm(hr),
    zfa = qnorm(far),
    dprime = zhr - zfa,
    crit = -0.5 * (zhr + zfa)
  )
head(sdt)
Participant cr fa hit miss hr far zhr zfa dprime crit
p001 8 12 11 9 0.55 0.60 0.126 0.253 -0.128 -0.190
p002 9 11 6 14 0.30 0.55 -0.524 0.126 -0.650 0.199
p003 7 13 10 10 0.50 0.65 0.000 0.385 -0.385 -0.193
p004 18 2 9 11 0.45 0.10 -0.126 -1.282 1.156 0.704
p005 14 6 6 14 0.30 0.30 -0.524 -0.524 0.000 0.524
p006 16 4 9 11 0.45 0.20 -0.126 -0.842 0.716 0.484

We can then compute basic statistical tests on these person-specific parameters. For example, we can ask whether sensitivity (dprime) was greater than zero.

Code
t.test(sdt$dprime) %>% 
  parameters()
Results of a t-test of dprime against zero.
Parameter Mean mu Difference CI CI_low CI_high t df_error p Method Alternative
sdt$dprime 0.007 0 0.007 0.95 -0.085 0.1 0.157 105 0.876 One Sample t-test two.sided

And whether participants show a liberal bias (negative criterion) or a conservative one (positive criterion).

Code
t.test(sdt$crit) %>% 
  parameters()
Results of a t-test of criterion against zero.
Parameter Mean mu Difference CI CI_low CI_high t df_error p Method Alternative
sdt$crit 0.247 0 0.247 0.95 0.206 0.288 12 105 0 One Sample t-test two.sided

In these example data, sensitivity was not statistically significantly different from zero: On average, participants are unable to tell lies from true stimuli. On the other hand, the criterion was high, so participants were generally conservative to answer “lie”. We can use the aggregate parameters to redraw Figure 2 from above (Figure 3):

Code
sdt_draw(mean(sdt$dprime), mean(sdt$crit))
Figure 3: SDT model estimated from example data: The implied evidence distributions for lie and truth trials. As is seen, the distributions overlap completely and thus sensitivity is at zero. Criterion is above zero, so on average, ‘lie’ responses are less likely; this is often called a conservative criterion (to respond with ‘lie’).

We can also visualize the participants’ parameters. Note that there were very few trials per participant, thus many participants will have identical parameter estimates. We therefore reduce overplotting by jittering the points a little bit.

Code
sdt %>% 
  ggplot(aes(crit, dprime)) +
  geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
  geom_vline(xintercept = 0, lty = 2, linewidth = .25) +
  labs(x = "Criterion", y = "Sensitivity") +
  geom_smooth(
    method = "lm",
    linewidth = .33,
    color = "black",
    alpha = .2
  ) +
  geom_point(
    shape = 1, 
    position = position_jitter(.01, .01)
  ) +
  theme(aspect.ratio = 1)
Figure 4: Scatterplot of participant-specific criterion and sensitivity parameters.

Relation with accuracy

Proportion corrects are sometimes very similar to d-primes (Figure 5), but this does not hold in general (Macmillan and Creelman 2005, chap. 1).

Code
pa <- left_join(
  sdt,
  d_accuracy_participant
) %>% 
  ggplot(aes(Accuracy, dprime)) +
  scale_x_continuous(
    "Percent correct",
    breaks = extended_breaks(5),
    labels = percent
  ) +
  scale_y_continuous(
    "Sensitivity",
    breaks = extended_breaks(7)
  ) +
  geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
  geom_vline(xintercept = 0.5, lty = 2, linewidth = .25) +
  geom_smooth(
    method = "lm",
    linewidth = .33,
    color = "black",
    alpha = .2
  ) +
  geom_point(shape = 1, position = position_jitter(.005, .01)) +
  theme(aspect.ratio = 1)
pb <- last_plot() + 
  aes(y = crit) + 
  scale_y_continuous(
    "Criterion",
    breaks = extended_breaks(7)
  )
pa | pb
Figure 5: Scatterplot of participant-specific sensitivity and accuracy (proportion correct) parameters.

Now that we have gleaned some insight into the inner workings of the SDT framework, we turn to estimating SDT models as generalized linear mixed models (GLMMs).

Model 0

Above, we used traditional methods and formulas for calculating SDT parameters for each participant. We then used those person-specific estimates in a further statistical test. We can get equivalent average and participant-specific parameters more efficiently with a GLMM. We label this model with a zero to indicate that it is a “baseline” model: We do not yet include any parameters to estimate the effects of various experimental manipulations.

Coding of predictor variables

We first need to ensure the categorical predictors are used appropriately in the model. We “contrast-code” isLie. This ensures the model intercept is “between” truth and lie trials, or their average. The intercept will then correspond to 1 - criterion.

Code
contrasts(d$isLie) <- c(-0.5, 0.5)
contrasts(d$isLie)
    [,1]
No  -0.5
Yes  0.5

Model specification

We then define the GLMM as a mixed effects formula in R’s extended formula syntax. Below, we regress sayLie (0 = “no”, 1 = “yes”) on an intercept (in R syntax 1 means an intercept because it adds a column of 1s in the design matrix) and the slope of isLie (whether the stimulus was a truth or a lie). We also specify both as correlated random effects across Participants, and the intercepts in addition as random across Stimulus

Code
f0 <- sayLie ~ 1 + isLie + 
  (1 + isLie | Participant) + 
  (1 | Stimulus)

With this syntax, the model is parameterized such that there will be two main parameters. The intercept describes the (negative) criterion: It is half the sum of the z-scores of the hit and false alarm rates, or the z-score of the “lie” responses for the “average” stimulus. The slope of isLie indicates sensitivity–the separation of the latent variables between truth and lie trials. This parameter is typically called d-prime in many SDT applications.

Prior distributions

Although it is optional, for illustrative purposes we define some vaguely informative prior distributions on the population-level parameters.

Code
p0 <- prior(normal(0, 1), class = b) +
  prior(student_t(3, 0, 1), class = sd) +
  prior(lkj(1), class = cor)

Sampling

With the model formula, data, and prior distribution, we can then sample from the model’s posterior distribution. In addition, it is critical here to use the Bernoulli outcome distribution with a probit link; this specifies that we assume normality of the latent variable. We also specify the “file” argument, because some models take a while to sample, with “file” we can load the model from a file instead in subsequent runs.

Code
m0 <- brm(
  formula = f0, 
  family = bernoulli(link = probit),
  data = d,
  prior = p0,
  file = "models/m0"
)

It is good practice to then check model convergence. For simple models there almost never are any issues. Nevertheless you should always at least use numerical model diagnostics to evaluate if the sampling has performed as intended. The most commonly used metric is called “Rhat”, and this number should be almost exactly 1.0 (within about 0.01 of 1.0). We will return to this below, when looking at the model’s numerical summary.

Another way to do this is to visually see whether the four chains of samples are well mixed:

Code
mcmc_plot(m0, type = "trace") +
  theme(legend.position = "bottom")
Figure 6: Caterpillar plots of the main parameters’ mcmc chains.

Similarly, we want to see evidence that the model is able to reproduce the observed data:

Code
pp_check(m0, type = "bars_grouped", group = "isLie", ndraws = 100) + 
  scale_x_continuous(breaks = c(0, 1), labels = c("No", "Yes"))
Figure 7: Posterior predictive check.

Model summary

The default model summary is printed with summary(). Here we use an additional arugment to show the prior distributions used.

Code
summary(m0, prior = TRUE)
 Family: bernoulli 
  Links: mu = probit 
Formula: sayLie ~ 1 + isLie + (1 + isLie | Participant) + (1 | Stimulus) 
   Data: d (Number of observations: 4240) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Priors: 
b ~ normal(0, 1)
Intercept ~ student_t(3, 0, 2.5)
L ~ lkj_corr_cholesky(1)
<lower=0> sd ~ student_t(3, 0, 1)

Group-Level Effects: 
~Participant (Number of levels: 106) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept)             0.13      0.03     0.07     0.18 1.00     2109
sd(isLie1)                0.34      0.06     0.23     0.46 1.00     2085
cor(Intercept,isLie1)    -0.86      0.12    -1.00    -0.54 1.01     1382
                      Tail_ESS
sd(Intercept)             2457
sd(isLie1)                2983
cor(Intercept,isLie1)     2297

~Stimulus (Number of levels: 40) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.58      0.07     0.46     0.74 1.00     1318     1913

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -0.26      0.10    -0.46    -0.05 1.00      892     1390
isLie1       -0.00      0.19    -0.37     0.36 1.01     1105     1896

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

We first return to the “Rhat” number from above. Above, each population level effect has an Rhat value of ~1.0 and we are reassured of model convergence. If this was not the case, we would first try drawing more samples from the model (see ?brm).

After confirming that the model estimation algorithm has converged, and that the model doesn’t have glaring errors in reproducing the data, we can proceed to interpret the estimates. Instead of the above verbose output, we use parameters() to display a neat summary. The regression coefficients describe the negative criterion (Intercept) and d-prime (slope):

Code
parameters(m0, centrality = "mean")
Table 7: Summary of the baseline model’s main parameters.
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_Intercept -0.257 0.95 -0.462 -0.052 0.993 1.00 877
b_isLie1 -0.002 0.95 -0.375 0.365 0.501 1.01 1104

The above numbers are the posterior means and 95%CIs. Those are calculated from the samples we drew from the posterior distribution using brm() above. (Note that these summaries differ somewhat from the manually calculated basic SDT summaries above, because the model is different (random stimulus intercepts).) Those samples can also be visualized (note below we flip the intercept to plot the criterion directly):

Code
gather_draws(m0, b_Intercept, b_isLie1) %>% 
  mutate(
    .variable = factor(
      .variable, 
      levels = c("b_Intercept", "b_isLie1"), 
      labels = c("Criterion", "dprime")
    )
  ) %>% 
  # Negate intercept to criterion
  mutate(.value = if_else(.variable == "Criterion", -.value, .value)) %>% 
  ggplot(aes(.value, .variable)) +
  scale_x_continuous(
    "Parameter value",
    breaks = extended_breaks(7)
  ) +
  scale_y_discrete(
    "Parameter",
    expand = expansion(0.01)
  ) +
  geom_vline(xintercept = 0, linewidth = 0.25) +
  stat_halfeye(
    adjust = 1.5,
    slab_color = "black",
    slab_fill = "lightskyblue2",
    slab_linewidth = 0.25,
    normalize = "xy",
    height = 0.75
  )
Figure 8: Estimated parameters’ posterior distributions for Model 0.

Variance decomposition

Code
var_m0 <- varde(m0)
plot(var_m0, type = "river")

Alternative parameterisation 1

The above parameterisation is the most straightforward and produces parameter estimates that directly reflect the (negative) criterion and d-prime. However, because the model is parameterized as a linear model with an intercept and a slope, the z-scored hit and false alarm rates can end up having different priors even though that might not be intended.

It is then often useful to reparameterize the model such that the population level parameters indicate means of the latent variable for each level of isLie. That is, instead of estimating an intercept and a slope, we directly estimate z-scored false alarm and hit rates. This ensures using the same prior on the two veracity conditions’ latent variable. Because we are treating isLie as a factor in R, removing the intercepts with a 0 does the trick:

Code
f0a <- sayLie ~ 0 + isLie + 
  (0 + isLie | Participant) + 
  (1 | Stimulus)
m0a <- brm(
  f0a,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m0a"
)
parameters(m0a, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_isLieNo -0.271 0.95 -0.546 0.004 0.974 1 1059
b_isLieYes -0.267 0.95 -0.531 0.003 0.973 1 947

These parameters now reflect the model-predicted z-scores of the probabilities of answering “lie” for true and lie stimuli, respectively. We can convert them to SDT metrics as follows (note that we can here reverse the previous model’s intercept to directly give the criterion)

Code
as_draws_df(m0a, variable = "b_", regex = TRUE) %>% 
  mutate(
    dprime = b_isLieYes - b_isLieNo,
    criterion = -0.5 * (b_isLieYes + b_isLieNo)
  ) %>% 
  parameters(centrality = "mean")
Parameter Mean CI_low CI_high pd
b_isLieNo -0.271 -0.546 0.004 0.974
b_isLieYes -0.267 -0.531 0.003 0.973
dprime 0.005 -0.368 0.398 0.513
criterion 0.269 0.084 0.464 0.998

Alternative parameterisation 2

We can also directly parameterise the model as dprime and criterion. However, this is slightly more complicated because it requires brms’ nonlinear syntax.

Code
f0nl <- bf(
  sayLie ~ Phi(dprime * isLie - criterion),
  dprime ~ 1 + (1 |s| Participant), 
  criterion ~ 1 + (1 |s| Participant) + (1 | Stimulus),
  nl = TRUE
)
p0nl <- prior(normal(0, 1), nlpar = "dprime") +
  prior(normal(0, 1), nlpar = "criterion") +
  prior(normal(0, 1), nlpar = "dprime", class = "sd") +
  prior(normal(0, 1), nlpar = "criterion", class = "sd")
m0nl <- brm(
  f0nl,  
  family = bernoulli(link = "identity"),
  data = d,
  prior = p0nl,
  file = "models/m0nl"
)
parameters(m0nl, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_dprime_Intercept -0.007 0.95 -0.378 0.367 0.520 1 729
b_criterion_Intercept 0.262 0.95 0.076 0.455 0.995 1 607

Varying effects

Person-parameters’ shrinkage

Code
bind_rows(
  coef(m0)$Participant[,,1] %>% 
    as.data.frame() %>% 
    rownames_to_column("Participant"),
  coef(m0)$Participant[,,2] %>% 
    as.data.frame() %>% 
    rownames_to_column("Participant"),
  .id = "Parameter"
) %>% 
  tibble() %>% 
  mutate(Parameter = factor(Parameter, labels = c("Criterion", "dprime"))) %>% 
  pivot_wider(names_from = Parameter, values_from = Estimate:Q97.5) %>% 
  left_join(select(sdt, Participant, crit, dprime)) %>% 
  ggplot(aes(crit, dprime)) +
  geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
  geom_vline(xintercept = 0, lty = 2, linewidth = .25) +
  labs(x = "Criterion", y = "dprime") +
  geom_segment(
    aes(
      x = crit, xend = -Estimate_Criterion,
      y = dprime, yend = Estimate_dprime
    ),
    linewidth = .25
  ) +
  geom_point(shape = 21, fill = "white", col = "grey30") +
  geom_point(
    aes(-Estimate_Criterion, Estimate_dprime)
  ) +
  theme(
    aspect.ratio = 1
  )
Figure 9: Scatterplot of participant-specific criterion and sensitivity parameters as estimated by Model 0 (filled points) and manual calculation (empty points). Line segments connect parameter estimates from the two estimation methods to illustrate shrinkage.

All varying effects

Code
set.seed(1)
p_m0_coef_participant <- m0 %>% 
  spread_draws(
    b_Intercept, b_isLie1, 
    r_Participant[Participant,Parameter] | Parameter
  ) %>% 
  mutate(
    Criterion = -(Intercept + b_Intercept),
    dprime = isLie1 + b_isLie1,
    .keep = "unused"
  ) %>% 
  ungroup() %>% 
  filter(Participant %in% sample(unique(.$Participant), 20)) %>% 
  mutate(Participant = fct_reorder(factor(Participant), Criterion)) %>% 
  pivot_longer(c(Criterion, dprime)) %>% 
  ggplot(aes(value, Participant)) +
  geom_vline(xintercept = 0, lty = 2, linewidth = .25) +
  stat_halfeye(
    slab_color = NA,
    slab_fill = "darkorange",
    slab_linewidth = .2,
    interval_size_range = c(.4, .8),
    height = .75
  ) +
  labs(x = "Parameter value") +
  facet_wrap("name")
Code
p_m0_coef_stimulus <- m0 %>% 
  spread_draws(
    b_Intercept, r_Stimulus[Stimulus, Parameter] | Parameter
  ) %>% 
  mutate(
    Criterion = -(Intercept + b_Intercept),
    .keep = "unused"
  ) %>% 
  ungroup() %>% 
  mutate(
    Stimulus = fct_reorder(factor(Stimulus), Criterion),
    name = "Criterion"
  ) %>% 
  ggplot(aes(Criterion, Stimulus)) +
  geom_vline(xintercept = 0, lty = 2, linewidth = .25) +
  stat_halfeye(
    slab_color = NA,
    slab_fill = "indianred1",
    slab_linewidth = .2,
    interval_size_range = c(.4, .8),
    height = .75
  ) +
  scale_x_continuous(
    "Parameter value",
    breaks = extended_breaks(7)
  ) +
  facet_wrap("name")
Code
(p_m0_coef_participant | p_m0_coef_stimulus) &
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  )
Figure 10: Caterpillar plots of participant- (left two panels) and stimulus-specific (right panel) parameters. We show participant-specific parameters for a random sample of 20 participants to reduce overplotting.

Model 1

We then proceed to a more complex model, and ask whether bias or sensitivity differ between the training groups. First, we ensure that “None” is the baseline level for Training.

Code
d <- d %>% 
  mutate(
    Training = fct_relevel(Training, "None")
  )

Then, we can simply enter Training as a main effect and interaction with isLie.

Code
f1 <- sayLie ~ 1 + isLie * Training + 
  (1 + isLie | Participant) + 
  (1 | Stimulus)
m1 <- brm(
  f1,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m1"
)

The default parameterization, and adding Training with dummy contrasts, leads to an intercept that is the negative criterion for Training = “None”. Differences in negative criterion between this baseline and the other two groups are the two Training main effects.

Code
contrasts(d$Training)
        Bogus Emotion
None        0       0
Bogus       1       0
Emotion     0       1

Model summary

The main effect of isLie indicates dprime for the baseline group, and the two interactions are differences in dprime between the baseline and the two other training groups. We display summaries of the estimates below, while negating the intercepts to criteria

Code
gather_draws(m1, `b_.*`, regex = TRUE) %>%
  mutate(
    .value = if_else(
      str_detect(.variable, "isLie"), 
      .value, 
      -.value
    )
  ) %>% 
  mean_qi()
.variable .value .lower .upper .width .point .interval
b_Intercept 0.228 0.015 0.439 0.95 mean qi
b_TrainingBogus 0.052 -0.071 0.174 0.95 mean qi
b_TrainingEmotion 0.052 -0.071 0.176 0.95 mean qi
b_isLie1 -0.220 -0.668 0.206 0.95 mean qi
b_isLie1:TrainingBogus 0.303 0.038 0.559 0.95 mean qi
b_isLie1:TrainingEmotion 0.289 0.016 0.544 0.95 mean qi

We can then use functions in the emmeans package to calculate other quantities, such as converting differences and interactions to conditional means. However, we cannot here negate the intercepts to directly correspond to criteria.

Code
# (Negative) criteria
emm_m1_c1 <- emmeans(m1, ~Training)

# Differences in (negative) criteria
emm_m1_c2 <- emmeans(m1, ~Training) %>% 
  contrast("revpairwise")

# Dprimes for three groups
emm_m1_d1 <- emmeans(m1, ~isLie + Training) %>% 
  contrast("revpairwise", by = "Training")

# Differences between groups
emm_m1_d2 <- emmeans(m1, ~isLie + Training) %>% 
  contrast(interaction = c("revpairwise", "revpairwise"))

list(emm_m1_c1, emm_m1_c2, emm_m1_d1, emm_m1_d2) |> 
  map(~parameters(., centrality = "mean")) |> 
  bind_rows() |> 
  select(c(1:2, 4:5))
Table 8: Criteria conditional means and group differences (first six rows), and groups’ dprimes and differences therein (last six rows).
Parameter Mean CI_low CI_high
None -0.228 -0.439 -0.015
Bogus -0.280 -0.481 -0.074
Emotion -0.280 -0.478 -0.074
Bogus - None -0.052 -0.174 0.071
Emotion - None -0.052 -0.176 0.071
Emotion - Bogus 0.000 -0.112 0.113
Yes - No None -0.220 -0.668 0.206
Yes - No Bogus 0.083 -0.338 0.499
Yes - No Emotion 0.069 -0.365 0.485
Yes - No Bogus - None 0.303 0.038 0.559
Yes - No Emotion - None 0.289 0.016 0.544
Yes - No Emotion - Bogus -0.013 -0.248 0.224

Typically, we are more interested in visualizations than tables. We saved the quantities above in objects, which we visualize below (note we flip the intercepts to directly plot criteria):

Code
emm_m1_c <- bind_rows(
  gather_emmeans_draws(emm_m1_c1) |> 
    mutate(panel = "Group means"),
  gather_emmeans_draws(emm_m1_c2) |> 
    rename(Training = contrast) |> 
    mutate(panel = "Differences")
) |> 
  mutate(name = "Criterion", .value = .value * -1)

emm_m1_d <- bind_rows(
  gather_emmeans_draws(emm_m1_d1) |> 
    ungroup() |> 
    mutate(panel = "Group means", name = "dprime") |> 
    select(-contrast),
  gather_emmeans_draws(emm_m1_d2) |> 
    ungroup() |> 
    rename(Training = Training_revpairwise) |> 
    mutate(panel = "Differences", name = "dprime") |> 
    select(-isLie_revpairwise)
)

bind_rows(emm_m1_c, emm_m1_d) |> 
  mutate(panel = fct_rev(panel)) |> 
  ggplot(aes(Training, .value)) +
  labs(
    x = "Training group (or difference)",
    y = "Posterior mean and 95%CI"
  ) +
  geom_hline(yintercept = 0, linewidth = 0.25) +
  scale_fill_manual(
    "In ROPE",
    values = c(
      alpha("dodgerblue1", .5), 
      alpha("dodgerblue4", .75)
    ),
    aesthetics = c("slab_fill")
  ) +
  stat_halfeye(
    normalize = "xy",
    width = 0.5,
    slab_color = "black",
    slab_linewidth = 0.25,
    aes(slab_fill = after_stat(between(y, -0.1, 0.1)))
  ) +
  facet_grid(
    name~panel, 
    scales = "free"
  ) +
  theme(legend.position = "none")
Figure 11: Posterior means and 95%CIs of the criterion and dprime parameters from Model 1.

A “hypothesis test”:

Code
hypothesis(m1, "TrainingEmotion < 0")
Hypothesis Tests for class b:
             Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (TrainingEmotion) < 0    -0.05      0.06    -0.16     0.05       3.99
  Post.Prob Star
1       0.8     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Variance decomposition

Code
var_m1 <- varde(m1)
plot(var_m1, type = "river")

Alternative parameterization

As above, in the linear model parameterisation above, prior distributions on parameters might have unintended consequences. We therefore reparameterize the model to estimate means of the latent variable instead.

Code
f1a <- sayLie ~ 0 + isLie %in% Training + 
  (0 + isLie | Participant) + 
  (1 | Stimulus)
m1a <- brm(
  f1a,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m1a"
)
Code
parameters(m1a, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_isLieNo:TrainingNone -0.128 0.95 -0.422 0.174 0.805 1.01 547
b_isLieYes:TrainingNone -0.338 0.95 -0.619 -0.045 0.991 1.00 558
b_isLieNo:TrainingBogus -0.336 0.95 -0.624 -0.036 0.985 1.01 525
b_isLieYes:TrainingBogus -0.242 0.95 -0.521 0.032 0.957 1.00 575
b_isLieNo:TrainingEmotion -0.328 0.95 -0.622 -0.033 0.985 1.01 510
b_isLieYes:TrainingEmotion -0.245 0.95 -0.521 0.034 0.958 1.01 547

Calculating the SDT parameters then takes some more work, however. We need to manually calculate them for each group.

Code
x <- gather_draws(m1a, `b_.*`, regex = TRUE) %>% 
  separate(.variable, c("isLie", "Training"), sep = ":") %>% 
  mutate(
    isLie = str_remove(isLie, "b_isLie"),
    Training = str_remove(Training, "Training")
  ) %>% 
  pivot_wider(names_from = isLie, values_from = .value) %>% 
  mutate(
    Criterion = -0.5 * (Yes + No),
    dprime = Yes - No
  ) %>% 
  select(-No, -Yes) %>% 
  pivot_longer(Criterion:dprime)
x2 <- x %>% 
  group_by(name) %>%
  compare_levels(
    value, 
    Training, 
    comparison = emmeans_comparison("revpairwise")
  )

tmp <- bind_rows(x, x2) %>% 
  mutate(
    t = if_else(
      str_detect(Training, "-"), 
      "Differences", "Group means"
    ) %>% 
      fct_inorder(),
    Training = fct_inorder(Training)
  )
Code
tmp %>% 
  ggplot(aes(Training, value)) +
  labs(
    x = "Training group (or difference)",
    y = "Posterior mean and 95%CI"
  ) +
  geom_hline(yintercept = 0, linewidth = 0.25) +
  stat_halfeye(
    normalize = "xy",
    width = 0.5,
    slab_color = "black",
    slab_linewidth = 0.25,
    slab_fill = "grey80"
  ) +
  facet_grid(
    name~t, 
    scales = "free"
  )
Figure 12: Posterior distributions with 66% (thick) and 95%CIs (thin lines) of the criterion and dprime parameters from Model 1.

Region of practical equivalence

Because our estimates are random samples from the posterior distribution, it is easy to calculate further inferential quantities. For example, we might ask whether the estimated criteria, dprimes, or their differences were practically equivalent to zero. To do so, we establish arbitrarily that the region of practical equivalence to zero is from -0.1 to 0.1. We then calculate percentages of each posterior distribution that lie within this ROPE.

Code
rope <- bind_rows(emm_m1_c, emm_m1_d) |> 
  group_by(Training, panel, name) %>% 
  group_modify(
    ~describe_posterior(
      .$.value, 
      test = "rope", 
      rope_ci = 1, 
      rope_range = c(-0.1, 0.1)
    )
  )
Code
bind_rows(emm_m1_c, emm_m1_d) |> 
  mutate(panel = fct_rev(panel)) |> 
  ggplot(aes(Training, .value)) +
  labs(
    x = "Training group (or difference)",
    y = "Parameter value"
  ) +
  geom_hline(yintercept = 0, linewidth = 0.25) +
  scale_fill_manual(
    "In ROPE",
    values = c(
      alpha("dodgerblue1", .5), 
      alpha("dodgerblue4", .75)
    ),
    aesthetics = c("slab_fill")
  ) +
  stat_halfeye(
    normalize = "xy",
    width = 0.5,
    slab_color = "black",
    slab_linewidth = 0.25,
    aes(slab_fill = after_stat(between(y, -0.1, 0.1)))
  ) +
  geom_text(
    data = rope |> mutate(panel = fct_inorder(panel)),
    aes(label = percent(ROPE_Percentage, 1)),
    y = 0.05, nudge_x = -0.2
  ) +
  facet_grid(
    name~panel, 
    scales = "free"
  ) +
  theme(legend.position = "none")
Figure 13: Posterior distributions with 66% (thick) and 95%CIs (thin lines) of the criterion and dprime parameters from Model 1. The densities are colored according to whether the value lies within (dark) or outside the ROPE from -0.1 to 0.1. Proportions of the posterior distribution within the ROPE are displayed in numbers.

Model 2

Finally, we include the within-person manipulation LieType as well.

Code
f2 <- sayLie ~ 1 + isLie * Training * LieType + 
  (1 + isLie * LieType | Participant) + 
  (1 | Stimulus)
m2 <- brm(
  f2,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m2"
)
Code
parameters(m2, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_Intercept -0.222 0.95 -0.504 0.062 0.934 1.00 988
b_isLie1 -0.544 0.95 -1.129 0.026 0.969 1.01 1021
b_TrainingBogus 0.046 0.95 -0.109 0.201 0.718 1.00 3198
b_TrainingEmotion -0.152 0.95 -0.307 0.009 0.967 1.00 3351
b_LieTypeExperiential -0.014 0.95 -0.421 0.387 0.519 1.00 966
b_isLie1:TrainingBogus 0.263 0.95 -0.059 0.571 0.952 1.00 3403
b_isLie1:TrainingEmotion 0.252 0.95 -0.048 0.550 0.949 1.00 3718
b_isLie1:LieTypeExperiential 0.679 0.95 -0.161 1.526 0.946 1.00 977
b_TrainingBogus:LieTypeExperiential -0.217 0.95 -0.432 -0.005 0.977 1.00 3314
b_TrainingEmotion:LieTypeExperiential 0.202 0.95 -0.019 0.417 0.965 1.00 3466
b_isLie1:TrainingBogus:LieTypeExperiential 0.123 0.95 -0.428 0.655 0.672 1.00 3024
b_isLie1:TrainingEmotion:LieTypeExperiential 0.073 0.95 -0.460 0.598 0.605 1.00 3299

From these estimates, we can again use emmeans to produce other inferential quantities, like differences:

Code
emm_m2_d1 <- emmeans(m2, ~isLie | Training * LieType) %>% 
  contrast("revpairwise")
emm_m2_d2 <- emmeans(m2, ~isLie + Training * LieType) %>% 
  contrast(interaction = c("revpairwise", "revpairwise"), by = "LieType")

# (Negative) criteria
emm_m2_c1 <- emmeans(m2, ~Training * LieType)
emm_m2_c2 <- emmeans(m2, ~Training | LieType) %>% 
  contrast("revpairwise")
Code
tmp <- bind_rows(
  bind_rows(
    gather_emmeans_draws(emm_m2_d1) %>% 
      group_by(Training, LieType) %>% 
      select(-contrast),
    gather_emmeans_draws(emm_m2_d2) %>% 
      rename(
        Training = Training_revpairwise
      ) %>% 
      group_by(Training, LieType) %>% 
      select(-isLie_revpairwise)
  ),
  bind_rows(
    gather_emmeans_draws(emm_m2_c1),
    gather_emmeans_draws(emm_m2_c2) %>% 
      rename(
        Training = contrast
      )
  ),
  .id = "Parameter"
) %>% 
  mutate(Parameter = factor(Parameter, labels = c("dprime", "Criterion"))) %>% 
  mutate(
    t = if_else(str_detect(Training, " - "), "Differences", "Group means") %>% 
      fct_inorder(),
    Training = fct_inorder(Training)
  ) 
tmp %>%   
  mutate(.value = if_else(Parameter == "Criterion", .value * -1, .value)) %>% 
  mutate(Parameter = fct_rev(Parameter)) %>% 
  ggplot(aes(Training, .value, slab_fill = LieType)) +
  labs(
    x = "Training group (or difference)",
    y = "Parameter value"
  ) +
  geom_hline(yintercept = 0, linewidth = .25) +
  scale_x_continuous(
    breaks = 1:6,
    labels = unique(tmp$Training)
  ) +
  scale_slab_alpha_discrete(range = c(0.5, 1.0)) +
  stat_halfeye(
    normalize = "xy",
    width = 0.33,
    slab_color = "black",
    slab_linewidth = 0.2,
    interval_size_range = c(0.5, 0.75),
    .width = 0.95,
    aes(
      side = ifelse(LieType=="Affective", "left", "right"),
      x = ifelse(LieType == "Affective", as.numeric(Training)-0.025, as.numeric(Training)+0.025),
      slab_alpha = after_stat(between(y, -0.1, 0.1))
      )
  ) +
  guides(slab_alpha = "none") +
  facet_grid(Parameter~t, scales = "free") +
  theme(legend.position = "none")
Figure 14: Posterior distributions and 95%CIs of the criterion and dprime parameters, or differences therein, from Model 2.

Variance decomposition

Code
var_m2 <- varde(m2)
plot(var_m2, type = "river")

Alternative parameterization

Code
f2a <- sayLie ~ 0 + (Training / isLie) %in% LieType + 
  (0 + LieType / isLie | Participant) + 
  (1 | Stimulus)
m2a <- brm(
  f2a,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m2a"
)
Code
parameters(m2a)
Parameter Median CI CI_low CI_high pd Rhat ESS
b_TrainingNone:LieTypeAffective -0.226 0.95 -0.524 0.065 0.943 1.00 685
b_TrainingBogus:LieTypeAffective -0.182 0.95 -0.469 0.108 0.897 1.00 652
b_TrainingEmotion:LieTypeAffective -0.381 0.95 -0.667 -0.091 0.994 1.00 643
b_TrainingNone:LieTypeExperiential -0.232 0.95 -0.526 0.071 0.935 1.01 560
b_TrainingBogus:LieTypeExperiential -0.405 0.95 -0.688 -0.119 0.997 1.01 510
b_TrainingEmotion:LieTypeExperiential -0.187 0.95 -0.465 0.105 0.894 1.01 529
b_TrainingNone:LieTypeAffective:isLie1 -0.548 0.95 -1.126 0.024 0.968 1.01 543
b_TrainingBogus:LieTypeAffective:isLie1 -0.284 0.95 -0.841 0.270 0.843 1.01 528
b_TrainingEmotion:LieTypeAffective:isLie1 -0.296 0.95 -0.870 0.263 0.838 1.01 532
b_TrainingNone:LieTypeExperiential:isLie1 0.132 0.95 -0.503 0.761 0.668 1.00 653
b_TrainingBogus:LieTypeExperiential:isLie1 0.527 0.95 -0.093 1.123 0.956 1.00 639
b_TrainingEmotion:LieTypeExperiential:isLie1 0.462 0.95 -0.149 1.069 0.934 1.00 659

Further reading

The GLMM analysis of SDT models in R is based on Vuorre (2017), which in turn drew on discussions in Rouder and Lu (2005), Rouder et al. (2007), DeCarlo (1998), DeCarlo (2010), and Decarlo (2003). Macmillan and Creelman (2005) is a classic introductory text on SDT, and Stanislaw and Todorov (1999) discusses the calculations required for SDT metrics in some detail. Bürkner (2017) is a good introductory text on the brms package; Wickham and Grolemund (2016) is an excellent book on R; and McElreath (2016) is a recommended introductory textbook on bayesian statistics and probabilistic modelling.

References

Bürkner, Paul-Christian. 2017. “Brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
DeCarlo, Lawrence T. 1998. “Signal Detection Theory and Generalized Linear Models.” Psychological Methods 3 (2): 186–205. https://doi.org/10.1037/1082-989X.3.2.186.
Decarlo, Lawrence T. 2003. “Using the PLUM Procedure of SPSS to Fit Unequal Variance and Generalized Signal Detection Models.” Behavior Research Methods, Instruments, & Computers 35 (1): 49–56. https://doi.org/10.3758/BF03195496.
DeCarlo, Lawrence T. 2010. “On the Statistical and Theoretical Basis of Signal Detection Theory and Extensions: Unequal Variance, Random Coefficient, and Mixture Models.” Journal of Mathematical Psychology 54 (3): 304–13. https://doi.org/10.1016/j.jmp.2010.01.001.
Macmillan, Neil A., and C. Douglas Creelman. 2005. Detection Theory: A User’s Guide. 2nd ed. Mahwah, N.J: Lawrence Erlbaum Associates.
McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. CRC Press.
Rouder, Jeffrey N., and Jun Lu. 2005. “An Introduction to Bayesian Hierarchical Models with an Application in the Theory of Signal Detection.” Psychonomic Bulletin & Review 12 (4): 573–604. https://doi.org/10.3758/BF03196750.
Rouder, Jeffrey N., Jun Lu, Dongchu Sun, Paul Speckman, Richard D. Morey, and Moshe Naveh-Benjamin. 2007. “Signal Detection Models with Random Participant and Item Effects.” Psychometrika 72 (4): 621–42. https://doi.org/10.1007/s11336-005-1350-6.
Stanislaw, Harold, and Natasha Todorov. 1999. “Calculation of Signal Detection Theory Measures.” Behavior Research Methods, Instruments, & Computers 31 (1): 137–49. http://link.springer.com/article/10.3758/BF03207704.
Vuorre, Matti. 2017. “Bayesian Estimation of Signal Detection Models.” October 9, 2017. https://vuorre.netlify.app/posts/2017-10-09-bayesian-estimation-of-signal-detection-theory-models.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. http://r4ds.had.co.nz/.

Appendix

Simpler Model 0

This is just to show that the estimated values reflect manually calculated closely when we drop stimulus random effects

Code
f0a2 <- sayLie ~ 1 + isLie + (1 + isLie | Participant)
m0a2 <- brm(
  formula = f0a2,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m0a2"
)
f0a3 <- sayLie ~ 0 + isLie + (0 + isLie | Participant)
m0a3 <- brm(
  formula = f0a3,  
  family = bernoulli(link = probit),
  data = d,
  file = "models/m0a3"
)
parameters(m0a2, centrality = "mean")
Parameter Mean CI CI_low CI_high pd Rhat ESS
b_Intercept -0.237 0.95 -0.280 -0.195 1.000 1 4579
b_isLie1 0.000 0.95 -0.091 0.095 0.507 1 3887
Code
as_draws_df(m0a3, variable = "b_", regex = TRUE) %>% 
  mutate(
    dprime = b_isLieYes - b_isLieNo,
    crit = -0.5 * (b_isLieYes + b_isLieNo)
  ) %>% 
  parameters(centrality = "mean")
Parameter Mean CI_low CI_high pd
b_isLieNo -0.237 -0.306 -0.165 1.000
b_isLieYes -0.236 -0.289 -0.184 1.000
dprime 0.000 -0.086 0.089 0.507
crit 0.236 0.193 0.280 1.000
Code
sdt %>% 
  select(Participant, zfa, zhr, dprime, crit) %>% 
  pivot_longer(-Participant) %>% 
  group_by(name) %>% 
  group_modify(
    ~parameters(t.test(.$value)) %>% 
      select(Mean, CI_low, CI_high, p),
    .keep = TRUE
  )
name Mean CI_low CI_high p
crit 0.247 0.206 0.288 0.000
dprime 0.007 -0.085 0.100 0.876
zfa -0.251 -0.325 -0.176 0.000
zhr -0.243 -0.289 -0.198 0.000

Receiving Operator Characteristic

People sometimes like talking about the receiver operating characteristic and area under the curve measures. At least they allow for nice plots (but those require some hairy code).

The ROC plots hit rates against false alarm rates for varying threshold levels. However, we only get one threshold (per participant):

Code
sdt %>% 
  ggplot(aes(far, hr)) +
  geom_abline(lty = 2, linewidth = .33) +
  scale_x_continuous(
    "False alarm rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  scale_y_continuous(
    "Hit rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  geom_point(
    shape = 1,
    position = position_jitter(.01, .01)
  ) +
  theme(aspect.ratio = 1)

One solution to this is to plot the estimated hit rate for all false alarm rates. We do so here for the average participant. We display 100 random ROCs from the posterior to display uncertainty.

Code
grid <- tibble(
  x = seq(-2.5, 2.5, length.out = 30),
  far = pnorm(x, lower = FALSE),
  zfar = qnorm(far)
)
rocs <- as_draws_df(m0a) %>% 
  slice_sample(n = 100) %>% 
  select(starts_with("b_")) %>% 
  rownames_to_column("i") %>% 
  crossing(grid) %>% 
  mutate(hr = pnorm(x, b_isLieYes - b_isLieNo, lower = FALSE))
Code
rocs %>% 
  ggplot(aes(far, hr)) +
  geom_abline(lty = 2, linewidth = .33) +
  scale_x_continuous(
    "False alarm rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  scale_y_continuous(
    "Hit rate",
    limits = c(0, 1),
    expand = expansion(0.01)
  ) +
  geom_line(
    aes(group = i), 
    alpha = .25,
    linewidth = .33
  ) +
  theme(
    aspect.ratio = 1
  )

Reuse

Citation

BibTeX citation:
@article{vuorre2023,
  author = {Vuorre, Matti and Zloteanu, Mircea},
  title = {A Tutorial for Deception Detection Analysis},
  journal = {PsyArXiv},
  date = {2023-05-16},
  url = {https://psyarxiv.com/fdh5b/},
  doi = {10.31234/osf.io/fdh5b},
  langid = {en}
}
For attribution, please cite this work as:
Vuorre, Matti, and Mircea Zloteanu. 2023. “A Tutorial for Deception Detection Analysis.” PsyArXiv, May. https://doi.org/10.31234/osf.io/fdh5b.