Wednesday 5 April 2017

estimating and reporting the reliability of our cognitive tasks: Introducing the splithalf package


It is important that us researchers report the reliability of our cognitive tasks. In fact, I believe that the estimation and reporting of task reliability should be standard practice, however this tends to be the exception, rather than the rule. The reliability of our tasks influences the confidence with which we can give our results, and in turn, the power to replicate our results (LeBel & Paunonen, 2011). The continued development and improvement of our implicit psychological measures would benefit from regular reporting of task reliability, much as we currently do for self-report measures. i.e. it is common check the cronbach's alphas of the scales that we have used - as well as conduct large validation studies early on in the development of the measure.

I developed a script to allow myself to estimate the reliability of the dot-probe task that I had used in one of my studies. With the assistance of Phil Enock and my supervisor Anne-Wil Kruijt I was able to turn this script into a workable function, DPsplithalf(), which I hope will be fairly straightforward for others to use. This function is contained within the splithalf package, which I am currently asking others to test before I upload it to CRAN. The package also offers several other functions to calculate the split half reliability of tasks. 

The functions include the option to estimate split-half reliability with a monte carlo simulation process repeating a large number (5000 is recommended here) of calculations in order to yield a more stable estimate than other approaches, such as the odd/even numbered trial split. I will write in more detail about this at a later time, but for now it is worth noting that the package offers this, and I sugest that it will be the best to use. 

I wanted to use this post to introduce the package and provide an example of how it may be used. Therefore, I have copied parts of the package vignette, which provides two examples that may be helpful to explain how to use the DPsplithalf function. The first is straightforward, using a simulated data set in which there are no missing data and the variables are named as required by the function. The second is slightly more complex and will hopefully help with compatibility issues. Last I have included a figure to give an idea of the expected run time of the function with increasing numbers of conditions and larger sample sizes. 

If any readers have a dot-probe data set laying around, I would really appreciate it if you take the time to test the package to test its functionality and usability. Any suggestions or input are very welcome, as I want the package to be as user friendly as possible. In the future I also aim to develop a shiny application in order that non R users are able to run these functions online.

you can install the package from github with the following

install.packages("devtools")
library("devtools")
devtools::install_github("sdparsons/splithalf")


Example 1:
First, lets look at the data. This is simulated data contained within the package. It has been designed to be as straightforward as possible for the purposes of this example. You will note that the data contains no missing data, the variables do not need to be renamed, and the congruency is appropriately named (example 2 will cover when this is not the case).
str(DPdata)
## 'data.frame':    3840 obs. of  6 variables:
##  $ subject   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ blockcode : Factor w/ 2 levels "block1","block2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ trialnum  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ congruency: Factor w/ 2 levels "Congruent","Incongruent": 1 2 1 2 1 2 1 2 1 2 ...
##  $ latency   : num  23.6 25 24.1 24.7 24.6 ...
##  $ correct   : num  1 1 1 1 1 1 1 1 1 1 ...
head(DPdata)
##   subject blockcode trialnum  congruency  latency correct
## 1       1    block1        1   Congruent 23.56481       1
## 2       1    block1        2 Incongruent 24.98990       1
## 3       1    block1        3   Congruent 24.08452       1
## 4       1    block1        4 Incongruent 24.70067       1
## 5       1    block1        5   Congruent 24.58380       1
## 6       1    block1        6 Incongruent 24.53391       1
sum(is.na(DPdata))
## [1] 0
In this example, we want to calculate the split-half reliability for block 1 and block 2. We want to use the “random” method to process 5000 iterations of randomly generated splits, and return estimates for both blocks. The function will take a little time to run (the time taken while creating this vignette was 23s).
library(splithalf)
## 
## Attaching package: 'splithalf'
## The following object is masked _by_ '.GlobalEnv':
## 
##     DPdata
DPsplithalf(DPdata, conditionlist = c("block1","block2"), halftype = "random", no.iterations = 5000)
## [1] "condition block1 complete"
## [1] "condition block2 complete"
## [1] "Calculating split half estimates"
## [1] "Split half estimates for 5000 random splits"
##   condition  N  splithalf spearmanbrown
## 1    block1 20 -0.9146333     -23.96023
## 2    block2 20 -0.9442398     -38.08540
What you see first is the console output. You will be informed as each condition has been processed and at which point the estimates are being calculated. This is purely so the user can keep track of the progress of the function.
Next, the function output. The data frame returns each condition separately. The N column indicates the number of participants’ data that has been processed, which is important to check as it is an early indication of missing data. The splithalf column is the raw split half estimate, in this case the average of 5000 random splits. The spearmanbrown column returns the spearman-brown corrected reliability estimate. You will quickly note that this estimate is flawed in this example; this is due to the high negative splithalf estimate. This is expected with this randomly simulated data.

The purpose of this example is to provide a more practical example of how to use the DPsplithalf() function with real data. Therefore, I have included snippets of script that will enable users to adapt their data where necessary. We will take the example step-by-step to cover many of the issues that could arise. First, note that the data frame does not actually contain any missing data, however, there is no data for subject 15 in block 2.
sum(is.na(DPdata_missing))
## [1] 0
head(subset(DPdata_missing, DPdata_missing$accuracy == 0))
##      subject  block trialnumber    trialtype responsetime accuracy
## 2785      15 block2           1   Congruent1     24.60937        0
## 2786      15 block2           2 Incongruent2     25.35842        0
## 2787      15 block2           3   Congruent1     25.84493        0
## 2788      15 block2           4 Incongruent2     23.74887        0
## 2789      15 block2           5   Congruent1     24.90278        0
## 2790      15 block2           6 Incongruent2     24.66794        0
From the head() data above we can also see that the variable names do not conform to what is needed to run the function. First, we will rename the Congruent and Incongruent trials. The function requires that a “congruency” variable specifies whether the trial is “Congruent” of “Incongruent” (case sensitive). One such method is to run the following. The code searches in the trialtype variable for strings that contain “Incongruent” and returns “Incongruent” if present, and “Congruent” if not. This simple line of script can be adapted to meet most namings. Note, ensure that the name searched for is not present in other conditions, otherwise these will be included too.
DPdata_missing$congruency <- as.factor(ifelse(grepl("Incongruent", DPdata_missing$trialtype), "Incongruent", "Congruent"))
str(DPdata_missing$congruency)
##  Factor w/ 2 levels "Congruent","Incongruent": 1 2 1 2 1 2 1 2 1 2 ...
Next, at the moment we have 4 different conditions in this task, however we want the same 2. This may happen for example, when you have two stimuli lists that are counterbalanced between participants. The next line of code is a slight variation on the above to return the blockname without the additional “a”. Also note that there are a large number of methods to do this within R. This example is meant to be informative to those with minimal R experience, as well as being applicable across a number of contexts. In short, the code tests whether the names of the block conform to either condition, and return that condition.
DPdata_missing$block <- as.factor(ifelse(DPdata_missing$block  == "block1" | 
                                         DPdata_missing$block  == "block1a",
                                         "block1",
                                  ifelse(DPdata_missing$block  == "block2" |
                                         DPdata_missing$block  == "block2a",
                                        "block2","")))

str(DPdata_missing$block)
##  Factor w/ 2 levels "block1","block2": 1 1 1 1 1 1 1 1 1 1 ...
Now that our congruency and block names are correct, all we need to do is ensure that the variable names are specified within the function. Reminder that the variable names are defaulted to; var.RT = “latency” var.condition = “blockcode” var.participant = “subject” var.correct = “correct” var.trialnum = ‘trialnum’
We will therefore specify; condition, trialnum, RT, correct
example2 <- DPsplithalf(DPdata_missing, conditionlist = c("block1","block2"),        halftype = "random", no.iterations = 5000, var.condition = "block",            var.trialnum = "trialnumber", var.RT = "responsetime", var.correct = "accuracy" )
## [1] "condition block1 complete"
## [1] "condition block2 complete"
## [1] "Calculating split half estimates"
## [1] "the following are participants/conditions with missing data"
##        condition participant
## 170001    block2          15
## [1] "note: these iterations will be removed from the split half\n          reliability calculations, in that condition"
## Warning in DPsplithalf(DPdata_missing, conditionlist = c("block1", "block2"), : Bias indices missing:
##           at least one participant has missing data from at one condition
##           These cases are removed from calculating reliability estimates
##           $omitted contains the missing cases
## [1] "Split half estimates for 5000 random splits"
example2$Estimates
##   condition  N  splithalf spearmanbrown
## 1    block1 20 -0.9256936     -27.82766
## 2    block2 19 -0.9136950     -23.87316
head(example2$omitted)
##        condition participant iteration bias1 bias2
## 170001    block2          15         1   NaN   NaN
## 170002    block2          15         2   NaN   NaN
## 170003    block2          15         3   NaN   NaN
## 170004    block2          15         4   NaN   NaN
## 170005    block2          15         5   NaN   NaN
## 170006    block2          15         6   NaN   NaN
You will notice that we have much more in our output than in example 1. You will see a warning message pointing out that there is missing data. In the console you will see a short message “the following are participants/conditions with missing data”, which is followed by a data frame showing which participants have have missing data from which conditions. The function will still calculate split half reliability. The function output two data frames (I also recommend loading the function into an object to ease this step). The Estimates data frame includes the split-half reliability estimates as described in example 1. This will also highlight which conditions have missing participants in the N column. The omitted data frame is a complete data frame of all omitted iterations. In this example we have one participant in one condition missing and therefore the omitted data frame contains all 5000 iterations of missing data.

function run time

An important question is; how long does this function take to run? The quicker the functions run, the more usable. Given that ideally we will run 5000 simulations as standard, we want the script to run quickly. No-body is keen to run a 2 hour script on 20 participants’ data to get only a single value as an output. Much of the recent development of this package has actually been reducing the run time of the functions. This has been very successful, going from some 45 minutes for roughly 40 participants in 2 conditions, to about 45 seconds. To illustrate the likely run-time of the script I have created the below figure. The first section of the script was used to simulate the data (I then replicated the output in the next section, in order to avoid the need to run the simulation to knit this document). Next is the data table, and the graph showing the run time in seconds across 1,2,3,4,5, and 10 conditions, with sample sizes of 10,20,30,40,50,100,200, and 500. This should give a rough idea of the likely run time of the DPsplithalf function for other data sets (note that the simulated data does not have missing values).
conditionnumber <- c(1,2,3,4,5,10)
samplesize <- c(10,20,30,40,50,100,200,500)
temp <- NA
times <- NULL

for(i in conditionnumber)
{
  for(j in samplesize)
  {
    # ensure the correct number of conditions
    if(i == 1) {
      conlist <- c("block1")
    } else if(i == 2) {
      conlist <- c("block1","block2")
    } else if(i == 3) {
      conlist <- c("block1","block2","block3")
    } else if(i == 4) {
      conlist <- c("block1","block2","block3","block4")
    } else if(i == 5) {
      conlist <- c("block1","block2","block3","block4","block5")
    } else {
      conlist <- c("block1","block2","block3","block4","block5",
                   "block6","block7","block8","block9","block10")
    }

  # generate the simulated data  
   temp <-  data.frame(subject = rep(1:j, each = (96*i)),
                       blockcode = rep(conlist, each = 96, length.out = i*j*96),
                       trialnum = rep(1:96, length.out = i*j*96),
                       congruency = rep(c("Congruent","Incongruent"), length.out = i*j*96),
                       latency = rep(rnorm(100,100,25), length.out = i*j*96),
                       correct = rep(1, length.out = i*j*96) )
  
    # run DPsplithalf
    time[j] <- system.time(DPsplithalf(temp, conditionlist = conlist, 
                                       halftype = "random", no.iterations = 5000))
   
   # save the data
   times <- rbind(times, c(i,j,time[j]))
   
   # keep track of the runs
   print(paste("completed",i,"conditions",j,"participants", sep = " "))
  }
}
times <- read.csv("timestable.csv")
summary(lm(data=times, System.time ~ Conditions + Sample.size))
## 
## Call:
## lm(formula = System.time ~ Conditions + Sample.size, data = times)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -552.79  -55.82    7.85   75.60 1138.02 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -234.5706    62.8531  -3.732 0.000531 ***
## Conditions    56.7447    11.3265   5.010 8.90e-06 ***
## Sample.size    1.8793     0.2124   8.847 2.09e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 228.4 on 45 degrees of freedom
## Multiple R-squared:  0.6967, Adjusted R-squared:  0.6832 
## F-statistic: 51.68 on 2 and 45 DF,  p-value: 2.2e-12
library(ggplot2)
times$Conditions <- as.factor(times$Conditions)
ggplot2::ggplot(times, aes(x = Sample.size, y = System.time, linetype = Conditions)) +
  geom_line(size = 1)
The figure highlights the run time for 48 simulations of the DPsplithalf function. As you can see, there is a linear increase in the run time as a function of the number of conditions and the sample size. It is worth noting that most research utilizing cognitive tasks such as the Dot-Probe recruit smaller sample sizes, and so for most data sets, the estimation of task reliability should take no more than a few minutes. A linear model also yields the following equation to estimate the amount of time needed for the function to run: -234 + 56.7(Conditions) + 1.9(Sample size). Please note that the operating system and the actual data will have a large effect on the time taken to run the functions.

References

Enock, P. M., Hofmann, S. G., & McNally, R. J. (2014). Attention Bias Modification Training Via Smartphone to Reduce Social Anxiety: A Randomized, Controlled Multi-Session Experiment. Cognitive Therapy and Research, 38(2), 200–216. http://doi.org/10.1007/s10608-014-9606-z*
LeBel, E. P., & Paunonen, S. V. (2011). Sexy But Often Unreliable: The Impact of Unreliability on the Replicability of Experimental Findings With Implicit Measures. Personality and Social Psychology Bulletin, 37(4), 570–583. http://doi.org/10.1177/0146167211400619*