Saturday 6 May 2017

Correlation tables in presentations - BPS 2017 annual conference in Brighton post 1

Fresh minded from the BPS annual conference last week in Brighton I have a few ideas for posts in the work. My aim is to turn a few of my thoughts from the wide array of talks into a short series of blog posts. By way of opening comments, some of these thoughts come from a tired and grumpy me, and therefore I will be doing my best to turn these opinions into functional suggestions, without as Prof Andy Field put it "sounding like a self-righteous prick". So to begin with, here is my most positive one.

Correlation tables - presenting them as networks.

I am really fond of the growing approach of network analyses of psychological constructs. There is an active group on Facebook (https://www.facebook.com/groups/PsychologicalDynamics/), as well as being very present on twitter (@SachaEpskamp and @EikoFried are immediate examples). I think that this approach and way of thinking about related clusters of symptoms has the potential to inform us greatly about the role that symptoms play in the maintenance and development of psychiatric conditions.

But, this post is not about that approach specifically, rather I want to suggest and demonstrate how using one of the most basic network visualisations could make presenting correlations more visually pleasing and I believe more informative as well.

In theory a correlation table is the best way to present a series of correlations. After all, each correlation and its p value (or at least some stars) are included. Unfortunately, with increasing numbers of variables comes greater difficulty in easily scanning at the data. Take the table below as an example. We have 8 variables, four of which are random numbers I put in, and the other 4 contain randomly generated numbers with different means and SDs (purely for illustrative purposes).

dt <- data.frame(a = 1:8,
           b = c(8,6,3,7,8,4,8,7),
           c = c(2,5,7,8,3,7,10,14),
           d = c(1,4,6,1,1,3,4,5),
           e = rnorm(8, 25, 25),
           f = rnorm(8, 65, 25),
           g = rnorm(8, 25, 50),
           h = rnorm(8, 55, 90))

dt_cor <- cor(dt)
round(dt_cor,2)
##       a     b     c     d     e     f     g     h
## a  1.00  0.11  0.79  0.28  0.27 -0.52  0.21 -0.20
## b  0.11  1.00 -0.08 -0.58  0.66  0.26 -0.20 -0.37
## c  0.79 -0.08  1.00  0.59 -0.13 -0.43 -0.22 -0.42
## d  0.28 -0.58  0.59  1.00 -0.40 -0.29  0.14 -0.05
## e  0.27  0.66 -0.13 -0.40  1.00  0.00  0.38 -0.26
## f -0.52  0.26 -0.43 -0.29  0.00  1.00  0.08 -0.12
## g  0.21 -0.20 -0.22  0.14  0.38  0.08  1.00  0.43
## h -0.20 -0.37 -0.42 -0.05 -0.26 -0.12  0.43  1.00
True, this table only presents the correlations, and not the p values. but for the purposes of this post it should suffice. In a conference presentation or poster, given the short time that we have to look at these kinds of tables, it can be easy to miss or misinterpret the associations highlighted by the table.

Take instead the following few lines of code. For non-r users, in the example df is the data frame, and df_cor is the correlation table that is presented above. Now, plugging the dataframe into the qgraph function yields the below plot. Explaining the plot to an audience is easy - "In this plot, each variable is presented in a circle and the associations are the lines between them. Green represents positive correlations and red represents negative. The thicker the line, the larger the correlation. We have removed correlations under .2 as our minimum effect size of interest (you could also remove by a p value threshold, ideally corrected for multiple comparisons).

library(qgraph)
qgraph(dt_cor, details = TRUE, minimum = .2)
The same information is present in this figure as the correlation table, but is instantly 'readable'. In this example we can see strong positive associations between a-c and b-e, and negative between b-d, for example. Also importantly, we can quickly distinguish where there are greater and smaller correlations between our variables of interest. In the context of a conference poster or presentation, the reader can efficiently grasp the relationships between the variables in the analysis. We can also quickly see where relationships are not found (too low r or p value for example). This helps to ensure that the full picture is given, other than selecting only the significant values. With just a few lines of code, readership of large correlation tables can be made much more accessible. This barely scrapes the tip of the iceberg of what network analysis and qgraph can accomplish, but for the purposes of this post it will do to hopefully convince you that correlation tables can be pretty. 

In the next post reflecting my thoughts from the BPS conference will take a highlight of the replication crisis or revolution workshop.