Abstract
Psycho is an R package that aims at providing tools for psychologists, neuropsychologists and neuroscientists, to transform statistical outputs into something readable that can be, almost directly, copied and pasted into a report. It also implements various functions useful in psychological science, such as correlation matrices, assessment plot creation or normalization. The package revolves around the psychobject. Main functions from the package return this type, and the analyze() function transforms other R objects (for now, only stan_lmer type) into psychobjects. Four functions can then be applied on a psychobject: summary(), print(), plot() and values(). Contrary to many other packages which goal is to produce statistical analyzes, psycho
’s goal is to fill the gap between statistical R output and statistical report writing, with a focus on APA formatting guidelines. Complex outputs, such as those of Bayesian linear models, are automatically transformed into readable text, important values are extracted and plots are drawn to illustrate the effects. Thus, the results can easily be incorporated into shareable reports and publications, saving time and preventing errors for better, reproducible, science.
# Do this once (uncomment if needed)
# install.packages("devtools")
# library(devtools)
# devtools::install_github("https://github.com/neuropsychology/psycho.R")
# Load psycho (at the beginning of every script)
library(psycho)
The package mainly revolves around the psychobject
. Main functions from the package return this type, and the analyze()
function transforms other R objects into psychobjects. Then, 4 functions can be applied on a psychobject: summary()
, print()
, plot()
and values()
.
It is possible to quickly run a correlation analysis on a dataframe with the flexible and powerful correlation()
function.
library(psycho)
df <- iris
cor <- psycho::correlation(df,
type = "full",
method = "pearson",
adjust = "none")
print(cor)
Sepal.Length | Sepal.Width | Petal.Length | |
---|---|---|---|
Sepal.Length | NA | NA | NA |
Sepal.Width | -0.12 | NA | NA |
Petal.Length | 0.87*** | -0.43*** | NA |
Petal.Width | 0.82*** | -0.37*** | 0.96*** |
You can save this correlation matrix using write.csv(print(cor), "correlation_table.csv")
. That is very useful to copy/paste it from excel to a paper or a report :)
You can also draw a quick visualization:
cor$plot()
correlation()
offers the possibility to run partial or semi-partial correleations.
library(psycho)
df <- iris
pcor <- psycho::correlation(df,
type = "partial",
method = "pearson",
adjust = "bonferroni")
print(pcor)
Sepal.Length | Sepal.Width | Petal.Length | |
---|---|---|---|
Sepal.Length | NA | NA | NA |
Sepal.Width | 0.63*** | NA | NA |
Petal.Length | 0.72*** | -0.62*** | NA |
Petal.Width | -0.34*** | 0.35*** | 0.87*** |
The normalize()
function allows you to easily scale and center all numeric variables of a dataframe. It is similar to the base function scale()
, but presents some advantages: it is tidyverse-friendly, data-type friendly (i.e., does not transform it into a matrix) and can handle dataframes with categorical data.
library(psycho)
library(tidyverse)
iris %>%
select(Species, Sepal.Length, Petal.Length) %>%
psycho::normalize() %>%
summary()
## Species Sepal.Length Petal.Length
## setosa :50 Min. :-1.86378 Min. :-1.5623
## versicolor:50 1st Qu.:-0.89767 1st Qu.:-1.2225
## virginica :50 Median :-0.05233 Median : 0.3354
## Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.67225 3rd Qu.: 0.7602
## Max. : 2.48370 Max. : 1.7799
This function is useful in clinical activity. It is sometimes necessary to show to the patient, his family or other members of staff, a visual representation of his score. The assess()
function also computes the percentile and the Z-score, often needed for neuropsychological reports.
library(psycho)
results <- psycho::assess(124, mean=100, sd=15)
# Print it
print(results)
## [1] "The participant (score = 124) is positioned at 1.6 standard deviations from the mean (M = 100, SD = 15). The participant's score is greater than 94.63 % of the general population."
# Plot it
plot(results)
This is possibly the most important function of the psycho
package. Its goal is to transform complex outputs of complex statistical routines into something readable, interpretable, and formatted. It is designed to work with frequentist and Bayesian mixed models, which is the central statistical routine used in psychological science.
Let’s start by creating a dataframe similar to those find in psychological science.
set.seed(666)
df <- data.frame(Participant = as.factor(rep(1:25, each = 4)),
Item = rep_len(c("i1", "i2", "i3", "i4"), 100),
Condition = rep_len(c("A", "B", "A", "B", "B"), 20),
Error = as.factor(sample(c(0, 1), 100, replace = T)),
RT = rnorm(100, 30, .2),
Stress = runif(100, 3, 5))
# Normalize the numeric variables.
df <- psycho::normalize(df)
# Take a look at the first 10 rows
head(df)
Participant | Item | Condition | Error | RT | Stress |
---|---|---|---|---|---|
1 | i1 | A | 1 | 0.2610666 | -1.5032539 |
1 | i2 | B | 0 | 1.2180393 | -1.3381086 |
1 | i3 | A | 1 | -0.6122813 | -1.6359922 |
1 | i4 | B | 0 | -0.5209097 | -0.1893384 |
2 | i1 | B | 0 | -0.4227089 | -0.2383872 |
2 | i2 | A | 1 | -0.7291277 | 1.1834562 |
This dataframe contains the data of 25 participants (labelled from 1 to 25), that saw 4 items (i1-i4) in two conditions (A and B). We measured, for each item, if the response was correct or not (Error), its reaction time (RT) and the stress associated with the trial.
In order to investigate the effect of the condition on the reaction time RT, the traditional, ancient and obsolete routines are to compute the mean for each participant, and run an ANOVA.
# Format data
df_for_anova <- df %>%
dplyr::group_by(Participant, Condition) %>%
dplyr::summarise(RT = mean(RT))
# Run the anova
anova <- aov(RT ~ Condition + Error(Participant), df_for_anova)
summary(anova)
##
## Error: Participant
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 24 15.13 0.6304
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 1 0.725 0.7254 1.102 0.304
## Residuals 24 15.793 0.6580
As we can see, the effect of condition is not significant (unsuprisingly, as data was generated randomly). One of the many flaws of this approach is that we lose information about intra-individual and item-related variability.
The use of the mixed-modelling framework allows us to add the items as random factors.
library(lmerTest)
fit <- lmerTest::lmer(RT ~ Condition + (1|Participant) + (1|Item), data=df)
# Traditional output
summary(fit)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula: RT ~ Condition + (1 | Participant) + (1 | Item)
## Data: df
##
## REML criterion at convergence: 282.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8414 -0.6905 -0.0882 0.7010 2.4840
##
## Random effects:
## Groups Name Variance Std.Dev.
## Participant (Intercept) 0.0000 0.0000
## Item (Intercept) 0.1036 0.3219
## Residual 0.9251 0.9618
## Number of obs: 100, groups: Participant, 25; Item, 4
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.09239 0.22142 5.81000 0.417 0.691
## ConditionB -0.15398 0.19633 95.00000 -0.784 0.435
##
## Correlation of Fixed Effects:
## (Intr)
## ConditionB -0.532
As the output is a bit messy, the analyze()
function will munge this into something nicely formatted.
results <- psycho::analyze(fit)
# We can extract a formatted summary table
summary(results, round = 2)
Effect_Size | Coef | SE | t | df | Coef.std | SE.std | p | |
---|---|---|---|---|---|---|---|---|
(Intercept) | Very Small | 0.09 | 0.22 | 0.42 | 5.81 | 0.00 | 0.0 | 0.69 |
ConditionB | Very Small | -0.15 | 0.20 | -0.78 | 95.00 | -0.08 | 0.1 | 0.43 |
We can also print it in a text format!
print(results)
## [1] "The overall model predicting ... successfully converged and explained 10.57% of the variance of the endogen (the conditional R2). The variance explained by the fixed effects was of 0.56% (the marginal R2) and the one explained by the random effects of 10.01%."
## [2] "The effect of (Intercept) was [NOT] significant (beta = 0.092, SE = 0.22, t(5.81) = 0.42, p > .1) and can be considered as very small (std. beta = 0, std. SE = 0)."
## [3] "The effect of ConditionB was [NOT] significant (beta = -0.15, SE = 0.2, t(95) = -0.78, p > .1) and can be considered as very small (std. beta = -0.076, std. SE = 0.097)."
However, as the frequentist framework is criticized, it is advised to switch to a Bayesian framework. However, the interpretation of these models is even more complex and unfamiliar to regular psychologists. But stay calm, because analyze()
handles this difficulty for you.
library(rstanarm)
fit <- rstanarm::stan_lmer(RT ~ Condition + (1|Participant) + (1|Item), data=df)
# Traditional output
results <- psycho::analyze(fit, Effect_Size=T)
summary(results, round=2)
Variable | MPE | Median | MAD | Mean | SD | CI_lower | CI_higher | Very_Large | Large | Medium | Small | Very_Small | Opposite |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | 65.30 | 0.10 | 0.25 | 0.10 | 0.3 | -0.50 | 0.70 | 0 | 0.02 | 0.05 | 0.27 | 0.31 | 0.35 |
ConditionB | 79.92 | -0.16 | 0.20 | -0.16 | 0.2 | -0.54 | 0.23 | 0 | 0.00 | 0.04 | 0.38 | 0.37 | 0.20 |
print(results)
## [1] "We fitted a Markov Chain Monte Carlo [type] model to predict[Y] with [X] (formula = RT ~ Condition + (1 | Participant) + (1 | Item)).Priors were set as follow: [INSERT INFO ABOUT PRIORS]."
## [2] "Concerning the effect of (Intercept), there is a probability of 65.3% that its coefficient is between 0 and 1.58 (Median = 0.1, MAD = 0.25, Mean = 0.1, SD = 0.3, 95% CI [-0.5, 0.7])."
## [3] "Based on Cohen (1988) recommandations, there is a probability of 0.22% that this effect size is very large, 1.62% that this effect size is large, 5.35% that this effect size is medium, 27.45% that this effect size is small, 30.65% that this effect is very small and 34.7% that it has an opposite direction(between 0 and 2e-04)."
## [4] "Concerning the effect of ConditionB, there is a probability of 79.92% that its coefficient is between -0.87 and 0 (Median = -0.16, MAD = 0.2, Mean = -0.16, SD = 0.2, 95% CI [-0.54, 0.23])."
## [5] "Based on Cohen (1988) recommandations, there is a probability of 0% that this effect size is very large, 0.18% that this effect size is large, 3.92% that this effect size is medium, 38.38% that this effect size is small, 37.45% that this effect is very small and 20.08% that it has an opposite direction(between 0 and 0.55)."
We can also plot the effects:
plot(results)
Obviously, you need to learn more about Bayesian analyses before running them. You can find more information in the rstanarm’s vignettes.