climwin
is a newly released package in R to help users choose a time period or ‘window’ over which to conduct ecological research. Although the general focus of the package will concern the impacts of climate, other novel applications can also be considered. This vignette will give an initial introduction to the package and its basic features. Future vignettes will provide more detail on the ‘advanced’ features present within the package.
The characteristics of an organism are likely to change over time (e.g. body condition, behaviour), and these will influence the way in which an organism responds to its environment. As such, the time period or ‘window’ over which we choose to conduct ecological study can strongly influence the outcome and interpretation of our results. Yet there has been a tendency in ecological research to focus on a limited number of windows, often chosen arbitrarily. For example, there has been a strong focus on the biological impacts of average spring conditions.
Without a critical comparison of different possible windows, we limit our ability to make meaningful conclusions from our research. If a biological trait displays no response to climate, it is difficult to determine if this is evidence of insensitivity to climate or whether the choice of time period is flawed. Even when we find a relationship between climate and the biological response, we cannot be sure that we have selected the period where the trait is most sensitive. Therefore, there is a need for a more effective method through which we can select our study period.
Realistically, manually testing and comparing all possible windows can be difficult and time consuming. With climwin
we hope to overcome this problem by employing an exploratory approach to test and compare the effects of all possible windows. This will remove the need to arbitrarily select our study window, and will consequently improve our research outcomes and conclusions. Below we will outline the basic features of the climwin package, including how to carry out and interpret a climate window analysis.
climatewin
is the main function of the climwin package. It uses an exploratory approach to investigate all possible windows and compares them using values of AICc.
Let’s imagine we have a dataset containing two variables: chick mass at hatching and date of measurement. Chick mass is likely to be measured on different dates due to biological differences (e.g. hatching date), therefore our dataset will look something like this:
Date | Mass (g) |
---|---|
04/06/2015 | 120 |
05/06/2015 | 123 |
07/06/2015 | 110 |
07/06/2014 | 140 |
We want to understand the relationship between temperature and chick mass, so we also have a second dataset containing daily temperature data.
Date | Temperature |
---|---|
01/06/2015 | 15 |
02/06/2015 | 16 |
03/06/2015 | 12 |
04/06/2014 | 18 |
05/06/2014 | 20 |
06/06/2014 | 23 |
07/06/2014 | 21 |
As is often the case, we have no prior knowledge to help us select the best climate window. To overcome this issue, climatewin
will test all possible windows for us. In this case, we may start by testing the effect of average temperature 1-2 days before hatching. climatewin
will calculate the corresponding average temperature for each biological record and create a new combined dataset.
Date | Mass (g) | Temperature [1 - 2 days] |
---|---|---|
04/06/2015 | 120 | 14 |
05/06/2015 | 123 | 15 |
07/06/2015 | 110 | 21.5 |
07/06/2014 | 140 | 21.5 |
Now that we have this new dataset, we can test the relationship between mass and temperature and determine an AICc value (Burnham, Anderson and Hayvaert 2011 source). Let’s use a basic linear model:
lm(Mass ~ Temperature[1 - 2 days])
We’ve tested the relationship between temperature and chick mass in one climate window, but we don’t yet know how this window compares to others. climatewin
will go back and carry out the same process on all other windows and compare the AICc value of each of these windows to that of a null model containing no climate. Our final outcome can be seen below:
Window | Model AICc | Null model AICc | Difference |
---|---|---|---|
2 - 4 days | 1006 | 1026 | 20 |
1 - 3 days | 1015 | 1026 | 11 |
1 - 2 days | 1019 | 1026 | 7 |
4 - 5 days | 1020 | 1026 | 6 |
… | … | … | … |
With this simple comparison, we can see that the strength of the window 2 - 4 days before hatching is not only better than a model with no climate (i.e. the value of AICc is smaller), but also better than other tested climate windows. In this case, we could conclude that chick mass was most strongly influenced by average temperature 2 - 4 days before hatching.
While this example is simplistic, it provides insight into the methodology behind climatewin
. In practice, climatewin
can be applied to large datasets testing thousands of possible windows, streamlining what would otherwise be a difficult and time consuming process.
Now that we understand how climatewin
works, how do we go about using it? As with our simple example above, it is first necessary to create two separate datasets for the climate and biological data. These two datasets will be the basis for the climate window analysis.
To start using climatewin
it’s important to first understand the basic parameters. Below, we will discuss the parameters required to run a basic climate window analysis. More ‘advanced’ parameters will be discussed in additional vignettes.
For this example, we will use the Mass
and MassClimate
datasets from the climwin
package.
xvar
To begin, we need to determine the predictor variable which we want to study. The parameter xvar is a list object containing our variable of interest (e.g. Temperature, Rainfall). While we will focus here on climate, it is possible to apply our climatewin
methodology with non-climatic predictors.
xvar = list(Temp = MassClimate$Temperature)
cdate/bdate
Once we have established our predictor variable, we next need to tell climatewin
the location of our date data. These two parameters contain the date variable for both the climate dataset (cdate) and biological dataset (bdate).
NOTE: These date variables should be stored in a dd/mm/yyyy format
cdate = MassClimate$Date
bdate = Mass$Date
baseline
The parameter baseline determines the model structure for the null model. The structure of the baseline model is highly versatile, with the potential to use multiple model types (lm, lmer, glm, glmer), include covariates and weights.
baseline = lm(Mass ~ 1, data = Mass)
Climate data from each tested climate window will be added as a predictor to this baseline model, to allow for a direct comparison between the null model and each climate window.
cinterval
Ideally, our predictive data will be available at a resolution of days. However, in some circumstances this may not be possible, especially when considering non-climatic data. With this in mind, cinterval allows users to adjust the resolution of analysis to days, weeks or months. Note that the choice of cinterval will impact our interpretation of parameters furthest and closest (see below). In our current example, daily data is available and so a resolution of days will be used.
cinterval = "day"
furthest/closest
While our above example used a small dataset and tested a limited number of climate windows, the number of tested climate windows can be much larger, set by the parameters furthest and closest. These parameters set the upper and lower limits for tested climate windows. The values of furthest and closest will correspond to the resolution chosen in parameter cinterval. In this example we are interested in testing all possible climate windows anywhere up to 100 days before the biological record.
furthest = 100
closest = 0
type
There are two distinct ways to carry out a climate window analysis. In our example of chick mass we considered ‘variable’ climate windows, where the placement of the window will vary depending on the time of the biological response (e.g. 1-2 days before hatching). These variable windows assume that each individual may have the same relative response to climate, but the exact dates will vary between individuals.
Alternatively, there may be situations where we expect all individuals to respond to the same climatic period (e.g. average April temperature). In this case, we assume that all climate windows will have a ‘fixed’ date that will be taken as day 0. In this case, the additional parameters cutoff.day and cutoff.month will set the position of day 0 for the climate window analysis.
In our example using the Mass
dataset, we test fixed climate windows with a starting date of May 20th.
type = "fixed"
cutoff.month = 05,
cutoff.day = 20
stat
While we now have the ability to extract climate data for all possible climate windows, the aggregate statistic with which we analyse this data may also influence our results. Most commonly, we consider the mean value of climate within each window, yet it may be more appropriate to consider other possible aggregate statistics, such as maximum, minimum or slope.
The parameter stat allows users to select the aggregate statistic of their choice.
stat = "mean"
func
Although the relationship between climate and our biological response may often be linear, there is a potential for more complex relationships. The parameter func allows users to select from a range of possible relationships, including linear (“lin”), quadratic (“quad”), cubic (“cub”), logarithmic (“log”) and inverse (“inv”).
func = "lin"
After choosing each of our parameter values, we can finally input our choices into the climatewin
function.
The below climatewin
syntax will test the linear effect of mean temperature on mass across all windows between 0 and 100 days before May 20th.
NOTE: Analysis with climwin can require large amounts of computational power. Please be patient with the results.
library(climwin)
## Loading required package: ggplot2
## Loading required package: gridExtra
MassWin <- climatewin(xvar = list(Temp = MassClimate$Temp),
cdate = MassClimate$Date,
bdate = Mass$Date,
baseline = lm(Mass ~ 1, data = Mass),
cinterval = "day",
furthest = 100, closest = 0,
type = "fixed", cutoff.day = 20, cutoff.month = 05,
stat = "mean",
func = "lin")
The product of our climate window analysis is a list object, here we’ve called it MassWin
.
The object MassWin
has three key components.
Dataset is an ordered data frame summarising the results of all tested climate windows. The climate windows are ordered by delta AICc (the difference between the AICc of each model and the null model). Note, when calling the dataset we need to specify that we’re interested in the dataset from the first list item (i.e. [[1]]). As we will discuss below, it is possible to test multiple combinations of parameter levels, which will output multiple list items.
head(MassWin[[1]]$Dataset)
deltaAICc | WindowOpen | WindowClose | ModelBeta | … | ModWeight | … |
---|---|---|---|---|---|---|
-64.81496 | 72 | 15 | -4.481257 | … | 0.0278688 | … |
-64.55352 | 72 | 14 | -4.485161 | … | 0.0244539 | … |
-64.53157 | 73 | 15 | -4.510254 | … | 0.0241870 | … |
-64.40163 | 73 | 14 | -4.517427 | … | 0.0226655 | … |
-64.30387 | 72 | 13 | -4.500146 | … | 0.0215842 | … |
-64.20857 | 73 | 13 | -4.533436 | … | 0.0205799 | … |
As we can see above, the best climate window is 72 - 15 days before May 20th, equivalent to March and April temperature.
As the name suggests, BestModel is a model object showing the relationship between temperature and mass within the strongest climate window.
MassWin[[1]]$BestModel
Call:
lm(formula = Yvar ~ climate, data = modeldat)
Coefficients:
(Intercept) climate
163.544 -4.481
Finally, BestModelData is a data frame containing the raw climate and biological data used to fit BestModel.
head(MassWin[[1]]$BestModelData)
Yvar | climate |
---|---|
140 | 6.068966 |
138 | 6.160345 |
136 | 6.781034 |
135 | 6.877586 |
134 | 6.713793 |
134 | 6.120690 |
With the climatewin
code above, we have compared all possible climate windows for our Mass
dataset. However, one danger of the exploratory approach that we employ is that there is a risk of detecting seemingly suitable climate windows simply by statistical chance. In other words, if we fit enough climate windows one of them will eventually look good.
To overcome this concern, we have designed a number of plotting options that allow users to visualise their climate window data and determine if the ‘best’ window detected represents a real period of climate sensitivity or simply a statistical fluke.
We have designed 6 different plotting functions to help users visualise and interpret their climatewin
results. Below we will discuss each plot in detail.
As a first step, we can look at the distribution of delta AICc values across all tested climate windows. In the below plot, blue regions represent climate windows with limited strength (AICc values similar to the null model) while red regions show strong models.
In our Mass
example, we can see an obvious region of strong windows around the left of the graph. This seems to suggest that there is a clear area of sensitivity.
plotdelta(dataset = MassOutput)
Another way to visualise the delta AICc data is through the use of model weights (Burnham, Anderson & Huyvaert 2011 source). In short, the higher a model weight value the greater confidence we have that this model is the true ‘best’ model. While our top climate window is the most likely single window to best explain the data, our confidence that this top window represents the true ‘best’ model may still be low.
If we sum the weights of multiple models, we can now be more confident that we encompass the true best model, but we are less sure of its exact location. In our model weights plot we shade all models that make up the top 95% of model weights. Therefore, we can be 95% confident that the true ‘best’ model falls within the shaded region. In our example using the Mass
dataset, we can see that the top 95% of model weights falls within a small region roughly corresponding to the peak in delta AICc seen above. In fact, we can be 95% sure that the best climate window falls within only 7% of the total fitted models. This result suggests that our results are not produed by chance.
plotweights(dataset = MassOutput)
While we are often interested in estimating the location of the best climate window, we may also be interested in understanding the relationship between climate and our biological response. The plotbeta
function generates a similar colour plot to that used to model delta AICc, which shows the spread of model coefficients across all fitted climate windows. In the below plot, we can see that windows around our best model show a negative relationship between temperature and mass, while other models show little correlation.
plotbetas(dataset = MassOutput)
So far both our plots of delta AICc and model weights seem to suggest that there is a true period of climate sensitivity within our data. To better understand how likely it is that we would achieve such a result by chance we can also compare the outcomes of our climate window analysis with a similar analysis on randomised data (i.e. data with no climate signal). To generate this random data we can use the function randwin
, which uses a similar syntax to that of climatewin
. The additional parameter repeats determines the number of times that biological data should be randomised and analysed. We recommend using a minimum of 5 repeats for best results.
MassRand <- randwin(repeats = 5,
xvar = list(Temp = MassClimate$Temp),
cdate = MassClimate$Date,
bdate = Mass$Date,
baseline = lm(Mass ~ 1, data = Mass),
cinterval = "day",
furthest = 100, closest = 0,
type = "fixed", cutoff.day = 20, cutoff.month = 05,
stat = "mean",
func = "lin")
Now we can compare the distribution of delta AICc values for our original climate window analysis and the same analysis on randomised data. If our climate window results are due to statistical chance we would expect to see a similar distribution of delta AICc values between both random and real data.
As we can see below, the distribution of delta AICc values in the real data (top histogram) has a much longer tail than that of the randomised data (bottom histogram). The dashed line on the plot represents the 99th quantile of the randomised data, therefore we can be confident that the large negative values of delta AICc we find in our real data are unlikely to occur by random chance.
plothist(dataset = MassOutput, datasetrand = MassRand)
To represent the model weight plot in a different way, plotwin
shows boxplots of the start and end point of all climate windows that make up the top 95% of model weights. The values above each boxplot represent the median start and end time for these models.
In the below example, the median start and end values of the top models corresponds almost exactly with our best window determined using delta AICc. In situations where model weights are more dispersed such a match would be less likely.
plotwin(dataset = MassOutput)
Although the above plots give us strong confidence that our results represent a real period of climate sensitivity, we have not considered how well climate within this period actually explains variation in the biological data. Although temperature in March and April explains variation in mass better than at other points in time, temperature over this period may still not explain variation in the biological data that well.
Using the plotbest
function, we can plot our best model over the biological data. Examining this plot can help us determine whether it may be more appropriate to test alternative relationships of climate (e.g. quadratic or cubic) by adjusting the parameter func.
While it is easy to save the Dataset output from the climatewin
function, the BestModel object can often be lost when we start a new R session. To overcome this issue, the singlewin
function allows you to generate BestModel and BestModelData for a single climate window.
MassSingle <- singlewin(xvar = list(Temp = MassClimate$Temp),
cdate = MassClimate$Date,
bdate = Mass$Date,
baseline = lm(Mass ~ 1, data = Mass),
cinterval = "day",
furthest = 72, closest = 15,
type = "fixed", cutoff.day = 20, cutoff.month = 05,
stat = "mean",
func = "lin")
NOTE: In the singlewin function the values of furthest and closest now equate to the start and end time of the single window, rather than the range over which multiple windows will be tested.
We can use the output from the singlewin function to plot the best data.
plotbest(dataset = MassOutput,
bestmodel = MassSingle$BestModel,
bestmodeldata = MassSingle$BestModelData)
plotall
functionWhile each of these plots provides useful information for the user, they are best viewed in conjunction. The plotall
function allows users to create an array of all 6 graphics to contrast and compare the different information they provide.
plotall(dataset = MassOutput,
datasetrand = MassRand,
bestmodel = MassSingle$BestModel,
bestmodeldata = MassSingle$BestModelData)
In the above examples we have been investigating one combination of climatewin
parameters. However, there may be situations where users may wish to test different parameter combinations. For example, we may be interested in looking at both the linear and quadratic relationship of temperature or consider investigating the effects of both average and maximum temperature.
For ease of use, climatewin
allows users to test these different combinations with a single function. To take the above examples, we can test both linear and quadratic relationships
func = c("lin", "quad")
and consider both maximum and mean temperature values.
stat = c("max", "mean")
So our final function would be:
MassWin2 <- climatewin(xvar = list(Temp = MassClimate$Temp),
cdate = MassClimate$Date,
bdate = Mass$Date,
baseline = lm(Mass ~ 1, data = Mass),
cinterval = "day",
furthest = 100, closest = 0,
type = "fixed", cutoff.day = 20, cutoff.month = 05,
stat = c("max", "mean"),
func = c("lin", "quad"))
To view all the tested combinations we can call the combos
object
MassWin2$combos
Climate | Type | Stat | func | |
---|---|---|---|---|
1 | Temp | fixed | max | lin |
2 | Temp | fixed | mean | lin |
3 | Temp | fixed | max | quad |
4 | Temp | fixed | mean | quad |
Using this combos list, we can extract information on each of our tested combinations, as each row in the combos
table corresponds to a list item in our object MassWin2. For example, the BestModel object of the quadratic relationship using maximum temperature (3rd row) would be called using the following code:
MassWin2[[3]]$BestModel
Call:
lm(formula = Yvar ~ climate + I(climate^2), data = modeldat)
Coefficients:
(Intercept) climate I(climate^2)
139.39170 -1.33767 0.03332
This covers the basic functions of climwin
. For more advanced features additional vignettes will be provided with future updates. To ask additional questions or report bugs please e-mail: