The stgam package (Comber et al.
2024) provides a wrapper around mgcv functionality
(Wood 2025) to support space-time analysis
with Generalized Additive Models (GAMs), Although GAMs, as with any
regression, are useful predictors of a target variable, the focus here
is understanding relationships in the data and associated inference
(i.e., process understanding between the target and a set of predictor
variables). This is a common objective of geographical analysis, which
is often concerned with how processes vary spatially, temporally and
spatio-temporally.
This vignette illustrates an analysis workflow using GAM smooths that quantifies and models variations over space and time. It highlights the importance of prior investigations of spatial and / or temporal variability before constructing space-time GAMs.
This vignette acts as precursor, to a complementary vignette, where spatial, temporal and spatio-temporal varying coefficient GAMs are investigated. The GAMs specified here, only use surrogates for spatial or temporal effects, through location coordinates or date stamps, respectively. GAMs by design support varying coefficients but for this vignette, this variation is essentially investigated in ‘attribute-space’ only.
You should load the stgam package and the
chaco data. The data are sample of 2000 observations of
Normalised Difference Vegetation Index (NDVI) of the Chaco dry
rainforest in South America with some climate data. The NDVI and climate
data are found via Google Earth Engine (Gorelick
et al. 2017). The NDVI data is sourced from the PKU GIMMS NDVI
v1.2 dataset, which provides NDVI observations at 1/12° spatial
resolution at bi-monthly intervals from 1982 to 2022 (Li et al. 2023). For the climate data, Maximum
temperature (tmax) in degrees C and Precipitation
(pr) in millimetres were selected from the TerraClimate
dataset (IDAHO_EPSCOR/TERRACLIMATE) and their mean value was calculated
for each monthly image across all pixels.
library(cols4all)
library(dplyr)
library(ggplot2)
library(tidyr)
library(sf)
library(cowplot)
# load the package and data
library(stgam)
data("chaco")You should examine the help for the dataset and especially the
variables that chaco contains:
The first stage is simply to examine the data, particularly how variables many vary over space and time. The code below examines the data before undertaking some initial investigations.
# examine what is loaded
chaco
#> Simple feature collection with 2000 features and 11 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 3780862 ymin: 8212925 xmax: 3994096 ymax: 8432313
#> Projected CRS: SIRGAS 2000 / Brazil Mercator
#> # A tibble: 2,000 × 12
#> id ndvi tmax pr month date year lon lat X Y geometry
#> * <int> <dbl> <dbl> <int> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <POINT [m]>
#> 1 1 0.738 31.4 208 91 2020-01-01 2020 -52.7 -15.5 3919927. 8260790. (3919927 8260790)
#> 2 2 0.545 38.8 139 64 2017-10-01 2017 -52.0 -14.3 3994096. 8403808. (3994096 8403808)
#> 3 3 0.758 30 334 18 2013-12-01 2013 -53.5 -16.0 3836488. 8212925. (3836488 8212925)
#> 4 4 0.652 28.7 0 119 2022-05-01 2022 -53.2 -15.0 3864301. 8327636. (3864301 8327636)
#> 5 5 0.774 31.8 69 94 2020-04-01 2020 -52.8 -15.1 3910656. 8308557. (3910656 8308557)
#> 6 6 0.714 30 343 18 2013-12-01 2013 -53.2 -14.6 3864301. 8365752. (3864301 8365752)
#> 7 7 0.705 31.7 96 22 2014-04-01 2014 -53.2 -14.1 3864301. 8422815. (3864301 8422815)
#> 8 8 0.733 31.9 127 79 2019-01-01 2019 -52.3 -14.9 3966282. 8337171. (3966282 8337171)
#> 9 9 0.611 30.1 0 9 2013-03-01 2013 -53.9 -14.4 3790133. 8394299. (3790133 8394299)
#> 10 10 0.427 35 5 99 2020-09-01 2020 -52.8 -15.2 3910656. 8299011. (3910656 8299011)
#> # ℹ 1,990 more rowsThe analyses will model NDVI (the ndvi variable). GAMs
will be constructed that regress this target variable against available
predictor variables including location and time. The print out of the
first 10 records above shows that the dataset contains a time variable
(date) in date format as well as location in metres
(X and Y from the OSGB projection). It often
more useful to represent time as continuous scalar values and to have
location coordinates in kilometres rather than metres. The data also
contains a variable called month to represent time (here
months since the earliest date). The code below rescales the locational
data, but the original coordinates are retained for mapping in
geometry attribute of chaco:
chaco <-
chaco |>
# scale location and retain original coordinates
mutate(Xo = X, Yo = Y) |>
mutate(X = X/1000, Y = Y/1000) Again this can be examined:
The target variable can be mapped as in the code snippet below. The
map blow shows the spatial distribution of ndvi.
# map the data layers
chaco |>
ggplot() + geom_sf(aes(col = ndvi)) +
scale_color_viridis_c(option = "magma") +
theme_bw() +xlab("") + ylab("")ndvi
variable .Distributions and correlations for the continuous variables can be examined using boxplots / histograms and a correlation matrix, respectively. It is important to establish that any GAM with the proposed target, regressed against a set of predictor variables may yield something interesting and sensible, and just as importantly, to identify any variables that might be a problematic when trying to fit and interpret the resultant model.
For example, evidence of a highly skewed variable or one that has
outliers. This is the case for the proposed target variable
(ndvi) which is strongly positively skewed with potential
outliers.
The code below generates boxplots and then density histograms.
# boxplots
chaco |>
st_drop_geometry() |>
select(id, ndvi, tmax, pr) |>
pivot_longer(-id) |>
ggplot(aes(x = value), fil) +
geom_boxplot(fill="dodgerblue") +
facet_wrap(~name, scales = "free") +
theme_bw() +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())# histograms
chaco |>
st_drop_geometry() |>
select(id, ndvi, tmax, pr) |>
pivot_longer(-id) |>
ggplot(aes(x = value), fil) +
geom_histogram(aes(y=after_stat(density)),bins = 30,
fill="tomato", col="white") +
geom_density(alpha=.5, fill="#FF6666") +
facet_wrap(~name, scales = "free") +
theme_bw()Examining correlations, similarly checks for issues that may be a problem later on in the analysis workflow. Here the aim is to establish that there are reasonable correlations with the target variable and at the same time, little collinearity amongst the predictor variables. Note this may change when considering change in correlations over space, time or space-time (i.e., they are only examined globally, for now). See accompanying vignette.
# correlations
chaco |>
st_drop_geometry() |>
select(ndvi, tmax, pr) |>
cor() |> round(3)
#> ndvi tmax pr
#> ndvi 1.000 -0.418 0.500
#> tmax -0.418 1.000 -0.101
#> pr 0.500 -0.101 1.000Finally, thinking about a GAM-based space-time analysis of your dataset, as a general heuristic, any dataset for use in the space-time analysis should ideally have a minimum of about 100 locations and a minimum of about 50 time intervals. You should consider this when you are assembling your data. For space-time datasets with smaller dimensions, alternative regression methods suit, say a spatial panel regression when the time component is short (Harris et al. 2017).
Investigations in the previous section were all concerned with
establishing the viability of undertaking the proposed analysis, using
standard exploratory data analyses to examine distributions and
correlations. In this section, these are extended to investigate data
variability over time, over space and in space-time. This is done by
constructing a series of GAMs, of different forms and with different
settings. Tools for constructing GAMs using the mgcv
package (Wood 2025) are introduced but
with particular focus on using GAM smooths or splines to explore
variations.
As an initial baseline model, the code below creates a standard OLS
(ordinary least squares) regression for ndvi with two
predictors, using the lm function. The model fit is weak,
but the model summary indicates that the all of the predictor variables
(tmax and pr) are significant predictors of
the target variable, ndvi. An analysis of variance (ANOVA),
using the anova function, shows how much variance in
ndvi is explained by each predictor and whether that
contribution is statistically significant (via associated p-values,
Pr(>F)). Here high values in the F value
test statistic provides evidence that a given predictor is useful.In
summary, there is evidence the two predictors (tmax and
pr) are statistically significant predictors.
# an OLS model
m_ols <- lm(ndvi ~ tmax + pr, data = chaco)
summary(m_ols)
#>
#> Call:
#> lm(formula = ndvi ~ tmax + pr, data = chaco)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.305438 -0.071189 0.008027 0.074015 0.282366
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.394e+00 3.956e-02 35.24 <2e-16 ***
#> tmax -2.648e-02 1.254e-03 -21.11 <2e-16 ***
#> pr 4.817e-04 1.836e-05 26.24 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1013 on 1997 degrees of freedom
#> Multiple R-squared: 0.3866, Adjusted R-squared: 0.386
#> F-statistic: 629.3 on 2 and 1997 DF, p-value: < 2.2e-16
anova(m_ols)
#> Analysis of Variance Table
#>
#> Response: ndvi
#> Df Sum Sq Mean Sq F value Pr(>F)
#> tmax 1 5.8481 5.8481 569.95 < 2.2e-16 ***
#> pr 1 7.0659 7.0659 688.64 < 2.2e-16 ***
#> Residuals 1997 20.4906 0.0103
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1As as of yet, space, time and space-time influences on house price variation have not been investigated. GAMs with smooths offer a route to investigate these.
GAMs with splines (or smooths - the terms are used interchangeably) can be used to explore and quantify spatial and temporal variations. However before considering these aspects, it is important to outline what smooths do.
Smooths help to fit a (smooth) curve through a cloud of messy data points. They generate flexible fits (curves) able to adapt to the shape of the data. Because of this, they are able to describe the relationships between variables better than a straight line. For example, consider the scatterplot of some simulated data below: it would be difficult to fit a straight line through it.
# create simulated data
set.seed(12)
x <- runif(500)
mu <- sin(2 * (4 * x - 2)) + 2 * exp(-(16 ^ 2) * ((x - .5) ^ 2))
y <- rnorm(500, mu, .3)
# plot x and y
ggplot() +
geom_point(aes(x,y)) +
theme_bw()x and y
data.The code below uses a GAM smooth to determine the relationship
between x and y without having to pre-specify
any particular form (e.g. quadratic, exponential, etc). The wiggliness
of the fit is determined automatically by the s function
from the mgcv package. This is done in different ways
depending on the type of smooth that is specified (thin plate regression
smooths is the default in s). Effectively what the smooth
does is to split the data into chunks and fit piecewise linear
relationships, rather than fitting a single straight line as in a
standard linear regression. In doing so, GAMs implicitly allow for
non-linear relationships.
# a GAM illustration with a spline using s
gam_s_example <- gam(y~s(x))
# extract the smooth fit
y.s <- gam_s_example$fitted.values
# plot
ggplot() +
geom_point(aes(x,y), col = "grey") +
geom_line(aes(x, y = y.s), lwd = 1) +
theme_bw()x and y data
with GAM a Spline fitted.Thus, this diversion into smooths and splines illustrates how they
model different relationships (slopes) with y at different
locations within the variable feature space, (the x axis in
the above example). In this way, spline smooth curves can be used to
capture non-linear relationships in attribute-space (here the
relationship of x with y).
Importantly, in the context of space-time varying coefficient
modelling with stgam (see the accompanying vignette in this
package), the smooth can be used to model how relationships between
x and y varies with respect to time or
location, if the smooth is also parameterised with those. That is moving
from attribute-space to temporal- or geographical-space,
respectively.
It would be useful to examine the how the target variable,
ndvi, varies over time. One way of doing this is to extend
the use GAM smooths illustrated above. The code snippet below models the
variation in ndvi but now using time (the
month variable) as a predictor.
# the first GAM
gam.1 <- gam(ndvi~s(month), data = chaco)
summary(gam.1)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> ndvi ~ s(month)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.626637 0.002845 220.3 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(month) 8.69 8.972 7.579 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.0314 Deviance explained = 3.56%
#> GCV = 0.016264 Scale est. = 0.016185 n = 2000The smooth can be investigated by plotting ndvi with
time. Note the use of the actual data variable date rather
than month in the code below to have a friendly x-axis in
the plot, and the use of the predict function to extract
the standard errors on the fitted curve (i.e., the predictions of
NDVI):
# create a data frame with x, predicted y, standard error
x <- chaco$date
y <- gam.1$fitted.values
se <- predict(gam.1, se = TRUE, chaco)$se.fit
u <- y+se
l <- y-se
df <- data.frame(x, y, u, l)
# plot!
ggplot(df, aes(x, y, ymin = l, ymax = u)) +
geom_ribbon(fill = "lightblue") +
geom_line() +
theme_bw() +
xlab("Date") + ylab("NDVI") This exercise models variation in the relationship between
ndvi (NDVI) and month (time). The code above
extracts the predicted values (\(\hat{y}\)) of ndvi from the
model and plots them against time, but using the actual date. The plot
shows how the relationship of the target variable varies with time. This
provides confirmation of the potential suitability of a temporal
analysis of ndvi.
The spatially varying trends can be examined in similar way, again
using a standard s spline - but now with location variables
(i.e., the coordinates) to predict house price.
# the second GAM
gam.2 <- gam(ndvi~s(X,Y), data = chaco)
summary(gam.2)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> ndvi ~ s(X, Y)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.626637 0.002817 222.4 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(X,Y) 21.75 26.13 4.088 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.0502 Deviance explained = 6.05%
#> GCV = 0.016055 Scale est. = 0.015872 n = 2000
plot(gam.2, asp = 1)mgcv map of the spatial
smooth.Notice that the model fit (\(R^2\))
has not increased substantially from the previous
model. This suggests some important spatial effects and again the smooth
can be plotted, but this time rather than a line, it is a 2-Dimensional
surface. However, it would be useful to visualise this as a surface
rather than just over the chaco point locations. The code
below uses the predict function used to model the
relationship over a hexagonal grid (a surface) covering the extent of
study area, that contains X and Y attributes
of the location of the hexagonal cell centroid.
# create a grid object from the Chaco data
chaco_grid <-
st_make_grid(chaco, square=FALSE,n=20) |>
st_sf()
# rename the geometry, sort row names
st_geometry(chaco_grid) = "geometry"
rownames(chaco_grid) <- 1:nrow(chaco_grid)
# create and add coordinates X and Y / 1000
coords <- chaco_grid |> st_centroid() |> st_coordinates()
chaco_grid <- chaco_grid |> bind_cols(coords/1000)You may wish to inspect this object:
Before continuing with the mapping procedure:
# predict over the grid
yhat <- predict(gam.2, newdata = chaco_grid)
chaco_grid |>
mutate(yhat = yhat) |>
# and plot
ggplot() +
geom_sf(aes(fill = yhat), col = NA) +
# adjust default shading
scale_fill_continuous_c4a_seq("brewer.yl_or_rd", name = "NDVI") +
# apply and modify plot theme
theme_bw() +
theme(legend.position = "bottom",
legend.key.width = unit(1, "cm"))These are clear spatial trends (variations over space) of the NDVI
target variable. The map indicates high values in the centre of the
study area, as in the initial map of the data earlier, and again
provides confirmatory evidence that the some spatial modelling of
ndvi is appropriate.
The smooth approach can be extended to model the target variable in
space-time by including the location and time variables in the smooth.
However, a slightly different set of considerations are required because
of mixing space and time in the mgcv smooths. Consider the
code snippet below for the space-time smooths it specifies a smooth
using the te function rather than s and
contains other parameters in the d and bs
arguments. The reasons for this are described below.
# the third GAM
gam.3 <- gam(ndvi~te(X,Y,month, d = c(2,1), bs=c('tp','cr')), data = chaco)
summary(gam.3)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> ndvi ~ te(X, Y, month, d = c(2, 1), bs = c("tp", "cr"))
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.626637 0.002804 223.5 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> te(X,Y,month) 39.7 49.84 2.684 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.0592 Deviance explained = 7.79%
#> GCV = 0.016048 Scale est. = 0.015721 n = 2000It is important to consider the difference between
s(X,Y,month) (which would be a continuation of the format
above but has not been run) and
te(X,Y,month,d = c(2,1),bs=c('tp','cr')) before examining
the resultant GAM (gam.3). The spline functions
s() and te() can model smooth interactions
between multiple variables, but they handle dimension scaling, marginal
smoothness, and basis construction in different ways.
s() is an Isotropic smooth. It is used when
all variables are on the same scale and have similar smoothness
behaviours (like X and Y). By default it uses
a thin plate regression spline (bs = "tp") and a single
basis to model the attribute space. The formulation
s(X,Y,month) would therefore use a single basis to model a
smooth in 3D space, under the assumption that all the variables
(X, Y and month) contribute in
the same way and at the same scale. This is fine if the variables are
numeric, continuous, and in comparable units such as percentages. But
not in space-time.
te() is a Tensor Product (TP) smooth. These
are used when variables are on different scales or represent different
kinds of effects (like X, Y and
month). It builds a smooth using separate basis functions
for each variable and then combines them. The formulation
te(X,Y,month,d = c(2,1),bs=c('tp','cr')) includes the
d parameter that specifies the dimensions for each basis.
Here a 2D basis is specified for X and Y
(location) and a 1D basis for month (time). It also
includes a basis parameter bs which specifies a Thin Plate
Regression Spline (tp) for location and a Cubic Regression
Spline (cr) for time. Thin plate splines are invariant to
coordinate rotation and are ideal for spatial data (regardless of
whether the coordinates are in meters, kilometres, or degrees) and Cubic
regression splines are more efficient especially in 1D. The tensor
product smooth is constructed by constructing (marginal) smooths for
each and taking the tensor product of those to form the joint smooth. It
is useful when the variables to be included in the smooth have different
scales or units (e.g., X and Y are
coordinates, month is time) and different degrees of
smoothness are expected along each variable.
Thus, in this case a TP smooth is specified because space and time
need to be combined. The code below examines the results and uses the
mgcv plot function to show the spatial distributions over 9
discretised time chunks (~15 months).
However, it is also possible to slice and dice the time variable in
other ways. The code snippet below illustrates how this can be done for
monthly periods using the calculate_vcs function in the
stgam package. Note that if predictor variables are
included in the model then the function coefficient estimates, standard
errors and t-values for each of terms specified, as shown in the
accompanying vignette.
However, in this case we are just interested in the predicted value
of the target variable, yhat. The code below uses the grid
object created above (chaco_grid) as a spatial framework to
hold the prediction of the ndvi variable over these
discrete time periods.
# create time slices
years <- sort(unique(chaco$year))
# calculate over the grid for each time slice
res_out <- matrix(nrow = nrow(chaco_grid), ncol = 0)
for (i in 1:length(years)){
# convert years to months - here getting month 6
month.val <- ((years[i]-min(years)) * 12) + 6
res.i <- calculate_vcs(input_data = chaco_grid |> mutate(month = month.val),
mgcv_model = gam.3,
terms = NULL)
res_out <- cbind(res_out, res.i$yhat)
}
# name with years and join to the grid
colnames(res_out) <- paste0("Y_", years)
chaco_grid |>
cbind(res_out) |>
# select the variables and pivot longer
select(-X, -Y) |>
pivot_longer(-geometry) |>
# make the new object a factor (to enforce plotting order)
mutate(name = factor(name, levels = paste0("Y_", years))) |>
# and plot
ggplot() +
geom_sf(aes(fill = value), col = NA) +
# adjust default shading
scale_fill_continuous_c4a_seq("brewer.yl_or_rd", name = "Predicted\nNDVI") +
# facet
facet_wrap(~name, ncol = 4) +
# apply and modify plot theme
theme_bw() +
theme(
legend.position = "inside",
legend.position.inside = c(0.9, 0.17),
strip.background = element_rect(fill="white", colour = "white"),
strip.text = element_text(size = 8, margin = margin(b=4)),
axis.title=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank())This time series of maps show that without any predictor variables
(aside from location & time), the trends in NDVI varies
spatially and changes in intensity over time (the increase and decrease
in values over space), with changes in intensity but limited variation
in spatial pattern over time.
The formula used to construct the gam.3 model in the
previous subsection constructed a smooth that implicitly combined space
and time, under an assumption that the variations in space and time of
ndvi interact with each other. However this may not be the
case. More generally, spatial and temporal dependencies might not be
expected to interact in the data for a relatively small region, whose
observations might be expected to be subject to the same socio-economic
or environmental pressures. In this case, any space and time effects
might be expected to be independent of each other. Whereas in a
larger, for example national study, any space and time effects might be
expected to interact more strongly.
It is possible to use a different construction involving separate
smooths for space and time, with the s() smooth, and to
avoid the assumption of the interaction of space and time effects:
# the fourth GAM
gam.4 <- gam(ndvi~s(X,Y) + s(month), data = chaco)
summary(gam.4)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> ndvi ~ s(X, Y) + s(month)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.626637 0.002773 226 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(X,Y) 21.370 25.849 4.076 <2e-16 ***
#> s(month) 8.701 8.973 8.126 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.0799 Deviance explained = 9.37%
#> GCV = 0.015618 Scale est. = 0.015376 n = 2000Again there is evidence of spatial and temporal trends in the data
and these can be visualised using the plot.gam function in
mgcv (called with just plot).
mgcv plot of the spatial and
temporal smooths.Note that, the space-time interactions could be generated in the same
way as for the gam.3 model above, using the location
(X and Y) values in chaco_grid
and the time slices in years.
Up until now, only the target (response) variable, ndvi
has been examined for variations in time and space, using GAM smooths
specified in different ways. This was to explore space-time variability
of this variable, but also to introduce GAM smooths. The code below
includes the tmax predictor variable in a space-time GAM as
a fixed parametric term (i.e., in a similar way as in the OLS model
created above).
# the fifth GAM
gam.5 <- gam(ndvi~s(X,Y) + s(month) + tmax, data = chaco)
summary(gam.5)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> ndvi ~ s(X, Y) + s(month) + tmax
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.69211 0.04758 35.56 <2e-16 ***
#> tmax -0.03409 0.00152 -22.42 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(X,Y) 22.072 26.36 7.271 <2e-16 ***
#> s(month) 8.681 8.97 6.778 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.266 Deviance explained = 27.7%
#> GCV = 0.012473 Scale est. = 0.012268 n = 2000The model summary indicates that tmax is a significant
predictor of ndvi and improves the model fit. However, this
raises a key question of how should predictor variables be included in
the model: in (fixed) parametric form or in a (varying) smooth form?
This is addressed in the accompanying vignette.
As before the smooths can be plotted:
This vignette has introduced GAMs with smooths (splines) as a way of
investigating any space-time effects present in the data. The mechanics
of smooths was described and illustrated, and a brief introduction to
different smooth forms was provided. Variations in the target variable
(ndvi) in time, space and space-time were explored through
elementary model fitting and plotting the smooth graphical trends.
In this case, effects in space and time are present in the
ndvi target variable and it looks like there is a universal
spatial trend at all times, and universal temporal trend everywhere. The
final space-time GAM included a predictor variable (tmax)
which added some explanatory power but these were not examined with
respect to varying in space or in time or in space-time (i.e. through
the use of smooths). This is done in the accompanying vignette.
Such investigations are an important initial step. They provide evidence of space-time variations in the target variable - i.e. whether the effects in space and time are present in the data - and can help determine which predictor variables may be of further interest. They guide subsequent analysis and avoid making assumptions about the presence of space-time interactions, for example by plugging every predictor variable into a space-time smooth of some kind.