A Geographer’s introduction to space-time regression with GAMs using stgam

2. Space-time GAM contruction and considertations

2.1 Overview

This section works through an example of constructing a space-time GAM to illustrate a workflow that allows for spatio-temporally varying terms, using the Chaco dataset. It fits an initial complex model and then, through investigation of the results, refines the how the model is specified. In so doing a number of important considerations are illustrated relating the bases that are specified, their dimensions, the GAM fitting method, the model family and the number of knots used in the smooths.

In the GAMs created in this vignette and this package, the Intercept is treated as an explicitly addressable term.

By way of example, consider the model m created below:

m  <- gam(ndvi~s(X,Y) + s(month) + pr, data = chaco)
summary(m)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ s(X, Y) + s(month) + pr
#> 
#> Parametric coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 5.595e-01  3.391e-03  165.01   <2e-16 ***
#> pr          5.366e-04  1.948e-05   27.54   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>             edf Ref.df      F p-value    
#> s(X,Y)   21.648 26.055  5.444  <2e-16 ***
#> s(month)  8.925  8.998 13.355  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.335   Deviance explained = 34.5%
#> GCV = 0.011302  Scale est. = 0.011117  n = 2000

This includes the terms s(X,Y) + s(month). These model the spatially varying and temporally varying Intercept plus error. Note that if this was a spatial model (without time), then just s(X,Y) would be included for the Intercept and similarly just s(month) for a purely temporal model.

The model can be reformulated as follows with an addressable Intercept term:

m <- gam(ndvi ~ 
           0 + Intercept + s(X,Y,by=Intercept) + s(month, by=Intercept) + 
           pr,
         data = chaco |> mutate(Intercept = 1))
summary(m)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ 0 + Intercept + s(X, Y, by = Intercept) + s(month, by = Intercept) + 
#>     pr
#> 
#> Parametric coefficients:
#>            Estimate Std. Error t value Pr(>|t|)    
#> Intercept 5.595e-01  3.391e-03  165.01   <2e-16 ***
#> pr        5.366e-04  1.948e-05   27.54   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                       edf Ref.df      F p-value    
#> s(X,Y):Intercept   21.648 26.055  5.444  <2e-16 ***
#> s(month):Intercept  8.925  8.998 13.355  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.335   Deviance explained = 97.3%
#> GCV = 0.011302  Scale est. = 0.011117  n = 2000

The model summary now includes Intercept as a parametric term (rather than (Intercept)) with s(X,Y):Intercept and s(month):Intercept as smooth terms, rather than s(X,Y)and s(month), but the values in both model summaries are all the same.

The rationale for treating the Intercept as addressable term is because, in varying coefficient modelling, specifying a varying intercept over space and or time detects situations when the response variable (the \(y\)-value) may show notably low or high values in a space-time locality that were not accounted for by variation in the explanatory values (Harris 2019). This could be caused, for example, if variables not recorded in the study were causing otherwise unexplained space-time patterns in the \(y\) variable. As such, a varying intercept commonly tends to capture autocorrelated errors (Harris 2019) and background variation that has the potential to affect other varying coefficient estimates. It also simplifies the interpretation of other varying coefficient estimates, particularly if the predictor variables are not mean-centred.

The models in this vignette include the Intercept as an addressable term and the code below creates this variable in the input data:

chaco <- 
  chaco |> 
  mutate(Intercept = 1)

2.2 Space-time GAMs

An initial space-time GAM is fitted using the code below. Notice that for each predictor variable and the explicitly specified intercept, the model specifies Tensor Product (TP) smooths to capture the variations in NDVI (ndvi) in both space and time simultaneously. These are tensor product smooths, specified using the te() function, and are described below. Notice also that global parametric parameters are also specified for tmax, pr and Intercept. This is to provide a multiplicative constant as well as a function around the coefficient term, and potentially an offset term.

# initial model
m1 <- gam(
  ndvi ~ 0 +
    Intercept + te(X,Y,month, by=Intercept,bs = c("tp", "cr"), d = c(2,1)) + 
    tmax + te(X,Y,month, by = tmax, bs = c("tp", "cr"), d = c(2,1)) +
    pr + te(X,Y,month, by = pr, bs = c("tp", "cr"), d = c(2,1)),
  data = chaco,
  method = "REML",
  family = gaussian()
)

Each of the space-time smooths is specified with Tensor Product (TP) smooth (te()) to model interactions between space and time margins. These construct smooth surfaces over space and time without random effects (Wood 2025). They enable variables that are measured over different scales (i.e. that are anisotropic), such as location coordinates and time, to be combined. The basis dimensions of each component are specified through the d parameter, here specifying a 2-dimensional smooth basis for location (X and Y) and a 1-dimensional one for time (month). These are combined via the TPs. Thin plate regression splines (tp) were specified to model location and low-rank cubic regression splines (cr) were specified for time. Thin plate regression splines are ideal for 2-D smooths because they are isotropic (they are rotation invariant with have no directional bias) and they perform well even if the observations are irregularly located over space. Cubic regression splines are efficient in 1-D and known to be good for seasonal or long-term effects. The low-rank basis avoids overfitting and improves computational efficiency. Finally, the model is specified with a Gaussian family and the model is fitted using a REML (Relaxed Maximum Likelihood) method. Model estimation in GAMs can be conducted in two ways: (a) predictive (i.e., out-of-sample minimization) via Generalized Cross-Validation (GCV) and (b) Bayesian (i.e. attach priors on basis coefficients) via some maximum likelihood approach. These tend to provide more stable estimates of the smoothing parameters (i.e., reduce over-fitting) and tend to provide more robust estimates of the variance. GCV is more computationally efficient but can over-fit, especially when errors are correlated. Note that GCV is set as the default in mgcv, but REML based approaches are preferred because they provide less biased estimates of variance and smoothing parameters, and are more robust during the optimization process.

The model can be examined using the appraise function in the gratia package (Simpson 2024; Simpson 2025) which generates diagnostic plots of the model residuals (the difference between the observed value of \(y\) (in this case ndvi) and the predicted value of \(y\), also referred to as \(\hat{y}\) :

appraise(m1, method = "simulate")

Diagnostic plots of m1.

The diagnostic plots show that the top and bottom and of the QQ plot are close to the theoretical fit line and that the modelled quantiles are close to the theoretical one. This indicates that there are a low number of residuals - observed quantiles that are not on the theoretical fit line. If there were a high number - i.e. the QQ plot had an S-shape - then this would suggest residual heterogeneity, potentially due to un-modelled interactions or due to an inappropriate assumption of a Gaussian model. In such cases other response should be investigated for example through histograms and specified accordingly via the family parameter that is passed to the GAM - see below.

2.2 Space-time GAM Refinements I: Model family

The assumptions behind a Gaussian model are that the target variable is normally distributed, meaning that it is continuous, has constant variance and is unbounded (i.e. it can theoretically take any value from \(-\propto\) to \(+\propto\)).

If the target variable has a non-normal distribution (a long tail, a skewed distribution, etc) then a different model family should be specified. For example, a target variable such as house price which has values that cannot be negative and may have some very high values, would suggest specifying a a Gamma family with a log link. This is used when the response is continuous, strictly positive, right-skewed (i.e., with a long tail to the right) and when the variance increases with the mean (i.e. is heteroscedastic). What this effectively does to provide a log-scale regression for positive, skewed data.

The point is that there are a number options for specifying the model family in a mgcv GAM, depending on the distribution of the response variable, as summarised in the table below. Further information can also be found in the help for the family function (enter ?family).

Common families and their properties.
Family	Response	Link	Example
Gaussian	continuous, symmetric	identity	residuals roughly normal
Gamma	continuous, positive, skewed	log	costs, durations, prices
Inverse Gaussian	continuous, positive, heavy-tailed	log	reaction times, dispersions
Poisson	count data	log	integer counts
Binomial	0–1 or proportion data	logit	success/failure outcomes

This vignette and the current version of the stgam package considers only a Gaussian, normally distributed response. Future versions will address other responses.

2.3 Space-time GAM Refinement II: Effect size

Having determined that correct model family has been specified for the response variable, the model summary can be examined:

summary(m1)

#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ 0 + Intercept + te(X, Y, month, by = Intercept, bs = c("tp", 
#>     "cr"), d = c(2, 1)) + tmax + te(X, Y, month, by = tmax, bs = c("tp", 
#>     "cr"), d = c(2, 1)) + pr + te(X, Y, month, by = pr, bs = c("tp", 
#>     "cr"), d = c(2, 1))
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> Intercept  1.562e+00  4.069e-02  38.385  < 2e-16 ***
#> tmax      -1.821e-02  5.604e-03  -3.249  0.00118 ** 
#> pr         5.020e-04  4.887e-05  10.273  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Approximate significance of smooth terms:
#>                           edf Ref.df     F  p-value    
#> te(X,Y,month):Intercept  5.01  5.014 6.435 6.35e-06 ***
#> te(X,Y,month):tmax      34.83 42.522 4.347  < 2e-16 ***
#> te(X,Y,month):pr        10.48 12.178 1.686   0.0591 .  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Rank: 375/377
#> R-sq.(adj) =  0.478   Deviance explained = 49.1%
#> -REML = -1758.3  Scale est. = 0.0087294  n = 2000

This indicates that pr specified in a space-time smooth is not significant but that all of the other model components are.

To unpick this further and to provide a further basis for model evaluation, it is also instructive to examine the size of the effect of each model component. The code below generates numeric summaries of the effects of each smooth by extracting the effect of each term from the model using the predict function. The typical size of each smooth’s contribution to predictions are summarised from their standard deviations (sd) and the total amplitude of the smooth effects (range), where higher values indicate a larger effect on the target variable. This is undertaken below and the code snippets have been wrapped into the effect_size function in stgam.

# extract the terms 
f1 <- predict(m1, type = "terms")
# generate numeric summaries  
apply(f1, 2, function(v) c(sd = sd(v), range = diff(range(v)))) |> round(3)

#>       Intercept  tmax    pr te(X,Y,month):Intercept te(X,Y,month):tmax te(X,Y,month):pr
#> sd            0 0.033 0.062                   0.239              0.234            0.011
#> range         0 0.242 0.282                   1.639              1.518            0.121

This shows the size of the effect of each model component. Considering the ranges, we can see the zero effect of the parametric Intercept and pr term and the low partial effect of the TP smooth for the pr term, suggesting a lack of interaction over space-time for this this predictor variable. A possible refinement in this case could be to remove the fixed Intercept and pr terms but we would like to retain them as constants and offsets. The range value of the combined space-time smooth for pr is relatively low (te(X,Y,month):pr) and it may be useful to examine pr over space and time separately. In the code below the individual space and time smooths are specified as thin plate regression smooths, in s() function in the mgcv package.

# second model with separate space-time effects for pr
m2 <- gam(
  ndvi ~ 0 +
    Intercept + te(X,Y,month, by=Intercept, bs = c("tp", "cr"), d = c(2,1)) + 
    tmax + te(X,Y,month, by = tmax, bs = c("tp", "cr"), d = c(2,1)) +
    pr + s(X,Y, by = pr) + s(month, by = pr),
  data = chaco,
  method = "REML",
  family = gaussian()
)

Then effect size can be examined using the effect_size function:

effect_size(m2)

#>       Intercept  tmax pr te(X,Y,month):Intercept te(X,Y,month):tmax s(X,Y):pr s(month):pr
#> sd            0 0.037  0                   0.219              0.214     0.046       0.107
#> range         0 0.271  0                   1.437              1.356     0.209       0.437

Now the size of the effect of pr is split and is larger over both space (tmax s(X,Y):pr) and time (s(month):pr), compared to the m1 GAM model. This demonstrate how evaluating and comparing models with different space-time smooths can help to determine whether to model them interactively or not, and what the changes in the magnitude of the modelled response surface might be.

2.4 Space-time GAM Refinement III: `k` the number of knots

In the mgcv GAM implementation the number of knots in each smooth are automatically determined. These control how ‘wiggly’ (non-linear) the the smooth can be. The number of knots in each smooth can be examined using the gam.check function. This provides some diagnostic information summarising the model fitting procedures can also be generated and also prints out some diagnostic plots (although not as informative as the ones created by the appraise function in the gratia R package).

gam.check(m2)

This checks whether the number of knots used in the smooths are too low. This is indicated when the effective degrees of freedom (EDF in the edf column) are close to the number of knots (k) and some of the k-index value are somewhat less than 1. In this case, the number of knots used in the temporal smooth for pr may be too low.

The number of knots in a used in a smooth are automatically determined but can be specified. In mgcv GAM smooths, k controls the basis dimension of the smooth term — i.e. the maximum possible complexity of the smooth, not the actual complexity. The effective degrees of freedom (edf) after fitting should be much smaller. If k is too small, the model fails to capture the true pattern of the relationships in the data (i.e. it fails to capture non-linearities) and can result in biased fits and residual structure. There are some approximate guides for specifying k that are summarised in the table below.

Heuristics for choosing `k` in common smooth types.
Smooth type	Heuristic	Notes
1D temporal smooth (e.g., days, years)	`k` ≈ min(n/10, 20–40)	Typically start around 10 basis functions; if long time series (> 1000 points), may go to 50–100
2D spatial smooth (e.g., te(X, Y))	`k` ≈ 50–200 total basis functions	Typically start around 50 basis functions; choose higher if data cover large/complex region
3D space–time smooth (e.g., te(X, Y, time))	`k` = product of marginal bases, e.g. `k = c(50, 20)`	mgcv handles tensor product decomposition; but do not let any single dimension be too small
Factor-by smooths (e.g., by = region)	`k` ≈ same as main smooth per level	You can use fewer if levels are many and data per level small

As a general rule of thumb, the k value should be doubled if the check reveals it to be too low. This is done for the temporal smooth specified for pr in the code below to create m3 and note that as k increases so does the time to fit the model:

# third model with increased k for the pr temporal smooth
m3 <- gam(
  ndvi ~ 0 +
    Intercept + te(X,Y,month, by=Intercept, bs = c("tp", "cr"), d = c(2,1)) + 
    tmax + te(X,Y,month, by = tmax, bs = c("tp", "cr"), d = c(2,1)) +
    pr + s(X,Y, by = pr) + s(month, k=20, by = pr),
  data = chaco,
  method = "REML",
  family = gaussian()
)

The gam.check function calls the k.check function, provides some interpretation notes and diagnostic plots. However, the k.check function can be called directly:

# check the model smooths
k.check(m3)

#>                          k'       edf   k-index p-value
#> te(X,Y,month):Intercept 124 11.297948 0.8969713       0
#> te(X,Y,month):tmax      125 34.676559 0.8971304       0
#> s(X,Y):pr                30  7.268772 0.7575468       0
#> s(month):pr              20 14.121735 0.5536068       0

This shows some improvement, with the EDF now increased but still close to k for the smooth and the k-index still much lower than 1. The model below tries to improve the fit by increasing k once more:

# fourth model with increased k for the pr temporal smooth
m4 <- gam(
  ndvi ~ 0 +
    Intercept + te(X,Y,month, by=Intercept, bs = c("tp", "cr"), d = c(2,1)) + 
    tmax + te(X,Y,month, by = tmax, bs = c("tp", "cr"), d = c(2,1)) +
    pr + s(X,Y, by = pr) + s(month, k=40, by = pr),
  data = chaco,
  method = "REML",
  family = gaussian()
)
# check the model smooths
k.check(m4)

#>                          k'       edf   k-index p-value
#> te(X,Y,month):Intercept 124  9.442720 0.9193043       0
#> te(X,Y,month):tmax      125 36.175682 0.9107176       0
#> s(X,Y):pr                30  7.941069 0.7444885       0
#> s(month):pr              40 36.853976 0.6355069       0

Again some improvement is shown but this can be nudged further:

# fifth model with increased k for the pr temporal smooth
m5 <- gam(
  ndvi ~ 0 +
    Intercept + te(X,Y,month, by=Intercept, bs = c("tp", "cr"), d = c(2,1)) + 
    tmax + te(X,Y,month, by = tmax, bs = c("tp", "cr"), d = c(2,1)) +
    pr + s(X,Y, by = pr) + s(month, k=80, by = pr),
  data = chaco,
  method = "REML",
  family = gaussian()
)
# check the model smooths
k.check(m5)

#>                          k'      edf   k-index p-value
#> te(X,Y,month):Intercept 124 15.71355 0.9177279       0
#> te(X,Y,month):tmax      125 37.46644 0.9162767       0
#> s(X,Y):pr                30  7.65725 0.7242695       0
#> s(month):pr              80 68.65270 0.8199559       0

Some further improvement is evident but this process of increasing k should continue. However, the number of knots cannot be increased infinitely due the data observations, the number of terms in the model and how they combine to determine the effective degrees of freedom (edf). The code below increases the number of knots by 40 as in this case there are only 120 unique values for months.

# sixth model with increased k for the pr temporal smooth
m6 <- gam(
  ndvi ~ 0 +
    Intercept + te(X,Y,month, by=Intercept, bs = c("tp", "cr"), d = c(2,1)) + 
    tmax + te(X,Y,month, by = tmax, bs = c("tp", "cr"), d = c(2,1)) +
    pr + s(X,Y, by = pr) + s(month, k=120, by = pr),
  data = chaco,
  method = "REML",
  family = gaussian()
)
# check the model smooths
k.check(m6)

#>                          k'       edf   k-index p-value
#> te(X,Y,month):Intercept 124 15.047285 0.9195204       0
#> te(X,Y,month):tmax      125 37.547213 0.9281455       0
#> s(X,Y):pr                30  7.533317 0.7161696       0
#> s(month):pr             120 80.443605 0.8443115       0

In this case the check shows that there is sufficiently a large difference between k (120) and edf (~80) and the k-index is closer to 1 (0.84). The final model can be examined the usual way:

appraise(m6, method = "simulate")

The effects of the terms can be examined as before:

effect_size(m6)

#>       Intercept  tmax pr te(X,Y,month):Intercept te(X,Y,month):tmax s(X,Y):pr s(month):pr
#> sd            0 0.023  0                   0.253              0.249     0.060       0.134
#> range         0 0.169  0                   1.326              1.381     0.305       0.663

The refined model above, m6, shows improvement, with the effect of pr over time relatively high (s(month):pr). Its effect over over space (s(X,Y):pr) is relatively low and potentially the spatial smooth for pr could be dropped. Some confirmation for this decision is provided when the module summary is examined:

summary(m6)

#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ 0 + Intercept + te(X, Y, month, by = Intercept, bs = c("tp", 
#>     "cr"), d = c(2, 1)) + tmax + te(X, Y, month, by = tmax, bs = c("tp", 
#>     "cr"), d = c(2, 1)) + pr + s(X, Y, by = pr) + s(month, k = 120, 
#>     by = pr)
#> 
#> Parametric coefficients:
#>            Estimate Std. Error t value Pr(>|t|)    
#> Intercept  1.238686   0.059797  20.715   <2e-16 ***
#> tmax      -0.012671   0.005584  -2.269   0.0234 *  
#> pr         0.000000   0.000000     NaN      NaN    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Approximate significance of smooth terms:
#>                            edf Ref.df      F  p-value    
#> te(X,Y,month):Intercept 15.047  20.21  2.287 0.000885 ***
#> te(X,Y,month):tmax      37.547  43.99  4.380  < 2e-16 ***
#> s(X,Y):pr                7.533  10.47  1.230 0.288856    
#> s(month):pr             80.444  86.84 12.649  < 2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Rank: 399/402
#> R-sq.(adj) =   0.67   Deviance explained = 69.3%
#> -REML = -1994.6  Scale est. = 0.0055213  n = 2000

However in this case, the m6 model is retained as the final model as the effect sizes are relatively strong for both components. It specifies a space-time TP smooth for the Intercept and for tmax, and separate spatial and temporal smooth for pr. All of the parametric terms are retained:

The size of the effects can be examined and compared with the original model m1:

# original model
effect_size(m1)

#>       Intercept  tmax    pr te(X,Y,month):Intercept te(X,Y,month):tmax te(X,Y,month):pr
#> sd            0 0.033 0.062                   0.239              0.234            0.011
#> range         0 0.242 0.282                   1.639              1.518            0.121

# tuned model  
effect_size(m6)

#>       Intercept  tmax pr te(X,Y,month):Intercept te(X,Y,month):tmax s(X,Y):pr s(month):pr
#> sd            0 0.023  0                   0.253              0.249     0.060       0.134
#> range         0 0.169  0                   1.326              1.381     0.305       0.663

The ranges of the final model (m6) still indicate the strong space-time interaction of NDVI with the tmax in the TP smooth (te(X,Y,month):tmax) and the interactions of pr in separate space and time smooths (s(X,Y):pr and s(month):pr)).

The AICs of the 2 models can be compared and the final model shows a dramatic improvement (i.e. reduction in value) from the original:

# AIC comparison
m1$aic; m6$aic

#> [1] -3751.768
#> [1] -4583.288

2.5. Extracting, summarising and plotting the coefficients

The spatially and temporally varying coefficients estimates generated by the model can be extracted in a number of different ways using the calulate_vcs function in the stgam package. There are basically 2 options for generating the varying coefficient estimates: i) over the original data points or ii) over spatial surfaces for specific time slices.

For the first option, the original data, the GAM and a vector of the model terms (i.e. predictor variables names) are simply passed to the calculate_vcs function. The second option requires slices of the space-time data to be passed to the calculate_vcs function and a bit more work to process the results. In both cases the results can be summarised and mapped.

Considering first the original observations:

vcs <- calculate_vcs(input_data = chaco,
                     mgcv_model = m6, 
                     terms = c("Intercept", "tmax", "pr"))

A summary over space and time of the coefficient estimates shows that the Intercept is large and varying and that tmax and pr also have varying relationships with the target variable over space and time, with negative and positive values in some places and / or at some times. Their values reflect the association of degrees Centigrade and millimetres of rainfall. Considering the mean values, each additional 1 degree of tmax is associated with a 0.0222 decrease in NDVI (biomass), and each additional millimetre of rainfall is associated with a 0.0013 increase in NDVI.

vcs |> 
  # drop the geometry if the object is spatial
  st_drop_geometry() |>
  select(starts_with("b_")) |>
  apply(2, summary) |> 
  round(4)
#>         b_Intercept  b_tmax    b_pr
#> Min.         0.4448 -0.0436 -0.0077
#> 1st Qu.      1.0610 -0.0284  0.0004
#> Median       1.2679 -0.0229  0.0007
#> Mean         1.2387 -0.0222  0.0013
#> 3rd Qu.      1.4349 -0.0165  0.0020
#> Max.         1.7712  0.0023  0.0082

These can be explicitly examined over time and notice the increasingly positive relationship with NDVI over time with the Intercept and the increasingly negative relationship of Maximum temperature. There is much more variation in the temporal trend of the relationship between NDVI and Precipitation.

vcs |> 
  st_drop_geometry() |>
  select(date, starts_with("b_")) |> 
  rename(`Intercept` = b_Intercept,
         `Max Temperature` = b_tmax,
         `Precipitation` = b_pr) |>
  pivot_longer(-date) |>
  group_by(date, name) |>
  summarise(
    lower = quantile(value, 0.25),
    median = median(value),
    upper = quantile(value, 0.75)
  ) |>
  ggplot(aes(x = date, y = median)) +
  geom_point(col = "blue", alpha = 0.2) +
  geom_smooth() +
  facet_wrap(~name, scale = "free_y") +
  theme_bw() + xlab("") + ylab("") +
  theme(strip.background = element_rect(fill="white"))
#> `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The variations over time of the final model coefficient estimates.

Standard ggplot approaches can be used to map the coefficient estimates as in the figure below, which summarises the coefficient estimates for pr (b_pr) by year. The code uses the year variable to facets the maps of the spatially varying coefficients. Notice the increasingly negative relationship of Maximum temperature with NDVI over time, with a distinct South West (low) to North East (high) gradient.

# title
tit <-expression(paste(""*beta[`tmax`]*"")) 
# plot 
ggplot()  + 
  geom_sf(data = vcs, aes(col = b_tmax)) + 
  scale_colour_continuous_c4a_div(palette="brewer.rd_yl_bu",name = tit) + 
  facet_wrap(~year) + 
  theme_bw() + 
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.17), 
      strip.background = element_rect(fill="white", colour = "white"), 
      strip.text = element_text(size = 8, margin = margin(b=4)),
        axis.title=element_blank(),
        axis.text=element_blank(),
    axis.ticks=element_blank())

The varying tmax (Maximum temperature) coefficient estimates over space and time.

A second approach is to estimate the varying coefficients over a grid. The code below creates the chaco_grid spatial object, in the same way as was done in the introductory vignette, but this time also adding dummy variables:

# create a grid object from the Chaco data
chaco_grid <- 
  st_make_grid(chaco, square=FALSE,n=20) |> 
  st_sf()
# rename the geometry, sort row names 
st_geometry(chaco_grid) = "geometry"
rownames(chaco_grid) <- 1:nrow(chaco_grid)
# create and add coordinates X and Y / 1000
coords <- chaco_grid |> st_centroid() |> st_coordinates() 
chaco_grid <- chaco_grid |> bind_cols(coords/1000)
# add dummy variables to the grid 
chaco_grid <- 
  chaco_grid |> 
  mutate(Intercept =  NA_real_, 
         tmax = NA_real_,
         pr = NA_real_)

You may wish to inspect this object and what it contains :

chaco_grid
plot(st_geometry(chaco_grid))

In order to generate annual gridded estimates of the coefficients over grid space, the model passed to the calculate_vcs function requires the space and time inputs that were used in the model: X, Y and month. The chaco_grid object has locational attributes but the temporal attributes need to be created. The for loop below does this, extracts the coefficient estimates for the specified month - here the sixth month in each year - and then combines them.

# create time slices
years <- sort(unique(vcs$year))
# calculate over the grid for each time slice
res_out <- matrix(nrow = nrow(chaco_grid), ncol = 0)
for (i in 1:length(years)){
  # convert years to months - here getting month 6
  month.val <- ((years[i]-min(years)) * 12) + 6
  res.i <- calculate_vcs(input_data = chaco_grid |> mutate(month = month.val),
                         mgcv_model = m6,
                         terms = c("Intercept", "tmax", "pr"))
  # select all the coefficient estimates 
  res.i <- 
    res.i |> 
    st_drop_geometry() |> 
    select(starts_with("b_"), 
           starts_with("se_"),
           starts_with("t_")) 
  # rename coefficients estimates
  names(res.i) <- paste0(names(res.i), "_", years[i])
  # bind to the result
  res_out <- cbind(res_out, res.i)
  cat(years[i], "\t")
}

The result can be joined to the grid and mapped. The code below does this for tmax (Maximum temperature) as this has space-time TP smooth. The maps describe the variations over time and space of the relationship between the tmax predictor variable and ndvi. The maps indicate the increasingly negative relationship over time and the South West to North East gradient as before.

# join to the grid
chaco_grid |> 
  cbind(res_out) |>
  # select the variables and pivot longer
  select(starts_with("b_tmax")) |> 
  # rename and select with transmute
  transmute(`2012` = b_tmax_2012, `2013` = b_tmax_2013,
            `2014` = b_tmax_2014, `2015` = b_tmax_2015,
            `2016` = b_tmax_2016, `2017` = b_tmax_2017,
            `2018` = b_tmax_2018, `2019` = b_tmax_2019,
            `2020` = b_tmax_2020, `2021` = b_tmax_2021,
            `2022` = b_tmax_2022
            ) |>  
  pivot_longer(-geometry) |>
  # make the new time object a factor (to enforce plotting order)
  mutate(name = factor(name, levels = years)) |>
  # 4. and plot 
  ggplot() + 
  geom_sf(aes(fill = value), col = NA) +
  # adjust default shading
  scale_fill_continuous_c4a_div("brewer.rd_yl_bu", name = tit) +
  # facet
  facet_wrap(~name, ncol = 4) +
  # apply and modify plot theme
  theme_bw() + 
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.17), 
      strip.background = element_rect(fill="white", colour = "white"), 
      strip.text = element_text(size = 8, margin = margin(b=4)),
        axis.title=element_blank(),
        axis.text=element_blank(),
    axis.ticks=element_blank())

The varying tmax (Maximum temperature) coefficient estimates over space and time and the grid surface.

2.6 Summary

The above subsections describes a sequence of model construction, investigation and refinement, which, after specifying an initial Gaussian model that incorporated space-time smooths for each predictor variable, required a number of GAM-related considerations to be expanded upon.

The initial model used Tensor Product smooths, under an assumption of space-time interactions, rather than separate space and time effects. These required the basis dimensions of each component to be specified (through the d parameter), here specifying a 2-dimensional basis for location and a 1-dimensional one for time).
Additionally thin plate regression smooths (tp) were used to model location because they are isotropic with no directional bias) and can handle irregularly located observations. Cubic regression smooths (cr) were used to model time effects because there known to be good at handling seasonal or long-term effects.
The model was fitted using REML as this tends to provide more stable estimates of the smoothing parameters and robust estimates of the variance. However, model diagnostic investigations suggested residual heterogeneity, potentially due to the incorrect assumption of a Gaussian response.
A sequence of refinements were then undertaken and a number of considerations to be discussed: different model families and when they should be used, quantifying the magnitude of the effect size of the model terms, and checking and adjusting the knots used in model fitting procedure in order to determine the final model form.

In this case, the final model specified was Gaussian with a combined space-time TP smooth for tmax and a spatial smooth for pr with the number of knots set to 120.

So although statistical practice tends towards fitting the more complex model, penalizing the estimated effects and then evaluating the magnitude of those estimated effects, as was done here, it also suggests that there may be some benefit in undertaking some kind of automatic model selection (see Section 3).

3. Working with `stgam`: model selection

The models constructed in Section 2 started with an initial model that was gradually refined through a series of investigations. Each predictor variable and the Intercept was modelled a parametric term with combined space-time smooths in the initial m1 model. After investigation and refinement the tmax predictor variable was included in parametric form and within a space-time TP smooth and pr in a temporal smooth with the number of knots specified (\(k = 120\)).

This poses the question of which model form to use? How can the model best be specified? A key focus of the stgam package is to answer this question by creating and evaluating multiple models.

3.1 Using AIC to evaluate models

One way to evaluate models is to compare them using some metric. Commonly AIC (Akaike 1998) is used to compare models. It is an estimator of the leave-one-out cross validated prediction error and provides a parsimonious measure of model fit that balances model complexity and prediction accuracy. Using AIC to evaluate different models therefore identifies the model with the lowest prediction error.

The code below extracts the AIC from each of the models specified in the last section and prints them out in order. Here it can be seen that the m6 model has the lowest prediction error (lowest AIC), with pr specified in separate space and time smooths.

data.frame(Model = paste0("m", c(1:6)),
           AIC = c(m1$aic, m2$aic, m3$aic, m4$aic, m5$aic, m6$aic)) |> 
  # rank the models 
  arrange(AIC)

#>   Model       AIC
#> 1    m6 -4583.288
#> 2    m5 -4559.372
#> 3    m4 -4087.079
#> 4    m3 -3854.758
#> 5    m2 -3830.972
#> 6    m1 -3751.768

3.2 Model selection: determining GAM form

Given the the model investigation and refinement in the previous section and the model evaluations by AIC above, one option to determining model form is to create a series of models and identify the one with the best fit through AIC, given the caveats described above.

For a space-time GAM each predictor variable could be specified in one of 6 ways:

It is omitted.
It is included as a parametric response with no smooth.
It is included in parametric form and in a spatial smooth with location.
It is included in parametric form and in a temporal smooth with time.
It is included in parametric form and in a single space-time smooth.
It is included in parametric form and in 2 separate space and time smooths.

The intercept can be treated similarly, but without it being absent (i.e. 5 options). Obviously there are 3 options for each predictor variable in a spatial model (i., ii. and iii.) and in a temporal model (i., ii. and iv.), with each having 2 options for the intercept. Thus for purely spatial or temporal regressions, with \(k\) predictor variables, there are \(2 \times 3^k\) potential models to evaluate. For a space-time there are \(5 \times 6^k\) potential models.

Recall that the chaco dataset contains two numeric variables that could be of use in explaining the space-time variations in the ndvi target variable, well as location (X and Y) and time (month)

chaco |> st_drop_geometry() |> head()
#> # A tibble: 6 × 14
#>      id  ndvi  tmax    pr month date        year   lon   lat     X     Y       Xo       Yo Intercept
#>   <int> <dbl> <dbl> <int> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>     <dbl>
#> 1     1 0.738  31.4   208    91 2020-01-01  2020 -52.7 -15.5 3920. 8261. 3919927. 8260790.         1
#> 2     2 0.545  38.8   139    64 2017-10-01  2017 -52.0 -14.3 3994. 8404. 3994096. 8403808.         1
#> 3     3 0.758  30     334    18 2013-12-01  2013 -53.5 -16.0 3836. 8213. 3836488. 8212925.         1
#> 4     4 0.652  28.7     0   119 2022-05-01  2022 -53.2 -15.0 3864. 8328. 3864301. 8327636.         1
#> 5     5 0.774  31.8    69    94 2020-04-01  2020 -52.8 -15.1 3911. 8309. 3910656. 8308557.         1
#> 6     6 0.714  30     343    18 2013-12-01  2013 -53.2 -14.6 3864. 8366. 3864301. 8365752.         1

Thus with \(k = 2\) variables could are 180 potential space-time models to evaluate and 18 potential models in a spatial or temporal case study. The example below uses these 2 predictor variables evaluate 180 potential space-time models. This is done via the evaluate_models() function in the stgam package. This constructs and compares the full set of potential models, each specifying each predictor variable and the intercept in a different way. The separate space and time smooths are specified as thin plate regression smooths using the s() function in mgcv and the combined space-time smooths are specified as Tensor Product (TP) smooths (te()) with a 2D thin plate smooth for location and 1D cubic regression smooth for time as in Section 2.2.

The function also includes two optional logical parameters for adjusting the number of knots specified in each smooth. The first is k_set. This allows the user to specify k in the smooths via the spatial_k and temporal_k parameters. These applied to the thin plate spline spatial and temporal smooths individually, and combined in the TP smooths. The second is k_increase This checks on the number of knots specified in each smooth generated by the gam function relative to the effective degrees of freedom, as in the outputs of the k.check function above. If the increase_k parameter is set to TRUE, then a threshold of the ratio of the number of knots in each smooth to its EDF can be specified via the k2edf_ratio parameter and if any smooth has a knots-to-EDF ratio less than this value (i.e. the number of knots is close to the EDF), then the knots are iteratively increased until the threshold check is passed, the number knots passes the maximum degrees of freedom (as in the worked example in Section 2.4 above), or the number of iterations (max_iter) is reached. The the default is that they are doubled on each iteration, but this too can be changed via the k_multiplier parameter. If the knots do not need to be increased, the ones generated in the fitting of the GAM by the mgcv package are retained. The example below applies the second of these, k_increase.

Each model is evaluated via its AIC value. AIC (Akaike 1998) provides an estimate of the leave-one-out cross-validated prediction error and the model with lowest AIC can be selected.

The code below specifies a spatially and temporally varying coefficient model (STVC) using the tmax and pr variables determine a space-time regression mode. Note that the function specifies ncores to be set to 2 by default. This is to pass CRAN diagnostic checks for package submission - you may want to specify more using detectCores()-1 from the parallel package. Note also that the input data needs to be a data.frame or tibble object and it needs to have a defined Intercept term. This can be created in the following way input_data |> mutate(Intercept = 1), as was done at the end of Section 1 for chaco.

library(doParallel)
# ncores <- detectCores()-1
t1 <- Sys.time()
stvc_mods <- evaluate_models(
  input_data = chaco |> st_drop_geometry(),
  target_var = "ndvi", 
  family = "gaussian()",
  vars = c("tmax", "pr"),
  coords_x = "X", 
  coords_y = "Y", 
  VC_type = "STVC", 
  time_var = "month",
  k_set = FALSE,
  spatial_k = NULL, 
  temporal_k = NULL,
  k_increase = TRUE,
  k2edf_ratio = 1.5, 
  k_multiplier = 2, 
  max_iter = 10,
  ncores = 15
) 
Sys.time() - t1  # about 29 minutes on M4 Mac with 64GB of RAM

The best \(n\) models can be extracted (i.e. those with the lowest AIC score) using the gam_model_rank() function. The result of doing this is shown below. Interestingly few of the smooths in the top 10 models contain combined space-time TP smooths (te(ST)), indicating a lack of interaction between spatial and temporal effects. The separate spatial (s(S)) and temporal (s(T)) smooths for the Intercept have had \(k\) increased as have the spatial smooths for for tmax and pr. Also note that the top 3 models variously specify similar spatial smooths or spatial plus temporal smooths for each of the predictor variables and the similarity of the AIC values. Overall this suggests that while there are spatial and temporal dependencies with the target variable, these are not interacting over space-time.

mod_comp <- gam_model_rank(stvc_mods, n = 10)
# have a look
mod_comp |> select(-c(f, ks)) 
#>    Rank           Intercept                 tmax                  pr       AIC
#> 1     1 s(S,k=29)+s(T,k=65)           s(S,k=240)           s(S,k=30) -5776.332
#> 2     2 s(S,k=29)+s(T,k=65)           s(S,k=240) s(S,k=30)+s(T,k=80) -5774.729
#> 3     3           s(T,k=65)           s(S,k=240)           s(S,k=30) -5767.840
#> 4     4        te(ST,k=124) s(S,k=240)+s(T,k=80)           s(S,k=30) -5766.923
#> 5     5           s(T,k=65)           s(S,k=240) s(S,k=30)+s(T,k=80) -5766.317
#> 6     6        te(ST,k=124) s(S,k=240)+s(T,k=80) s(S,k=30)+s(T,k=80) -5765.223
#> 7     7           s(S,k=29) s(S,k=240)+s(T,k=80)           s(S,k=30) -5764.361
#> 8     8  s(S,k=29)+s(T,k=9) s(S,k=240)+s(T,k=80)           s(S,k=30) -5762.848
#> 9     9           s(S,k=29) s(S,k=240)+s(T,k=80) s(S,k=30)+s(T,k=80) -5762.526
#> 10   10 s(S,k=29)+s(T,k=17) s(S,k=240)+s(T,k=80) s(S,k=30)+s(T,k=80) -5761.027

3.3 The ‘best’ model

The highest ranked model can be specified for use in analysis. This includes the Intercept in separate space-time smooths and spatial smooths for tmax and pr. The formula can be extracted and inspected:

f <- as.formula(mod_comp$f[1])
f
#> ndvi ~ Intercept - 1 + s(X, Y, by = Intercept) + s(month, by = Intercept, 
#>     k = 66) + tmax + s(X, Y, by = tmax, k = 240) + pr + s(X, 
#>     Y, by = pr)

The formula can be used to specify a mgcv GAM using a REML approach. The resulting model is checked for over-fitting using the k.check function in themgcv package. Here the k-index values are near to 1, the k' and , except for the temporal smooth for the Intercept the edf the edf values are all much less than the k' values, so this model is reasonably well tuned.

# specify the model
gam.m <- gam(f, data = chaco, method = "REML")
# check k
k.check(gam.m)
#>                     k'        edf   k-index p-value
#> s(X,Y):Intercept    29   2.001361 0.8462690    0.00
#> s(month):Intercept  65  60.299998 0.9875535    0.31
#> s(X,Y):tmax        240 151.743690 0.8462690    0.00
#> s(X,Y):pr           30  16.936022 0.8462690    0.00

A summary of the model can be examined and this shows that of the parametric terms only the Intercept is significant as all fo the spatial and temporal smooths.

summary(gam.m)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ Intercept - 1 + s(X, Y, by = Intercept) + s(month, by = Intercept, 
#>     k = 66) + tmax + s(X, Y, by = tmax, k = 240) + pr + s(X, 
#>     Y, by = pr)
#> 
#> Parametric coefficients:
#>           Estimate Std. Error t value Pr(>|t|)    
#> Intercept  1.01706    0.06575  15.469   <2e-16 ***
#> tmax      -0.01461    0.01373  -1.064    0.287    
#> pr         0.00000    0.00000     NaN      NaN    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                        edf  Ref.df      F p-value    
#> s(X,Y):Intercept     2.001   2.002  5.103 0.00616 ** 
#> s(month):Intercept  60.300  64.022 45.832 < 2e-16 ***
#> s(X,Y):tmax        151.744 192.520  4.808 < 2e-16 ***
#> s(X,Y):pr           16.936  21.658  2.215 0.00106 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Rank: 365/367
#> R-sq.(adj) =  0.825   Deviance explained = 84.6%
#> -REML =  -2572  Scale est. = 0.00292   n = 2000

3.4 Plot and map the results

The varying coefficient estimates can be extracted, examined, plotted and mapped as before using the code below. There are no temporal smooths specified in the model for the for the predictor variables tmax and pr and perhaps unsurprisingly the general trends of coefficient estimates and the relationships with NDVI are over time that they describe, are relatively consistent.

# extract the VCs
vcs <- calculate_vcs(input_data = chaco,
                     mgcv_model = gam.m, 
                     terms = c("Intercept", "tmax", "pr"))
# examine
vcs |> 
  # drop the geometry if the object is spatial
  st_drop_geometry() |>
  select(starts_with("b_")) |>
  apply(2, summary) |> 
  round(5)
#>         b_Intercept   b_tmax     b_pr
#> Min.        0.67549 -0.01930 -0.00018
#> 1st Qu.     0.92539 -0.01420 -0.00010
#> Median      1.02336 -0.01235 -0.00004
#> Mean        1.01706 -0.01230 -0.00004
#> 3rd Qu.     1.11573 -0.00984  0.00000
#> Max.        1.29073 -0.00436  0.00012


# plot over time
vcs |> 
  st_drop_geometry() |>
  select(date, starts_with("b_")) |> 
  rename(`Intercept` = b_Intercept,
         `Max Temperature` = b_tmax,
         `Precipitation` = b_pr) |>
  pivot_longer(-date) |>
  group_by(date, name) |>
  summarise(
    lower = quantile(value, 0.25),
    median = median(value),
    upper = quantile(value, 0.75)
  ) |>
  ggplot(aes(x = date, y = median)) +
  geom_point(col = "blue", alpha = 0.2) +
  geom_smooth() +
  facet_wrap(~name, scale = "free_y") +
  theme_bw() + xlab("") + ylab("") +
  theme(strip.background = element_rect(fill="white"))

The variations over time of the model coefficient estimates. Next they can be mapped over space and time using the original observation locations and the year of observation. Notice that the gradient is now South to North, with little variation over time:

# title
tit <-expression(paste(""*beta[`tmax`]*"")) 
# plot 
ggplot()  + 
  geom_sf(data = vcs, aes(col = b_tmax)) + 
  scale_colour_continuous_c4a_div(palette="brewer.rd_yl_bu",name = tit) + 
  facet_wrap(~year) + 
  theme_bw() + 
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.17), 
      strip.background = element_rect(fill="white", colour = "white"), 
      strip.text = element_text(size = 8, margin = margin(b=4)),
        axis.title=element_blank(),
        axis.text=element_blank(),
    axis.ticks=element_blank())

The varying tmax (Maximum temperature) coefficient estimates over space and time.

This lack of temporal variation is confirmed if the coefficient estimates are mapped annually over the grid:

# create time slices
years <- sort(unique(vcs$year))
# calculate over the grid for each time slice
res_out <- matrix(nrow = nrow(chaco_grid), ncol = 0)
for (i in 1:length(years)){
  # convert years to months - here getting month 6
  month.val <- ((years[i]-min(years)) * 12) + 6
  res.i <- calculate_vcs(input_data = chaco_grid |> mutate(month = month.val),
                         mgcv_model = gam.m,
                         terms = c("Intercept", "tmax", "pr"))
  # select all the coefficient estimates 
  res.i <- 
    res.i |> 
    st_drop_geometry() |> 
    select(starts_with("b_"), 
           starts_with("se_"),
           starts_with("t_")) 
  # rename coefficients estimates
  names(res.i) <- paste0(names(res.i), "_", years[i])
  # bind to the result
  res_out <- cbind(res_out, res.i)
  # cat(years[i], "\t")
}
# join to the grid
chaco_grid |> 
  cbind(res_out) |>
  # select the variables and pivot longer
  select(starts_with("b_tmax")) |> 
  # rename and select with transmute
  transmute(`2012` = b_tmax_2012, `2013` = b_tmax_2013,
            `2014` = b_tmax_2014, `2015` = b_tmax_2015,
            `2016` = b_tmax_2016, `2017` = b_tmax_2017,
            `2018` = b_tmax_2018, `2019` = b_tmax_2019,
            `2020` = b_tmax_2020, `2021` = b_tmax_2021,
            `2022` = b_tmax_2022
            ) |>  
  pivot_longer(-geometry) |>
  # make the new time object a factor (to enforce plotting order)
  mutate(name = factor(name, levels = years)) |>
  # 4. and plot 
  ggplot() + 
  geom_sf(aes(fill = value), col = NA) +
  # adjust default shading
  scale_fill_continuous_c4a_div("brewer.rd_yl_bu", name = tit) +
  # facet
  facet_wrap(~name, ncol = 4) +
  # apply and modify plot theme
  theme_bw() + 
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.17), 
      strip.background = element_rect(fill="white", colour = "white"), 
      strip.text = element_text(size = 8, margin = margin(b=4)),
        axis.title=element_blank(),
        axis.text=element_blank(),
    axis.ticks=element_blank())

The varying tmax (Maximum temperature) coefficient estimates over space and time and the grid surface.

3.5 Summary

This section has illustrated the the use of functions in the the stgam package. It suggests the following workflow:

Prepare the data, for example by lengthening the data.frame, tibble or sf object containing the data, to have observations with single location and / or time values (in the above examples these were X,Yand month), and an Intercept as an addressable term.
Evaluate all possible models. For spatial or temporal problems each predictor is specified in 3 ways, for space-time problems each predictor is specified in 6 ways.
Rank the models by their AIC score, identify any consistent model specification trends in the top ranked models and select the best model with the lowest AIC score (i.e. the model with the lowest prediction error).
Extract the formula and create the final model.
Calculate the varying coefficient estimates: to quantify how the relationships between the target and predictor variables vary over space, time or space-time.
Create maps, time series plots etc.

This workflow evaluates and ranks multiple models using the model AIC value. This was done algorithmically using the evaluate_models() function. However, this was not undertaken in isolation. Rather it built on the investigations in Section 3 to determine whether space and time effects were present in the data. This set of model investigations were undertaken to both develop and confirm knowledge of processes related to NDVI in the Chaco dry rainforest. That is, the analysis was both considered and contextual. For example, in this section it was determined that a spatially varying intercept term was appropriate but with a different dataset others forms may be suggested. Similarly a larger model space could be examined with more variables. The stgam package allows these choices to be validated through an automated approach, providing an exploration of the full set of potential choices.

3.6 Useful resources

We think these talks by Noam Ross provide really useful tips and pointers for working with GAMs and mgcv:

3.7 Final words

The rationale for using GAMs with TP smooths for spatially varying coefficient or temporally varying coefficient models is as follows:

GAMs with smooths (splines) capture non-linear relationships between the response variable and covariates.
Smooths generate a varying coefficient model when they are parameterised with more than one variable.
This is readily extending to the temporal and / or spatial dimensions to generate temporally, spatially and space-time varying coefficient models.
Different smooths are available, but Tensor Products smooths can be used combine space and time because they undertake anisotropic smoothing across space and time separately, that have different measurement scales.
GAMs are robust, have a rich theoretical background and been subject to much development.

The workflow suggested in this vignette and in the stgam package is to determine the most appropriate model form (both steps evaluated by minimising AIC. The ‘best’ model can then be extracted an examined for the nature of the space-time interactions it suggests, before generating the varying coefficient estimates and mapping or plotting the results.

Future developments will seek to examine the scales of spatial dependencies in the data sing kriging based approaches and will extend the stgam package to work with large data (i.e. to draw from the functionality of the bam function in mgcv).

A Geographer’s introduction to space-time regression with GAMs using `stgam`

Lex Comber, Paul Harris and Chris Brunsdon

January 2026

Overview

1. Data and variables

2. Space-time GAM contruction and considertations

2.1 Overview

2.2 Space-time GAMs

2.3 Space-time GAM Refinement II: Effect size

2.4 Space-time GAM Refinement III: `k` the number of knots

2.5. Extracting, summarising and plotting the coefficients

2.6 Summary

3. Working with `stgam`: model selection

3.1 Using AIC to evaluate models

3.2 Model selection: determining GAM form

3.3 The ‘best’ model

3.4 Plot and map the results

3.5 Summary

3.6 Useful resources

3.7 Final words

References

A Geographer’s introduction to space-time regression with GAMs using stgam

Lex Comber, Paul Harris and Chris Brunsdon

January 2026

Overview

1. Data and variables

2. Space-time GAM contruction and considertations

2.1 Overview

2.2 Space-time GAMs

2.2 Space-time GAM Refinements I: Model family

2.3 Space-time GAM Refinement II: Effect size

2.4 Space-time GAM Refinement III: k the number of knots

2.5. Extracting, summarising and plotting the coefficients

2.6 Summary

3. Working with stgam: model selection

3.1 Using AIC to evaluate models

3.2 Model selection: determining GAM form

3.3 The ‘best’ model

3.4 Plot and map the results

3.5 Summary

3.6 Useful resources

3.7 Final words

References

A Geographer’s introduction to space-time regression with GAMs using `stgam`

2.4 Space-time GAM Refinement III: `k` the number of knots

3. Working with `stgam`: model selection