A Geographer’s introduction to GAMs

1. Data considerations

The first stage is simply to examine the data, particularly how variables many vary over space and time. The code below examines the data before undertaking some initial investigations.

# examine what is loaded 
chaco
#> Simple feature collection with 2000 features and 11 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 3780862 ymin: 8212925 xmax: 3994096 ymax: 8432313
#> Projected CRS: SIRGAS 2000 / Brazil Mercator
#> # A tibble: 2,000 × 12
#>       id  ndvi  tmax    pr month date        year   lon   lat        X        Y          geometry
#>  * <int> <dbl> <dbl> <int> <dbl> <date>     <dbl> <dbl> <dbl>    <dbl>    <dbl>       <POINT [m]>
#>  1     1 0.738  31.4   208    91 2020-01-01  2020 -52.7 -15.5 3919927. 8260790. (3919927 8260790)
#>  2     2 0.545  38.8   139    64 2017-10-01  2017 -52.0 -14.3 3994096. 8403808. (3994096 8403808)
#>  3     3 0.758  30     334    18 2013-12-01  2013 -53.5 -16.0 3836488. 8212925. (3836488 8212925)
#>  4     4 0.652  28.7     0   119 2022-05-01  2022 -53.2 -15.0 3864301. 8327636. (3864301 8327636)
#>  5     5 0.774  31.8    69    94 2020-04-01  2020 -52.8 -15.1 3910656. 8308557. (3910656 8308557)
#>  6     6 0.714  30     343    18 2013-12-01  2013 -53.2 -14.6 3864301. 8365752. (3864301 8365752)
#>  7     7 0.705  31.7    96    22 2014-04-01  2014 -53.2 -14.1 3864301. 8422815. (3864301 8422815)
#>  8     8 0.733  31.9   127    79 2019-01-01  2019 -52.3 -14.9 3966282. 8337171. (3966282 8337171)
#>  9     9 0.611  30.1     0     9 2013-03-01  2013 -53.9 -14.4 3790133. 8394299. (3790133 8394299)
#> 10    10 0.427  35       5    99 2020-09-01  2020 -52.8 -15.2 3910656. 8299011. (3910656 8299011)
#> # ℹ 1,990 more rows

The analyses will model NDVI (the ndvi variable). GAMs will be constructed that regress this target variable against available predictor variables including location and time. The print out of the first 10 records above shows that the dataset contains a time variable (date) in date format as well as location in metres (X and Y from the OSGB projection). It often more useful to represent time as continuous scalar values and to have location coordinates in kilometres rather than metres. The data also contains a variable called month to represent time (here months since the earliest date). The code below rescales the locational data, but the original coordinates are retained for mapping in geometry attribute of chaco:

chaco <- 
  chaco |> 
  # scale location and retain original coordinates
  mutate(Xo = X, Yo = Y) |>
  mutate(X = X/1000, Y = Y/1000)

Again this can be examined:

chaco

The target variable can be mapped as in the code snippet below. The map blow shows the spatial distribution of ndvi.

# map the data layers
chaco |> 
  ggplot() + geom_sf(aes(col = ndvi)) +
  scale_color_viridis_c(option = "magma") +
  theme_bw()  +xlab("") + ylab("")

Spatial variation of the ndvi variable .

Distributions and correlations for the continuous variables can be examined using boxplots / histograms and a correlation matrix, respectively. It is important to establish that any GAM with the proposed target, regressed against a set of predictor variables may yield something interesting and sensible, and just as importantly, to identify any variables that might be a problematic when trying to fit and interpret the resultant model.

For example, evidence of a highly skewed variable or one that has outliers. This is the case for the proposed target variable (ndvi) which is strongly positively skewed with potential outliers.

The code below generates boxplots and then density histograms.

# boxplots
chaco |> 
  st_drop_geometry() |>
  select(id, ndvi, tmax, pr) |>
  pivot_longer(-id) |>
  ggplot(aes(x = value), fil) + 
  geom_boxplot(fill="dodgerblue") +
  facet_wrap(~name, scales = "free") +
  theme_bw() +
  theme(axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

Boxplots of the continous variables in the data.

# histograms
chaco |> 
  st_drop_geometry() |>
  select(id, ndvi, tmax, pr) |>
  pivot_longer(-id) |>
  ggplot(aes(x = value), fil) + 
  geom_histogram(aes(y=after_stat(density)),bins = 30, 
                 fill="tomato", col="white") +
  geom_density(alpha=.5, fill="#FF6666") +
  facet_wrap(~name, scales = "free") +
  theme_bw()

Histograms of the continuous variables in the data.

Examining correlations, similarly checks for issues that may be a problem later on in the analysis workflow. Here the aim is to establish that there are reasonable correlations with the target variable and at the same time, little collinearity amongst the predictor variables. Note this may change when considering change in correlations over space, time or space-time (i.e., they are only examined globally, for now). See accompanying vignette.

# correlations
chaco |>
  st_drop_geometry() |>
  select(ndvi, tmax, pr) |>
  cor() |> round(3)
#>        ndvi   tmax     pr
#> ndvi  1.000 -0.418  0.500
#> tmax -0.418  1.000 -0.101
#> pr    0.500 -0.101  1.000

Finally, thinking about a GAM-based space-time analysis of your dataset, as a general heuristic, any dataset for use in the space-time analysis should ideally have a minimum of about 100 locations and a minimum of about 50 time intervals. You should consider this when you are assembling your data. For space-time datasets with smaller dimensions, alternative regression methods suit, say a spatial panel regression when the time component is short (Harris et al. 2017).

2. Detecting variability

Investigations in the previous section were all concerned with establishing the viability of undertaking the proposed analysis, using standard exploratory data analyses to examine distributions and correlations. In this section, these are extended to investigate data variability over time, over space and in space-time. This is done by constructing a series of GAMs, of different forms and with different settings. Tools for constructing GAMs using the mgcv package (Wood 2025) are introduced but with particular focus on using GAM smooths or splines to explore variations.

2.1 Initial investigations with OLS and dummy variables

As an initial baseline model, the code below creates a standard OLS (ordinary least squares) regression for ndvi with two predictors, using the lm function. The model fit is weak, but the model summary indicates that the all of the predictor variables (tmax and pr) are significant predictors of the target variable, ndvi. An analysis of variance (ANOVA), using the anova function, shows how much variance in ndvi is explained by each predictor and whether that contribution is statistically significant (via associated p-values, Pr(>F)). Here high values in the F value test statistic provides evidence that a given predictor is useful.In summary, there is evidence the two predictors (tmax and pr) are statistically significant predictors.

# an OLS model
m_ols <- lm(ndvi ~ tmax + pr, data = chaco)
summary(m_ols)
#> 
#> Call:
#> lm(formula = ndvi ~ tmax + pr, data = chaco)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.305438 -0.071189  0.008027  0.074015  0.282366 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.394e+00  3.956e-02   35.24   <2e-16 ***
#> tmax        -2.648e-02  1.254e-03  -21.11   <2e-16 ***
#> pr           4.817e-04  1.836e-05   26.24   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.1013 on 1997 degrees of freedom
#> Multiple R-squared:  0.3866, Adjusted R-squared:  0.386 
#> F-statistic: 629.3 on 2 and 1997 DF,  p-value: < 2.2e-16
anova(m_ols)
#> Analysis of Variance Table
#> 
#> Response: ndvi
#>             Df  Sum Sq Mean Sq F value    Pr(>F)    
#> tmax         1  5.8481  5.8481  569.95 < 2.2e-16 ***
#> pr           1  7.0659  7.0659  688.64 < 2.2e-16 ***
#> Residuals 1997 20.4906  0.0103                      
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As as of yet, space, time and space-time influences on house price variation have not been investigated. GAMs with smooths offer a route to investigate these.

2.2 Detecting variability using GAM smooths

GAMs with splines (or smooths - the terms are used interchangeably) can be used to explore and quantify spatial and temporal variations. However before considering these aspects, it is important to outline what smooths do.

Smooths help to fit a (smooth) curve through a cloud of messy data points. They generate flexible fits (curves) able to adapt to the shape of the data. Because of this, they are able to describe the relationships between variables better than a straight line. For example, consider the scatterplot of some simulated data below: it would be difficult to fit a straight line through it.

# create simulated data 
set.seed(12)
x  <- runif(500)
mu <- sin(2 * (4 * x - 2)) + 2 * exp(-(16 ^ 2) * ((x - .5) ^ 2))
y  <- rnorm(500, mu, .3)
# plot x and y
ggplot() + 
    geom_point(aes(x,y)) +
  theme_bw()

Simulated x and y data.

The code below uses a GAM smooth to determine the relationship between x and y without having to pre-specify any particular form (e.g. quadratic, exponential, etc). The wiggliness of the fit is determined automatically by the s function from the mgcv package. This is done in different ways depending on the type of smooth that is specified (thin plate regression smooths is the default in s). Effectively what the smooth does is to split the data into chunks and fit piecewise linear relationships, rather than fitting a single straight line as in a standard linear regression. In doing so, GAMs implicitly allow for non-linear relationships.

# a GAM illustration with a spline using s
gam_s_example <- gam(y~s(x))
# extract the smooth fit
y.s <- gam_s_example$fitted.values
# plot
ggplot() +
  geom_point(aes(x,y), col = "grey") +
    geom_line(aes(x, y = y.s), lwd = 1) +
  theme_bw()

Simulated x and y data with GAM a Spline fitted.

Thus, this diversion into smooths and splines illustrates how they model different relationships (slopes) with y at different locations within the variable feature space, (the x axis in the above example). In this way, spline smooth curves can be used to capture non-linear relationships in attribute-space (here the relationship of x with y).

Importantly, in the context of space-time varying coefficient modelling with stgam (see the accompanying vignette in this package), the smooth can be used to model how relationships between x and y varies with respect to time or location, if the smooth is also parameterised with those. That is moving from attribute-space to temporal- or geographical-space, respectively.

2.3 Temporal variability

It would be useful to examine the how the target variable, ndvi, varies over time. One way of doing this is to extend the use GAM smooths illustrated above. The code snippet below models the variation in ndvi but now using time (the month variable) as a predictor.

# the first GAM 
gam.1 <- gam(ndvi~s(month), data = chaco)
summary(gam.1)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ s(month)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.626637   0.002845   220.3   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>           edf Ref.df     F p-value    
#> s(month) 8.69  8.972 7.579  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.0314   Deviance explained = 3.56%
#> GCV = 0.016264  Scale est. = 0.016185  n = 2000

The smooth can be investigated by plotting ndvi with time. Note the use of the actual data variable date rather than month in the code below to have a friendly x-axis in the plot, and the use of the predict function to extract the standard errors on the fitted curve (i.e., the predictions of NDVI):

# create a data frame with x, predicted y, standard error
x <- chaco$date
y <- gam.1$fitted.values
se <- predict(gam.1, se = TRUE, chaco)$se.fit
u <- y+se
l <- y-se
df <- data.frame(x, y, u, l)
# plot!
ggplot(df, aes(x, y, ymin = l, ymax = u)) + 
  geom_ribbon(fill = "lightblue") + 
  geom_line() + 
  theme_bw() + 
  xlab("Date") + ylab("NDVI")

A plot of the temporal smooth.

This exercise models variation in the relationship between ndvi (NDVI) and month (time). The code above extracts the predicted values (\(\hat{y}\)) of ndvi from the model and plots them against time, but using the actual date. The plot shows how the relationship of the target variable varies with time. This provides confirmation of the potential suitability of a temporal analysis of ndvi.

2.4 Spatial variability

The spatially varying trends can be examined in similar way, again using a standard s spline - but now with location variables (i.e., the coordinates) to predict house price.

# the second GAM
gam.2 <- gam(ndvi~s(X,Y), data = chaco)
summary(gam.2)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ s(X, Y)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.626637   0.002817   222.4   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>          edf Ref.df     F p-value    
#> s(X,Y) 21.75  26.13 4.088  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.0502   Deviance explained = 6.05%
#> GCV = 0.016055  Scale est. = 0.015872  n = 2000
plot(gam.2, asp = 1)

A mgcv map of the spatial smooth.

Notice that the model fit (\(R^2\)) has not increased substantially from the previous model. This suggests some important spatial effects and again the smooth can be plotted, but this time rather than a line, it is a 2-Dimensional surface. However, it would be useful to visualise this as a surface rather than just over the chaco point locations. The code below uses the predict function used to model the relationship over a hexagonal grid (a surface) covering the extent of study area, that contains X and Y attributes of the location of the hexagonal cell centroid.

# create a grid object from the Chaco data
chaco_grid <- 
  st_make_grid(chaco, square=FALSE,n=20) |> 
  st_sf()
# rename the geometry, sort row names 
st_geometry(chaco_grid) = "geometry"
rownames(chaco_grid) <- 1:nrow(chaco_grid)
# create and add coordinates X and Y / 1000
coords <- chaco_grid |> st_centroid() |> st_coordinates() 
chaco_grid <- chaco_grid |> bind_cols(coords/1000)

You may wish to inspect this object:

chaco_grid
plot(st_geometry(chaco_grid))

Before continuing with the mapping procedure:

# predict over the grid
yhat <- predict(gam.2, newdata = chaco_grid)
chaco_grid |> 
  mutate(yhat = yhat) |>
  # and plot
  ggplot() + 
  geom_sf(aes(fill = yhat), col = NA) +
  # adjust default shading
  scale_fill_continuous_c4a_seq("brewer.yl_or_rd", name = "NDVI") +
  # apply and modify plot theme
  theme_bw() +
  theme(legend.position = "bottom",
        legend.key.width = unit(1, "cm"))

A map of the smoothed (predicted) response variable over hexagon grid.

These are clear spatial trends (variations over space) of the NDVI target variable. The map indicates high values in the centre of the study area, as in the initial map of the data earlier, and again provides confirmatory evidence that the some spatial modelling of ndvi is appropriate.

2.5 Space-time variability I

The smooth approach can be extended to model the target variable in space-time by including the location and time variables in the smooth. However, a slightly different set of considerations are required because of mixing space and time in the mgcv smooths. Consider the code snippet below for the space-time smooths it specifies a smooth using the te function rather than s and contains other parameters in the d and bs arguments. The reasons for this are described below.

# the third GAM
gam.3 <- gam(ndvi~te(X,Y,month, d = c(2,1), bs=c('tp','cr')), data = chaco)
summary(gam.3)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ te(X, Y, month, d = c(2, 1), bs = c("tp", "cr"))
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.626637   0.002804   223.5   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                edf Ref.df     F p-value    
#> te(X,Y,month) 39.7  49.84 2.684  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.0592   Deviance explained = 7.79%
#> GCV = 0.016048  Scale est. = 0.015721  n = 2000

It is important to consider the difference between s(X,Y,month) (which would be a continuation of the format above but has not been run) and te(X,Y,month,d = c(2,1),bs=c('tp','cr')) before examining the resultant GAM (gam.3). The spline functions s() and te() can model smooth interactions between multiple variables, but they handle dimension scaling, marginal smoothness, and basis construction in different ways.

s() is an Isotropic smooth. It is used when all variables are on the same scale and have similar smoothness behaviours (like X and Y). By default it uses a thin plate regression spline (bs = "tp") and a single basis to model the attribute space. The formulation s(X,Y,month) would therefore use a single basis to model a smooth in 3D space, under the assumption that all the variables (X, Y and month) contribute in the same way and at the same scale. This is fine if the variables are numeric, continuous, and in comparable units such as percentages. But not in space-time.
te() is a Tensor Product (TP) smooth. These are used when variables are on different scales or represent different kinds of effects (like X, Y and month). It builds a smooth using separate basis functions for each variable and then combines them. The formulation te(X,Y,month,d = c(2,1),bs=c('tp','cr')) includes the d parameter that specifies the dimensions for each basis. Here a 2D basis is specified for X and Y (location) and a 1D basis for month (time). It also includes a basis parameter bs which specifies a Thin Plate Regression Spline (tp) for location and a Cubic Regression Spline (cr) for time. Thin plate splines are invariant to coordinate rotation and are ideal for spatial data (regardless of whether the coordinates are in meters, kilometres, or degrees) and Cubic regression splines are more efficient especially in 1D. The tensor product smooth is constructed by constructing (marginal) smooths for each and taking the tensor product of those to form the joint smooth. It is useful when the variables to be included in the smooth have different scales or units (e.g., X and Y are coordinates, month is time) and different degrees of smoothness are expected along each variable.

Thus, in this case a TP smooth is specified because space and time need to be combined. The code below examines the results and uses the mgcv plot function to show the spatial distributions over 9 discretised time chunks (~15 months).

plot(gam.3, asp = 1)

However, it is also possible to slice and dice the time variable in other ways. The code snippet below illustrates how this can be done for monthly periods using the calculate_vcs function in the stgam package. Note that if predictor variables are included in the model then the function coefficient estimates, standard errors and t-values for each of terms specified, as shown in the accompanying vignette.

However, in this case we are just interested in the predicted value of the target variable, yhat. The code below uses the grid object created above (chaco_grid) as a spatial framework to hold the prediction of the ndvi variable over these discrete time periods.

# create time slices
years <- sort(unique(chaco$year))
# calculate over the grid for each time slice
res_out <- matrix(nrow = nrow(chaco_grid), ncol = 0)
for (i in 1:length(years)){
  # convert years to months - here getting month 6
  month.val <- ((years[i]-min(years)) * 12) + 6
  res.i <- calculate_vcs(input_data = chaco_grid |> mutate(month = month.val),
                         mgcv_model = gam.3,
                         terms = NULL)
  res_out <- cbind(res_out, res.i$yhat)
}
# name with years and join to the grid
colnames(res_out) <- paste0("Y_", years)
chaco_grid |> 
  cbind(res_out) |>
  # select the variables and pivot longer
  select(-X, -Y) |>
  pivot_longer(-geometry) |>
  # make the new object a factor (to enforce plotting order)
  mutate(name = factor(name, levels = paste0("Y_", years))) |>
  # and plot 
  ggplot() + 
  geom_sf(aes(fill = value), col = NA) +
  # adjust default shading
  scale_fill_continuous_c4a_seq("brewer.yl_or_rd", name = "Predicted\nNDVI") +
  # facet
  facet_wrap(~name, ncol = 4) +
  # apply and modify plot theme
  theme_bw() + 
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.17), 
      strip.background = element_rect(fill="white", colour = "white"), 
      strip.text = element_text(size = 8, margin = margin(b=4)),
        axis.title=element_blank(),
        axis.text=element_blank(),
    axis.ticks=element_blank())

A plot of a spatial smooth over 7 approximately annual time periods.

This time series of maps show that without any predictor variables (aside from location & time), the trends in NDVI varies spatially and changes in intensity over time (the increase and decrease in values over space), with changes in intensity but limited variation in spatial pattern over time.

2.6 Space-time variability II

The formula used to construct the gam.3 model in the previous subsection constructed a smooth that implicitly combined space and time, under an assumption that the variations in space and time of ndvi interact with each other. However this may not be the case. More generally, spatial and temporal dependencies might not be expected to interact in the data for a relatively small region, whose observations might be expected to be subject to the same socio-economic or environmental pressures. In this case, any space and time effects might be expected to be independent of each other. Whereas in a larger, for example national study, any space and time effects might be expected to interact more strongly.

It is possible to use a different construction involving separate smooths for space and time, with the s() smooth, and to avoid the assumption of the interaction of space and time effects:

# the fourth GAM
gam.4 <- gam(ndvi~s(X,Y) + s(month), data = chaco)
summary(gam.4)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ s(X, Y) + s(month)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.626637   0.002773     226   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>             edf Ref.df     F p-value    
#> s(X,Y)   21.370 25.849 4.076  <2e-16 ***
#> s(month)  8.701  8.973 8.126  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.0799   Deviance explained = 9.37%
#> GCV = 0.015618  Scale est. = 0.015376  n = 2000

Again there is evidence of spatial and temporal trends in the data and these can be visualised using the plot.gam function in mgcv (called with just plot).

plot(gam.4, page = 1)

A mgcv plot of the spatial and temporal smooths.

Note that, the space-time interactions could be generated in the same way as for the gam.3 model above, using the location (X and Y) values in chaco_grid and the time slices in years.

2.7 Including other predictor variables in space-time GAMs

Up until now, only the target (response) variable, ndvi has been examined for variations in time and space, using GAM smooths specified in different ways. This was to explore space-time variability of this variable, but also to introduce GAM smooths. The code below includes the tmax predictor variable in a space-time GAM as a fixed parametric term (i.e., in a similar way as in the OLS model created above).

# the fifth GAM
gam.5 <- gam(ndvi~s(X,Y) + s(month) + tmax, data = chaco)
summary(gam.5)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> ndvi ~ s(X, Y) + s(month) + tmax
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.69211    0.04758   35.56   <2e-16 ***
#> tmax        -0.03409    0.00152  -22.42   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>             edf Ref.df     F p-value    
#> s(X,Y)   22.072  26.36 7.271  <2e-16 ***
#> s(month)  8.681   8.97 6.778  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.266   Deviance explained = 27.7%
#> GCV = 0.012473  Scale est. = 0.012268  n = 2000

The model summary indicates that tmax is a significant predictor of ndvi and improves the model fit. However, this raises a key question of how should predictor variables be included in the model: in (fixed) parametric form or in a (varying) smooth form? This is addressed in the accompanying vignette.

As before the smooths can be plotted:

plot(gam.5, page = 1)

2.8 Summary

This vignette has introduced GAMs with smooths (splines) as a way of investigating any space-time effects present in the data. The mechanics of smooths was described and illustrated, and a brief introduction to different smooth forms was provided. Variations in the target variable (ndvi) in time, space and space-time were explored through elementary model fitting and plotting the smooth graphical trends.

In this case, effects in space and time are present in the ndvi target variable and it looks like there is a universal spatial trend at all times, and universal temporal trend everywhere. The final space-time GAM included a predictor variable (tmax) which added some explanatory power but these were not examined with respect to varying in space or in time or in space-time (i.e. through the use of smooths). This is done in the accompanying vignette.

Such investigations are an important initial step. They provide evidence of space-time variations in the target variable - i.e. whether the effects in space and time are present in the data - and can help determine which predictor variables may be of further interest. They guide subsequent analysis and avoid making assumptions about the presence of space-time interactions, for example by plugging every predictor variable into a space-time smooth of some kind.