7 Generalized Linear Models

Throughout this section, we will use the spmodel package and the ggplot2 package:

library(spmodel)
library(ggplot2)

Goals:

Explain how modeling spatial covariance fits within the structure of a generalized linear model.
Use the spglm() function in spmodel to fit generalized linear models for various model families (i.e., response distributions).

7.1 The Spatial Generalized Linear Model

As with spatial linear models, spatial generalized linear models can be fit in spmodel for point-referenced and areal data. A generalized linear model essentially uses the right-hand-side of Equation 2.2 as a model for a function of the mean of the response vector \(\mathbf{y}\). More formally, the spatial generalized linear model can be written as \[ g(\boldsymbol{\mu}) = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\tau} + \boldsymbol{\epsilon}, \] where \(g(\boldsymbol{\mu})\) is the link function that “links” a function of the mean of \(\mathbf{y}\) to \(\mathbf{X} \boldsymbol{\beta}\), \(\boldsymbol{\tau}\), and \(\boldsymbol{\epsilon}\). For example, in a spatial Poisson generalized linear model, each element of \(\mathbf{y}\), \(y_i\), is modeled as a Poisson random variable with mean \(\mu_i\). Denoting the vector of means as \(\boldsymbol{\mu}\), the log of the mean vector of \(\mathbf{y}\) is then modeled as

\[ \text{log}(\boldsymbol{\mu}) = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\tau} + \boldsymbol{\epsilon}, \]

where the \(\text{log()}\) function is applied element-wise over the mean vector \(\boldsymbol{\mu}\), which is the expected value of \(\mathbf{y}\). In this example, the link function used is the log link. In the binomial generalized linear model family, a popular link function is the logit link, so that the model for the mean of a binomial response vector is

\[ \text{log}\left(\frac{\boldsymbol{\mu}}{1 - \boldsymbol{\mu}}\right) = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\tau} + \boldsymbol{\epsilon}, \] Table 7.1 shows the resposne distributions, data types, and link functions available in spmodel.

Table 7.1: Response distributions and link functions available in `spmodel`
Distribution	Data Type	Link Function
Poisson	Count	Log
Negative Binomial	Count	Log
Binomial	Binary	Logit
Beta	Proportion	Logit
Gamma	Skewed	Log
Inverse Gaussian	Skewed	Log

7.2 Model Fitting

The spglm() function is used to fit spatial generalized linear models for point-referenced data, and the spgautor() function is used to fit spatial generalized linear models for areal data. spglm() and spgautor() share similar syntax with splm() and spautor(), respectively, though one additional argument is required:

family: The generalized linear model family (i.e., the distribution of \(\mathbf{y}\)). The family argument can be binomial, beta, Poisson, nbinomial, Gamma, or inverse.gaussian.

Note

The family argument in spglm() and spgautor() uses similar syntax as the family argument in glm(). One difference, however, is that the link function for the spmodel functions is fixed. For binomial and beta responses, that link is the logit link function, while for Poisson, negative binomial, gamma, and inverse gaussian responses, that link is the log link function.

While spatial generalized linear models can be fit to both point-referenced and areal data, we focus only on fitting spatial generalized linear models to point-referenced data with spglm() for the remainder of this section. Models are fit using a novel application of the Laplace approximation – Ver Hoef et al. (2023) provide further details.

We observed in Chapter 4 that a generalized linear model may be a better choice for the count data in the Alaska moose. We specify a Poisson spatial generalized linear model with the following:

poismod <- spglm(count ~ elev * strat, data = moose,
               family = poisson, spcov_type = "spherical")
summary(poismod)
#> 
#> Call:
#> spglm(formula = count ~ elev * strat, family = poisson, data = moose, 
#>     spcov_type = "spherical")
#> 
#> Deviance Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.4245 -0.7783 -0.3653  0.1531  0.5900 
#> 
#> Coefficients (fixed):
#>              Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -2.230575   0.958201  -2.328 0.019919 *  
#> elev         0.007623   0.003129   2.437 0.014820 *  
#> stratM       2.752234   0.782853   3.516 0.000439 ***
#> elev:stratM -0.010248   0.004472  -2.292 0.021928 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Pseudo R-squared: 0.09573
#> 
#> Coefficients (spherical spatial covariance):
#>        de        ie     range 
#>     3.892     1.163 51204.657 
#> 
#> Coefficients (Dispersion for poisson family):
#> dispersion 
#>          1

As with spatial linear models, the broom functions tidy(), glance() and augment(), as well as many other generic functions like plot(), are available for spatial generalized linear models. For example, we glance at the fitted model by running

glance(poismod)
#> # A tibble: 1 × 9
#>       n     p  npar value   AIC  AICc logLik deviance pseudo.r.squared
#>   <int> <dbl> <int> <dbl> <dbl> <dbl>  <dbl>    <dbl>            <dbl>
#> 1   218     4     3 1332. 1338. 1338.  -666.     84.3           0.0957

Exercise

Fit a spatial negative binomial model to the moose data with count as the response and elev, strat, and their interaction as predictors. The negative binomial model relaxes the assumption in the spatial Poisson generalized linear model that the mean of a response variable \(Y_i\) and the variance of a response variable \(Y_i\) must be equal. Obtain a summary of the fitted model. Then compare their fits using loocv(). Which model is preferable?

Exercise Solution

nbmod <- spglm(count ~ elev * strat, data = moose,
               family = nbinomial, spcov_type = "spherical")
loocv(poismod)
#> [1] 32.11889
loocv(nbmod)
#> [1] 27.90761

nbmod has the lower loocv() error, suggesting it is a better fit to the data.

7.3 Prediction

We can also make predictions of the mean function at unobserved locations. For example, we can use poismod to predict the mean number of moose (on the link scale) at the spatial locations in moose_preds using predict() by running:

# results omitted
predict(poismod, newdata = moose_preds)

We can also use augment():

augment(poismod, newdata = moose_preds)
#> Simple feature collection with 100 features and 3 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 269386.2 ymin: 1418453 xmax: 419976.2 ymax: 1541763
#> Projected CRS: NAD83 / Alaska Albers
#> # A tibble: 100 × 4
#>    elev strat .fitted           geometry
#> * <dbl> <chr>   <dbl>        <POINT [m]>
#> 1  143. L      0.207  (401239.6 1436192)
#> 2  324. L     -0.0563 (352640.6 1490695)
#> 3  158. L     -1.24   (360954.9 1491590)
#> 4  221. M     -1.16   (291839.8 1466091)
#> 5  209. M      1.78   (310991.9 1441630)
#> 6  218. L     -1.84   (304473.8 1512103)
#> # ℹ 94 more rows

By default, predict() and augment() return predictions on the link scale. We return predictions on the response scale by running

augmod <- augment(poismod, newdata = moose_preds, type = "response")

And we can visualize these predictions by running

ggplot(augmod, aes(color = .fitted)) +
  geom_sf() +
  scale_color_viridis_c(limits = c(0, 40)) +
  theme_gray(base_size = 14)

Exercise

Use spglm() to fit a spatial logistic regression model to the moose data using presence as the response variable and a Cauchy covariance function. Then, find the predicted probabilities that moose are present at the spatial locations in moose_preds (Hint: Use the type argument in predict() or augment()).

Exercise Solution

binmod <- spglm(presence ~ elev * strat, data = moose,
               family = binomial, spcov_type = "cauchy")
augment(binmod, newdata = moose_preds, type = "response")
#> Simple feature collection with 100 features and 3 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 269386.2 ymin: 1418453 xmax: 419976.2 ymax: 1541763
#> Projected CRS: NAD83 / Alaska Albers
#> # A tibble: 100 × 4
#>    elev strat .fitted           geometry
#> * <dbl> <chr>   <dbl>        <POINT [m]>
#> 1  143. L      0.489  (401239.6 1436192)
#> 2  324. L      0.376  (352640.6 1490695)
#> 3  158. L      0.0945 (360954.9 1491590)
#> 4  221. M      0.229  (291839.8 1466091)
#> 5  209. M      0.804  (310991.9 1441630)
#> 6  218. L      0.0411 (304473.8 1512103)
#> # ℹ 94 more rows

7.4 Additional Modeling Features

All advanced features available in spmodel for spatial linear models are also available for spatial generalized linear models. This means that spatial generalized linear models in spmodel can accommodate fixing spatial covariance parameters, fitting and predicting for multiple models, non-spatial random effects (on the link scale), partition factors, anisotropy (on the link scale), simulation, big data, and prediction.

7.5 R Code Appendix

library(spmodel)
library(ggplot2)
poismod <- spglm(count ~ elev * strat, data = moose,
               family = poisson, spcov_type = "spherical")
summary(poismod)
glance(poismod)
nbmod <- spglm(count ~ elev * strat, data = moose,
               family = nbinomial, spcov_type = "spherical")
loocv(poismod)
loocv(nbmod)
# results omitted
predict(poismod, newdata = moose_preds)
augment(poismod, newdata = moose_preds)
augmod <- augment(poismod, newdata = moose_preds, type = "response")
ggplot(augmod, aes(color = .fitted)) +
  geom_sf() +
  scale_color_viridis_c(limits = c(0, 40)) +
  theme_gray(base_size = 14)
binmod <- spglm(presence ~ elev * strat, data = moose,
               family = binomial, spcov_type = "cauchy")
augment(binmod, newdata = moose_preds, type = "response")