This function works on spatial stream network objects to fit generalized linear models with spatially autocorrelated errors using likelihood methods, allowing for non-spatial random effects, anisotropy, partition factors, big data methods, and more. The spatial formulation is described in Ver Hoef and Peterson (2010) and Peterson and Ver Hoef (2010).
ssn_glm(
formula,
ssn.object,
family,
tailup_type = "none",
taildown_type = "none",
euclid_type = "none",
nugget_type = "nugget",
tailup_initial,
taildown_initial,
euclid_initial,
nugget_initial,
dispersion_initial,
additive,
estmethod = "reml",
anisotropy = FALSE,
random,
randcov_initial,
partition_factor,
...
)
A two-sided linear formula describing the fixed effect structure
of the model, with the response to the left of the ~
operator and
the terms on the right, separated by +
operators.
A spatial stream network object with class SSN
.
The generalized linear model family for use with ssn_glm()
.
Available options include "Gaussian"
, "poisson"
,
"nbinomial"
(negative binomial), "binomial"
, "beta"
,
"Gamma"
, and "invgauss"
. When family
is "Gaussian"
, arguments are passed to and evaluated by ssn_lm()
.
Can be quoted or unquoted. Note that the family
argument
only takes a single value, rather than the list structure used by stats::glm.
See Details for more.
The tailup covariance function type. Available options
include "linear"
, "spherical"
, "exponential"
,
"mariah"
, "epa"
, and "none"
. Parameterizations are
described in Details.
The taildown covariance function type. Available options
include "linear"
, "spherical"
, "exponential"
,
"mariah"
, "epa"
, and "none"
. Parameterizations are
described in Details.
The euclidean covariance function type. Available options
include "spherical"
, "exponential"
, "gaussian"
,
"cosine"
, "cubic"
, "pentaspherical"
, "wave"
,
"jbessel"
, "gravity"
, "rquad"
, "magnetic"
, and
"none"
. Parameterizations are
described in Details.
The nugget covariance function type. Available options
include "nugget"
or "none"
. Parameterizations are
described in Details.
An object from tailup_initial()
specifying initial and/or
known values for the tailup covariance parameters.
An object from taildown_initial()
specifying initial and/or
known values for the taildown covariance parameters.
An object from euclid_initial()
specifying initial and/or
known values for the euclidean covariance parameters.
An object from nugget_initial()
specifying initial and/or
known values for the nugget covariance parameters.
An object from dispersion_initial()
specifying initial and/or
known values for the tailup covariance parameters.
The name of the variable in ssn.object
that is used
to define spatial weights. Can be quoted or unquoted. For the tailup covariance functions, these additive
weights are used for branching. Technical details that describe the role
of the additive variable in the tailup covariance function are available
in Ver Hoef and Peterson (2010).
The estimation method. Available options include
"reml"
for restricted maximum likelihood and "ml"
for maximum
likelihood. The default is "reml"
.
A logical indicating whether (geometric) anisotropy should
be modeled. Not required if spcov_initial
is provided with 1) rotate
assumed unknown or assumed known and non-zero or 2) scale
assumed unknown
or assumed known and less than one. When anisotropy
is TRUE
,
computational times can significantly increase. The default is FALSE
.
A one-sided linear formula describing the random effect structure
of the model. Terms are specified to the right of the ~ operator
.
Each term has the structure x1 + ... + xn | g1/.../gm
, where x1 + ... + xn
specifies the model for the random effects and g1/.../gm
is the grouping
structure. Separate terms are separated by +
and must generally
be wrapped in parentheses. Random intercepts are added to each model
implicitly when at least one other variable is defined.
If a random intercept is not desired, this must be explicitly
defined (e.g., x1 + ... + xn - 1 | g1/.../gm
). If only a random intercept
is desired for a grouping structure, the random intercept must be specified
as 1 | g1/.../gm
. Note that g1/.../gm
is shorthand for (1 | g1/.../gm)
.
If only random intercepts are desired and the shorthand notation is used,
parentheses can be omitted.
An optional object specifying initial and/or
known values for the random effect variances. See spmodel::randcov_initial()
.
A one-sided linear formula with a single term specifying the partition factor. The partition factor assumes observations from different levels of the partition factor are uncorrelated.
Other arguments to stats::optim()
.
A list with many elements that store information about
the fitted model object and has class ssn_glm
. Many generic functions that
summarize model fit are available for ssn_glm
objects, including
AIC
, AICc
, anova
, augment
, coef
,
cooks.distance
, covmatrix
, deviance
, fitted
, formula
,
glance
, glances
, hatvalues
, influence
,
labels
, logLik
, loocv
, model.frame
, model.matrix
,
plot
, predict
, print
, pseudoR2
, summary
,
terms
, tidy
, update
, varcomp
, and vcov
.
This fitted model list contains the following elements:
additive
: The name of the additive function value column.
anisotropy
: Whether euclidean anisotropy was modeled.
call
: The function call.
coefficients
: Model coefficients.
contrasts
: Any user-supplied contrasts.
cooks_distance
: Cook's distance values.
crs
: The geographic coordinate reference system.
deviance
: The model deviance.
diagtol
: A tolerance value that may be added to the diagonal
of ovariance matrices to encourage decomposition stability.
estmethod
: The estimation method.
euclid_max
: The maximum euclidean distance.
family
: The generalized linear model family
fitted
: Fitted values.
formula
: The model formula.
hatvalues
: The hat (leverage) values.
is_known
: An object that identifies which parameters are known.
local_index
: An index identifier used internally for sorting.
missing_index
: Which rows in the "obs" object had missing responses.
n
: The sample size.
npar
: The number of estimated covariance parameters.
observed_index
: Which rows in the "obs" object had observed responses.
optim
: The optimization output.
p
: The number of fixed effects.
partition_factor
: The partition factor formula.
pseudoR2
: The pseudo R-squared.
random
: The random effect formula.
residuals
: The residuals.
sf_column_name
: The name of the geometry columns ssn.object
size
: The size of the binomial trials if relevant.
ssn.object
: An updated ssn.object
.
tail_max
: The maximum stream distance.
terms
: The model terms.
vcov
: Variance-covariance matrices
xlevels
: The levels of factors in the model matrix.
y
: The response.
These list elements are meant to be used with various generic functions
(e.g., residuals()
that operate on the model object.
While possible to access elements of the fitted model list directly, we strongly
advise against doing so when there is a generic available to return the element
of interest. For example, we strongly recommend using residuals()
to
obtain model residuals instead of accessing the fitted model list directly via
object$residuals
.
The generalized linear model for spatial stream networks can be written as
\(g(\mu) = \eta = X \beta + zu + zd + ze + n\), where \(\mu\) is the expectation
of the response given the random errors, \(y\), \(g()\) is a function that links the mean
and \(\eta\) (and is called a link function), X
is the fixed effects design
matrix, \(\beta\) are the fixed effects, \(zu\) is tailup random error,
\(zd\) is taildown random error, and \(ze\) is Euclidean random error,
and \(n\) is nugget random error.
There are six generalized linear model
families available: poisson
assumes \(y\) is a Poisson random variable
nbinomial
assumes \(y\) is a negative binomial random
variable, binomial
assumes \(y\) is a binomial random variable,
beta
assumes \(y\) is a beta random variable,
Gamma
assumes \(y\) is a gamma random
variable, and inverse.gaussian
assumes \(y\) is an inverse Gaussian
random variable.
The supports for \(y\) for each family are given below:
family: support of \(y\)
Gaussian: \(-\infty < y < \infty\)
poisson: \(0 \le y\); \(y\) an integer
nbinomial: \(0 \le y\); \(y\) an integer
binomial: \(0 \le y\); \(y\) an integer
beta: \(0 < y < 1\)
Gamma: \(0 < y\)
inverse.gaussian: \(0 < y\)
The generalized linear model families and the parameterizations of their link functions are given below:
family: link function
Gaussian: \(g(\mu) = \eta\) (identity link)
poisson: \(g(\mu) = log(\eta)\) (log link)
nbinomial: \(g(\mu) = log(\eta)\) (log link)
binomial: \(g(\mu) = log(\eta / (1 - \eta))\) (logit link)
beta: \(g(\mu) = log(\eta / (1 - \eta))\) (logit link)
Gamma: \(g(\mu) = log(\eta)\) (log link)
inverse.gaussian: \(g(\mu) = log(\eta)\) (log link)
The variance function of an individual \(y\) (given \(\mu\)) for each generalized linear model family is given below:
family: \(Var(y)\)
Gaussian: \(\sigma^2\)
poisson: \(\mu \phi\)
nbinomial: \(\mu + \mu^2 / \phi\)
binomial: \(n \mu (1 - \mu) \phi\)
beta: \(\mu (1 - \mu) / (1 + \phi)\)
Gamma: \(\mu^2 / \phi\)
inverse.gaussian: \(\mu^2 / \phi\)
The parameter \(\phi\) is a dispersion parameter that influences \(Var(y)\).
For the poisson
and binomial
families, \(\phi\) is always
one. Note that this inverse Gaussian parameterization is different than a
standard inverse Gaussian parameterization, which has variance \(\mu^3 / \lambda\).
Setting \(\phi = \lambda / \mu\) yields our parameterization, which is
preferred for computational stability. Also note that the dispersion parameter
is often defined in the literature as \(V(\mu) \phi\), where \(V(\mu)\) is the variance
function of the mean. We do not use this parameterization, which is important
to recognize while interpreting dispersion estimates.
For more on generalized linear model constructions, see McCullagh and
Nelder (1989).
In the generalized linear model context, the tailup, taildown, Euclidean, and nugget covariance affect the modeled mean of an observation (conditional on these effects). On the link scale, the tailup random errors capture spatial covariance moving downstream (and depend on downstream distance), the taildown random errors capture spatial covariance moving upstream (and depend on upstream) distance, the Euclidean random errors capture spatial covariance that depends on Euclidean distance, and the nugget random errors captures variability independent of spatial locations. \(\eta\) is modeled using a spatial covariance function expressed as \(de(zu) * R(zu) + de(zd) * R(zd) + de(ze) * R(ze) + nugget * I\). \(de(zu)\), \(de(zu)\), and \(de(zd)\) represent the tailup, taildown, and Euclidean variances, respectively. \(R(zu)\), \(R(zd)\), and \(R(ze)\) represent the tailup, taildown, and Euclidean correlation matrices, respectively. Each correlation matrix depends on a range parameter that controls the distance-decay behavior of the correlation. \(nugget\) represents the nugget variance and \(I\) represents an identity matrix.
tailup_type
Details: Let \(D\) be a matrix of hydrologic distances,
\(W\) be a diagonal matrix of weights from additive
, \(r = D / range\),
and \(I\) be
an identity matrix. Then parametric forms for flow-connected
elements of \(R(zu)\) are given below:
linear: \((1 - r) * (r <= 1) * W\)
spherical: \((1 - 1.5r + 0.5r^3) * (r <= 1) * W\)
exponential: \(exp(-r) * W\)
mariah: \(log(90r + 1) / 90r * (D > 0) + 1 * (D = 0) * W\)
epa: \((D - range)^2 * F * (r <= 1) * W / 16range^5\)
none: \(I\) * W
Details describing the F
matrix in the epa
covariance are given in Garreta et al. (2010).
Flow-unconnected elements of \(R(zu)\) are assumed uncorrelated.
Observations on different networks are also assumed uncorrelated.
taildown_type
Details: Let \(D\) be a matrix of hydrologic distances,
\(r = D / range\),
and \(I\) be an identity matrix. Then parametric forms for flow-connected
elements of \(R(zd)\) are given below:
linear: \((1 - r) * (r <= 1)\)
spherical: \((1 - 1.5r + 0.5r^3) * (r <= 1)\)
exponential: \(exp(-r)\)
mariah: \(log(90r + 1) / 90r * (D > 0) + 1 * (D = 0)\)
epa: \((D - range)^2 * F1 * (r <= 1) / 16range^5\)
none: \(I\)
Now let \(A\) be a matrix that contains the shorter of the two distances between two sites and the common downstream junction, \(r1 = A / range\), \(B\) be a matrix that contains the longer of the two distances between two sites and the common downstream junction, \(r2 = B / range\), and \(I\) be an identity matrix. Then parametric forms for flow-unconnected elements of \(R(zd)\) are given below:
linear: \((1 - r2) * (r2 <= 1)\)
spherical: \((1 - 1.5r1 + 0.5r2) * (1 - r2)^2 * (r2 <= 1)\)
exponential: \(exp(-(r1 + r2))\)
mariah: \((log(90r1 + 1) - log(90r2 + 1)) / (90r1 - 90r2) * (A =/ B) + (1 / (90r1 + 1)) * (A = B)\)
epa: \((B - range)^2 * F2 * (r2 <= 1) / 16range^5\)
none: \(I\)
Details describing the F1
and F2
matrices in the epa
covariance are given in Garreta et al. (2010).
Observations on different networks are assumed uncorrelated.
euclid_type
Details: Let \(D\) be a matrix of Euclidean distances,
\(r = D / range\), and \(I\) be an identity matrix. Then parametric
forms for elements of \(R(ze)\) are given below:
exponential: \(exp(- r )\)
spherical: \((1 - 1.5r + 0.5r^3) * (r <= 1)\)
gaussian: \(exp(- r^2 )\)
cubic: \((1 - 7r^2 + 8.75r^3 - 3.5r^5 + 0.75r^7) * (r <= 1)\)
pentaspherical: \((1 - 1.875r + 1.25r^3 - 0.375r^5) * (r <= 1)\)
cosine: \(cos(r)\)
wave: \(sin(r) * (h > 0) / r + (h = 0)\)
jbessel: \(Bj(h * range)\), Bj is Bessel-J function
gravity: \((1 + r^2)^{-0.5}\)
rquad: \((1 + r^2)^{-1}\)
magnetic: \((1 + r^2)^{-1.5}\)
none: \(I\)
nugget_type
Details: Let \(I\) be an identity matrix and \(0\)
be the zero matrix. Then parametric
forms for elements the nugget variance are given below:
nugget: \(I\)
none: \(0\)
In short, the nugget effect is modeled when nugget_type
is "nugget"
and omitted when nugget_type
is "none"
.
estmethod
Details: The various estimation methods are
reml
: Maximize the restricted log-likelihood.
ml
: Maximize the log-likelihood.
anisotropy
Details: By default, all Euclidean covariance parameters except rotate
and scale
are assumed unknown, requiring estimation. If either rotate
or scale
are given initial values other than 0 and 1 (respectively) or are assumed unknown
in euclid_initial()
, anisotropy
is implicitly set to TRUE
.
(Geometric) Anisotropy is modeled by transforming a Euclidean covariance function that
decays differently in different directions to one that decays equally in all
directions via rotation and scaling of the original Euclidean coordinates. The rotation is
controlled by the rotate
parameter in \([0, \pi]\) radians. The scaling
is controlled by the scale
parameter in \([0, 1]\). The anisotropy
correction involves first a rotation of the coordinates clockwise by rotate
and then a
scaling of the coordinates' minor axis by the reciprocal of scale
. The Euclidean
covariance is then computed using these transformed coordinates.
random
Details: If random effects are used (the estimation method must be "reml"
or
"ml"
), the model
can be written as \(g(\mu) = \eta = X \beta + W1\gamma 1 + ... Wj\gamma j + zu + zd + ze + n\),
where each Z is a random effects design matrix and each u is a random effect.
partition_factor
Details: The partition factor can be represented in matrix form as \(P\), where
elements of \(P\) equal one for observations in the same level of the partition
factor and zero otherwise. The covariance matrix involving only the
spatial and random effects components is then multiplied element-wise
(Hadmard product) by \(P\), yielding the final covariance matrix.
Other Details: Observations with NA
response values are removed for model
fitting, but their values can be predicted afterwards by running
predict(object)
.
This function does not perform any internal scaling. If optimization is not stable due to large extremely large variances, scale relevant variables so they have variance 1 before optimization.
Garreta, V., Monestiez, P. and Ver Hoef, J.M. (2010) Spatial modelling and prediction on river networks: up model, down model, or hybrid? Environmetrics 21(5), 439--456.
McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.
Peterson, E.E. and Ver Hoef, J.M. (2010) A mixed-model moving-average approach to geostatistical modeling in stream networks. Ecology 91(3), 644--651.
Ver Hoef, J.M. and Peterson, E.E. (2010) A moving average approach for spatial statistical models of stream networks (with discussion). Journal of the American Statistical Association 105, 6--18. DOI: 10.1198/jasa.2009.ap08248. Rejoinder pgs. 22--24.
# Copy the mf04p .ssn data to a local directory and read it into R
# When modeling with your .ssn object, you will load it using the relevant
# path to the .ssn data on your machine
copy_lsn_to_temp()
temp_path <- paste0(tempdir(), "/MiddleFork04.ssn")
mf04p <- ssn_import(temp_path, overwrite = TRUE)
ssn_gmod <- ssn_glm(
formula = Summer_mn ~ ELEV_DEM,
ssn.object = mf04p,
family = "Gamma",
tailup_type = "exponential",
additive = "afvArea"
)
summary(ssn_gmod)
#>
#> Call:
#> ssn_glm(formula = Summer_mn ~ ELEV_DEM, ssn.object = mf04p, family = "Gamma",
#> tailup_type = "exponential", additive = "afvArea")
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -3.296e-03 -9.345e-04 4.352e-05 4.850e-04 5.457e-03
#>
#> Coefficients (fixed):
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 8.4238415 0.7860602 10.717 < 2e-16 ***
#> ELEV_DEM -0.0029545 0.0003907 -7.563 3.95e-14 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Pseudo R-squared: 0.606
#>
#> Coefficients (covariance):
#> Effect Parameter Estimate
#> tailup exponential de (parsill) 1.058e-02
#> tailup exponential range 9.184e+04
#> nugget nugget 2.898e-04
#> dispersion dispersion 2.328e+04
#>