Fit spatial linear models for point-referenced data (i.e., geostatistical models) using a variety of estimation methods, allowing for random effects, anisotropy, partition factors, and big data methods.

## Usage

```
splm(
formula,
data,
spcov_type,
xcoord,
ycoord,
spcov_initial,
estmethod = "reml",
weights = "cressie",
anisotropy = FALSE,
random,
randcov_initial,
partition_factor,
local,
...
)
```

## Arguments

- formula
A two-sided linear formula describing the fixed effect structure of the model, with the response to the left of the

`~`

operator and the terms on the right, separated by`+`

operators.- data
A data frame or

`sf`

object object that contains the variables in`fixed`

,`random`

, and`partition_factor`

as well as geographical information. If an`sf`

object is provided with`POINT`

geometries, the x-coordinates and y-coordinates are used directly. If an`sf`

object is provided with`POLYGON`

geometries, the x-coordinates and y-coordinates are taken as the centroids of each polygon.- spcov_type
The spatial covariance type. Available options include

`"exponential"`

,`"spherical"`

,`"gaussian"`

,`"triangular"`

,`"circular"`

,`"cubic"`

,`"pentaspherical"`

,`"cosine"`

,`"wave"`

,`"jbessel"`

,`"gravity"`

,`"rquad"`

,`"magnetic"`

,`"matern"`

,`"cauchy"`

,`"pexponential"`

, and`"none"`

. Parameterizations of each spatial covariance type are available in Details. Multiple spatial covariance types can be provided as a character vector, and then`splm()`

is called iteratively for each element and a list is returned for each model fit. The default for`spcov_type`

is`"exponential"`

. When`spcov_type`

is specified, all unknown spatial covariance parameters are estimated.`spcov_type`

is ignored if`spcov_initial`

is provided.- xcoord
The name of the column in

`data`

representing the x-coordinate. Can be quoted or unquoted. Not required if`data`

is an`sf`

object.- ycoord
The name of the column in

`data`

representing the y-coordinate. Can be quoted or unquoted. Not required if`data`

is an`sf`

object.- spcov_initial
An object from

`spcov_initial()`

specifying initial and/or known values for the spatial covariance parameters. Multiple`spcov_initial()`

objects can be provided in a list. Then`splm()`

is called iteratively for each element and a list is returned for each model fit.- estmethod
The estimation method. Available options include

`"reml"`

for restricted maximum likelihood,`"ml"`

for maximum likelihood,`"sv-wls"`

for semivariogram weighted least squares, and`"sv-cl"`

for semivariogram composite likelihood. The default is`"reml"`

.- weights
Weights to use when

`estmethod`

is`"sv-wls"`

. Available options include`"cressie"`

,`"cressie-dr"`

,`"cressie-nopairs"`

,`"cressie-dr-nopairs"`

,`"pairs"`

,`"pairs-invd"`

,`"pairs-invrd"`

, and`"ols"`

. Parameterizations for each weight are available in Details. The default is`"cressie"`

.- anisotropy
A logical indicating whether (geometric) anisotropy should be modeled. Not required if

`spcov_initial`

is provided with 1)`rotate`

assumed unknown or assumed known and non-zero or 2)`scale`

assumed unknown or assumed known and less than one. When`anisotropy`

is`TRUE`

, computational times can significantly increase. The default is`FALSE`

.- random
A one-sided linear formula describing the random effect structure of the model. Terms are specified to the right of the

`~ operator`

. Each term has the structure`x1 + ... + xn | g1/.../gm`

, where`x1 + ... + xn`

specifies the model for the random effects and`g1/.../gm`

is the grouping structure. Separate terms are separated by`+`

and must generally be wrapped in parentheses. Random intercepts are added to each model implicitly when at least one other variable is defined. If a random intercept is not desired, this must be explicitly defined (e.g.,`x1 + ... + xn - 1 | g1/.../gm`

). If only a random intercept is desired for a grouping structure, the random intercept must be specified as`1 | g1/.../gm`

. Note that`g1/.../gm`

is shorthand for`(1 | g1/.../gm)`

. If only random intercepts are desired and the shorthand notation is used, parentheses can be omitted.- randcov_initial
An optional object specifying initial and/or known values for the random effect variances.

- partition_factor
A one-sided linear formula with a single term specifying the partition factor. The partition factor assumes observations from different levels of the partition factor are uncorrelated.

- local
An optional logical or list controlling the big data approximation. If omitted,

`local`

is set to`TRUE`

or`FALSE`

based on the sample size (the number of non-missing observations in`data`

) -- if the sample size exceeds 5,000,`local`

is set to`TRUE`

. Otherwise it is set to`FALSE`

.`local`

is also set to`FALSE`

when`spcov_type`

is`"none"`

and there are no random effects specified via`random`

. If`FALSE`

, no big data approximation is implemented. If a list is provided, the following arguments detail the big data approximation:`index:`

The group indexes. Observations in different levels of`index`

are assumed to be uncorrelated for the purposes of estimation. If`index`

is not provided, it is determined by specifying`method`

and either`size`

or`groups`

.`method`

: The big data approximation method used to determine`index`

. Ignored if`index`

is provided. If`method = "random"`

, observations are randomly assigned to`index`

based on`size`

. If`method = "kmeans"`

, observations assigned to`index`

based on k-means clustering on the coordinates with`groups`

clusters. The default is`"kmeans"`

. Note that both methods have a random component, which means that you may get different results from separate model fitting calls. To ensure consistent results, specify`index`

or set a seed via`base::set.seed()`

.`size`

: The number of observations in each`index`

group when`method`

is`"random"`

. If the number of observations is not divisible by`size`

, some levels get`size - 1`

observations. The default is 100.`groups:`

The number of`index`

groups. If`method`

is`"random"`

,`size`

is \(ceiling(n / groups)\), where \(n\) is the sample size. Automatically determined if`size`

is specified. If`method`

is`"kmeans"`

,`groups`

is the number of clusters.`var_adjust:`

The approach for adjusting the variance-covariance matrix of the fixed effects.`"none"`

for no adjustment,`"theoretical"`

for the theoretically-correct adjustment,`"pooled"`

for the pooled adjustment, and`"empirical"`

for the empirical adjustment. The default is`"theoretical"`

.`parallel`

: If`TRUE`

, parallel processing via the parallel package is automatically used. The default is`FALSE`

.`ncores`

: If`parallel = TRUE`

, the number of cores to parallelize over. The default is the number of available cores on your machine.

When

`local`

is a list, at least one list element must be provided to initialize default arguments for the other list elements. If`local`

is`TRUE`

, defaults for`local`

are chosen such that`local`

is transformed into`list(size = 100, method = "kmeans", var_adjust = "theoretical", parallel = FALSE)`

.- ...
Other arguments to

`esv()`

or`stats::optim()`

.

## Value

A list with many elements that store information about
the fitted model object. If `spcov_type`

or `spcov_initial`

are
length one, the list has class `splm`

. Many generic functions that
summarize model fit are available for `splm`

objects, including
`AIC`

, `AICc`

, `anova`

, `augment`

, `coef`

,
`cooks.distance`

, `covmatrix`

, `deviance`

, `fitted`

, `formula`

,
`glance`

, `glances`

, `hatvalues`

, `influence`

,
`labels`

, `logLik`

, `loocv`

, `model.frame`

, `model.matrix`

,
`plot`

, `predict`

, `print`

, `pseudoR2`

, `summary`

,
`terms`

, `tidy`

, `update`

, `varcomp`

, and `vcov`

. If
`spcov_type`

or `spcov_initial`

are length greater than one, the
list has class `splm_list`

and each element in the list has class
`splm`

. `glances`

can be used to summarize `splm_list`

objects, and the aforementioned `splm`

generics can be used on each
individual list element (model fit).

## Details

The spatial linear model for point-referenced data (i.e., geostatistical model) can be written as \(y = X \beta + \tau + \epsilon\), where \(X\) is the fixed effects design matrix, \(\beta\) are the fixed effects, \(\tau\) is random error that is spatially dependent, and \(\epsilon\) is random error that is spatially independent. Together, \(\tau\) and \(\epsilon\) are modeled using a spatial covariance function, expressed as \(de * R + ie * I\), where \(de\) is the dependent error variance, \(R\) is a correlation matrix that controls the spatial dependence structure among observations, \(ie\) is the independent error variance, and \(I\) is an identity matrix.

`spcov_type`

Details: Parametric forms for \(R\) are given below, where \(\eta = h / range\)
for \(h\) distance between observations:

exponential: \(exp(- \eta )\)

spherical: \((1 - 1.5\eta + 0.5\eta^3) * I(h <= range)\)

gaussian: \(exp(- \eta^2 )\)

triangular: \((1 - \eta) * I(h <= range)\)

circular: \((1 - (2 / \pi) * (m * sqrt(1 - m^2) + sin^{-1}(m))) * I(h <= range), m = min(\eta, 1)\)

cubic: \((1 - 7\eta^2 + 8.75\eta^3 - 3.5\eta^5 + 0.75\eta^7) * I(h <= range)\)

pentaspherical: \((1 - 1.875\eta + 1.25\eta^3 - 0.375\eta^5) * I(h <= range)\)

cosine: \(cos(\eta)\)

wave: \(sin(\eta) / \eta * I(h > 0) + I(h = 0)\)

jbessel: \(Bj(h * range)\), Bj is Bessel-J function

gravity: \((1 + \eta^2)^{-0.5}\)

rquad: \((1 + \eta^2)^{-1}\)

magnetic: \((1 + \eta^2)^{-1.5}\)

matern: \(2^{1 - extra}/ \Gamma(extra) * \alpha^{extra} * Bk(\alpha, extra)\), \(\alpha = (2extra * \eta)^{0.5}\), Bk is Bessel-K function with order \(1/5 \le extra \le 5\)

cauchy: \((1 + \eta^2)^{-extra}\), \(extra > 0\)

pexponential: \(exp(h^{extra}/range)\), \(0 < extra \le 2\)

none: \(0\)

All spatial covariance functions are valid in one spatial dimension. All
spatial covariance functions except `triangular`

and `cosine`

are
valid in two dimensions.

`estmethod`

Details: The various estimation methods are

`reml`

: Maximize the restricted log-likelihood.`ml`

: Maximize the log-likelihood.`sv-wls`

: Minimize the semivariogram weighted least squares loss.`sv-cl`

: Minimize the semivariogram composite likelihood loss.

`anisotropy`

Details: By default, all spatial covariance parameters except `rotate`

and `scale`

as well as all random effect variance parameters
are assumed unknown, requiring estimation. If either `rotate`

or `scale`

are given initial values other than 0 and 1 (respectively) or are assumed unknown
in `spcov_initial()`

, `anisotropy`

is implicitly set to `TRUE`

.
(Geometric) Anisotropy is modeled by transforming a covariance function that
decays differently in different directions to one that decays equally in all
directions via rotation and scaling of the original coordinates. The rotation is
controlled by the `rotate`

parameter in \([0, \pi]\) radians. The scaling
is controlled by the `scale`

parameter in \([0, 1]\). The anisotropy
correction involves first a rotation of the coordinates clockwise by `rotate`

and then a
scaling of the coordinates' minor axis by the reciprocal of `scale`

. The spatial
covariance is then computed using these transformed coordinates.

`random`

Details: If random effects are used (the estimation method must be `"reml"`

or
`"ml"`

), the model
can be written as \(y = X \beta + Z1u1 + ... Zjuj + \tau + \epsilon\),
where each Z is a random effects design matrix and each u is a random effect.

`partition_factor`

Details: The partition factor can be represented in matrix form as \(P\), where
elements of \(P\) equal one for observations in the same level of the partition
factor and zero otherwise. The covariance matrix involving only the
spatial and random effects components is then multiplied element-wise
(Hadmard product) by \(P\), yielding the final covariance matrix.

`local`

Details: The big data approximation works by sorting observations into different levels
of an index variable. Observations in different levels of the index variable
are assumed to be uncorrelated for the purposes of model fitting. Sparse matrix methods are then implemented
for significant computational gains. Parallelization generally further speeds up
computations when data sizes are larger than a few thousand. Both the `"random"`

and `"kmeans"`

values of `method`

in `local`

have random components. That means you may get slightly different
results when using the big data approximation and rerunning `splm()`

with the same code. For consistent results,
either set a seed via `base::set.seed()`

or specify `index`

to `local`

.

Observations with `NA`

response values are removed for model
fitting, but their values can be predicted afterwards by running
`predict(object)`

.

## Note

This function does not perform any internal scaling. If optimization is not stable due to large extremely large variances, scale relevant variables so they have variance 1 before optimization.

## Examples

```
spmod <- splm(z ~ water + tarp,
data = caribou,
spcov_type = "exponential", xcoord = x, ycoord = y
)
summary(spmod)
#>
#> Call:
#> splm(formula = z ~ water + tarp, data = caribou, spcov_type = "exponential",
#> xcoord = x, ycoord = y)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.41281 -0.20763 -0.11205 0.02956 0.45429
#>
#> Coefficients (fixed):
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.04981 0.31093 6.592 4.33e-11 ***
#> waterY -0.08310 0.06449 -1.289 0.197563
#> tarpnone 0.08005 0.07759 1.032 0.302166
#> tarpshade 0.28654 0.07667 3.737 0.000186 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Pseudo R-squared: 0.3963
#>
#> Coefficients (exponential spatial covariance):
#> de ie range
#> 0.1109 0.0226 19.1168
```