Skip to contents

Fit random forest residual spatial linear models for areal data (i.e., spatial autoregressive models) using random forest to fit the mean and a spatial linear model to fit the residuals. The spatial linear model fit to the residuals can incorporate a variety of estimation methods, allowing for random effects, partition factors, and row standardization.

Usage

spautorRF(formula, data, ...)

Arguments

formula

A two-sided linear formula describing the fixed effect structure of the model, with the response to the left of the ~ operator and the terms on the right, separated by + operators.

data

A data frame or sf object object that contains the variables in fixed, random, and partition_factor as well as geographical information. If an sf object is provided with POINT geometries, the x-coordinates and y-coordinates are used directly. If an sf object is provided with POLYGON geometries, the x-coordinates and y-coordinates are taken as the centroids of each polygon.

...

Additional named arguments to ranger::ranger() or spautor().

Value

A list with several elements to be used with predict(). These elements include the function call (named call), the random forest object fit to the mean (named ranger), the spatial linear model object fit to the residuals (named spautor or spautor_list), and an object can contain data for locations at which to predict (called newdata). The newdata

object contains the set of observations in data whose response variable is NA. If spcov_type or spcov_initial (which are passed to spautor()) are length one, the list has class spautorRF and the spatial linear model object fit to the residuals is called spautor, which has class spautor. If spcov_type or spcov_initial are length greater than one, the list has class spautorRF_list and the spatial linear model object fit to the residuals is called spautor_list, which has class spautor_list. and contains several objects, each with class spautor.

Details

The random forest residual spatial linear model is described by Fox et al. (2020). A random forest model is fit to the mean portion of the model specified by formula using ranger::ranger(). Residuals are computed and used as the response variable in an intercept-only spatial linear model fit using spautor(). This model object is intended for use with predict() to perform prediction, also called random forest regression Kriging.

References

Fox, E.W., Ver Hoef, J. M., & Olsen, A. R. (2020). Comparing spatial regression to random forests for large environmental data sets. PloS one, 15(3), e0229509.

Examples

# \donttest{
seal$var <- rnorm(NROW(seal)) # add noise variable
sprfmod <- spautorRF(log_trend ~ var, data = seal, spcov_type = "car")
predict(sprfmod)
#>            1            9           13           15           18           19 
#> -0.047560844  0.063245303 -0.035318981  0.007805814 -0.035086249 -0.022768385 
#>           27           32           36           40           42           43 
#> -0.015337128 -0.197452009 -0.012918855 -0.032428345 -0.087973916  0.008595976 
#>           44           46           47           48           49           50 
#> -0.200895509 -0.074365042 -0.012440013 -0.025842967 -0.162211600 -0.101897529 
#>           51           52           53           54           55           56 
#> -0.022099198  0.014858151 -0.036807223  0.023241791 -0.030652813  0.013857662 
#>           57           58           61           62 
#> -0.030180964 -0.022308774 -0.087930238 -0.012911933 
# }