Skip to contents

Fit random forest spatial residual models for point-referenced data (i.e., geostatistical models) using random forest to fit the mean and a spatial linear model to fit the residuals. The spatial linear model fit to the residuals can incorporate variety of estimation methods, allowing for random effects, anisotropy, partition factors, and big data methods.

Usage

splmRF(formula, data, ...)

Arguments

formula

A two-sided linear formula describing the fixed effect structure of the model, with the response to the left of the ~ operator and the terms on the right, separated by + operators.

data

A data frame or sf object object that contains the variables in fixed, random, and partition_factor as well as geographical information. If an sf object is provided with POINT geometries, the x-coordinates and y-coordinates are used directly. If an sf object is provided with POLYGON geometries, the x-coordinates and y-coordinates are taken as the centroids of each polygon.

...

Additional named arguments to ranger::ranger() or splm().

Value

A list with several elements to be used with predict(). These elements include the function call (named call), the random forest object fit to the mean (named ranger), the spatial linear model object fit to the residuals (named splm or splm_list), and an object can contain data for locations at which to predict (called newdata). The newdata

object contains the set of observations in data whose response variable is NA. If spcov_type or spcov_initial (which are passed to splm()) are length one, the list has class splmRF and the spatial linear model object fit to the residuals is called splm, which has class splm. If spcov_type or spcov_initial are length greater than one, the list has class splmRF_list and the spatial linear model object fit to the residuals is called splm_list, which has class splm_list. and contains several objects, each with class splm.

An splmRF object to be used with predict(). There are three elements: ranger, the output from fitting the mean model with ranger::ranger(); splm, the output from fitting the spatial linear model to the ranger residuals; and newdata, the newdata

object, if relevant.

Details

The random forest residual spatial linear model is described by Fox et al. (2020). A random forest model is fit to the mean portion of the model specified by formula using ranger::ranger(). Residuals are computed and used as the response variable in an intercept-only spatial linear model fit using splm(). This model object is intended for use with predict() to perform prediction, also called random forest regression Kriging.

Note

This function does not perform any internal scaling. If optimization is not stable due to large extremely large variances, scale relevant variables so they have variance 1 before optimization.

References

Fox, E.W., Ver Hoef, J. M., & Olsen, A. R. (2020). Comparing spatial regression to random forests for large environmental data sets. PloS one, 15(3), e0229509.

Examples

# \donttest{
sulfate$var <- rnorm(NROW(sulfate)) # add noise variable
sulfate_preds$var <- rnorm(NROW(sulfate_preds)) # add noise variable
sprfmod <- splmRF(sulfate ~ var, data = sulfate, spcov_type = "exponential")
predict(sprfmod, sulfate_preds)
#>            1            2            3            4            5            6 
#>   7.20709420  29.97610382  12.63403994  22.29915168  14.85211628  30.11851902 
#>            7            8            9           10           11           12 
#>   6.75121343  16.27706150  -2.18942325  12.22709064  10.04521554  11.11677116 
#>           13           14           15           16           17           18 
#>  -0.99116549  15.25667845  13.71515905  12.98879396  20.62285809  -3.33794420 
#>           19           20           21           22           23           24 
#>   0.05116391  25.27716676  19.48845223  -4.91539820   4.53255660  -5.64102060 
#>           25           26           27           28           29           30 
#>   4.55116010 -10.38802285  12.77869222  11.47271020  18.85634794   7.92927322 
#>           31           32           33           34           35           36 
#>   5.54454469   9.26063641  16.22038068  -2.31492552   9.40237754   9.57107856 
#>           37           38           39           40           41           42 
#>  17.55486020   9.09413211  13.78084068   5.85094688   8.55376772  29.94295184 
#>           43           44           45           46           47           48 
#>  15.36372494  22.76814241  -0.10511850  11.14670761  13.19146006  13.98674196 
#>           49           50           51           52           53           54 
#>  16.88227816   1.66852596  17.62750274  12.15246529  -6.31643437  10.45968752 
#>           55           56           57           58           59           60 
#>   2.46017146  14.61181297   8.13925329  -5.13374986  16.35470813  21.30610987 
#>           61           62           63           64           65           66 
#>  10.01414219  32.67164884  -1.70920535  24.28674843  20.14332755  26.90220679 
#>           67           68           69           70           71           72 
#>   5.85234711   7.15476641  18.90941330   9.67478205   1.25508935  12.54552746 
#>           73           74           75           76           77           78 
#>   4.76182617  -0.03741894  -3.12941969  -7.49434302  24.36690723  -1.96925278 
#>           79           80           81           82           83           84 
#>  13.53995822  24.92688168  11.07479774  10.43132321   8.29187049  -4.71025329 
#>           85           86           87           88           89           90 
#>   6.77832958   6.58587792  10.32600658  -4.59673144  -2.05363435  14.58627575 
#>           91           92           93           94           95           96 
#>   1.82537275   6.27116360   4.86695311  20.33752276   7.20672468  22.62624400 
#>           97           98           99          100 
#>  21.20959799  28.53459450  22.51962258  11.96041110 
# }