vignettes/start-here.Rmd
start-here.Rmd
If this is your first time using the spsurvey package, run
install.packages("spsurvey")
to install the package. You only need to run this code once per version of R.
After the spsurvey package is installed, load it into R each new R session by running
If you used spsurvey in your work, please cite it. You can view the most recent citation by running
citation("spsurvey")
#> To cite spsurvey in publications use:
#>
#> Michael Dumelle, Tom Kincaid, Anthony R. Olsen, Marc Weber (2023).
#> spsurvey: Spatial Sampling Design and Analysis in R. Journal of
#> Statistical Software, 105(3), 1-29. doi:10.18637/jss.v105.i03
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Article{,
#> title = {{spsurvey}: Spatial Sampling Design and Analysis in {R}},
#> author = {Michael Dumelle and Tom Kincaid and Anthony R. Olsen and Marc Weber},
#> journal = {Journal of Statistical Software},
#> year = {2023},
#> volume = {105},
#> number = {3},
#> pages = {1--29},
#> doi = {10.18637/jss.v105.i03},
#> }
spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. There are a few terms helpful to define before we move forward with spsurvey, as these terms will be used throughout the vignettes and documentation:
There are three additional vignettes in spsurvey:
summary()
and plot()
functions to
summarize and visualize sampling frames, design sites, and analysis
datavignette("EDA", "spsurvey")
grts()
function to implement the Generalized
Random Tessellation Stratified (GRTS) algorithm (Stevens and Olsen,
2004) to select spatially balanced samplesvignette("sampling", "spsurvey")
.vignette("analysis", "spsurvey")
.These vignettes cover some of the core functions (and arguments
within those functions) in spsurvey. To learn more about features of
spsurvey that are not covered in these vignettes, we encourage you to
read spsurvey’s documentation available for download here or viewable
interactively on our website here. Help files for a
particular function are viewable by running ?function_name
after loading spsurvey. For example, to learn more about the
grts()
function, run ?grts
.
The version 5.0.0 update to spsurvey implemented many significant changes to existing functions. As a result, some of your old code may not run properly while using version 5.0.0. Though we recommend adapting your code to work with the version 5.0.0, you may also install a previous version of spsurvey. For information regarding the installation of previous version of R packages, please see the RStudio support page here. Additionally, old versions of spsurvey are also available for download in the release tags section of our GitHub repository here.
sf
objects
The sampling functions in spsurvey (grts()
and
irs()
) require that your sampling frame is an
sf
object. An sf
object (shorthand for a
“simple features” object) is an R object with a unique structure used to
conveniently store spatial data. sf
objects are constructed
using the sf package (Pebesma, 2018). The sf package is loaded and
installed alongside the spsurvey package, so you do not need to run
install.packages("sf")
or library(sf)
to
access the sf package if spsurvey is already installed and loaded. For
more on the sf package, see here.
Next we discuss a few ways to construct sf
objects in R.
The first is to read a shapefile directly into R using
sf::read_sf()
. The second is to use the
sf::st_sf()
function or the sf::st_as_sf()
function to combine an appropriate R object (most commonly a data frame)
and an appropriate geometry object into an sf
object. To
illustrate one approach for turning a data frame into an sf
object, we start with NE_Lakes_df
, a data frame in spsurvey
that contains variables and geographic coordinates (latitude and
longitude coordinates) for lakes in the Northeastern United States. To
turn NE_Lakes_df
into NE_Lakes_geo
, an
sf
object with geographic coordinates, run
NE_Lakes_geo <- st_as_sf(NE_Lakes_df, coords = c("XCOORD", "YCOORD"), crs = 4326)
NE_Lakes_geo
#> Simple feature collection with 195 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -73.64778 ymin: 41.07065 xmax: -69.96715 ymax: 42.73616
#> Geodetic CRS: WGS 84
#> First 10 features:
#> AREA AREA_CAT ELEV ELEV_CAT geometry
#> 1 10.648825 large 264.69 high POINT (-72.08896 42.55508)
#> 2 2.504606 small 557.63 high POINT (-73.18199 42.36727)
#> 3 3.979199 small 28.79 low POINT (-71.14074 42.15596)
#> 4 1.645657 small 212.60 high POINT (-73.06726 41.783)
#> 5 7.489052 small 239.67 high POINT (-72.2602 42.36255)
#> 6 86.533725 large 195.37 high POINT (-71.74634 41.87624)
#> 7 1.926996 small 158.96 high POINT (-73.48408 41.34238)
#> 8 6.514217 small 29.26 low POINT (-73.25487 41.20551)
#> 9 3.100221 small 204.62 high POINT (-72.20897 42.12512)
#> 10 1.868094 small 78.77 low POINT (-72.70233 42.18012)
The coords
argument to sf::st_as_sf
specifies the columns in NE_Lakes_df
that are the
x-coordinates and y-coordinates. The crs
argument specifies
the coordinate reference system, which we discuss in more detail
next.
Spatial data and sf
objects rely on coordinate reference
systems. A coordinate reference system (CRS) provides a structure by
which to identify unique locations on the Earth’s surface. CRSs are
either geographic or projected. A geographic CRS uses longitude
(east-west direction) and latitude (north-south direction) coordinates
to represent location with respect to a specific ellipsoid or spheroid
surface. Geographic CRSs are measured in degrees, not units like
meters or feet – this has important consequences. For example, a
one degree difference in latitude is different at different longitudes.
Projected CRSs are measured in standard Cartesian coordinates with
respect to a flat surface. They have x and y locations, an origin, and a
unit of measurement (like meters or feet).
You can move between coordinate systems using
sf::st_transform()
. For example, we can transform
NE_Lakes_geo
(which uses a geographic CRS) to
NE_Lakes
(which uses a projected CRS) by running
NE_Lakes <- st_transform(NE_Lakes_geo, crs = 5070)
NE_Lakes
#> Simple feature collection with 195 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1834001 ymin: 2225021 xmax: 2127632 ymax: 2449985
#> Projected CRS: NAD83 / Conus Albers
#> First 10 features:
#> AREA AREA_CAT ELEV ELEV_CAT geometry
#> 1 10.648825 large 264.69 high POINT (1930929 2417191)
#> 2 2.504606 small 557.63 high POINT (1849399 2375085)
#> 3 3.979199 small 28.79 low POINT (2017323 2393723)
#> 4 1.645657 small 212.60 high POINT (1874135 2313865)
#> 5 7.489052 small 239.67 high POINT (1922712 2392868)
#> 6 86.533725 large 195.37 high POINT (1977163 2350744)
#> 7 1.926996 small 158.96 high POINT (1852292 2257784)
#> 8 6.514217 small 29.26 low POINT (1874421 2247388)
#> 9 3.100221 small 204.62 high POINT (1933352 2368181)
#> 10 1.868094 small 78.77 low POINT (1892582 2364213)
CRSs in R have traditionally been stored using EPSG codes or
proj4string
values. This meant that in order to transform
your coordinates from one CRS to another, you needed two EPSG codes or
proj4string
values, one for each CRS. Recent updates to R’s
handling of spatial data follow GDAL and PROJ (more information
available here), and CRSs
in sf
objects are stored in R as lists with two components:
input
, which contains information regarding the EPSG code
and proj4string
; and wkt
, an open geospatial
standard format. For more information on CRSs and EPSG codes, see
Pebesma (2018) and Lovelace et al. (2019). To search for various CRSs
and EPSG codes, see here and here.
spsurvey will use the CRS from your sf
object, so it is
your responsibility to make sure the sf
object has an
appropriate CRS. If the CRS is not specified correctly, you may get
misleading results.
Lovelace, R., Nowosad, J., & Muenchow, J. (2019). Geocomputation with R. CRC Press.
Pebesma, E., (2018). Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10 (1):439-446. https://doi.org/10.32614/RJ-2018-009
Stevens Jr, D. L. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465):262-278.