3  Geoprocessing

We’ll explore a number operations in this section that make up what most of us probably think of as typical GIS operations and how to perform them in R - operations such as clipping, subsetting, joining, dissolving, performing map algebra.

Keep in mind as we mentioned earlier that sf contains functions that bind to these three primary underlying libraries:

3.1 Goals and Outcomes

  • Learn about fundamental spatial operations in R using spatial predicates in sf for:
    • subsetting
    • spatial join
    • buffer
    • logical set operations - union / intersection
    • clipping
    • generating centroids and ‘casting’ to other geometry types
    • raster operations
      • map algebra
      • cropping and masking
      • zonal operations
  • Explore performing ‘map algebra’ type operations with raster data in R
  • Learn how to do extract and zonal operations in R

Throughout this section we’ll use the following packages:

3.2 Operations on Geometries

This breakdown of simple features follows for the most part this section in Spatial Data Science

Simple and valid geometries

  • Certain conditions have to be met with simple features:
    • For linestrings to be considered simple they must not self-intersect:
(ls <- st_linestring(rbind(c(0,0), c(1,1), c(2,2), c(0,2), c(1,1), c(2,0))))
LINESTRING (0 0, 1 1, 2 2, 0 2, 1 1, 2 0)

is_simple 
    FALSE 
  • For polygons several other conditions have to be met to be simple:

    • polygon rings are closed (the lastpoint equals the first)
    • polygon holes (inner rings) are inside their exterior ring
    • polygon inner rings maximally touch the exterior ring in single points, not over a line
    • a polygon ring does not repeat its own path
    • in a multi-polygon, an external ring maximally touches another exterior ring in single points, not over a line

We can break down operations on geometries for vector features in the following way:

  • predicates: a logical asserting a certain property is TRUE
  • measures: a quantity (a numeric value, possibly with measurement unit)
  • transformations: newly generated geometries

We can look at these operations by what they operate on, whether the are single geometries, pairs, or sets of geometries:

  • unary when it’s a single geometry
  • binary when it’s pairs of geometries
  • n-ary when it’s sets of geometries

Unary predicates work to describe a property of a geometry.

A list of unary predicates:

predicate meaning
is Tests if geometry belongs to a particular class
is_simple Tests whether geometry is simple
is_valid Test whether geometry is valid
is_empty Tests if geometry is empty

A list of binary predicates is:

predicate meaning inverse of
contains None of the points of A are outside B within
contains_properly A contains B and B has no points in common with the boundary of A
covers No points of B lie in the exterior of A covered_by
covered_by Inverse of covers
crosses A and B have some but not all interior points in common
disjoint A and B have no points in common intersects
equals A and B are topologically equal: node order or number of nodes may differ; identical to A contains B and A within B
equals_exact A and B are geometrically equal, and have identical node order
intersects A and B are not disjoint disjoint
is_within_distance A is closer to B than a given distance
within None of the points of B are outside A contains
touches A and B have at least one boundary point in common, but no interior points
overlaps A and B have some points in common; the dimension of these is identical to that of A and B
relate Given a mask pattern, return whether A and B adhere to this pattern

See the Geometries chapter of Spatial Data Science for a full treatment that also covers unary and binary measures as well as unary, binary and n-ary transformers

For those who are interested in a deeper understanding of spatial predicates and operating on geometric relationships in R or elsewhere it’s worth taking some time exploring the Dimensionally Extended Nine-Intersection Model (DE-9IM) - a topological model (and a standard) that describes the relationship between any two geometries in two-dimensional space.

Spatial features have one of the four dimension values:

  • 0 for points
  • 1 for lines or linear features
  • 2 for polygons or polygon features
  • F or false for empty geometries

The DE-9IM matrix provides a way to classify geometry relations using the set {0,1,2,F} or {T,F}.

The DE-9IM matrix is based on a 3x3 intersection matrix testing the following relations:

  • II, IE, IB
  • BI, BE, BB
  • EI, EE, EB

With a {T,F} matrix using the I,B, E space, there are 512 possible relations that can be grouped into binary classification schemes.

About 10 of these spatial predicates have common names such as intersects, touches, within, contains. These are the binary predicates we’ve listed in previous table.

See the Binary predicates and DE-9IM section in Spatial Data Science as well as succinct overview in slides for a previous workshop by Mike Johnson

3.3 Measures and Units

Measures (with sf) make use of the underlying GEOS library, as well as the R units library that provides measure units for R vectors. Once a coordinate reference system has been defined for features, we often want to ask questions of our data such as:

  • How long is a line or a polygon perimeter (unit)
  • What is the area of a polygon (unit^2)
  • How far apart / close together are objects from each other (unit)

Some examples using county and gage datasets we’ve seen in previous section (with refresher again on pulling in data from a .csv file with x and y information and making it spatial):

gages <- read_csv(system.file("extdata", "Gages_flowdata.csv", package = "awra2020spatial")) |> 
  dplyr::select(SOURCE_FEA, STATE, LAT_SITE, LON_SITE) |> 
  st_as_sf(coords = c("LON_SITE", "LAT_SITE"), crs = 4269)
st_distance(gages[1,], gages[2,])
Units: [m]
         [,1]
[1,] 58822.95
counties <- counties("Oregon", cb = TRUE)
options(scipen=3)
print(paste0('The total area of all counties in Oregon is: ',sum(st_area(counties))))
Exercise

How did st_distance know to return the distance between our 2 gages in meters? What if we want that distance in feet? Or kilometers? Or our total area in Oregon in km2? There are a couple approaches:

  • you could set a projection that uses the units you want reported
  • you could simply look up the conversion factor and apply it manually
  • you can make use of the units library to explicitly set your units for the data
  • Setting the projection for the desired units
gages_ft <- st_transform(gages, 2994)
st_distance(gages_ft[1,], gages_ft[2,])
Units: [foot]
         [,1]
[1,] 193550.5

3.4 Spatial Subsetting

Spatial subsetting is analogous to attribute subsetting - with sf objects, we can use square bracket ([]) notation to take a spatial object and return a new object that contains only the features that relate in space to another spatial object (i.e. are within, intersect, are within distance of, are spatially disjoint, etc.).

We use the simple syntax of x[y,] to perform a spatial subset with the default operation of ‘intersects’: x[y,] is identical to x[y, , op=st_intersects]. We could also provide a different spatial predicate such as x[y, , op = st_disjoint].

Let’s run through a couple examples.

First we can demonstrate some very simple spatial subsetting examples using a particular county (polygon) in Oregon and Oregon cities (points).

or_cities <- read_sf(system.file("extdata/cities.shp", package = "Rspatialworkshop"))
mult_cnty<- aoi_get(state = "OR", county= "Multnomah") |> 
  st_transform(st_crs(or_cities)) # project counties to cities
mapview(mult_cnty, alpha.regions=.07, color='black', lwd=2) + mapview(or_cities)

Spatially subset cities within Multnomah County:

mult_cities <- or_cities[mult_cnty,]
# or
mult_cities <- or_cities[mult_cnty, , op=st_intersects]
mapview(mult_cnty, alpha.regions=.07, color='black', lwd=2) + mapview(mult_cities)
Exercise

Given our explanation above of using different spatial predicates in subsetting operations (e.g. st_disjoint, st_contains, st_is_within_distance) try:

  • selecting and mapping cities outside of Multnomah county
  • selecting and mapping cities within 50 kilometers of Multnomah county (or whatever distance you like)
  • using an attribute selection to select a city (or several cities) by name somewhere else in the state. Using these cities, how would we select and map just the county the contains this city / cities? Hint: we’ll need to pull in all counties of Oregon by modifying AOI_get()

Outside Multnomah county:

out_mult_cities <- or_cities[mult_cnty, , op=st_disjoint]
mapview(mult_cnty, alpha.regions=.07, color='black', lwd=2) + mapview(out_mult_cities)

Cities within 50km of Multnomah county:

close_mult_cities <- or_cities[mult_cnty, , op=st_is_within_distance,dist=50000]
mapview(mult_cnty, alpha.regions=.07, color='black', lwd=2) + mapview(close_mult_cities)

County that contains selected cities:

counties <- aoi_get(state='Oregon', county='all') |> 
  st_transform(st_crs(or_cities))
my_cities <- or_cities |> 
  dplyr::filter(CITY %in% c('ASHLAND','BURNS','HOOD RIVER'))
sel_counties <- counties[my_cities, , op=st_contains]
mapview(sel_counties, alpha.regions=.07, color='black', lwd=2) + mapview(my_cities)

3.5 Spatial Join

Often we want to join information from one spatial dataset to another based on a spatial relationship rather than attribute relationships - joining attribute data should be a familiar concept to everyone and many of the principles are the same. st_join will add new columns from from a source spatial dataset to the target spatial dataset.

By default st_join performs a left join (all rows in the target including rows with no match in the source data) - but you can also do an inner join by setting left=FALSE.

We can demonstrate the most basic spatial join using our Oregon counties and cities data - here we simply get the county name for every city based on what county each city lands in.

cty_cnty <- st_join(or_cities, counties['name'])
dplyr::glimpse(cty_cnty)
Rows: 898
Columns: 8
$ AREA      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ PERIMETER <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ CITIES_   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ CITIES_ID <dbl> 1658, 1368, 1366, 1382, 1384, 1380, 1376, 1370, 1378, 1386, …
$ CITY      <chr> "MULINO", "HAMMOND", "FORT STEVENS", "GLIFTON", "BRADWOOD", …
$ FLAG      <int> 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, …
$ geometry  <POINT [foot]> POINT (776899.8 1272019), POINT (439320.8 1638725),…
$ name      <chr> "Clackamas", "Clatsop", "Clatsop", "Clatsop", "Clatsop", "Cl…

This is equivalent to:

cty_cnty <- st_join(or_cities, counties['name'], .predicate=st_intersects)
dplyr::glimpse(cty_cnty)
Rows: 898
Columns: 8
$ AREA      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ PERIMETER <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ CITIES_   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ CITIES_ID <dbl> 1658, 1368, 1366, 1382, 1384, 1380, 1376, 1370, 1378, 1386, …
$ CITY      <chr> "MULINO", "HAMMOND", "FORT STEVENS", "GLIFTON", "BRADWOOD", …
$ FLAG      <int> 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, …
$ geometry  <POINT [foot]> POINT (776899.8 1272019), POINT (439320.8 1638725),…
$ name      <chr> "Clackamas", "Clatsop", "Clatsop", "Clatsop", "Clatsop", "Cl…
Exercise

Often we want to find what features are within a certain specified distance of other features, or what features are closest to a set of features. st_join can answer these questions by supplying the st_is_within_distance

  • Setting the projection for the desired units
gages_ft <- st_transform(gages, 2994)
st_distance(gages_ft[1,], gages_ft[2,])
Units: [foot]
         [,1]
[1,] 193550.5

3.6 Buffer and Dissolve

Typical GIS operations creating buffers around features (points, lines or polygons) and dissolving polygon boundaries based on a common attribute - here we show how to do these common GIS tasks in an R workflow.

We’ll demonstrate dissolving using both tidyverse functions in conjunction with simple features as well as using the st_union() function in sf.

Let’s use our Oregon cities and counties data again for this.

First we’ll demonstrate buffering which is a simpler operation using st_buffer with a supplied distance.

We pick our largest cities and buffer them by some arbitrary distance - we’ll say 20 miles (note below we convert feet to miles).

# Filter for metro areas
metros <- or_cities |> 
  dplyr::filter(CITY %in% c('PORTLAND','SALEM','EUGENE','CORVALLIS','BEND','MEDFORD'))
# Buffer larger cities
metro_areas <- metros |> 
  st_buffer(105600)
mapview(metro_areas)

Next we show both methods to dissolve, categorizing Oregon counties as urban or rural and dissolving on these categories.

# urban area
counties <- st_transform(counties, st_crs(metro_areas)) # crs same

urban_counties <- counties[metro_areas, ,op=st_intersects] # subsetting

urban <- urban_counties |>  # Dissolve
  st_union() |> # unite geometries
  st_sf() |> # promote geometry back to data frame
  dplyr::mutate(urban=TRUE) # assign urban

# rural area
# here we use tidyverse method to dissolve
rural <- counties |> 
  dplyr::filter(!name %in% urban_counties$name) |> 
  dplyr::group_by(state_abbr) |> # just group by state - all the same
  dplyr::mutate(AREA= geometry |> st_area()) |> 
  dplyr::summarise(AREA = sum(AREA)) # this is the cool part!

mapview(urban, col.region='red') + mapview(rural)
Exercise

We made use of the summarise method above with our sf features to perform a dissolve - by - summarise has a special argument - do_union which is set to TRUE by default and unions grouped geometries.

Review and experiment with different tidyverse methods that honor sf objects using the counties and cities data. These methods include:

  • filter
  • select
  • group_by
  • ungroup
  • mutate
  • transmute
  • rowwise
  • rename
  • slice
  • summarise
  • distinct
  • gather
  • pivot_longer
  • spread
  • nest
  • unnest
  • unite
  • separate
  • separate_rows
  • sample_n
  • sample_frac

An exhaustive list - when used with sf objects they simply preserve geometry - remember geometry is sticky and sf is tidyverse-compliant. You can use st_drop_geometry if you want to return tidyverse operations on sf features as tibbles or data.frames (or corece features to data.frames or tibbles).

This section in Spatial Data Science provides a nice background, as well as this vignette from the Geocomputation with R book on pitfalls to be aware of with spatial data in conjunction with tidyverse.

3.7 Clipping

Clipping is another extremely common GIS operation, and it’s simply a form of spatial subsetting that makes changes to the geometry list columns of affected features - it only applies to more complex geometries like lines, polygons and multi-lines and multi-polygons.

Let’s use our city buffer for Corvallis and counties to demonstrate clipping. We’ll ask for just the portions of counties that intersect our ‘metro’ buffer around Corvallis.

Let’s see what it looks like first:

plot(metro_areas[2,c('geometry')],col = "lightgrey", axes=TRUE) 
plot(counties[metro_areas[2,],c('geometry')],border = "grey", add = TRUE)

Clip and show results:

metro_county_area <- st_intersection(metro_areas[2,], counties)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
plot(st_geometry(metro_county_area), border = "grey", axes = TRUE) # intersecting area

Note

You may encounter errors like this when running geoprocessing operations like st_join in R:

Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented
= oriented,  :  Loop 0 is not valid: Edge 772 crosses edge 774

Running st_make_valid might not fix.

You may need to turn off spherical geometry - sf_use_s2(TRUE), run st_make_valid, and then turn spherical geometry back on - sf_use_s2(FALSE) See background on S2 here and discussion of S2 related issues here

3.8 Centroids and Type Transformations

We can extract the centroid of features a couple different ways - typically we are looking for the geographic centroid which is the center of mass of a feature and can fall outside the boundaries of a feature for complex features. If we want our centroids to always be inside our features we need to ask for a point on surface. Lastly, casting in sf allows us to perform type transformations from one geometry type to another.

We’ll use our county data and bring in some river flow line data using the dataRetrieval package as well to demonstrate with linear features.

SouthSantiam <- dataRetrieval::findNLDI(nwis = "14187200", nav = "UT", find = "flowlines", distance_km=10)
  
SouthSantiam <- SouthSantiam$UT_flowlines
SouthSantiam <- SouthSantiam |> 
  st_transform(st_crs(counties))

riv_cent1 <- st_centroid(SouthSantiam)
Warning: st_centroid assumes attributes are constant over geometries
riv_cent2 <- st_point_on_surface(SouthSantiam)
Warning: st_point_on_surface assumes attributes are constant over geometries
sel_counties <- counties |> 
  dplyr::filter(name %in% c('Marion','Yamhill','Polk','Benton','Multnomah','Clackamas','Linn','Washington'))

cnt_cent1 <- st_centroid(sel_counties)
Warning: st_centroid assumes attributes are constant over geometries
cnt_cent2 <- st_point_on_surface(sel_counties)
Warning: st_point_on_surface assumes attributes are constant over geometries
p1 = tm_shape(sel_counties) + tm_borders() +
  tm_shape(cnt_cent1) + tm_symbols(shape = 1, col = "black", size = 0.5) +
  tm_shape(cnt_cent2) + tm_symbols(shape = 1, col = "red", size = 0.5)  
p2 = tm_shape(SouthSantiam) + tm_lines() +
  tm_shape(riv_cent1) + tm_symbols(shape = 1, col = "black", size = 0.5) +
  tm_shape(riv_cent2) + tm_symbols(shape = 1, col = "red", size = 0.5)  
tmap_arrange(p1, p2, ncol = 2)

3.9 Simplification

With simplification we can generalize vector line and polygon objects - this is often useful to improve cartographic display as well as to reduce memory use and disk space used by complex geometric features. st_simplify in sf provides basic simplification implementing the Douglas-Peucker algorithm via GEOS to prune vertices. See rmapshaper and mapshaper for more extensive simplification algorithms.

Sant_simp = st_simplify(SouthSantiam, dTolerance = 500)  # 500 m
p1 = tm_shape(SouthSantiam) + tm_lines() 
p2 = tm_shape(Sant_simp) + tm_lines() 
tmap_arrange(p1, p2, ncol = 2)

3.10 Raster operations

3.10.1 Cropping

We’ll use some built in data from a previous workshop to demonstrate a number of simple raster geoprocessing operations.

We’ll start with an elevation raster and Crater Lake boundary and project both to an local conformal projection (UTM Z10 North)

data(CraterLake)
raster_filepath <- system.file("extdata", "elevation.tif", package = "Rspatialworkshop")
elevation <- rast(raster_filepath)
elevation <- project(elevation, "EPSG:26910", method = "bilinear")
CraterLake <- st_transform(CraterLake,crs(elevation))
plot(elevation)
plot(CraterLake, add=TRUE, col=NA, border='blue')

Here we’ll use crop to crop the elevation raster to the bounding box of our Crater Lake polygon feature

elev_crop = crop(elevation, vect(CraterLake))
plot(elev_crop)
plot(CraterLake, add=TRUE, col=NA, border='blue')

And finally we can use mask to mask the raster to just inside the polygon outline of Crater Lake National Park.

Note - if you have a large raster, it makes a HUGE difference to use crop first, then mask - mask is a much more computationally intensive operation so it will pay off to crop first then mask. An interesting twitter thread regarding this just the other day:

elev_mask = mask(elevation, vect(CraterLake))
plot(elev_mask)
plot(CraterLake, add=TRUE, col=NA, border='blue')

3.10.2 Map Algebra

We can divide map algebra into a couple of categories:

  1. Local - per-cell operations
    1. raster calculator
    2. replacing values
    3. reclassifying
    4. calculating indices
  2. Focal (neighborhood operations)
    1. summarizing output cell value as the result of a window (such as a3 x 3 input cell block)
    2. Zonal operations - summarizing raster values for some zones (either another raster or a vector feature)
    3. Global - summarizing values over entire raster(s)

3.10.2.1 Local Operations

Say we want to convert our elevation raster from meters to feet:

elev_feet = elevation * 3.28084
elev_feet
class       : SpatRaster 
dimensions  : 981, 883, 1  (nrow, ncol, nlyr)
resolution  : 78.76638, 78.76638  (x, y)
extent      : 536109.7, 605660.4, 4714704, 4791974  (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / UTM zone 10N (EPSG:26910) 
source(s)   : memory
name        : srtm_12_04 
min value   :   1846.880 
max value   :   8910.399 

How would we verify the units of our projection? Take a minute to explore crs for terra

Our max value is 8890, which makes sense - the high point in Crater Lake National Park is Mount Scott at 8,929’.

What if we want to make elevation bins, or classify some elevations as NA with our elevation raster?

reclass <- matrix(c(0, 500, 1, 500, 1000, 2, 1000, 1500, 3, 1500 , 2000, 4, 2000, 2700, 5), ncol = 3, byrow = TRUE)
reclass
     [,1] [,2] [,3]
[1,]    0  500    1
[2,]  500 1000    2
[3,] 1000 1500    3
[4,] 1500 2000    4
[5,] 2000 2700    5
elev_recl = classify(elevation, rcl = reclass)
plot(elevation)

plot(elev_recl)

elev_new = elevation
elev_new[elev_new > 2000] = NA
plot(elev_new)

3.10.2.2 Focal Operations

A simple focal window operation

elev_focal_mean = focal(elevation, 
                   w = matrix(1, nrow = 25, ncol = 25), 
                   fun = mean)
plot(elev_focal_mean)

3.10.2.3 Global Operations

terra::global(elev_mask, fun="mean", na.rm=TRUE)
               mean
srtm_12_04 1849.557
terra::global(elev_mask, fun="sum", na.rm=TRUE)
                 sum
srtm_12_04 221839618

3.10.3 Zonal Operations

Here we demonstrate using the zonal function in terra to summarize a value raster of elevation, using an srtm.tif from spDataLarge, by the zones of NLCD classes using nlcd.tif raster also in the spDataLarge package. We’ll expand more on zonal statistics including for categorical data in the final section of the workshop.

srtm_path = system.file("raster/srtm.tif", package = "spDataLarge")
srtm_path
[1] "C:/Users/mwebe/AppData/Local/R/win-library/4.3/spDataLarge/raster/srtm.tif"
srtm = rast(srtm_path)
srtm
class       : SpatRaster 
dimensions  : 457, 465, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : -113.2396, -112.8521, 37.13208, 37.51292  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : srtm.tif 
name        : srtm 
min value   : 1024 
max value   : 2892 
nlcd = rast(system.file("raster/nlcd2011.tif", package = "spDataLarge"))
srtm_utm = project(srtm, nlcd, method = "bilinear")
srtm_zonal = zonal(srtm_utm, nlcd, na.rm = TRUE, fun = "mean")
srtm_zonal
   nlcd2011     srtm
1        11 2227.060
2        21 1713.980
3        22 1642.077
4        23 1569.632
5        31 1854.069
6        41 2361.121
7        42 1867.068
8        43 2500.253
9        52 1650.966
10       71 1644.359
11       81 1284.106
12       82 1417.671
13       90 1254.168
14       95 1909.590