TADA Module 2: Geospatial Functions
TADA Team
2024-06-20
Source:vignettes/TADAModule2.Rmd
TADAModule2.Rmd
Welcome!
Thank you for your interest in Tools for Automated Data Analysis (TADA). TADA is an open-source tool set built in the R programming language. This RMarkdown document walks users through how to download the TADA R package from GitHub, access and parameterize several important functions, and create basic visualizations with a sample data set.
Note: TADA is still under development. New functionality is added weekly, and sometimes we need to make bug fixes in response to tester and user feedback. We appreciate your feedback, patience, and interest in these helpful tools.
If you are interested in contributing to TADA development, more information is available at:
We welcome collaboration with external partners.
Install and load packages
First, install and load the remotes package specifying the repo. This is needed before installing TADA because it is only available on GitHub.
install.packages("remotes",
repos = "http://cran.us.r-project.org"
)
library(remotes)
Next, install and load TADA using the remotes package. TADA R Package dependencies will also be downloaded automatically from CRAN with the TADA install. You may be prompted in the console to update dependency packages that have more recent versions available. If you see this prompt, it is recommended to update all of them (enter 1 into the console).
remotes::install_github("USEPA/EPATADA",
ref = "471-tada_convertresultunits-is-overwriting-tadamethodspeciationname-with-na",
dependencies = TRUE
)
Finally, use the library() function to load the TADA R Package into your R session.
Help pages
All TADA R package functions have their own individual help pages,
listed on the Function
reference page on the GitHub site. Users can also access the help
page for a given function in R or RStudio using the following format
(example below): ?[name of TADA function]
# Access help page for TADA_DataRetrieval
?TADA_DataRetrieval
Geospatial Functions in TADA
This vignette represents functions that provide users the option to convert TADA Water Quality Portal data into a geospatial sf object as well as to associate water quality observations with their nearest state-defined water quality assessment units in ATTAINS.
A Note About ATTAINS:
The Assessment, Total Maximum Daily Load (TMDL) Tracking and Implementation System (ATTAINS) is an online platform that organizes and combines each state and participating tribe’s Clean Water Act reporting data into a single data repository. The geospatial component of ATTAINS includes spatial representations of each entities assessment units as well as their assigned designated uses, their most recent EPA reporting category (i.e., their impairment status), their impaired designated uses, and the parameter(s) causing the impairment.
Within an assessment unit, the criteria or thresholds used to assess water quality typically remain the same and all water features are assessed as one entity (although there are some exceptions, for example if a single assessment unit crosses multiple ecoregions). Depending on the state or tribe, these assessment units can be a specific point or series of points along a waterbody such as a river or lake, a river reach (line), an entire waterbody such as a river or lake (polygon), or even an entire watershed. In other words, assessment units can take the form of point, line, and area (polygon) features, or some combination of all of them. Moreover, it is possible that some assessment units are not geospatially referenced at all, meaning they are not captured in the ATTAINS geospatial database.
TADA_MakeSpatial()
This function converts any Water Quality Portal (WQP)-style dataframe with latitude/longitude data into a geospatial sf object. To run the function, the user supplies a WQP dataframe and the coordinate reference system that they want the spatial object to be in (the default is WGS 84). For the function to work properly, the input dataframe must have - at a minimum - WQP observation coordinates in “LongitudeMeasure” and “LatitudeMeasure” and a “HorizontalCoordinateReferenceSystemDatumName” column.
Using TADA_MakeSpatial()
First, we will need to pull in some TADA Water Quality Portal Data:
# pH data in Larimer County, Colorado for the year 2020.
TADA_dataframe <- TADA_DataRetrieval(
startDate = "2020-01-01",
endDate = "2020-12-31",
characteristicName = "pH",
countycode = "US:08:069",
applyautoclean = TRUE
)
## [1] "Downloading WQP query results. This may take some time depending upon the query size."
## $startDate
## [1] "2020-01-01"
##
## $countycode
## [1] "US:08:069"
##
## $characteristicName
## [1] "pH"
##
## $endDate
## [1] "2020-12-31"
##
## [1] "Data successfully downloaded. Running TADA_AutoClean function."
## [1] "TADA_Autoclean: creating TADA-specific columns."
## [1] "TADA_Autoclean: harmonizing dissolved oxygen characterisic name to DISSOLVED OXYGEN SATURATION if unit is % or % SATURATN."
## [1] "TADA_Autoclean: handling special characters and coverting TADA.ResultMeasureValue and TADA.DetectionQuantitationLimitMeasure.MeasureValue value fields to numeric."
## [1] "TADA_Autoclean: converting TADA.LatitudeMeasure and TADA.LongitudeMeasure fields to numeric."
## [1] "TADA_Autoclean: harmonizing synonymous unit names (m and meters) to m."
## [1] "TADA_Autoclean: updating deprecated (i.e. retired) characteristic names."
## [1] "No deprecated characteristic names found in dataset."
## [1] "TADA_Autoclean: harmonizing result and depth units."
## [1] "TADA_Autoclean: creating TADA.ComparableDataIdentifier field for use when generating visualizations and analyses."
## [1] "NOTE: This version of the TADA package is designed to work with numeric data with media name: 'WATER'. TADA_AutoClean does not currently remove (filter) data with non-water media types. If desired, the user must make this specification on their own outside of package functions. Example: dplyr::filter(.data, TADA.ActivityMediaName == 'WATER')"
Now, we can make the water quality data spatial by running
TADA_MakeSpatial()
:
# default CRS is WGS84 (4326)
TADA_spatial <- TADA_MakeSpatial(TADA_dataframe, crs = 4326)
This new spatial object is identical to the original TADA dataset,
but now includes a “geometry” column that allows for mapping and
additional geospatial capabilities. Enter ?TADA_MakeSpatial
into the console to review another example of this function in use and
additional information.
leaflet::leaflet() %>%
leaflet::addProviderTiles("Esri.WorldTopoMap",
group = "World topo",
options = leaflet::providerTileOptions(
updateWhenZooming = FALSE,
updateWhenIdle = TRUE
)
) %>%
leaflet::clearShapes() %>%
leaflet.extras::addResetMapButton() %>%
leaflet::addLegend(
position = "bottomright",
colors = "black",
labels = "Water Quality Observation(s)",
opacity = 1
) %>%
leaflet::addCircleMarkers(
data = TADA_spatial,
color = "grey", fillColor = "black",
fillOpacity = 0.8, stroke = TRUE, weight = 1.5, radius = 6,
popup = paste0(
"Site ID: ",
TADA_spatial$MonitoringLocationIdentifier,
"<br> Site Name: ",
TADA_spatial$MonitoringLocationName
)
)
TADA_GetATTAINS()
This function pulls in ATTAINS data from the EPA’s ATTAINS Assessment Geospatial Service and links it to TADA-pulled Water Quality Portal observations. For the function to work properly, the input dataframe must have - at a minimum - WQP observation coordinates in “LongitudeMeasure” and “LatitudeMeasure” columns and a “HorizontalCoordinateReferenceSystemDatumName” column.
Users also have the option of returning the ATTAINS geospatial sf
objects with their ATTAINS-linked Water Quality Portal dataframe. If
return_sf = TRUE
, the function returns a list containing
the dataframe plus shapefile objects named
ATTAINS_catchments
, ATTAINS_lines
,
ATTAINS_points
, and ATTAINS_polygons
. Note, if
any of these shapefile objects are empty, this indicates that there are
no ATTAINS objects of that type intersecting any WQP catchment.
Regardless of the user’s decision on returning the ATTAINS sf
objects, TADA_GetATTAINS()
always returns a dataframe
containing the original TADA WQP dataset, plus new columns representing
the ATTAINS assessment unit(s) that fall within the same NHDPlus HR
catchment as them. This means that it is possible for a single TADA WQP
observation to have multiple ATTAINS assessment units linked to it and
subsequently more than one row of data. Such WQP observations can be
identified using the new index
column (i.e., multiple rows
with the same index value are the same observation).
Using TADA_GetATTAINS()
Using either our original TADA_dataframe
or the
geospatial version TADA_spatial
, we can pull in the ATTAINS
features that are within the same NHD HR catchment as our
observations:
TADA_with_ATTAINS <- TADA_GetATTAINS(TADA_dataframe, return_sf = FALSE)
## [1] "Your TADA data covers a large spatial range. The ATTAINS pull may take a while."
TADA_with_ATTAINS <- TADA_GetATTAINS(TADA_spatial, return_sf = FALSE)
## [1] "Your TADA data covers a large spatial range. The ATTAINS pull may take a while."
This new TADA_with_ATTAINS
object is a modification of
the original TADA Water Quality Portal dataframe that now has additional
columns associated with the ATTAINS assessment unit(s) that lie in the
same NHD HR catchment as them (these columns are prefixed with
“ATTAINS”). Moreover, because our TADA_with_ATTAINS
object
contains more rows than the original TADA dataframe, we can deduce that
some Water Quality Portal observations fall within an NHD HR catchment
that contains more than one ATTAINS assessment unit.
TADA_with_ATTAINS_list <- TADA_GetATTAINS(TADA_dataframe, return_sf = TRUE)
## [1] "Your TADA data covers a large spatial range. The ATTAINS pull may take a while."
TADA_with_ATTAINS_list <- TADA_GetATTAINS(TADA_spatial, return_sf = TRUE)
## [1] "Your TADA data covers a large spatial range. The ATTAINS pull may take a while."
If we set return_sf = TRUE
as done to create the
TADA_with_ATTAINS_list
object above, we also now have all
the raw ATTAINS features associated with these TADA Water Quality Portal
observations stored in a list with the TADA dataframe.
Now, let’s select specific columns from the TADA_with_ATTAINS dataframe, and create a new dataframe with ONLY the unique combinations of WQP MonitoringLocationIdentifier’s and ATTAINS Assessment Unit Identifiers.
TADA_with_ATTAINS_subset <- TADA_with_ATTAINS %>%
dplyr::select(
c(
"LongitudeMeasure", "LatitudeMeasure", "MonitoringLocationTypeName",
"MonitoringLocationIdentifier", "ATTAINS.assessmentunitidentifier",
"ATTAINS.overallstatus", "ATTAINS.isassessed", "ATTAINS.isimpaired",
"ATTAINS.organizationid", "ATTAINS.assessmentunitname",
"ATTAINS.reportingcycle", "ATTAINS.waterbodyreportlink"
)
) %>%
dplyr::distinct(.keep_all = FALSE)
Enter ?TADA_GetATTAINS
into the console to review
another example of this function in use and additional information.
TADA_ViewATTAINS()
This function visualizes the raw ATTAINS features that are linked to
the TADA Water Quality Portal observations that are generated in
TADA_GetATTAINS()
when return_sf = TRUE
. For
the function to work properly, the input dataframe must be the list
produced from TADA_GetATTAINS()
with
return_sf = TRUE
. The map also displays the Water Quality
Portal monitoring locations used to find the ATTAINS features.
Using TADA_ViewATTAINS()
Let’s view the data associated with our
TADA_with_ATTAINS_list
object! Enter
?TADA_ViewATTAINS
into the console to review another
example query and additional information.
TADA_ViewATTAINS(TADA_with_ATTAINS_list)