Large WQP data pulls using dataRetrieval
This function does multiple synchronous data calls to the WQP ( It uses the WQP summary service to limit the amount downloaded to only relevant data (based on user query), pulls back data for 250000 records at a time, and then joins the data back together to produce a single TADA compatible dataframe as the output. For large data sets, that can save a lot of time and ultimately reduce the complexity of subsequent data processing. Using this function, you will be able to download all data available from all sites in the contiguous United States available for the time period, characteristicName, and siteType requested. Computer memory may limit the size of dataframes that your R console will be able to hold in one session. Function requires a characteristicName, siteType, statecode, huc, or start/ end date input. The recommendation is to be as specific as you can with your large data call. The function allows the user to run TADA_AutoClean on the dataframe, but this is not the default as checking large dataframes for exact duplicate rows can be time consuming and is better performed on its own once the query is completed.
startDate = "null",
endDate = "null",
countrycode = "null",
statecode = "null",
countycode = "null",
huc = "null",
siteid = "null",
siteType = "null",
characteristicName = "null",
characteristicType = "null",
sampleMedia = "null",
organization = "null",
maxrecs = 250000,
applyautoclean = FALSE
- startDate
Start Date string in the format YYYY-MM-DD, for example, "2020-01-01"
- endDate
End Date string in the format YYYY-MM-DD, for example, "2020-01-01"
- countrycode
Code that identifies a country or ocean (e.g. countrycode = "CA" for Canada, countrycode = "OA" for Atlantic Ocean). See for options.
- statecode
FIPS state alpha code that identifies a state (e.g. statecode = "DE" for Delaware). See for options.
- countycode
FIPS county name. Note that a state code must also be supplied (e.g. statecode = "AL", countycode = "Chilton"). See for options.
- huc
A numeric code denoting a hydrologic unit. Example: "04030202". Different size hucs can be entered. See for a map with HUCS. Click on a HUC to find the associated code.
- siteid
Unique monitoring location identifier.
- siteType
Type of waterbody. See for options.
- characteristicName
Name of parameter. See for options.
- characteristicType
Groups of environmental measurements/parameters. See for options.
- sampleMedia
Sampling substrate such as water, air, or sediment. See for options.
- organization
A string of letters and/or numbers (some additional characters also possible) used to signify an organization with data in the Water Quality Portal. See for options.
- maxrecs
The maximum number of results queried within one call to dataRetrieval.
- applyautoclean
Logical, defaults to FALSE. If TRUE, runs TADA_AutoClean function on the returned data profile.
Some code for this function was adapted from this USGS Blog (Author: Aliesha Krall) Large Sample Pull
See ?TADA_AutoClean documentation for more information on this optional input.
Note: TADA_BigDataRetrieval (by leveraging USGS's dataRetrieval), automatically converts the date times to UTC. It also automatically converts the data to dates, datetimes, numerics based on a standard algorithm. See: ?dataRetrieval::readWQPdata
if (FALSE) { # \dontrun{
# takes approx 3 mins to run
tada1 <- TADA_BigDataRetrieval(startDate = "2019-01-01", endDate = "2021-12-31", characteristicName = "Temperature, water", statecode = c("AK", "AL"))
# takes approx 21 mins
tada2 <- TADA_BigDataRetrieval(startDate = "2016-10-01", endDate = "2022-09-30", statecode = "UT")
# takes seconds to run
tada3 <- TADA_BigDataRetrieval(huc = "04030202", characteristicName = "Escherichia coli")
# takes approx 3 mins to run
tada4 <- TADA_BigDataRetrieval(startDate = "2004-01-01", countrycode = "CA")
# takes seconds to run
tada5 <- TADA_BigDataRetrieval(startDate = "2018-01-01", statecode = "AL", countycode = "Chilton")
# takes seconds to run
tada6 <- TADA_BigDataRetrieval(organization = "PUEBLOOFTESUQUE")
} # }