Skip to contents

This function does multiple synchronous data calls to the WQP (waterqualitydata.us). It uses the WQP summary service to limit the amount downloaded to only relevant data (based on user query), pulls back data for 250000 records at a time, and then joins the data back together to produce a single TADA compatible dataframe as the output. For large data sets, that can save a lot of time and ultimately reduce the complexity of subsequent data processing. Using this function, you will be able to download all data available from all sites in the contiguous United States available for the time period, characteristicName, and siteType requested. Computer memory may limit the size of data frames that your R console will be able to hold in one session. Function requires a characteristicName, siteType, statecode, huc, or start/ end date input. The recommendation is to be as specific as you can with your large data call. The function allows the user to run TADA_AutoClean on the data frame, but this is not the default as checking large dataframes for exact duplicate rows can be time consuming and is better performed on its own once the query is completed.

Usage

TADA_BigDataRetrieval(
  startDate = "null",
  endDate = "null",
  countrycode = "null",
  statecode = "null",
  countycode = "null",
  huc = "null",
  siteid = "null",
  siteType = "null",
  characteristicName = "null",
  characteristicType = "null",
  sampleMedia = "null",
  organization = "null",
  maxrecs = 250000,
  applyautoclean = FALSE
)

Arguments

startDate

Start Date string in the format YYYY-MM-DD, for example, "2020-01-01"

endDate

End Date string in the format YYYY-MM-DD, for example, "2020-01-01"

countrycode

Code that identifies a country or ocean (e.g. countrycode = "CA" for Canada, countrycode = "OA" for Atlantic Ocean). See https://www.waterqualitydata.us/Codes/countrycode for options.

statecode

FIPS state alpha code that identifies a state (e.g. statecode = "DE" for Delaware). See https://www.waterqualitydata.us/Codes/statecode for options.

countycode

FIPS county name. Note that a state code must also be supplied (e.g. statecode = "AL", countycode = "Chilton"). See https://www.waterqualitydata.us/Codes/countycode for options.

huc

A numeric code denoting a hydrologic unit. Example: "04030202". Different size hucs can be entered. See https://epa.maps.arcgis.com/home/item.html?id=796992f4588c401fabec7446ecc7a5a3 for a map with HUCS. Click on a HUC to find the associated code.

siteid

Unique monitoring location identifier.

siteType

Type of waterbody. See https://www.waterqualitydata.us/Codes/sitetype for options.

characteristicName

Name of parameter. See https://www.waterqualitydata.us/Codes/characteristicName for options.

characteristicType

Groups of environmental measurements/parameters. See https://www.waterqualitydata.us/Codes/characteristicType for options.

sampleMedia

Sampling substrate such as water, air, or sediment. See https://www.waterqualitydata.us/Codes/sampleMedia for options.

organization

A string of letters and/or numbers (some additional characters also possible) used to signify an organization with data in the Water Quality Portal. See https://www.waterqualitydata.us/Codes/organization for options.

maxrecs

The maximum number of results queried within one call to dataRetrieval.

applyautoclean

Logical, defaults to FALSE. If TRUE, runs TADA_AutoClean function on the returned data profile.

Value

TADA-compatible dataframe

Details

Some code for this function was adapted from this USGS Blog (Author: Aliesha Krall) Large Sample Pull

See ?TADA_AutoClean documentation for more information on this optional input.

Note: TADA_BigDataRetrieval (by leveraging USGS's dataRetrieval), automatically converts the date times to UTC. It also automatically converts the data to dates, datetimes, numerics based on a standard algorithm. See: ?dataRetrieval::readWQPdata

Examples

if (FALSE) {
# takes approx 3 mins to run
tada1 <- TADA_BigDataRetrieval(startDate = "2019-01-01", endDate = "2021-12-31", characteristicName = "Temperature, water", statecode = c("AK", "AL"))

# takes approx 21 mins
tada2 <- TADA_BigDataRetrieval(startDate = "2016-10-01", endDate = "2022-09-30", statecode = "UT")

# takes seconds to run
tada3 <- TADA_BigDataRetrieval(huc = "04030202", characteristicName = "Escherichia coli")

# takes approx 3 mins to run
tada4 <- TADA_BigDataRetrieval(startDate = "2004-01-01", countrycode = "CA")

# takes seconds to run
tada5 <- TADA_BigDataRetrieval(startDate = "2018-01-01", statecode = "AL", countycode = "Chilton")

# takes seconds to run
tada6 <- TADA_BigDataRetrieval(organization = "PUEBLOOFTESUQUE")
}