
Access and harmonize fish data
getFishData.Rd
This function generates an occurrence or abundance community matrix for fish sampled in rivers and streams from the US EPA National Rivers and Streams Assessment and USGS BioData.
Usage
getFishData(
dataType = "occur",
taxonLevel = "Species",
agency = c("USGS", "EPA"),
standardize = "none",
hybrids = FALSE,
sharedTaxa = FALSE,
boatableStreams = FALSE
)
Arguments
- dataType
Output data type for the community matrix, either
"abun"
(abundance) or"occur"
(occurrence).- taxonLevel
Level of taxonomic resolution for the community matrix. Input must be one of:
"Family"
,"Genus"
, or"Species"
.- agency
The agency name or names (e.g., "USGS" and "EPA") that are the source of data for the output community matrix. See
Details
below for more information.- standardize
Standardization method to be used for calculating fish abundance matrices. Default is
standardize = "none"
, which returns raw fish count values. Other options includestandardize = "CPUE"
, which returns standardized abundances in Catch per Unit Effort. An alternative standardization method isstandardize = "MGMS"
, which uses Multigear Mean Standardization (MGMS) values to account for catchability differences between fish sampling methods. See 'Details' for more information on standardizations.- hybrids
logical. Should hybrid individuals be included in the output dataset?
TRUE
orFALSE
.- sharedTaxa
logical. Should taxa be limited to those organisms that appear in both the EPA and USGS datasets?
TRUE
orFALSE
. Must be set toFALSE
when only one agency is specified.- boatableStreams
logical. Should EPA boatable streams be included in the output dataset?
TRUE
orFALSE
. Note: most USGS fish samples are from wadable streams. It is not advisable to include boatable streams when building a dataset including both EPA and USGS data. Boatable EPA data and wadeable USGS data are not necessarily comparable.
Details
agency
refers to the federal agency that collected the fish samples. If
you want to use data from both agencies, set agency = c("USGS", "EPA")
,
which is the default. Note that by default, only moving waters classified as
"wadeable" are included, but setting boatableStreams = TRUE
will
include non-wadeable streams. Some information included in the EPA dataset
are not included in the USGS datasets, specifically observed wetted width of the stream or river.
taxonLevel
refers to the taxonomic resolution (Species, Genus, Family, etc.)
for the sample by taxa matrix. The input values for this parameter are case
sensitive and must start with a capital letter. All observations taxonomically
coarser than the taxonLevel
provided are dropped from the output community matrix.
For instance, if taxonLevel = "Genus"
, then observations identified at
Subfamily, Family, Order, Class, or Phylum levels are dropped. "Species" is
the finest level of taxonomic resolution provided for fish.
To standardize fish abundance data (standardize = "CPUE"
), abundances
are divided by the product of
sampling effort (minutes shocked, number of seine hauls, number of
snorkeling transects) and stream length sampled.
$$CPUE = \frac{taxa~abundance}{(sampling~effort~*~stream~length~fished~(m))}$$
When (standardize = "CPUE"
) or (standardize = "none"
), sampling
events that used multiple gear types will have rows of data for each unique
gear type (i.e. sampling event in which
both electroshocking and senining were used will have separate rows of data for
each gear type). To account for differences in efficacy between shocking, seining, and
snorkeling, multigear mean standardization (standardize = "MGMS"
) is
another standardization method provided as an alternative to catch per unit
effort (standardize = "CPUE"
). When (standardize = "MGMS"
), individual
taxa abundances are standardized for each gear type (i.e. electroshock, seine net,
snorkel), as above in CPUE
. Then, for each gear type, the CPUE
of all i taxa in each sample j is summed to get Total Catch Per Unit Effort
for sample j (TCPUE~j~): $$TCPUE_j = \sum_{} CPUE_{ij}$$
For each gear type, the mean TCPUE is calculated, \(\overline{TCPUE}\).
Next, to standardize each gear, CPUE for each taxa i is divided by
\(\overline{TCPUE}\).
$$MSC_{ij} = \frac{CPUE_{ij}}{\overline{TCPUE}}$$
\(MSC_{ij}\) is the mean standardized catch of species i in observation
j. The units of sampling effort are cancelled in the calculation of
\(MSC_{ij}\), but patterns of relative abundance
of species within and across observations are preserved. The function then
sums the \(MSC_{ij}\) among gear types, resulting in a single row of data for
each sampling event, regardless of the number of gear types used, such that
setting standardize = "CPUE"
will result in more rows within the output
dataset than standardize = "MGMS"
. See Gibson-Reinemer et al. (2017)
for more information regarding the computation of MGMS
Some of the samples lacked information on either stream length sampled or
sampling effort. Therefore,
if a user is interested in occurrence (pres/abs) data only, then set
dataType = "occur" and standardize = "none"
, which will provide an
occurrence dataset the samples that are otherwise dropped with standardization.
Be aware that setting dataType = "occur"
will result in a larger
dataset with additional samples/sites than when dataType = "abun"
.