Access and harmonize fish data — getFishData • finsyncR

This function generates an occurrence or abundance community matrix for fish sampled in rivers and streams from the US EPA National Rivers and Streams Assessment and USGS BioData.

Usage

getFishData(
  dataType = "occur",
  taxonLevel = "Species",
  agency = c("USGS", "EPA"),
  standardize = "none",
  hybrids = FALSE,
  sharedTaxa = FALSE,
  boatableStreams = FALSE
)

Arguments

dataType: Output data type for the community matrix, either "abun" (abundance) or "occur" (occurrence).
taxonLevel: Level of taxonomic resolution for the community matrix. Input must be one of: "Family", "Genus", or "Species".
agency: The agency name or names (e.g., "USGS" and "EPA") that are the source of data for the output community matrix. See Details below for more information.
standardize: Standardization method to be used for calculating fish abundance matrices. Default is standardize = "none", which returns raw fish count values. Other options include standardize = "CPUE", which returns standardized abundances in Catch per Unit Effort. An alternative standardization method is standardize = "MGMS", which uses Multigear Mean Standardization (MGMS) values to account for catchability differences between fish sampling methods. See 'Details' for more information on standardizations.
hybrids: logical. Should hybrid individuals be included in the output dataset? TRUE or FALSE.
sharedTaxa: logical. Should taxa be limited to those organisms that appear in both the EPA and USGS datasets? TRUE or FALSE. Must be set to FALSE when only one agency is specified.
boatableStreams: logical. Should EPA boatable streams be included in the output dataset? TRUE or FALSE. Note: most USGS fish samples are from wadable streams. It is not advisable to include boatable streams when building a dataset including both EPA and USGS data. Boatable EPA data and wadeable USGS data are not necessarily comparable.

Value

A taxa by sample data frame with site, stream reach, and sample information.

Details

agency refers to the federal agency that collected the fish samples. If you want to use data from both agencies, set agency = c("USGS", "EPA"), which is the default. Note that by default, only moving waters classified as "wadeable" are included, but setting boatableStreams = TRUE will include non-wadeable streams. Some information included in the EPA dataset are not included in the USGS datasets, specifically observed wetted width of the stream or river.

taxonLevel refers to the taxonomic resolution (Species, Genus, Family, etc.) for the sample by taxa matrix. The input values for this parameter are case sensitive and must start with a capital letter. All observations taxonomically coarser than the taxonLevel provided are dropped from the output community matrix. For instance, if taxonLevel = "Genus", then observations identified at Subfamily, Family, Order, Class, or Phylum levels are dropped. "Species" is the finest level of taxonomic resolution provided for fish.

To standardize fish abundance data (standardize = "CPUE"), abundances are divided by the product of sampling effort (minutes shocked, number of seine hauls, number of snorkeling transects) and stream length sampled. $$CPUE = \frac{taxa~abundance}{(sampling~effort~*~stream~length~fished~(m))}$$

When (standardize = "CPUE") or (standardize = "none"), sampling events that used multiple gear types will have rows of data for each unique gear type (i.e. sampling event in which both electroshocking and senining were used will have separate rows of data for each gear type). To account for differences in efficacy between shocking, seining, and snorkeling, multigear mean standardization (standardize = "MGMS") is another standardization method provided as an alternative to catch per unit effort (standardize = "CPUE"). When (standardize = "MGMS"), individual taxa abundances are standardized for each gear type (i.e. electroshock, seine net, snorkel), as above in CPUE. Then, for each gear type, the CPUE of all i taxa in each sample j is summed to get Total Catch Per Unit Effort for sample j (TCPUE~j~): $$TCPUE_j = \sum_{} CPUE_{ij}$$

For each gear type, the mean TCPUE is calculated, $\overline{TCPUE}$. Next, to standardize each gear, CPUE for each taxa i is divided by $\overline{TCPUE}$. $$MSC_{ij} = \frac{CPUE_{ij}}{\overline{TCPUE}}$$ $MSC_{ij}$ is the mean standardized catch of species i in observation j. The units of sampling effort are cancelled in the calculation of $MSC_{ij}$, but patterns of relative abundance of species within and across observations are preserved. The function then sums the $MSC_{ij}$ among gear types, resulting in a single row of data for each sampling event, regardless of the number of gear types used, such that setting standardize = "CPUE" will result in more rows within the output dataset than standardize = "MGMS". See Gibson-Reinemer et al. (2017) for more information regarding the computation of MGMS

Some of the samples lacked information on either stream length sampled or sampling effort. Therefore, if a user is interested in occurrence (pres/abs) data only, then set dataType = "occur" and standardize = "none", which will provide an occurrence dataset the samples that are otherwise dropped with standardization. Be aware that setting dataType = "occur" will result in a larger dataset with additional samples/sites than when dataType = "abun".

References

Gibson-Reinemer DK, Ickes BS, Chick JH, 2014. Development and assessment of a new method for combining catch per unit effort data from different fish sampling gears: Multigear mean standardization (MGMS). Can. J. Fish. Aquat. Sci. 74:8-14.

Author

Michael Mahon, Ethan Brown, Samantha Rumschlag, Terry Brown

Examples

if (FALSE) {
Fish <- getFishData(taxonLevel = "Species")
}