Core function that calculates summary stats across all sites and nearby people for EJScreen batch analysis

This is the function that takes the full tables of batch buffer results and calculates summary statistics like maximum at any site, median across all sites, maximum percentile for all specified indicators at a given site, etc. It can be expanded to provide other summary stats by adding those other formulas to this code.

Usage

batch.summarize(
  sitestats,
  popstats,
  cols = "all",
  wtscolname = "pop",
  probs = c(0, 0.25, 0.5, 0.75, 0.8, 0.9, 0.95, 0.99, 1),
  thresholds = list(90),
  threshnames = list(names(which(sapply(sitestats, class) != "character"))),
  threshgroups = list("variables"),
  na.rm = TRUE,
  rowfun.picked = "all",
  colfun.picked = "all",
  quiet = FALSE,
  testing = FALSE
)

Arguments

sitestats: A matrix or data.frame to summarize, one row per site, one column per variable. Must have correct stats for all people near a given site. Or full path to .csv or .xlsx file with that in a tab called "Each Site" as created by EJAM package ejam2excel function.
popstats: A matrix or data.frame to summarize, one row per site, one column per variable. Must have reduced counts that count only once each unique person near one or more of the sites. Used to sum population and get stats of distribution of each indicator across all unique individuals.
cols: NOT USED YET. Specifies which colums of x should be summarized or used during summarization. A single string value 'all' as default to specify all, or a vector of colnames.
wtscolname: Name of the column that contains the relevant weights to be used (e.g., "pop")
probs: Vector of numeric values, fractions, to use as probabilities used in finding quantiles. Default is c(0,0.25,0.50,0.75,0.80,0.90,0.95,0.99,1)
thresholds: list of vectors each with 1+ thresholds (cutpoints) used to count find sites where 1+ of given set of indicators are at/above the threshold & how many of the indicators are. If an element of the list is a single number, that is used for the whole group (all the threshnames in that nth list element). Otherwise/in general, each vector is recycled over the threshnames in corresponding list element, so each threshname can have its own threshold like some field-specific benchmark, or they can all use the same threshold like 50.
threshnames: list of vectors of character colnames defining fields in x that get compared to threshold, or to thresholds
threshgroups: of 1+ character strings naming the elements of threshnames list, such as "EJ US pctiles"
na.rm: Logical TRUE by default, specifying if na.rm should be used for sum(), mean(), and other functions.
rowfun.picked: logical vector specifying which of the pre-defined functions (like at/above threshold) are needed and will be applied
colfun.picked: logical vector specifying which of the pre-defined functions (like colSums) are needed and will be applied
quiet: optional logical, set to TRUE to stop printing results to console in RStudio.
testing: optional, default is FALSE. prints some debugging info if TRUE.

Value

output is a list with two named elements, rows and cols, where each is a matrix of summary stats.

cols: Each element in a summary col summarizes 1 row (site) across all the RELEVANT cols of batch data (e.g., all US Summary Index percentiles)

rows: Each element in a summary row summarizes 1 column (field) across all the rows of batch data.

keystats: subset of results, for convenience

keyindicators: subset of results, for convenience