Core function that calculates summary stats across all sites and nearby people for EJScreen batch analysis
Source:R/batch.summarize.R
batch.summarize.Rd
This is the function that takes the full tables of batch buffer results and calculates summary statistics like maximum at any site, median across all sites, maximum percentile for all specified indicators at a given site, etc. It can be expanded to provide other summary stats by adding those other formulas to this code.
Usage
batch.summarize(
sitestats,
popstats,
cols = "all",
wtscolname = "pop",
probs = c(0, 0.25, 0.5, 0.75, 0.8, 0.9, 0.95, 0.99, 1),
thresholds = list(90),
threshnames = list(names(which(sapply(sitestats, class) != "character"))),
threshgroups = list("variables"),
na.rm = TRUE,
rowfun.picked = "all",
colfun.picked = "all",
quiet = FALSE,
testing = FALSE
)
Arguments
- sitestats
A matrix or data.frame to summarize, one row per site, one column per variable. Must have correct stats for all people near a given site. Or full path to .csv or .xlsx file with that in a tab called "Each Site" as created by EJAM package ejam2excel function.
- popstats
A matrix or data.frame to summarize, one row per site, one column per variable. Must have reduced counts that count only once each unique person near one or more of the sites. Used to sum population and get stats of distribution of each indicator across all unique individuals.
- cols
NOT USED YET. Specifies which colums of x should be summarized or used during summarization. A single string value 'all' as default to specify all, or a vector of colnames.
- wtscolname
Name of the column that contains the relevant weights to be used (e.g., "pop")
- probs
Vector of numeric values, fractions, to use as probabilities used in finding quantiles. Default is c(0,0.25,0.50,0.75,0.80,0.90,0.95,0.99,1)
- thresholds
list of vectors each with 1+ thresholds (cutpoints) used to count find sites where 1+ of given set of indicators are at/above the threshold & how many of the indicators are. If an element of the list is a single number, that is used for the whole group (all the threshnames in that nth list element). Otherwise/in general, each vector is recycled over the threshnames in corresponding list element, so each threshname can have its own threshold like some field-specific benchmark, or they can all use the same threshold like 50.
- threshnames
list of vectors of character colnames defining fields in x that get compared to threshold, or to thresholds
- threshgroups
of 1+ character strings naming the elements of threshnames list, such as "EJ US pctiles"
- na.rm
Logical TRUE by default, specifying if na.rm should be used for sum(), mean(), and other functions.
- rowfun.picked
logical vector specifying which of the pre-defined functions (like at/above threshold) are needed and will be applied
- colfun.picked
logical vector specifying which of the pre-defined functions (like colSums) are needed and will be applied
- quiet
optional logical, set to TRUE to stop printing results to console in RStudio.
- testing
optional, default is FALSE. prints some debugging info if TRUE.
Value
output is a list with two named elements, rows and cols, where each is a matrix of summary stats.
cols: Each element in a summary col summarizes 1 row (site) across all the RELEVANT cols of batch data (e.g., all US Summary Index percentiles)
rows: Each element in a summary row summarizes 1 column (field) across all the rows of batch data.
keystats: subset of results, for convenience
keyindicators: subset of results, for convenience