Get an EJ analysis (residential population and environmental indicators) in or near a list of locations
Source:R/ejamit.R
ejamit.Rd
This is the main function in EJAM that runs the analysis. It does essentially what the web app does, to analyze/summarize near a set of points, or in a set of polygons from a shapefile, or in a list of Census Units like Counties.
Usage
ejamit(
sitepoints,
radius = 3,
radius_donut_lower_edge = 0,
maxradius = 31.07,
avoidorphans = FALSE,
quadtree = NULL,
fips = NULL,
shapefile = NULL,
countcols = NULL,
popmeancols = NULL,
calculatedcols = NULL,
subgroups_type = "nh",
include_ejindexes = TRUE,
calculate_ratios = TRUE,
extra_demog = TRUE,
need_proximityscore = FALSE,
infer_sitepoints = FALSE,
need_blockwt = TRUE,
thresholds = list(80, 80),
threshnames = list(c(names_ej_pctile, names_ej_state_pctile), c(names_ej_supp_pctile,
names_ej_supp_state_pctile)),
threshgroups = list("EJ-US-or-ST", "Supp-US-or-ST"),
updateProgress = NULL,
updateProgress_getblocks = NULL,
in_shiny = FALSE,
quiet = TRUE,
parallel = FALSE,
silentinteractive = FALSE,
called_by_ejamit = TRUE,
testing = FALSE,
showdrinkingwater = TRUE,
showpctowned = TRUE,
...
)
Arguments
- sitepoints
data.table with columns lat, lon giving point locations of sites or facilities around which are circular buffers
- radius
in miles, defining circular buffer around a site point
- radius_donut_lower_edge
radius of lower edge of donut ring if analyzing a ring not circle
- maxradius
miles distance (max distance to check if not even 1 block point is within radius)
- avoidorphans
logical If TRUE, then where not even 1 BLOCK internal point is within radius of a SITE, it keeps looking past radius, up to maxradius, to find nearest 1 BLOCK. What EJScreen does in that case is report NA, right? So, does EJAM really need to report stats on residents presumed to be within radius, if no block centroid is within radius? Best estimate might be to report indicators from nearest block centroid which is probably almost always the one your site is sitting inside of, but ideally would adjust total count to be a fraction of blockwt based on what is area of circular buffer as fraction of area of block it is apparently inside of. Setting this to TRUE can produce unexpected results, which will not match EJScreen numbers. Note that if creating a proximity score, by contrast, you instead want to find nearest 1 SITE if none within radius of this BLOCK.
- quadtree
(a pointer to the large quadtree object) created using indexblocks() which uses the SearchTree package. Takes about 2-5 seconds to create this each time it is needed. It can be automatically created when the package is attached via the .onAttach() function
- fips
optional FIPS code vector to provide if using FIPS instead of sitepoints to specify places to analyze, such as a list of US Counties or tracts. Passed to
getblocksnearby_from_fips()
- shapefile
optional. A sf shapefile object or path to .zip, .gdb, or folder that has a shapefiles, to analyze polygons. If in RStudio you want it to interactively prompt you to pick a file, use shapefile=1 (otherwise it assumes you want to pick a latlon file).
- countcols
character vector of names of variables to aggregate within a buffer using a sum of counts, like, for example, the number of people for whom a poverty ratio is known, the count of which is the exact denominator needed to correctly calculate percent low income.
- popmeancols
character vector of names of variables to aggregate within a buffer using population weighted mean.
- calculatedcols
character vector of names of variables to aggregate within a buffer using formulas that have to be specified.
- subgroups_type
Optional (uses default). Set this to "nh" for non-hispanic race subgroups as in Non-Hispanic White Alone, nhwa and others in names_d_subgroups_nh; "alone" for race subgroups like White Alone, wa and others in names_d_subgroups_alone; "both" for both versions. Possibly another option is "original" or "default" Alone means single race.
- include_ejindexes
whether to try to include Summary Indexes (assuming dataset is available) - passed to
doaggregate()
- calculate_ratios
whether to calculate and return ratio of each indicator to US and State overall averages - passed to
doaggregate()
- extra_demog
if should include more indicators from v2.2 report on language etc.
- need_proximityscore
whether to calculate proximity scores
- infer_sitepoints
set to TRUE to try to infer the lat,lon of each site around which the blocks in sites2blocks were found. lat,lon of each site will be approximated as average of nearby blocks, although a more accurate slower way would be to use reported distance of each of 3 of the furthest block points and triangulate
- need_blockwt
if fips parameter is used, passed to
getblocksnearby_from_fips()
- thresholds
list of percentiles like list(80,90) passed to batch.summarize(), to be counted to report how many of each set of indicators exceed thresholds at each site. (see default)
- threshnames
list of groups of variable names (see default)
- threshgroups
list of text names of the groups (see default)
- updateProgress
progress bar function passed to
doaggregate()
in shiny app- updateProgress_getblocks
progress bar function passed to
getblocksnearby()
in shiny app- in_shiny
if fips parameter is used, passed to
getblocksnearby_from_fips()
- quiet
Optional. passed to getblocksnearby() and batch.summarize(). set to TRUE to avoid message about using
getblocks_diagnostics()
, which is relevant only if a user saved the output of this function.- parallel
whether to use parallel processing in
getblocksnearby()
, but not implemented yet.- silentinteractive
to prevent long output showing in console in RStudio when in interactive mode, passed to
doaggregate()
also. app server sets this to TRUE when calling doaggregate() butejamit()
default is to set this to FALSE when callingdoaggregate()
.- called_by_ejamit
Set to TRUE by
ejamit()
to suppress some outputs even if ejamit(silentinteractive=F)- testing
used while testing this function, passed to doaggregate()
- ...
passed to
getblocksnearby()
etc. such as report_progress_every_n = 0
Value
This returns a named list of results.
# To see the structure of the outputs of ejamit()
structure.of.output.list(testoutput_ejamit_10pts_1miles)
dim(testoutput_ejamit_10pts_1miles$results_summarized$keystats)
dim(testoutput_ejamit_10pts_1miles$results_summarized$rows)
dim(testoutput_ejamit_10pts_1miles$results_summarized$cols)
dim(testoutput_ejamit_10pts_1miles$results_summarized$keyindicators)
results_overall a data.table with one row that provides the summary across all sites, the aggregated results for all unique residents.
results_bysite results for individual sites (buffers) - a data.table of results, one row per ejam_uniq_id (i.e., each site analyzed), one column per indicator
results_bybg_people results for each block group, to allow for showing the distribution of each indicator across everyone, including the distribution within a single residential population group, for example.
longnames descriptive long names for the indicators in the above outputs
count_of_blocks_near_multiple_sites additional detail
sitetype indicates if analysis used latlon, fips, or shp
results_summarized named list with "rows", "cols", "keystats", "keyindicators", each providing additional summary stats. Each is a data.frame except x$results_summarized$keystats is a matrix/array.
x$results_summarized$cols provides, at each site, the count of Summary Indexes at or above a threshold like the 80th percentile.
x$results_summarized$keyindicators provides summary stats for a handful of indicators.
x$results_summarized$keystats provides, for each indicator, the average across all sites and average across all (unique) residents, one row per indicator (a "tall" format).
x$results_summarized$rows provides the same, but as one column per indicator, corresponding to the format used in results_bysite or results_overall.
formatted another tall format showing averages for all indicators
sitetype the type of analysis done: "latlon", "shp", "fips", etc.
Examples
# See examples in vignettes/ articles at https://usepa.github.io/EJAM/index.html
# All in one step, using functions not shiny app:
out <- ejamit(testpoints_100_dt, 2)
if (FALSE) { # \dontrun{
# Do not specify sitepoints and it will prompt you for a file,
# if in RStudio in interactive mode!
out <- ejamit(radius = 3)
# Specify facilities or sites as points for test data,
# use 1000 test facility points from the R package
testsites <- testpoints_1000
# use facility points in an excel or csv file
testsites <- latlon_from_anything(
system.file(paste0("testdata/latlon/",
"testpoints_10.xlsx"),
package = "EJAM")
)
# head(testsites)
# use facility points from a random sample of EPA-regulated facilities
testsites <- testpoints_n(1e3)
# Specify max distance from sites to look at (residents within X miles of site point)
radius <- 3.1 # miles
# Get summaries of all indicators near a set of points
out <- ejamit(testsites, radius)
# out <- ejamit("myfile.xlsx", 3.1)
# out2 <- ejscreenit(testpoints_5)
# View results overall
round(t(out$results_overall), 3.1)
# View plots
plot_distance_by_group(results_bybg_people = out$results_bybg_people)
distance_by_group(out$results_bybg_people)
# View maps
mapfast(out$results_bysite, radius = 3.1)
# view results at a single site
mapfast(out$results_bysite, radius = 3.1)
# all the raw numbers at one site
t(out$results_bysite[1, ])
# if doing just 1st step of ejamit()
# get distance between each site and every nearby Census block
s2b <- testoutput_getblocksnearby_100pts_1miles
getblocks_diagnostics(s2b)
testsites <- testpoints_10[2,]
s2b <- getblocksnearby(testsites, radius = 3.1)
getblocks_diagnostics(s2b)
plotblocksnearby(s2b)
# if doing just 2d step of ejamit()
# get summaries of all indicators based on table of distances
out <- doaggregate(s2b, testsites) # this works now and is simpler
} # }