Get an EJ analysis (residential population and environmental indicators) in or near a list of locations

This is the main function in EJAM that runs the analysis. It does essentially what the web app does, to analyze/summarize near a set of points, or in a set of polygons from a shapefile, or in a list of Census Units like Counties.

Usage

ejamit(
  sitepoints = NULL,
  radius = 3,
  radius_donut_lower_edge = 0,
  maxradius = 31.07,
  avoidorphans = FALSE,
  quadtree = NULL,
  fips = NULL,
  shapefile = NULL,
  countcols = NULL,
  wtdmeancols = NULL,
  calculatedcols = NULL,
  calctype_maxbg = NULL,
  calctype_minbg = NULL,
  subgroups_type = "nh",
  include_ejindexes = TRUE,
  calculate_ratios = TRUE,
  extra_demog = TRUE,
  need_proximityscore = FALSE,
  infer_sitepoints = FALSE,
  need_blockwt = TRUE,
  thresholds = list(80, 80),
  threshnames = list(c(names_ej_pctile, names_ej_state_pctile), c(names_ej_supp_pctile,
    names_ej_supp_state_pctile)),
  threshgroups = list("EJ-US-or-ST", "Supp-US-or-ST"),
  updateProgress = NULL,
  updateProgress_getblocks = NULL,
  progress_all = NULL,
  in_shiny = FALSE,
  quiet = TRUE,
  silentinteractive = FALSE,
  called_by_ejamit = TRUE,
  testing = FALSE,
  showdrinkingwater = TRUE,
  showpctowned = TRUE,
  download_fips_bounds_to_calc_areas = FALSE,
  ...
)

Arguments

sitepoints: data.table with columns lat, lon giving point locations of sites or facilities around which are circular buffers
radius: in miles, defining circular buffer around a site point
radius_donut_lower_edge: radius of lower edge of donut ring if analyzing a ring not circle
maxradius: miles distance (max distance to check if not even 1 block point is within radius)
avoidorphans: logical If TRUE, then where not even 1 BLOCK internal point is within radius of a SITE, it keeps looking past radius, up to maxradius, to find nearest 1 BLOCK. What EJScreen does in that case is report NA, right? So, does EJAM really need to report stats on residents presumed to be within radius, if no block centroid is within radius? Best estimate might be to report indicators from nearest block centroid which is probably almost always the one your site is sitting inside of, but ideally would adjust total count to be a fraction of blockwt based on what is area of circular buffer as fraction of area of block it is apparently inside of. Setting this to TRUE can produce unexpected results, which will not match EJScreen numbers. Note that if creating a proximity score, by contrast, you instead want to find nearest 1 SITE if none within radius of this BLOCK.
quadtree: (a pointer to the large quadtree object) created using indexblocks() which uses the SearchTree package. Takes about 2-5 seconds to create this each time it is needed. It can be automatically created when the package is attached via the .onAttach() function
fips: optional FIPS code vector to provide if using FIPS instead of sitepoints to specify places to analyze, such as a list of US Counties or tracts. Passed to getblocksnearby_from_fips()
shapefile: optional. A sf shapefile object or path to .zip, .gdb, .json, .kml, etc., or folder that has a shapefiles, to analyze polygons. e.g., out = ejamit(shapefile = testdata("portland.json", quiet = T), radius = 0) If in RStudio you want it to interactively prompt you to pick a file, use shapefile=1 (otherwise it assumes you want to pick a latlon file).
countcols: character vector of names of variables to aggregate within a buffer using a sum of counts, like, for example, the number of people for whom a poverty ratio is known, the count of which is the exact denominator needed to correctly calculate percent low income.
wtdmeancols: character vector of names of variables to aggregate within a buffer using population-weighted or other-weighted mean.
calculatedcols: character vector of names of variables to aggregate within a buffer using formulas that have to be specified.
calctype_maxbg: character vector of names of variables to aggregate within a buffer using max() of all blockgroup-level values.
calctype_minbg: character vector of names of variables to aggregate within a buffer using min() of all blockgroup-level values.
subgroups_type: Optional (uses default). Set this to "nh" for non-hispanic race subgroups as in Non-Hispanic White Alone, nhwa and others in names_d_subgroups_nh; "alone" for race subgroups like White Alone, wa and others in names_d_subgroups_alone; "both" for both versions. Possibly another option is "original" or "default" Alone means single race.
include_ejindexes: whether to try to include Summary Indexes (assuming dataset is available) - passed to doaggregate()
calculate_ratios: whether to calculate and return ratio of each indicator to US and State overall averages - passed to doaggregate()
extra_demog: if should include more indicators from v2.2 report on language etc.
need_proximityscore: whether to calculate proximity scores
infer_sitepoints: set to TRUE to try to infer the lat,lon of each site around which the blocks in sites2blocks were found. lat,lon of each site will be approximated as average of nearby blocks, although a more accurate slower way would be to use reported distance of each of 3 of the furthest block points and triangulate
need_blockwt: if fips parameter is used, passed to getblocksnearby_from_fips()
thresholds: list of percentiles like list(80,90) passed to batch.summarize(), to be counted to report how many of each set of indicators exceed thresholds at each site. (see default)
threshnames: list of groups of variable names (see default)
threshgroups: list of text names of the groups (see default)
updateProgress: progress bar function passed to doaggregate() in shiny app
updateProgress_getblocks: progress bar function passed to getblocksnearby() in shiny app
progress_all: progress bar from app in R shiny to run
in_shiny: if fips parameter is used, passed to getblocksnearby_from_fips()
quiet: Optional. passed to getblocksnearby() and batch.summarize(). set to TRUE to avoid message about using getblocks_diagnostics(), which is relevant only if a user saved the output of this function.
silentinteractive: to prevent long output showing in console in RStudio when in interactive mode, passed to doaggregate() also. app server sets this to TRUE when calling doaggregate() but ejamit() default is to set this to FALSE when calling doaggregate().
called_by_ejamit: Set to TRUE by ejamit() to suppress some outputs even if ejamit(silentinteractive=F)
testing: used while testing this function, passed to doaggregate()
showdrinkingwater: T/F whether to include drinking water indicator values or display as NA. Defaults to TRUE.
showpctowned: T/f whether to include percent owner-occupied units indicator values or display as NA. Defaults to TRUE.
download_fips_bounds_to_calc_areas: if set to TRUE, it is slower because it downloads bounds of each unit to calculate area in square miles
...: passed to getblocksnearby() etc. such as report_progress_every_n = 0

Value

This returns a named list of results.

# To see the structure of the outputs of ejamit()

structure.of.output.list(testoutput_ejamit_10pts_1miles)

dim(testoutput_ejamit_10pts_1miles$results_summarized$keystats)

dim(testoutput_ejamit_10pts_1miles$results_summarized$rows)

dim(testoutput_ejamit_10pts_1miles$results_summarized$cols)

dim(testoutput_ejamit_10pts_1miles$results_summarized$keyindicators)

results_overall a data.table with one row that provides the summary across all sites, the aggregated results for all unique residents.
results_bysite results for individual sites (buffers) - a data.table of results, one row per ejam_uniq_id (i.e., each site analyzed), one column per indicator
results_bybg_people results for each block group, to allow for showing the distribution of each indicator across everyone, including the distribution within a single residential population group, for example.
longnames descriptive long names for the indicators in the above outputs
count_of_blocks_near_multiple_sites additional detail
sitetype indicates if analysis used latlon, fips, or shp
results_summarized named list with "rows", "cols", "keystats", "keyindicators", each providing additional summary stats. Each is a data.frame except x$results_summarized$keystats is a matrix/array.
- x$results_summarized$cols provides, at each site, the count of Summary Indexes at or above a threshold like the 80th percentile.
- x$results_summarized$keyindicators provides summary stats for a handful of indicators.
- x$results_summarized$keystats provides, for each indicator, the average across all sites and average across all (unique) residents, one row per indicator (a "tall" format).
- x$results_summarized$rows provides the same, but as one column per indicator, corresponding to the format used in results_bysite or results_overall.
formatted another tall format showing averages for all indicators
sitetype the type of analysis done: "latlon", "shp", "fips", etc.

Details

See examples in vignettes/ articles at https://usepa.github.io/EJAM-open

Examples


# See examples in vignettes/ articles

 # All in one step, using functions not shiny app:
 out <- ejamit(testpoints_100_dt, 2)

 # \donttest{
 # Do not specify sitepoints and it will prompt you for a file,
 # if in RStudio in interactive mode!
 out <- ejamit(radius = 3)

  # Specify facilities or sites as points for test data,
  # use 1000 test facility points from the R package
  testsites <- testpoints_1000
  # use facility points in an excel or csv file
   testsites <- latlon_from_anything(
     system.file(paste0("testdata/latlon/",
      "testpoints_10.xlsx"),
    package = "EJAM")
    )
   # head(testsites)
  # use facility points from a random sample of EPA-regulated facilities
  testsites <- testpoints_n(1e3)

  # Specify max distance from sites to look at (residents within X miles of site point)
  radius <- 3.1 # miles

  # Get summaries of all indicators near a set of points
  out <- ejamit(testsites, radius)
  # out <- ejamit("myfile.xlsx", 3.1)

  # Shapefile examples
  out2 = ejamit(shapefile = testshapes_2, radius = 0)
  out3 = ejamit(shapefile = testdata("portland.json", quiet = T), radius = 0)
  
  # FIPS examples
  out4 = ejamit(fips = testinput_fips_cities)
  out5 = ejamit(fips = fips_counties_from_state_abbrev("DE"), radius = 0)
  
  # View results overall
  round(t(out$results_overall), 3.1)

  # View plots
   plot_distance_by_group(results_bybg_people = out$results_bybg_people)
   distance_by_group(out$results_bybg_people)

  # View maps
  mapfast(out$results_bysite, radius = 3.1)

  # view results at a single site
  mapfast(out$results_bysite, radius = 3.1)
  # all the raw numbers at one site
  t(out$results_bysite[1, ])

  # if doing just 1st step of ejamit()
  #  get distance between each site and every nearby Census block

  s2b <- testoutput_getblocksnearby_100pts_1miles
  getblocks_diagnostics(s2b)

  testsites <- testpoints_10[2,]
  s2b <- getblocksnearby(testsites, radius = 3.1)
  getblocks_diagnostics(s2b)
  plotblocksnearby(s2b)

  # if doing just 2d step of ejamit()
  #  get summaries of all indicators based on table of distances
  out <- doaggregate(s2b, testsites) # this works now and is simpler

# }