Skip to contents

This is the main function in EJAM that runs the analysis. It does essentially what the web app does, to analyze/summarize near a set of points, or in a set of polygons from a shapefile, or in a list of Census Units like Counties.

Usage

ejamit(
  sitepoints,
  radius = 3,
  radius_donut_lower_edge = 0,
  maxradius = 31.07,
  avoidorphans = FALSE,
  quadtree = NULL,
  fips = NULL,
  shapefile = NULL,
  countcols = NULL,
  popmeancols = NULL,
  calculatedcols = NULL,
  subgroups_type = "nh",
  include_ejindexes = TRUE,
  calculate_ratios = TRUE,
  extra_demog = TRUE,
  need_proximityscore = FALSE,
  infer_sitepoints = FALSE,
  need_blockwt = TRUE,
  thresholds = list(80, 80),
  threshnames = list(c(names_ej_pctile, names_ej_state_pctile), c(names_ej_supp_pctile,
    names_ej_supp_state_pctile)),
  threshgroups = list("EJ-US-or-ST", "Supp-US-or-ST"),
  updateProgress = NULL,
  updateProgress_getblocks = NULL,
  in_shiny = FALSE,
  quiet = TRUE,
  parallel = FALSE,
  silentinteractive = FALSE,
  called_by_ejamit = TRUE,
  testing = FALSE,
  showdrinkingwater = TRUE,
  showpctowned = TRUE,
  ...
)

Arguments

sitepoints

data.table with columns lat, lon giving point locations of sites or facilities around which are circular buffers

radius

in miles, defining circular buffer around a site point

radius_donut_lower_edge

radius of lower edge of donut ring if analyzing a ring not circle

maxradius

miles distance (max distance to check if not even 1 block point is within radius)

avoidorphans

logical If TRUE, then where not even 1 BLOCK internal point is within radius of a SITE, it keeps looking past radius, up to maxradius, to find nearest 1 BLOCK. What EJScreen does in that case is report NA, right? So, does EJAM really need to report stats on residents presumed to be within radius, if no block centroid is within radius? Best estimate might be to report indicators from nearest block centroid which is probably almost always the one your site is sitting inside of, but ideally would adjust total count to be a fraction of blockwt based on what is area of circular buffer as fraction of area of block it is apparently inside of. Setting this to TRUE can produce unexpected results, which will not match EJScreen numbers. Note that if creating a proximity score, by contrast, you instead want to find nearest 1 SITE if none within radius of this BLOCK.

quadtree

(a pointer to the large quadtree object) created using indexblocks() which uses the SearchTree package. Takes about 2-5 seconds to create this each time it is needed. It can be automatically created when the package is attached via the .onAttach() function

fips

optional FIPS code vector to provide if using FIPS instead of sitepoints to specify places to analyze, such as a list of US Counties or tracts. Passed to getblocksnearby_from_fips()

shapefile

optional. A sf shapefile object or path to .zip, .gdb, or folder that has a shapefiles, to analyze polygons. If in RStudio you want it to interactively prompt you to pick a file, use shapefile=1 (otherwise it assumes you want to pick a latlon file).

countcols

character vector of names of variables to aggregate within a buffer using a sum of counts, like, for example, the number of people for whom a poverty ratio is known, the count of which is the exact denominator needed to correctly calculate percent low income.

popmeancols

character vector of names of variables to aggregate within a buffer using population weighted mean.

calculatedcols

character vector of names of variables to aggregate within a buffer using formulas that have to be specified.

subgroups_type

Optional (uses default). Set this to "nh" for non-hispanic race subgroups as in Non-Hispanic White Alone, nhwa and others in names_d_subgroups_nh; "alone" for race subgroups like White Alone, wa and others in names_d_subgroups_alone; "both" for both versions. Possibly another option is "original" or "default" Alone means single race.

include_ejindexes

whether to try to include Summary Indexes (assuming dataset is available) - passed to doaggregate()

calculate_ratios

whether to calculate and return ratio of each indicator to US and State overall averages - passed to doaggregate()

extra_demog

if should include more indicators from v2.2 report on language etc.

need_proximityscore

whether to calculate proximity scores

infer_sitepoints

set to TRUE to try to infer the lat,lon of each site around which the blocks in sites2blocks were found. lat,lon of each site will be approximated as average of nearby blocks, although a more accurate slower way would be to use reported distance of each of 3 of the furthest block points and triangulate

need_blockwt

if fips parameter is used, passed to getblocksnearby_from_fips()

thresholds

list of percentiles like list(80,90) passed to batch.summarize(), to be counted to report how many of each set of indicators exceed thresholds at each site. (see default)

threshnames

list of groups of variable names (see default)

threshgroups

list of text names of the groups (see default)

updateProgress

progress bar function passed to doaggregate() in shiny app

updateProgress_getblocks

progress bar function passed to getblocksnearby() in shiny app

in_shiny

if fips parameter is used, passed to getblocksnearby_from_fips()

quiet

Optional. passed to getblocksnearby() and batch.summarize(). set to TRUE to avoid message about using getblocks_diagnostics(), which is relevant only if a user saved the output of this function.

parallel

whether to use parallel processing in getblocksnearby(), but not implemented yet.

silentinteractive

to prevent long output showing in console in RStudio when in interactive mode, passed to doaggregate() also. app server sets this to TRUE when calling doaggregate() but ejamit() default is to set this to FALSE when calling doaggregate().

called_by_ejamit

Set to TRUE by ejamit() to suppress some outputs even if ejamit(silentinteractive=F)

testing

used while testing this function, passed to doaggregate()

...

passed to getblocksnearby() etc. such as report_progress_every_n = 0

Value

This returns a named list of results.

# To see the structure of the outputs of ejamit()

structure.of.output.list(testoutput_ejamit_10pts_1miles)

dim(testoutput_ejamit_10pts_1miles$results_summarized$keystats)

dim(testoutput_ejamit_10pts_1miles$results_summarized$rows)

dim(testoutput_ejamit_10pts_1miles$results_summarized$cols)

dim(testoutput_ejamit_10pts_1miles$results_summarized$keyindicators)

  • results_overall a data.table with one row that provides the summary across all sites, the aggregated results for all unique residents.

  • results_bysite results for individual sites (buffers) - a data.table of results, one row per ejam_uniq_id (i.e., each site analyzed), one column per indicator

  • results_bybg_people results for each block group, to allow for showing the distribution of each indicator across everyone, including the distribution within a single residential population group, for example.

  • longnames descriptive long names for the indicators in the above outputs

  • count_of_blocks_near_multiple_sites additional detail

  • sitetype indicates if analysis used latlon, fips, or shp

  • results_summarized named list with "rows", "cols", "keystats", "keyindicators", each providing additional summary stats. Each is a data.frame except x$results_summarized$keystats is a matrix/array.

    • x$results_summarized$cols provides, at each site, the count of Summary Indexes at or above a threshold like the 80th percentile.

    • x$results_summarized$keyindicators provides summary stats for a handful of indicators.

    • x$results_summarized$keystats provides, for each indicator, the average across all sites and average across all (unique) residents, one row per indicator (a "tall" format).

    • x$results_summarized$rows provides the same, but as one column per indicator, corresponding to the format used in results_bysite or results_overall.

  • formatted another tall format showing averages for all indicators

  • sitetype the type of analysis done: "latlon", "shp", "fips", etc.

Details

See examples in vignettes/ articles at https://usepa.github.io/EJAM/index.html

Examples


# See examples in vignettes/ articles at https://usepa.github.io/EJAM/index.html

 # All in one step, using functions not shiny app:
 out <- ejamit(testpoints_100_dt, 2)

 if (FALSE) { # \dontrun{
 # Do not specify sitepoints and it will prompt you for a file,
 # if in RStudio in interactive mode!
 out <- ejamit(radius = 3)

  # Specify facilities or sites as points for test data,
  # use 1000 test facility points from the R package
  testsites <- testpoints_1000
  # use facility points in an excel or csv file
   testsites <- latlon_from_anything(
     system.file(paste0("testdata/latlon/",
      "testpoints_10.xlsx"),
    package = "EJAM")
    )
   # head(testsites)
  # use facility points from a random sample of EPA-regulated facilities
  testsites <- testpoints_n(1e3)

  # Specify max distance from sites to look at (residents within X miles of site point)
  radius <- 3.1 # miles

  # Get summaries of all indicators near a set of points
  out <- ejamit(testsites, radius)
  # out <- ejamit("myfile.xlsx", 3.1)

  # out2 <- ejscreenit(testpoints_5)

  # View results overall
  round(t(out$results_overall), 3.1)

  # View plots
   plot_distance_by_group(results_bybg_people = out$results_bybg_people)
   distance_by_group(out$results_bybg_people)

  # View maps
  mapfast(out$results_bysite, radius = 3.1)

  # view results at a single site
  mapfast(out$results_bysite, radius = 3.1)
  # all the raw numbers at one site
  t(out$results_bysite[1, ])

  # if doing just 1st step of ejamit()
  #  get distance between each site and every nearby Census block

  s2b <- testoutput_getblocksnearby_100pts_1miles
  getblocks_diagnostics(s2b)

  testsites <- testpoints_10[2,]
  s2b <- getblocksnearby(testsites, radius = 3.1)
  getblocks_diagnostics(s2b)
  plotblocksnearby(s2b)

  # if doing just 2d step of ejamit()
  #  get summaries of all indicators based on table of distances
  out <- doaggregate(s2b, testsites) # this works now and is simpler

} # }