Skip to contents

This is used with a lookup table to convert a raw indicator vector to percentiles in US or States.

Usage

pctile_from_raw_lookup(
  myvector,
  varname.in.lookup.table,
  lookup = usastats,
  zone = "USA",
  quiet = TRUE
)

Arguments

myvector

Numeric vector, required. Values to look for in the lookup table.

varname.in.lookup.table

Character element, required. Name of column in lookup table to look in to find interval where a given element of myvector values is.

*** If vector is provided, then must be same length as myvector,

but only 1 value for zone can be provided.

lookup

Either lookup must be provided, not quoted, or a lookup table called usastats must already be in memory. This is the lookup table data.frame with a PCTILE column, REGION column, and column whose name is the value of varname.in.lookup.table To use state lookups set lookup=statestats

zone

Character element (or vector as long as myvector), optional. If specified, must appear in a column called REGION within the lookup table, or NA returned for each item looked up and warning given. For example, it could be "NY" for New York State, "USA" for national percentiles.

quiet

set to FALSE to see details on where certain scores were all NA values like in 1 state

Value

By default, returns numeric vector length of myvector.

Details

This could be recoded to be more efficient - could use data.table.

The data.frame lookup table must have a field called "PCTILE" that has quantiles/percentiles and other column(s) with values that fall at those percentiles. usastats and statestats are such lookup tables. This function uses a lookup table and finds the number in the PCTILE column that corresponds to where a specified value (in myvector) appears in the column called varname.in.lookup.table. The function just looks for where the specified value fits between values in the lookup table and returns the approximate percentile as found in the PCTILE column. If the value is between the cutpoints listed as percentiles 89 and 90, it returns 89, for example. If the value is exactly equal to the cutpoint listed as percentile 90, it returns percentile 90. If the value is exactly the same as the minimum in the lookup table and multiple percentiles in that lookup are listed as tied for having the same threshold value defining the percentile (i.e., a large percent of places have the same score and it is the minimum score), then the percentile gets reported as 0, not the percent of places tied for that minimum score. Note this is true whether they are tied at a value of 0 or are tied at some other minimum value than 0. If the value is less than the cutpoint listed as percentile 0, which should be the minimum value in the dataset, it still returns 0 as the percentile, but with a warning that the value checked was less than the minimum in the dataset.

It also handles other odd cases, like where a large percent of all raw scores are tied at the minimum value, in which case it reports 0 as percentile, not that large percent.

Examples

if (FALSE) { # \dontrun{

eg <- dput(round(as.vector(unlist(testoutput_ejamit_10pts_1miles$results_overall[ , ..names_d] )),3))

data.frame(value = eg, pctile = t(testoutput_ejamit_10pts_1miles$results_overall[ , ..names_d_pctile]))

data.frame(value = eg, pctile = lookup_pctile(eg, names_d))


  # compare ejscreen API output percentiles to those from this function:
  for (vname in c(names_d[c(1,3:6,8:10)] )) {
     print(pctile_from_raw_lookup(testoutput_ejscreenapi_plus_100[,vname] / 100, vname,
       lookup = usastats)
       - testoutput_ejscreenapi_plus_100[,paste0("pctile.",vname)] )
  }
  for (vname in c(names_e )) {
     print(pctile_from_raw_lookup(testoutput_ejscreenapi_plus_100[,vname], vname,
       lookup = usastats)
         - testoutput_ejscreenapi_plus_100[,paste0("pctile.",vname)] )
  }
} # }