Skip to contents

This is the format with one row per site-NAICS pair, so multiple rows for one site if it is in multiple NAICS. @details This file is not stored in the package, but is obtained via dataload_from_pins().

The EPA also provides a FRS Facility Industrial Classification Search tool where you can find facilities based on NAICS or SIC.

MOST SITES LACK NAICS INFO IN FRS! NAICS is missing for about 80 percent of these facilities.

frs here had about 2.5 million unique REGISTRY_ID values, but

frs_by_naics had only about 700k rows

about 562,000 unique REGISTRY_ID values with

about 2,900 unique NAICS codes.

length(unique(frs_by_naics$REGISTRY_ID))

length(unique(frs_by_naics[,REGISTRY_ID]))

length(frs_by_naics[, unique(REGISTRY_ID)])

frs_by_naics[,uniqueN(REGISTRY_ID)]

   e.g., 573,411 in mid 2024



    lat       lon  REGISTRY_ID  NAICS

1: 34.04722 -81.15136 110000854246 325211

2: 34.04722 -81.15136 110000854246 325220

3: 34.04722 -81.15136 110000854246 325222

See also

frs frs_from_naics() naics_categories() frs_by_programid and see naics_from_any in EJAM pkg.

Examples

 # NAICS is missing for about 80 percent of facilities
 `frs[ NAICS == "", .N] / frs[,.N] `
 # only about 562k facilities have some NAICS info
 `frs[ NAICS != "", .N]`
 `frs_by_naics[, uniqueN(REGISTRY_ID)]` # almost exactly matches the above
 
 dim(frs_by_naics) 
 # about 680k rows here, or pairs of 1 NAICS - 1 registry ID pair,
 #  since some IDs have 2 or more NAICS so appear as 2 or more rows here.
 
 # About 2,900 different NAICS codes appear here:
 `frs_by_naics[,  uniqueN(NAICS)]`
 `frs_by_naics[, .(sum(.N > 1)), by=NAICS][,sum(V1)]`
   #  2,457 NAICS codes are used to describe more than one Registry ID
  `frs_by_naics[, .(sum(.N == 1)), by=NAICS][,sum(V1)]`
   # [1] 425 NAICS codes appear only once, i.e., apply to only a single facility!
   
 # Which 2-digit NAICS are found here most often?
 `frs_by_naics[ , .N, keyby=substr(NAICS,1,2)]`
 `frs_by_naics[ , .N,   by=substr(NAICS,1,2)][order(N),]` # Most common is 33
 # Top 10 most common 3-digit NAICS here:
 `x = tail(frs_by_naics[ , .N,   by=.(n3 = substr(NAICS,1,3))][order(N), ],10)`
 `cbind(x, industry = rownames(naics_categories(3))[match(x$n3, naics_categories(3))])`