Skip to contents

Introduction

In this vignette, the CTX Exposure API will be explored.

NOTE: Please see the introductory vignette for an overview of the ctxR package and initial set up instruction with API key storage.

Data provided by the Exposure API are broadly organized in three different areas, Functional Use Information, Product Data, and List Presence Data. These data (except for the Functional Use Probability endpoint) are developed from publicly available documents and are also accessible using the Chemical Exposure Knowledgebase (ChempExpo) interactive web application developed by the United States Environmental Protection Agency. The underlying database for both the Exposure API and ChemExpo is the Chemicals and Products Database (CPDat). CPDat provides reported information on how chemicals are used in commerce and (where possible) at what quantities they occur in consumer and industrial products; see Dionisio et al. (2018) for more information on CPDat. The data provided by the Functional Use Probability endpoint are predictions from EPA’s Quantitative Structure Use Relationship (QSUR) models Phillips et al. (2017).

Product Data are organized by harmonized Product Use Categories (PUCs). The PUCs are assigned to products (which are associated with Composition Documents) and indicate the type of product associated to each data record. They are organized hierarchicially, with General Category containing Product Family, which in turn contains Product Type. The Exposure API also provide information on how the PUC was assigned. Do note that a Machine Learning model is used to assign PUCs with the “classificationmethod” equal to “Automatic”. As such, these assignments may be incorrect. More information on PUC categories can be found in Isaacs et al. (2020).

List Presence Data reflect the occurrence of chemicals on lists present in publicly available documents (sourced from a variety of federal and state agencies and trade associations). These lists are tagged with List Presence Keywords (LPKs) that together describe information contained in the document relevant to how the chemical was used. LPKs are an updated version of the cassettes provided in the Chemical and Product Categories (CPCat) database; see Dionisio et al. (2015). For the most up to date information on the current LPKs and to see how the CPCat cassettes were updated, see Koval et al. (2022).

Both reported and predicted Function Use Information is available. Reported functional use information is organized by harmonized Function Categories (FCs) that describe the role a chemical serves in a product or industrial process. The harmonized technical function categories and definitions were developed by the Organization for Economic Co-operation and Development (OECD) (with the exception of a few categories unique to consumer products which are noted as being developed by EPA). These categories have been augmented with additional categories needed to describe chemicals in personal care, pharmaceutical, or other commercial sectors. The reported function data form the basis for ORD’s QSUR models (Phillips et al. (2016)). These models provide the structure-based predictions of chemical function available in the Functional Use Probability endpoint. Note that these models were developed prior to the OECD function categories, so their function categories are not yet aligned with the harmonized categories used in the reported data. Updated models for the harmonized categories are under development.

Information for ChemExpo is sourced from

Sakshi Handa, Katherine A. Phillips, Kenta Baron-Furuyama, and Kristin K. Isaacs. 2023. “ChemExpo Knowledgebase User Guide”. https://comptox.epa.gov/chemexpo/static/user_guide/index.html.

Functions

Several ctxR functions are used to access the CTX Exposure API data.

Functional Use Resource

Functional uses for chemicals may be searched.

Exposure Functional Use

get_exposure_functional_use() retrieves FCs and associated exposure data for a specific chemical (by DTXSID).

exp_fun_use <- get_exposure_functional_use(DTXSID = 'DTXSID7020182')
head(data.table::as.data.table(exp_fun_use))
#>       id        dtxsid               datatype   docid
#>    <int>        <char>                 <char>   <int>
#> 1: 22724 DTXSID7020182 Chemical presence list 1371471
#> 2: 22722 DTXSID7020182 Chemical presence list 1497376
#> 3: 22728 DTXSID7020182            Composition 1389481
#> 4: 22726 DTXSID7020182            Composition 1550827
#> 5: 22727 DTXSID7020182            Composition 1389695
#> 6: 22732 DTXSID7020182            Composition 1390773
#> 4 variable(s) not shown: [doctitle <char>, docdate <char>, reportedfunction <char>, functioncategory <char>]

Exposure Functional Use Probability

get_exposure_functional_use_probability() retrieves the probability of functional use within different FCs for a given chemical (by DTXSID). Note, this is not probability of how the chemical is used across all categories but rather the probability within each FC that the chemical is used.

exp_fun_use_prob <- get_exposure_functional_use_probability(DTXSID = 'DTXSID7020182')
exp_fun_use_prob
#>    harmonizedFunctionalUse probability
#> 1            antimicrobial      0.3722
#> 2              antioxidant      0.8941
#> 3                 catalyst      0.2031
#> 4                 colorant      0.1560
#> 5              crosslinker      0.7743
#> 6          flame_retardant      0.2208
#> 7                flavorant      0.0314
#> 8                fragrance      0.2071
#> 9          heat_stabilizer      0.5119
#> 10        skin_conditioner      0.1168
#> 11         skin_protectant      0.3306
#> 12             uv_absorber      0.8046

Exposure Functional Use Categories

get_exposure_functional_use_categories() retrieves all the FCs. This is not specific to a chemical, but rather a list of all FCs.

exp_fun_use_cat <- get_exposure_functional_use_category()
head(data.table::as.data.table(exp_fun_use_cat))
#>       id                title               description
#>    <int>               <char>                    <char>
#> 1:    28     Coalescing agent Chemical substance used i
#> 2:    29     Conductive agent Chemical substance used t
#> 3:    30  Corrosion inhibitor Chemical substance used t
#> 4:    16    Anti-static agent Chemical substance that p
#> 5:    17 Anti-streaking agent Chemical substance which 
#> 6:    18               Binder Chemical substances that

Product Data Resource

There are a few resources for retrieving product use data associated with chemical identifiers (DTXSID) or general use.

Exposure Product Data

get_exposure_product_data() retrieves the product data (PUCs and related data) for products that use the specified chemical (by DTXSID).

exp_prod_dat <- get_exposure_product_data(DTXSID = 'DTXSID7020182')
head(data.table::as.data.table(exp_prod_dat))
#>        id        dtxsid   docid                  doctitle     docdate
#>     <int>        <char>   <int>                    <char>      <char>
#> 1: 589086 DTXSID7020182 1297201 EPOCAST 87005 B-60, FPC21  08/07/1991
#> 2: 657348 DTXSID7020182 1314861 EPOCAST 87005 B-80, FPC22  08/04/1992
#> 3: 192133 DTXSID7020182 1178630 EPOCAST HARDENER 946, FPC  01/11/1990
#> 4: 655734 DTXSID7020182 1314342       EPOCAST HARDNER 946  10/04/1994
#> 5:  85935 DTXSID7020182 1135022 EPOLITE 1301 HARDENER(FOR  01/01/1985
#> 6:  25284 DTXSID7020182 1106842     EPOLITE 1350 HANDENER  08/09/1993
#> 14 variable(s) not shown: [productname <char>, gencat <char>, prodfam <char>, prodtype <char>, classificationmethod <char>, rawmincomp <char>, rawmaxcomp <char>, rawcentralcomp <char>, unittype <char>, lowerweightfraction <num>, ...]

Exposure Product Use Category Data

get_exposure_product_data_puc() retrieves the PUCs. This is not specific to a chemical, but rather a list of all PUCs.

exp_prod_data_puc <- get_exposure_product_data_puc()
head(data.table::as.data.table(exp_prod_data_puc))
#>       id    kindName                    genCat                   prodfam
#>    <int>      <char>                    <char>                    <char>
#> 1:    45 Formulation Cleaning products and hou                      oven
#> 2:    44 Formulation Cleaning products and hou            metal specific
#> 3:    43 Formulation Cleaning products and hou laundry and fabric treatm
#> 4:    42 Formulation Cleaning products and hou laundry and fabric treatm
#> 5:   291 Formulation Cleaning products and hou                  bathroom
#> 6:   292 Formulation Cleaning products and hou                   jewelry
#> 2 variable(s) not shown: [prodtype <char>, definition <char>]

List Presence Resource

There are a few resources for retrieving list data for specific chemicals (by DTXSID) or general list presence information.

List Presence Tags

get_exposure_list_presence_tags() retrieves all the list presence tag information (including LPKs). This is not specific to a chemical, but rather a list of the the list presence tags.

exp_list_tags <- get_exposure_list_presence_tags()
head(data.table::as.data.table(exp_list_tags))
#>       id                   tagName             tagDefinition
#>    <int>                    <char>                    <char>
#> 1:    52                  detected chemicals measured or ide
#> 2:    53            drinking_water water intended for drinki
#> 3:    54 Electronics/small applian   ink for inkjet printers
#> 4:    25                    Canada Sources specific to Canad
#> 5:    26                      CEDI The U.S. FDA Cumulative E
#> 6:    27                  children pertaining to, or intende
#> 1 variable(s) not shown: [kindName <char>]

List Presence Tag Data

get_exposure_list_presence_tags_by_dtxsid() retrieves LPKs and associated data for a specific chemical (by DTXSID).

exp_list_tags_dat <- get_exposure_list_presence_tags_by_dtxsid(DTXSID = 'DTXSID7020182')
head(data.table::as.data.table(exp_list_tags_dat))
#>        id        dtxsid   docid                  doctitle
#>     <int>        <char>   <int>                    <char>
#> 1: 127967 DTXSID7020182 1557970 Experimental Small Molecu
#> 2:   9997 DTXSID7020182 1359540 Actively Registered AI's 
#> 3:  40538 DTXSID7020182 1372213 Indirect Additives used i
#> 4: 135524 DTXSID7020182 1558005 Chemicals of High Concern
#> 5: 113048 DTXSID7020182 1551584 Chemicals of high concern
#> 6:  76175 DTXSID7020182 1373540 Exposure of children and 
#> 7 variable(s) not shown: [docsubtitle <char>, docdate <char>, organization <char>, reportedfunction <char>, functioncategory <char>, component <char>, keywordset <char>]

There are batch search versions for several endpoints that gather data specific to a chemical. Namely, get_exposure_functional_use_batch(), get_exposure_functional_use_probability(), get_exposure_product_data_batch(), and get_exposure_list_presence_tags_by_dtxsid_batch(). The function get_exposure_functional_use_probability() returns a data.table with each row corresponding to a unique chemical and each column representing a functional use category associated to at least one input chemical. The other three batch functions return a named list of data.frames, the names corresponding to the unique chemicals input and the data.frames corresponding to the information to each individual chemical.

Functional use probability batch

We demonstrate how the individual results differ from the batch results when retrieving functional use probabilities.

bpa_prob <- get_exposure_functional_use_probability(DTXSID = 'DTXSID7020182')
caf_prob <- get_exposure_functional_use_probability(DTXSID = 'DTXSID0020232')

bpa_caf_prob <- get_exposure_functional_use_probability_batch(DTXSID = c('DTXSID7020182', 'DTXSID0020232'))
#>    harmonizedFunctionalUse probability
#> 1            antimicrobial      0.3722
#> 2              antioxidant      0.8941
#> 3                 catalyst      0.2031
#> 4                 colorant      0.1560
#> 5              crosslinker      0.7743
#> 6          flame_retardant      0.2208
#> 7                flavorant      0.0314
#> 8                fragrance      0.2071
#> 9          heat_stabilizer      0.5119
#> 10        skin_conditioner      0.1168
#> 11         skin_protectant      0.3306
#> 12             uv_absorber      0.8046
#>   harmonizedFunctionalUse probability
#> 1           antimicrobial      0.4808
#> 2                  buffer      0.6370
#> 3                colorant      0.3962
#> 4        skin_conditioner      0.9821
#>           DTXSID antimicrobial antioxidant catalyst colorant crosslinker
#>           <char>         <num>       <num>    <num>    <num>       <num>
#> 1: DTXSID7020182        0.3722      0.8941   0.2031   0.1560      0.7743
#> 2: DTXSID0020232        0.4808          NA       NA   0.3962          NA
#> 8 variable(s) not shown: [flame_retardant <num>, flavorant <num>, fragrance <num>, heat_stabilizer <num>, skin_conditioner <num>, skin_protectant <num>, uv_absorber <num>, buffer <num>]

Observe that Caffeine only has probabilities assigned to four functional use categories while Bisphenol A has probabilities assigned to twelve categories. For single chemical search, functional use categories denote the row. However, when using the batch search function, all reported categories are included as columns, with rows corresponding to each chemical. If a chemical does not have a probability associated to a functional use, the corresponding entry is given by an NA.

Conclusion

There are several CTX Exposure API endpoints and ctxR contains functions for each, and batch versions for some of these as well. These allow users to access various types of exposure data associated to a given chemical. In this vignette, we explored all of the non-batch versions and discussed the batch versions. We encourage the user to experiment with the different endpoints to understand better what sorts of data are available.