Skip to contents
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) : 
#>   object 'type_sum.accel' not found

Introduction

In this vignette, the CTX Exposure API will be explored.

Data provided by the Exposure API are broadly organized in four different areas, Functional Use Information, Product Data, List Presence Data, and Exposure estimates. Data from the Functional Use, Product Data, and List Presence resources (aside from the Functional Use Probability endpoint) are developed from publicly available documents and are also accessible using the Chemical Exposure Knowledgebase (ChempExpo) interactive web application developed by the United States Environmental Protection Agency. The underlying database for the Functional Use, Product Data, and List Presence endpoints of the Exposure API and ChemExpo is the Chemicals and Products Database (CPDat). CPDat provides reported information on how chemicals are used in commerce and (where possible) at what quantities they occur in consumer and industrial products; see (Dionisio et al. 2018) for more information on CPDat. The data provided by the Functional Use Probability endpoint are predictions from EPA’s Quantitative Structure Use Relationship (QSUR) models (Phillips et al. 2017). Exposure data is represented by predictions from the httk R package, introduced in (Pearce, R. et al. 2017) and several exposure models including the SEEM models. Information on the SEEM2 model can be found at (Wambaugh, J. et al. 2014) and on the SEEM3 model can be found at (Ring, C. et al. 2018)

Product Data are organized by harmonized Product Use Categories (PUCs). The PUCs are assigned to products (which are associated with Composition Documents) and indicate the type of product associated to each data record. They are organized hierarchicially, with General Category containing Product Family, which in turn contains Product Type. The Exposure API also provide information on how the PUC was assigned. Do note that a natural language processing model is used to assign PUCs with the “classificationmethod” equal to “Automatic”. As such, these assignments are less certain and may contain inaccuracies. More information on PUC categories can be found in (Isaacs et al. 2020).

List Presence Data reflect the occurrence of chemicals on lists present in publicly available documents (sourced from a variety of federal and state agencies and trade associations). These lists are tagged with List Presence Keywords (LPKs) that together describe information contained in the document relevant to how the chemical was used. LPKs are an updated version of the cassettes provided in the Chemical and Product Categories (CPCat) database; see (Dionisio et al. 2015). For the most up to date information on the current LPKs and to see how the CPCat cassettes were updated, see (Koval et al. 2022).

Both reported and predicted Function Use Information is available. Reported functional use information is organized by harmonized Function Categories (FCs) that describe the role a chemical serves in a product or industrial process. The harmonized technical function categories and definitions were developed by the Organisation for Economic Co-operation and Development (OECD) (with the exception of a few categories unique to consumer products which are noted as being developed by EPA). These categories have been augmented with additional categories needed to describe chemicals in personal care, pharmaceutical, or other commercial sectors. The reported function data form the basis for ORD’s QSUR models (Phillips et al. 2016). These models provide the structure-based predictions of chemical function available in the Functional Use Probability endpoint. Note that these models were developed prior to the OECD function categories, so their function categories are not yet aligned with the harmonized categories used in the reported data. Updated models for the harmonized categories are under development.

The R package httk provides users with a variety of tools to incorporate toxickinetics and in vitro-in vivo extrapolation into bioinformatics and comes with pre-made models that can be used with specific chemical data. The SEEM models were developed to provide predictions for potential human exposure to chemicals with little or no exposure data. For SEEM2, Bayesian methods were used to infer ranges of exposure consistent with data from the National Health and Nutrition Examination Survey. Predictions for different demographic groups were made. For SEEM3, chemical exposures through four different pathways were predicted and in turn weighting of different models through these exposure pathways was conducted to produce consensus predictions.

Information for ChemExpo is sourced from: Sakshi Handa, Katherine A. Phillips, Kenta Baron-Furuyama, and Kristin K. Isaacs. 2023. “ChemExpo Knowledgebase User Guide”. https://comptox.epa.gov/chemexpo/static/user_guide/index.html.

NOTE: Please see the introductory vignette for an overview of the ctxR package and initial set up instruction with API key storage.

Several ctxR functions can be used to access the CTX Exposure API data, as described in the following sections. Tables output in each example have been filtered to only display the first few rows of data.

Functional Use Resource

Functional uses for chemicals may be searched.

Functional Use

get_exposure_functional_use() retrieves FCs and associated metadata for a specific chemical (by DTXSID).

exp_fun_use <- get_exposure_functional_use(DTXSID = 'DTXSID7020182')
id dtxsid datatype docid doctitle docdate reportedfunction functioncategory
22724 DTXSID7020182 Chemical presence list 1371471 The 25 Chemicals Found in All Nine of the Biosolids Studied fire retardant Flame retardant
22722 DTXSID7020182 Chemical presence list 1497376 A regional assessment of chemicals of concern in surface waters of four Midwestern United States national parks- Table 1 6 December 2016 plastic component NA
22728 DTXSID7020182 Composition 1389481 halcyon radiant barrier shades 2015-08-25 monomer, polycarbonate Monomers
22726 DTXSID7020182 Composition 1550827 Thin-Set_Epoxy_Terrazzo_Flooring-Master_Terrazzo_Technologies-2017-02-02 february 2, 2017 curing agent Hardener
22727 DTXSID7020182 Composition 1389695 kerapoxy 410- part b 2015-01-19 resin amide NA
22732 DTXSID7020182 Composition 1390773 universal litter and recycling receptacle - 30 gallon with 2016-03-15 monomer, polycarbonate Monomers

Functional Use Probability

get_exposure_functional_use_probability() retrieves the probability of functional use within different FCs for a given chemical (by DTXSID). Each value represents the probability of the chemical being classified as having this function, as predicted by the QSUR models.

exp_fun_use_prob <- get_exposure_functional_use_probability(DTXSID = 'DTXSID7020182')
harmonizedFunctionalUse probability
antimicrobial 0.3722
antioxidant 0.8941
catalyst 0.2031
colorant 0.1560
crosslinker 0.7743
flame_retardant 0.2208

Functional Use Categories

get_exposure_functional_use_categories() retrieves definitions of all the available FCs. This is not specific to a chemical, but rather a list of all FCs.

id title description
28 Coalescing agent Chemical substance used in polymer emulsions that lower the glass-transition temperature (Tg) which results in the decrease in the minimum film-forming temperature (MFT) and upon evaporation, yields a hard film. Used in polishes; e.g., glycol; ether; pyrrolidines; and benzoates. Also referred to as a minimum film-forming temperature (MFT) modifier.
29 Conductive agent Chemical substance used to conduct electrical current. Also referred to as an electrolyte; or electrode material.
30 Corrosion inhibitor Chemical substance used to prevent or retard corrosion on metallic materials. Used in many products packaged in metal containers (such as aerosol products). Used in lubricants and other metal treatment products to provide protection to the substrates or surfaces on which the lubricants are used. Also referred to as a corrosion‑inhibiting additive; rust preventative; anticorrosion agent; or antirust agent.
16 Anti-static agent Chemical substance that prevents or reduces the tendency of a material to accumulate a static charge or alters the electrical properties of materials by reducing their tendency to acquire an electrical charge. Used in diesel fuel to prevent the build-up of static electricity. Also referred to as a charge stabilizer.
17 Anti-streaking agent Chemical substance which serves to enhance evaporation or reduce film formation in order to prevent the formation of streaks on a surface during cleaning. Also referred to as a film reducer.
18 Binder Chemical substances that are either synthetic/polymeric resins that further polymerize, provide structure and cohesiveness or are substances added to compounded dry powders to provide adhesive qualities during and after compression to make tablets or cakes. Also referred to as a binding agent or resin.

Product Data Resource

There are a few resources for retrieving product use data associated with chemical identifiers (DTXSID) or general use.

Product Data

get_exposure_product_data() retrieves the product data (PUCs and related data) for products that use the specified chemical (by DTXSID).

exp_prod_dat <- get_exposure_product_data(DTXSID = 'DTXSID7020182')
id dtxsid docid doctitle docdate productname gencat prodfam prodtype classificationmethod rawmincomp rawmaxcomp rawcentralcomp unittype lowerweightfraction upperweightfraction centralweightfraction weightfractiontype component
589086 DTXSID7020182 1297201 EPOCAST 87005 B-60, FPC2172 08/07/1991 epocast 87005 b-60_ fpc2172 Raw materials adhesives Manual 5 percent NA NA 0.05 reported
657348 DTXSID7020182 1314861 EPOCAST 87005 B-80, FPC2248 08/04/1992 epocast 87005 b-80_ fpc2248 Raw materials adhesives Manual 2 percent NA NA 0.02 reported
192133 DTXSID7020182 1178630 EPOCAST HARDENER 946, FPC 5000 01/11/1990 epocast hardener 946_ fpc 5000 NA NA NA NA 45 percent NA NA 0.45 reported
655734 DTXSID7020182 1314342 EPOCAST HARDNER 946 10/04/1994 epocast hardner 946 NA NA NA NA NA NA NA NA reported
85935 DTXSID7020182 1135022 EPOLITE 1301 HARDENER(FORM. 916 AD HARDENER 01/01/1985 epolite 1301 hardener(form. 916 ad hardener NA NA NA NA NA NA NA NA reported
25284 DTXSID7020182 1106842 EPOLITE 1350 HANDENER 08/09/1993 epolite 1350 handener NA NA NA NA 35 50 percent 0.35 0.5 NA reported

Product Use Category Data

get_exposure_product_data_puc() retrieves the definitions of all the PUCs. This is not specific to a chemical, but rather a list of all PUCs.

exp_prod_data_puc <- get_exposure_product_data_puc()
id kindName genCat prodfam prodtype definition
45 Formulation Cleaning products and household care oven cleaning or other products used in or on ovens that do not fit in a more refined category
44 Formulation Cleaning products and household care metal specific cleaning or care products specific to metals, which do not fit into a more refined category
43 Formulation Cleaning products and household care laundry and fabric treatment Cleaning or care products for laundry and fabric treatment which do not fit into a more refined category
42 Formulation Cleaning products and household care laundry and fabric treatment anti-static spray anti-static sprays for fabrics (spray formulation assumed)
291 Formulation Cleaning products and household care bathroom urinal cakes and deodorizers Includes urinal cakes and screens for use in urinals, also includes deodorizers used in portable toilets
292 Formulation Cleaning products and household care jewelry jewelry cleaner Cleaning solutions specifically for jewelry products

httk data

There is a single resource that returns httk model data when available

bpa_httk <- get_httk_data(DTXSID = 'DTXSID7020182')
head(bpa_httk)
#>       id        dtxsid    parameter measuredText measured predictedText
#> 1 101171 DTXSID7020182          Css       0.0083   0.0083         1.114
#> 2 101172 DTXSID7020182          Css       0.0083   0.0083        0.5297
#> 3 101173 DTXSID7020182          Css       0.0083   0.0083         1.076
#> 4 101174 DTXSID7020182          Css       0.0083   0.0083        0.5116
#> 5 101175 DTXSID7020182 TK.Half.Life         0.19   0.1900         139.5
#> 6 101176 DTXSID7020182     Days.Css           NA       NA           112
#>   predicted units          model              reference percentile species
#> 1    1.1140  mg/L           PBTK Wambaugh et al. (2018)        95%     Rat
#> 2    0.5297  mg/L           PBTK Wambaugh et al. (2018)        50%     Rat
#> 3    1.0760  mg/L 3compartmentss Wambaugh et al. (2018)        95%     Rat
#> 4    0.5116  mg/L 3compartmentss Wambaugh et al. (2018)        50%     Rat
#> 5  139.5000 hours   1compartment Wambaugh et al. (2018)         NA     Rat
#> 6  112.0000  Days           PBTK                     NA         NA     Rat
#>   dataSourceSpecies dataVersion                  importDate
#> 1               Rat          NA 2024-06-13T16:53:14.622350Z
#> 2               Rat          NA 2024-06-13T16:53:14.622350Z
#> 3               Rat          NA 2024-06-13T16:53:14.622350Z
#> 4               Rat          NA 2024-06-13T16:53:14.622350Z
#> 5               Rat          NA 2024-06-13T16:53:14.622350Z
#> 6               Rat          NA 2024-06-13T16:53:14.622350Z

List Presence Resource

There are a few resources for retrieving list data for specific chemicals (by DTXSID) or general list presence information.

List Presence Tags

get_exposure_list_presence_tags() retrieves all the list presence keywords. This is not specific to a chemical, but rather a list of the the list presence keywords. Note that some List Presence Keywords align with PUCs, but the keywords are assigned to documents that refer to product category as a whole, while PUCs are assigned to documents referring to specific products (e.g., ingredient list).

id tagName tagDefinition kindName
52 detected chemicals measured or identified in environmental media or products Modifiers
53 drinking_water water intended for drinking, or related to drinking water; includes bottled water, finished water from drinking water treatment plants, and untreated water that has been denoted as a drinking source Media
54 Electronics/small appliances - computers and accessories/supplies - printer ink ink for inkjet printers PUC - formulation
25 Canada Sources specific to Canada Location
26 CEDI The U.S. FDA Cumulative Estimated Daily Intake database of publicly available cumulative estimated daily intakes (CEDIs) for a large number of food contact substances Specialty list
27 children pertaining to, or intended for use specifically by children Subpopulation

List Presence Tag Data

get_exposure_list_presence_tags_by_dtxsid() retrieves LPKs and associated data for a specific chemical (by DTXSID).

exp_list_tags_dat <- get_exposure_list_presence_tags_by_dtxsid(DTXSID = 'DTXSID7020182')
id dtxsid docid doctitle docsubtitle docdate organization reportedfunction functioncategory component keywordset
127967 DTXSID7020182 1557970 Experimental Small Molecule Drugs DrugBank NA NA Canada; pharmaceutical
9997 DTXSID7020182 1359540 Actively Registered AI’s by Common Name California Department of Pesticide Regulation NA NA active_ingredient; Pesticides
40538 DTXSID7020182 1372213 Indirect Additives used in Food Contact Substances FDA authorizes Indirect Food Additives by identity, intended use, and conditions of use; the presence of a substance in this list indicates that only certain intended uses and use conditions are authorized by FDA regulations 10/4/2018 FDA NA NA Indirect additives food contact (10/2018)
135524 DTXSID7020182 1558005 Chemicals of High Concern to Children 9/1/2020 Vermont Department of Health NA NA children
113048 DTXSID7020182 1551584 Chemicals of high concern to children reporting list State of Washington Department of Ecology NA NA children; WA Children’s Safe Product Act (4/2020)
76175 DTXSID7020182 1373540 Exposure of children and unborn children to selected chemical substances - Table 4.2.3 Table 4.2.3 Regulation in the food area Apr-17 Danish Environmental Protection Agency NA NA Europe; Food contact items

Exposure Predictions

There are two functions that provide access to exposure prediction data. The first provides general information on exposure pathways while the second provides exposure predictions from a variety of exposure models. The general information corresponds to SEEM3 predictions of exposure pathways, while the exposure predictions feature SEEM2 predictions broken down by demographic groups, general consensus predictions from SEEM3, and in some cases additional exposure predictions from other models

General Exposure Predictions

get_general_exposure_prediction() returns general exposure information for a given chemical.

bpa_general_exposure <- get_general_exposure_prediction(DTXSID = 'DTXSID7020182')
head(bpa_general_exposure)
#>           dtxsid productionVolume  units stockholmConvention probabilityDietary
#>           <char>            <int> <char>               <int>              <num>
#> 1: DTXSID7020182          2780000 kg/day                   0                  1
#> 5 variable(s) not shown: [probabilityResidential <num>, probabilityPesticde <num>, probabilityIndustrial <num>, dataVersion <lgcl>, importDate <char>]
Demographic Exposure Predictions

get_demographic_exposure_prediction() returns exposure prediction information split across different demographics for a given chemical.

bpa_demographic_exposure <- get_demographic_exposure_prediction(DTXSID = 'DTXSID7020182')
bpa_demographic_exposure
#>        id        dtxsid        demographic       predictor       median
#> 1  768361 DTXSID7020182              Total    Food.Contact 1.766000e-02
#> 2  769393 DTXSID7020182              Total            FINE 9.460000e-06
#> 3  772655 DTXSID7020182              Total          RAIDAR 3.770000e+00
#> 4  784083 DTXSID7020182              Total     USETox.Pest 5.624000e-02
#> 5  785935 DTXSID7020182              Total   USETox.Indust 1.372000e-04
#> 6  749502 DTXSID7020182            Age 66+ SEEM2 Heuristic 6.608350e-05
#> 7  751534 DTXSID7020182           BMI > 30 SEEM2 Heuristic 7.073042e-05
#> 8  760855 DTXSID7020182              Total  SHEDS.Indirect 7.150000e-05
#> 9  761591 DTXSID7020182              Total    SHEDS.Direct 0.000000e+00
#> 10 763267 DTXSID7020182          BMI <= 30 SEEM2 Heuristic 6.245051e-05
#> 11 488214 DTXSID7020182              Total SEEM3 Consensus 5.497000e-05
#> 12 797784 DTXSID7020182              Total      USETox.Res 4.395000e-02
#> 13 807431 DTXSID7020182              Total     USETox.Diet 1.498000e-04
#> 14 709226 DTXSID7020182              Males SEEM2 Heuristic 3.867956e-05
#> 15 735410 DTXSID7020182          Age 12-19 SEEM2 Heuristic 5.871957e-05
#> 16 697139 DTXSID7020182 Repro. Age Females SEEM2 Heuristic 1.364275e-05
#> 17 711258 DTXSID7020182            Females SEEM2 Heuristic 1.244431e-05
#> 18 737451 DTXSID7020182          Age 20-65 SEEM2 Heuristic 5.675943e-05
#> 19 723306 DTXSID7020182           Age 6-11 SEEM2 Heuristic 6.296203e-05
#>              medianText          l95              l95Text         u95
#> 1               0.01766           NA                   NA          NA
#> 2              9.46e-06           NA                   NA          NA
#> 3                  3.77           NA                   NA          NA
#> 4               0.05624           NA                   NA          NA
#> 5             0.0001372           NA                   NA          NA
#> 6  6.60834995383669e-05 2.798634e-07  2.7986341540408e-07 0.019477870
#> 7  7.07304192271297e-05 3.136219e-07 3.13621919723853e-07 0.018576052
#> 8              7.15e-05           NA                   NA          NA
#> 9                     0           NA                   NA          NA
#> 10  6.2450508333388e-05 2.591822e-07 2.59182177179327e-07 0.013621125
#> 11            5.497e-05 1.923000e-07            1.923e-07 0.020440000
#> 12              0.04395           NA                   NA          NA
#> 13            0.0001498           NA                   NA          NA
#> 14 3.86795578537834e-05 2.846711e-07 2.84671057884619e-07 0.006306170
#> 15 5.87195691748974e-05 2.809632e-07 2.80963221822448e-07 0.017185596
#> 16 1.36427543462443e-05 5.637240e-08 5.63723993835891e-08 0.004176617
#> 17 1.24443070751952e-05 4.901108e-08 4.90110833197268e-08 0.002897798
#> 18 5.67594250809775e-05 2.080289e-07 2.08028872989558e-07 0.011509267
#> 19 6.29620332442998e-05 3.049913e-07 3.04991342892185e-07 0.010537090
#>                u95Text           units ad      reference dataVersion
#> 1                   NA       mg/kg/day  1    Biryol 2017          NA
#> 2                   NA          mg/day  1      Shin 2012          NA
#> 3                   NA       mg/kg/day  1     Arnot 2008          NA
#> 4                   NA intake fraction  1    Fantke 2013          NA
#> 5                   NA intake fraction  1 Rosenbaum 2008          NA
#> 6   0.0194778699251516       mg/kg/day  1  Wambaugh 2014          NA
#> 7   0.0185760522525412       mg/kg/day  1  Wambaugh 2014          NA
#> 8                   NA       mg/kg/day  1    Isaacs 2017          NA
#> 9                   NA       mg/kg/day  1    Isaacs 2017          NA
#> 10  0.0136211249503816       mg/kg/day  1  Wambaugh 2014          NA
#> 11             0.02044       mg/kg/day  1      Ring 2018          NA
#> 12                  NA intake fraction  1     Huang 2016          NA
#> 13                  NA intake fraction  1  Ernstoff 2016          NA
#> 14 0.00630617035849566       mg/kg/day  1  Wambaugh 2014          NA
#> 15  0.0171855959252902       mg/kg/day  1  Wambaugh 2014          NA
#> 16 0.00417661734132225       mg/kg/day  1  Wambaugh 2014          NA
#> 17 0.00289779809405841       mg/kg/day  1  Wambaugh 2014          NA
#> 18  0.0115092672875229       mg/kg/day  1  Wambaugh 2014          NA
#> 19  0.0105370896882791       mg/kg/day  1  Wambaugh 2014          NA
#>                     importDate
#> 1  2024-06-13T19:25:16.277317Z
#> 2  2024-06-13T19:25:16.277317Z
#> 3  2024-06-13T19:25:16.277317Z
#> 4  2024-06-13T19:25:16.277317Z
#> 5  2024-06-13T19:25:16.277317Z
#> 6  2024-06-13T19:25:16.277317Z
#> 7  2024-06-13T19:25:16.277317Z
#> 8  2024-06-13T19:25:16.277317Z
#> 9  2024-06-13T19:25:16.277317Z
#> 10 2024-06-13T19:25:16.277317Z
#> 11 2024-06-13T19:25:16.277317Z
#> 12 2024-06-13T19:25:16.277317Z
#> 13 2024-06-13T19:25:16.277317Z
#> 14 2024-06-13T19:25:16.277317Z
#> 15 2024-06-13T19:25:16.277317Z
#> 16 2024-06-13T19:25:16.277317Z
#> 17 2024-06-13T19:25:16.277317Z
#> 18 2024-06-13T19:25:16.277317Z
#> 19 2024-06-13T19:25:16.277317Z

There are batch search versions for several endpoints that gather data specific to a chemical. Namely, get_exposure_functional_use_batch(), get_exposure_functional_use_probability(), get_exposure_product_data_batch(), get_exposure_list_presence_tags_by_dtxsid_batch(), get_general_exposure_prediction_batch(), and get_demographic_exposure_prediction_batch(). The function get_exposure_functional_use_probability() returns a data.table with each row corresponding to a unique chemical and each column representing a functional use category associated to at least one input chemical. The other batch functions return a named list of data.frames or data.tables, the names corresponding to the unique chemicals input and the data.frames or data.tables corresponding to the information to each individual chemical.

Functional Use Probability Batch

We demonstrate how the individual results differ from the batch results when retrieving functional use probabilities.

bpa_prob <- get_exposure_functional_use_probability(DTXSID = 'DTXSID7020182')
caf_prob <- get_exposure_functional_use_probability(DTXSID = 'DTXSID0020232')

bpa_caf_prob <- get_exposure_functional_use_probability_batch(DTXSID = c('DTXSID7020182', 'DTXSID0020232'))
#>    harmonizedFunctionalUse probability
#> 1            antimicrobial      0.3722
#> 2              antioxidant      0.8941
#> 3                 catalyst      0.2031
#> 4                 colorant      0.1560
#> 5              crosslinker      0.7743
#> 6          flame_retardant      0.2208
#> 7                flavorant      0.0314
#> 8                fragrance      0.2071
#> 9          heat_stabilizer      0.5119
#> 10        skin_conditioner      0.1168
#> 11         skin_protectant      0.3306
#> 12             uv_absorber      0.8046
#>   harmonizedFunctionalUse probability
#> 1           antimicrobial      0.4808
#> 2                  buffer      0.6370
#> 3                colorant      0.3962
#> 4        skin_conditioner      0.9821
#>           DTXSID antimicrobial antioxidant catalyst colorant crosslinker
#>           <char>         <num>       <num>    <num>    <num>       <num>
#> 1: DTXSID7020182        0.3722      0.8941   0.2031   0.1560      0.7743
#> 2: DTXSID0020232        0.4808          NA       NA   0.3962          NA
#> 8 variable(s) not shown: [flame_retardant <num>, flavorant <num>, fragrance <num>, heat_stabilizer <num>, skin_conditioner <num>, skin_protectant <num>, uv_absorber <num>, buffer <num>]

Observe that Caffeine only has probabilities assigned to four functional use categories while Bisphenol A has probabilities assigned to twelve categories. For single chemical search, functional use categories denote the row. However, when using the batch search function, all reported categories are included as columns, with rows corresponding to each chemical. If a chemical does not have a probability associated to a functional use, the corresponding entry is given by an NA.

Conclusion

There are several CTX Exposure API endpoints and ctxR contains functions for each, and batch versions for some of these as well. These allow users to access various types of exposure data associated to a given chemical. In this vignette, we explored all of the non-batch versions and discussed the batch versions. We encourage the user to experiment with the different endpoints to understand better what sorts of data are available.