Map BEIS and MEGAN species to CRACMM


author: Havala Pye
date: 2024-08-08

updated: Nash Skipper
date: 2024-08-09

updated: Michael Pye
date: 2025-02-27

Notebook Description

This Notebook identifies the CRACMM species for each BEIS/MEGAN species using the mapper. The cracmm_mapper function depends on rdkit .

Download Notebook

Click here to access the Jupyter Notebook file directly in GitHub where it can be downloaded.

Setup

import pandas as pd
import os
## Install rdkit if not already installed

# !python -m pip install --user rdkit

# to install in the current kernel:
# %pip install rdkit
# set location of mapper downloaded from https://github.com/USEPA/CRACMM/
# import sys
# utildir = '/path/to/cracmm/utilities/directory'   
# sys.path.append(utildir)

# Import the python utilities
import cracmm1_mapper as cracmm1   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)
import cracmm2_mapper as cracmm2   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)
datadir = '../emissions/BiogenicMappings/'    # data files of mappings
outputdir = os.path.join(os.getcwd(), 'output/')
pd.set_option('display.max_rows', None)
pd.options.mode.copy_on_write = True
csvout_kw = dict(sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)

BEIS

input beis mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings

filename = datadir + 'bvoc_beis_tocracmm.csv' 
dfbeis = pd.read_csv(filename)
# for checking if any species mapping changed
orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2
dfbeis = dfbeis.rename(columns=dict(CRACMMorig=orig_map_colname))

# run cracmm2 mapper
smiles_k = 'SMILES'
koh_k    = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'
cstar_k  = 'log10Cstar_ugm3'
dfbeis['CRACMMnew'] = dfbeis.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)

# check if any species mappings changed
dfbeis_checkmatch = dfbeis.eval(f'match = {orig_map_colname}==CRACMMnew')
show_cols = ['SPECIES_NAME',orig_map_colname,'CRACMMnew']
if len(dfbeis_checkmatch[dfbeis_checkmatch.match==False])>0:
    print(f'the species mappings below changed from {orig_map_colname}')
    display(dfbeis_checkmatch[show_cols][dfbeis_checkmatch.match==False])
else:
    print(f'all species matched {orig_map_colname} mapping')

# save output
#dfbeis = dfbeis.drop(columns=orig_map_colname)
dfbeis.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)
the species mappings below changed from CRACMM1
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.
  warnings.warn(unkcracmm_msg)
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.
  warnings.warn(unkcracmm_msg)
SPECIES_NAME CRACMM1 CRACMMnew
1 nitric oxide UNKKOH UNKCRACMM
12 para-cymene ROCP6ARO VROCP6ARO
32 carbon monoxide SLOWROC UNKCRACMM

MEGAN

input megan mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings

filename = datadir + 'bvoc_megan_tocracmm.csv'
dfmegan = pd.read_csv(filename)
# for checking if any species mapping changed
orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2
dfmegan = dfmegan.rename(columns=dict(CRACMMorig=orig_map_colname))

# run cracmm2 mapper
smiles_k = 'SMILES'
koh_k    = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'
cstar_k  = 'log10Cstar_ugm3'
dfmegan['CRACMMnew'] = dfmegan.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)

# check if any species mappings changed
dfmegan_checkmatch = dfmegan.eval(f'match = {orig_map_colname}==CRACMMnew')
show_cols = ['REPRESENTATIVE_COMPOUND_NAME',orig_map_colname,'CRACMMnew']
if len(dfmegan_checkmatch[dfmegan_checkmatch.match==False])>0:
    print(f'the species mappings below changed from {orig_map_colname}')
    display(dfmegan_checkmatch[show_cols][dfmegan_checkmatch.match==False])
else:
    print(f'all species matched {orig_map_colname} mapping')

# save output
#dfmegan = dfmegan.drop(columns=orig_map_colname)
dfmegan.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)
the species mappings below changed from CRACMM1
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES S is unknown in CRACMM and has been mapped to UNKCRACMM.
  warnings.warn(unkcracmm_msg)
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.
  warnings.warn(unkcracmm_msg)
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.
  warnings.warn(unkcracmm_msg)
REPRESENTATIVE_COMPOUND_NAME CRACMM1 CRACMMnew
23 1-Isopropyl-2-methylbenzene XYE XYL
24 p-Cymene ROCP6ARO VROCP6ARO
25 m-Cymene XYE XYL
26 1-Methyl-4-(prop-1-en-2-yl)benzene XYM XYL
27 dl-Borneol ROCIOXY VROCIOXY
28 Bornyl acetate ROCIOXY VROCIOXY
30 Estragole ROCP6ARO VROCP6ARO
32 beta-Ionone ROCP6ARO VROCP6ARO
34 6,6-Dimethylbicyclo[3.1.1]heptane-2-carbaldehyde ROCIOXY VROCIOXY
35 1-Octanol ROCIOXY VROCIOXY
36 1-Octen-3-ol ROCP6ARO VROCP6ARO
55 Farnesol ROCP5ARO VROCP5ARO
62 cis-Nerolidol ROCP5ARO VROCP5ARO
63 trans-Nerolidol ROCP5ARO VROCP5ARO
67 2-Ethylhexyl salicylate ROCP2ALK VROCP2OXY2
70 (-)-alpha-Cadinol ROCP5ARO VROCP5ARO
72 (+)-Cedrol ROCP5ALK VROCP5OXY1
77 3,3,5-Trimethylcyclohexyl salicylate ROCP1ALK VROCP1OXY1
79 (-)-Kaur-16-ene ROCP5ARO VROCP5ARO
86 (+)-Longicyclene ROCP4ALK VROCP4ALK
103 Decanal ROCIOXY VROCIOXY
104 (E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO
105 (E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO
108 6,10-Dimethyl-5,9-undecadiene-2-one ROCP6ARO VROCP6ARO
109 Nonanal ROCIOXY VROCIOXY
110 2-Nonenal ROCP6ARO VROCP6ARO
126 7-heptadecene ROCP5ARO VROCP5ARO
127 Acetophenone ROCP6ARO VROCP6ARO
128 Anisole XYM XYL
131 Benzyl benzoate ROCP5ARO VROCP5ARO
132 Benzyl acetate ROCP6ARO VROCP6ARO
138 Cinnamic acid ROCP2ALK VROCP2OXY2
139 Coniferyl alcohol ROCP1ALK VROCP1OXY3
142 Ethyl cinnamate ROCP5ARO VROCP5ARO
159 Jasmone ROCP6ARO VROCP6ARO
162 Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO
163 Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO
165 Methyl benzoate ROCP6ARO VROCP6ARO
166 Methyl jasmonate ROCP5ARO VROCP5ARO
171 (2E)-3-(4-Hydroxyphenyl)-2-propenoic acid ROCP1ALK VROCP1OXY3
174 Safrole ROCP5ARO VROCP5ARO
178 m-Xylene XYM XYL
181 (Z)-Hex-3-enyl butyrate ROCP6ARO VROCP6ARO
187 Diallyl disulfide ROCP6ARO VROCP6ARO
190 1-Dodecene ROCP6ARO VROCP6ARO
195 Hydrogen sulfide UNKKOH UNKCRACMM
196 Indole ROCP5ARO VROCP5ARO
204 3-Methylbut-3-en-1-ol OLI OLT
207 Allyl propyl disulfide ROCP6ARO VROCP6ARO
209 3-Methylindole ROCP5ARO VROCP5ARO
210 alpha-Terpinyl acetate ROCP6ARO VROCP6ARO
211 alpha-Terpinyl acetate ROCP6ARO VROCP6ARO
212 1-Tetradecene ROCP5ARO VROCP5ARO
214 Carbon monoxide SLOWROC UNKCRACMM
215 Nitric oxide UNKKOH UNKCRACMM