Map BEIS and MEGAN species to CRACMM
author: Havala Pye
date: 2024-08-08
updated: Nash Skipper
date: 2024-08-09
updated: Michael Pye
date: 2025-02-27
Notebook Description
This Notebook identifies the CRACMM species for each BEIS/MEGAN species using the mapper. The cracmm_mapper function depends on rdkit .
Download Notebook
Click here to access the Jupyter Notebook file directly in GitHub where it can be downloaded.
Setup
import pandas as pd
import os
## Install rdkit if not already installed
# !python -m pip install --user rdkit
# to install in the current kernel:
# %pip install rdkit
# set location of mapper downloaded from https://github.com/USEPA/CRACMM/
# import sys
# utildir = '/path/to/cracmm/utilities/directory'
# sys.path.append(utildir)
# Import the python utilities
import cracmm1_mapper as cracmm1 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)
import cracmm2_mapper as cracmm2 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)
datadir = '../emissions/BiogenicMappings/' # data files of mappings
outputdir = os.path.join(os.getcwd(), 'output/')
pd.set_option('display.max_rows', None)
pd.options.mode.copy_on_write = True
csvout_kw = dict(sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)
BEIS
input beis mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings
filename = datadir + 'bvoc_beis_tocracmm.csv'
dfbeis = pd.read_csv(filename)
# for checking if any species mapping changed
orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2
dfbeis = dfbeis.rename(columns=dict(CRACMMorig=orig_map_colname))
# run cracmm2 mapper
smiles_k = 'SMILES'
koh_k = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'
cstar_k = 'log10Cstar_ugm3'
dfbeis['CRACMMnew'] = dfbeis.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)
# check if any species mappings changed
dfbeis_checkmatch = dfbeis.eval(f'match = {orig_map_colname}==CRACMMnew')
show_cols = ['SPECIES_NAME',orig_map_colname,'CRACMMnew']
if len(dfbeis_checkmatch[dfbeis_checkmatch.match==False])>0:
print(f'the species mappings below changed from {orig_map_colname}')
display(dfbeis_checkmatch[show_cols][dfbeis_checkmatch.match==False])
else:
print(f'all species matched {orig_map_colname} mapping')
# save output
#dfbeis = dfbeis.drop(columns=orig_map_colname)
dfbeis.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)
the species mappings below changed from CRACMM1
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.
warnings.warn(unkcracmm_msg)
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.
warnings.warn(unkcracmm_msg)
SPECIES_NAME | CRACMM1 | CRACMMnew | |
---|---|---|---|
1 | nitric oxide | UNKKOH | UNKCRACMM |
12 | para-cymene | ROCP6ARO | VROCP6ARO |
32 | carbon monoxide | SLOWROC | UNKCRACMM |
MEGAN
input megan mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings
filename = datadir + 'bvoc_megan_tocracmm.csv'
dfmegan = pd.read_csv(filename)
# for checking if any species mapping changed
orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2
dfmegan = dfmegan.rename(columns=dict(CRACMMorig=orig_map_colname))
# run cracmm2 mapper
smiles_k = 'SMILES'
koh_k = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'
cstar_k = 'log10Cstar_ugm3'
dfmegan['CRACMMnew'] = dfmegan.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)
# check if any species mappings changed
dfmegan_checkmatch = dfmegan.eval(f'match = {orig_map_colname}==CRACMMnew')
show_cols = ['REPRESENTATIVE_COMPOUND_NAME',orig_map_colname,'CRACMMnew']
if len(dfmegan_checkmatch[dfmegan_checkmatch.match==False])>0:
print(f'the species mappings below changed from {orig_map_colname}')
display(dfmegan_checkmatch[show_cols][dfmegan_checkmatch.match==False])
else:
print(f'all species matched {orig_map_colname} mapping')
# save output
#dfmegan = dfmegan.drop(columns=orig_map_colname)
dfmegan.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)
the species mappings below changed from CRACMM1
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES S is unknown in CRACMM and has been mapped to UNKCRACMM.
warnings.warn(unkcracmm_msg)
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.
warnings.warn(unkcracmm_msg)
/work/MOD3DEV/mpye/cracmm_sphinx/utilities/cracmm2_mapper.py:284: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.
warnings.warn(unkcracmm_msg)
REPRESENTATIVE_COMPOUND_NAME | CRACMM1 | CRACMMnew | |
---|---|---|---|
23 | 1-Isopropyl-2-methylbenzene | XYE | XYL |
24 | p-Cymene | ROCP6ARO | VROCP6ARO |
25 | m-Cymene | XYE | XYL |
26 | 1-Methyl-4-(prop-1-en-2-yl)benzene | XYM | XYL |
27 | dl-Borneol | ROCIOXY | VROCIOXY |
28 | Bornyl acetate | ROCIOXY | VROCIOXY |
30 | Estragole | ROCP6ARO | VROCP6ARO |
32 | beta-Ionone | ROCP6ARO | VROCP6ARO |
34 | 6,6-Dimethylbicyclo[3.1.1]heptane-2-carbaldehyde | ROCIOXY | VROCIOXY |
35 | 1-Octanol | ROCIOXY | VROCIOXY |
36 | 1-Octen-3-ol | ROCP6ARO | VROCP6ARO |
55 | Farnesol | ROCP5ARO | VROCP5ARO |
62 | cis-Nerolidol | ROCP5ARO | VROCP5ARO |
63 | trans-Nerolidol | ROCP5ARO | VROCP5ARO |
67 | 2-Ethylhexyl salicylate | ROCP2ALK | VROCP2OXY2 |
70 | (-)-alpha-Cadinol | ROCP5ARO | VROCP5ARO |
72 | (+)-Cedrol | ROCP5ALK | VROCP5OXY1 |
77 | 3,3,5-Trimethylcyclohexyl salicylate | ROCP1ALK | VROCP1OXY1 |
79 | (-)-Kaur-16-ene | ROCP5ARO | VROCP5ARO |
86 | (+)-Longicyclene | ROCP4ALK | VROCP4ALK |
103 | Decanal | ROCIOXY | VROCIOXY |
104 | (E)-6,10-Dimethylundeca-5,9-dien-2-one | ROCP6ARO | VROCP6ARO |
105 | (E)-6,10-Dimethylundeca-5,9-dien-2-one | ROCP6ARO | VROCP6ARO |
108 | 6,10-Dimethyl-5,9-undecadiene-2-one | ROCP6ARO | VROCP6ARO |
109 | Nonanal | ROCIOXY | VROCIOXY |
110 | 2-Nonenal | ROCP6ARO | VROCP6ARO |
126 | 7-heptadecene | ROCP5ARO | VROCP5ARO |
127 | Acetophenone | ROCP6ARO | VROCP6ARO |
128 | Anisole | XYM | XYL |
131 | Benzyl benzoate | ROCP5ARO | VROCP5ARO |
132 | Benzyl acetate | ROCP6ARO | VROCP6ARO |
138 | Cinnamic acid | ROCP2ALK | VROCP2OXY2 |
139 | Coniferyl alcohol | ROCP1ALK | VROCP1OXY3 |
142 | Ethyl cinnamate | ROCP5ARO | VROCP5ARO |
159 | Jasmone | ROCP6ARO | VROCP6ARO |
162 | Linalool oxide pyranoid, cis-(+-)- | ROCP5ARO | VROCP5ARO |
163 | Linalool oxide pyranoid, cis-(+-)- | ROCP5ARO | VROCP5ARO |
165 | Methyl benzoate | ROCP6ARO | VROCP6ARO |
166 | Methyl jasmonate | ROCP5ARO | VROCP5ARO |
171 | (2E)-3-(4-Hydroxyphenyl)-2-propenoic acid | ROCP1ALK | VROCP1OXY3 |
174 | Safrole | ROCP5ARO | VROCP5ARO |
178 | m-Xylene | XYM | XYL |
181 | (Z)-Hex-3-enyl butyrate | ROCP6ARO | VROCP6ARO |
187 | Diallyl disulfide | ROCP6ARO | VROCP6ARO |
190 | 1-Dodecene | ROCP6ARO | VROCP6ARO |
195 | Hydrogen sulfide | UNKKOH | UNKCRACMM |
196 | Indole | ROCP5ARO | VROCP5ARO |
204 | 3-Methylbut-3-en-1-ol | OLI | OLT |
207 | Allyl propyl disulfide | ROCP6ARO | VROCP6ARO |
209 | 3-Methylindole | ROCP5ARO | VROCP5ARO |
210 | alpha-Terpinyl acetate | ROCP6ARO | VROCP6ARO |
211 | alpha-Terpinyl acetate | ROCP6ARO | VROCP6ARO |
212 | 1-Tetradecene | ROCP5ARO | VROCP5ARO |
214 | Carbon monoxide | SLOWROC | UNKCRACMM |
215 | Nitric oxide | UNKKOH | UNKCRACMM |