{
"cells": [
{
"cell_type": "markdown",
"id": "0cfa8e4e-8935-4017-b8ff-965ba562f422",
"metadata": {},
"source": [
"# Map BEIS and MEGAN species to CRACMM\n",
"\n",
"---\n",
" author: Havala Pye\n",
" date: 2024-08-08\n",
"\n",
" updated: Nash Skipper\n",
" date: 2024-08-09\n",
"\n",
" updated: Michael Pye\n",
" date: 2025-02-27\n",
"---\n",
"## Notebook Description\n",
"This Notebook identifies the CRACMM species for each BEIS/MEGAN species using the mapper. The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/).\n",
"\n",
"## Download Notebook\n",
"Click [here](https://github.com/USEPA/CRACMM/blob/main/utilities/BEISMEGAN_biogenicmapping2cracmm.ipynb) to access the Jupyter Notebook file directly in GitHub where it can be downloaded. \n"
]
},
{
"cell_type": "markdown",
"id": "6f34a062-2b03-4a18-a0e1-fa37e26ec722",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84fc493b-a369-4729-8317-e9401f76a869",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f4562478-e1bf-4902-8297-5dfe9e5e0c18",
"metadata": {},
"outputs": [],
"source": [
"## Install rdkit if not already installed\n",
"\n",
"# !python -m pip install --user rdkit\n",
"\n",
"# to install in the current kernel:\n",
"# %pip install rdkit"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f83cc394-9140-4266-8347-cfd84cf04bdd",
"metadata": {},
"outputs": [],
"source": [
"# set location of mapper downloaded from https://github.com/USEPA/CRACMM/\n",
"# import sys\n",
"# utildir = '/path/to/cracmm/utilities/directory' \n",
"# sys.path.append(utildir)\n",
"\n",
"# Import the python utilities\n",
"import cracmm1_mapper as cracmm1 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)\n",
"import cracmm2_mapper as cracmm2 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "193f4b5f-54b3-4a84-ab36-9b99c6281f54",
"metadata": {},
"outputs": [],
"source": [
"datadir = '../emissions/BiogenicMappings/' # data files of mappings\n",
"outputdir = os.path.join(os.getcwd(), 'output/')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d3608277-b1f5-40ac-9783-f21bbd3078e7",
"metadata": {},
"outputs": [],
"source": [
"pd.set_option('display.max_rows', None)\n",
"pd.options.mode.copy_on_write = True\n",
"csvout_kw = dict(sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)"
]
},
{
"cell_type": "markdown",
"id": "ab6828cf-4cd7-46e5-a52c-e49581d560c2",
"metadata": {},
"source": [
"## BEIS\n",
"input beis mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1dd4414-363a-40c2-b575-a8560231b230",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the species mappings below changed from CRACMM2alpha\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.\n",
" warnings.warn(unkcracmm_msg)\n",
"/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.\n",
" warnings.warn(unkcracmm_msg)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" SPECIES_NAME | \n",
" CRACMM2alpha | \n",
" CRACMM2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 12 | \n",
" para-cymene | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 32 | \n",
" carbon monoxide | \n",
" SLOWROC | \n",
" UNKCRACMM | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" SPECIES_NAME CRACMM2alpha CRACMM2\n",
"12 para-cymene ROCP6ARO VROCP6ARO\n",
"32 carbon monoxide SLOWROC UNKCRACMM"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"filename = datadir + 'bvoc_beis_tocracmm.csv' \n",
"dfbeis = pd.read_csv(filename)\n",
"# for checking if any species mapping changed\n",
"orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2\n",
"dfbeis = dfbeis.rename(columns=dict(CRACMMorig=orig_map_colname))\n",
"\n",
"# run cracmm2 mapper\n",
"smiles_k = 'SMILES'\n",
"koh_k = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'\n",
"cstar_k = 'log10Cstar_ugm3'\n",
"dfbeis['CRACMMnew'] = dfbeis.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)\n",
"\n",
"# check if any species mappings changed\n",
"dfbeis_checkmatch = dfbeis.eval(f'match = {orig_map_colname}==CRACMMnew')\n",
"show_cols = ['SPECIES_NAME',orig_map_colname,'CRACMMnew']\n",
"if len(dfbeis_checkmatch[dfbeis_checkmatch.match==False])>0:\n",
" print(f'the species mappings below changed from {orig_map_colname}')\n",
" display(dfbeis_checkmatch[show_cols][dfbeis_checkmatch.match==False])\n",
"else:\n",
" print(f'all species matched {orig_map_colname} mapping')\n",
"\n",
"# save output\n",
"#dfbeis = dfbeis.drop(columns=orig_map_colname)\n",
"dfbeis.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)"
]
},
{
"cell_type": "markdown",
"id": "12d0c9c8-be4d-44ae-b20d-650f33207c71",
"metadata": {},
"source": [
"## MEGAN\n",
"input megan mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d69adc5d-bddb-478a-8af9-ad4dd4774dbe",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES S is unknown in CRACMM and has been mapped to UNKCRACMM.\n",
" warnings.warn(unkcracmm_msg)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"the species mappings below changed from CRACMM2alpha\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.\n",
" warnings.warn(unkcracmm_msg)\n",
"/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.\n",
" warnings.warn(unkcracmm_msg)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" REPRESENTATIVE_COMPOUND_NAME | \n",
" CRACMM2alpha | \n",
" CRACMM2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 24 | \n",
" p-Cymene | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 30 | \n",
" Estragole | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 32 | \n",
" beta-Ionone | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 36 | \n",
" 1-Octen-3-ol | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 55 | \n",
" Farnesol | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 62 | \n",
" cis-Nerolidol | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 63 | \n",
" trans-Nerolidol | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 67 | \n",
" 2-Ethylhexyl salicylate | \n",
" ROCP2OXY2 | \n",
" VROCP2OXY2 | \n",
"
\n",
" \n",
" 70 | \n",
" (-)-alpha-Cadinol | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 72 | \n",
" (+)-Cedrol | \n",
" ROCP5OXY1 | \n",
" VROCP5OXY1 | \n",
"
\n",
" \n",
" 77 | \n",
" 3,3,5-Trimethylcyclohexyl salicylate | \n",
" ROCP1OXY1 | \n",
" VROCP1OXY1 | \n",
"
\n",
" \n",
" 79 | \n",
" (-)-Kaur-16-ene | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 86 | \n",
" (+)-Longicyclene | \n",
" ROCP4ALK | \n",
" VROCP4ALK | \n",
"
\n",
" \n",
" 104 | \n",
" (E)-6,10-Dimethylundeca-5,9-dien-2-one | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 105 | \n",
" (E)-6,10-Dimethylundeca-5,9-dien-2-one | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 108 | \n",
" 6,10-Dimethyl-5,9-undecadiene-2-one | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 110 | \n",
" 2-Nonenal | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 126 | \n",
" 7-heptadecene | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 127 | \n",
" Acetophenone | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 131 | \n",
" Benzyl benzoate | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 132 | \n",
" Benzyl acetate | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 138 | \n",
" Cinnamic acid | \n",
" ROCP2OXY2 | \n",
" VROCP2OXY2 | \n",
"
\n",
" \n",
" 139 | \n",
" Coniferyl alcohol | \n",
" ROCP1OXY3 | \n",
" VROCP1OXY3 | \n",
"
\n",
" \n",
" 142 | \n",
" Ethyl cinnamate | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 159 | \n",
" Jasmone | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 162 | \n",
" Linalool oxide pyranoid, cis-(+-)- | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 163 | \n",
" Linalool oxide pyranoid, cis-(+-)- | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 165 | \n",
" Methyl benzoate | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 166 | \n",
" Methyl jasmonate | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 171 | \n",
" (2E)-3-(4-Hydroxyphenyl)-2-propenoic acid | \n",
" ROCP1OXY3 | \n",
" VROCP1OXY3 | \n",
"
\n",
" \n",
" 174 | \n",
" Safrole | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 181 | \n",
" (Z)-Hex-3-enyl butyrate | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 187 | \n",
" Diallyl disulfide | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 190 | \n",
" 1-Dodecene | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 196 | \n",
" Indole | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 207 | \n",
" Allyl propyl disulfide | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 209 | \n",
" 3-Methylindole | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 210 | \n",
" alpha-Terpinyl acetate | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 211 | \n",
" alpha-Terpinyl acetate | \n",
" ROCP6ARO | \n",
" VROCP6ARO | \n",
"
\n",
" \n",
" 212 | \n",
" 1-Tetradecene | \n",
" ROCP5ARO | \n",
" VROCP5ARO | \n",
"
\n",
" \n",
" 214 | \n",
" Carbon monoxide | \n",
" SLOWROC | \n",
" UNKCRACMM | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" REPRESENTATIVE_COMPOUND_NAME CRACMM2alpha CRACMM2\n",
"24 p-Cymene ROCP6ARO VROCP6ARO\n",
"30 Estragole ROCP6ARO VROCP6ARO\n",
"32 beta-Ionone ROCP6ARO VROCP6ARO\n",
"36 1-Octen-3-ol ROCP6ARO VROCP6ARO\n",
"55 Farnesol ROCP5ARO VROCP5ARO\n",
"62 cis-Nerolidol ROCP5ARO VROCP5ARO\n",
"63 trans-Nerolidol ROCP5ARO VROCP5ARO\n",
"67 2-Ethylhexyl salicylate ROCP2OXY2 VROCP2OXY2\n",
"70 (-)-alpha-Cadinol ROCP5ARO VROCP5ARO\n",
"72 (+)-Cedrol ROCP5OXY1 VROCP5OXY1\n",
"77 3,3,5-Trimethylcyclohexyl salicylate ROCP1OXY1 VROCP1OXY1\n",
"79 (-)-Kaur-16-ene ROCP5ARO VROCP5ARO\n",
"86 (+)-Longicyclene ROCP4ALK VROCP4ALK\n",
"104 (E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO\n",
"105 (E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO\n",
"108 6,10-Dimethyl-5,9-undecadiene-2-one ROCP6ARO VROCP6ARO\n",
"110 2-Nonenal ROCP6ARO VROCP6ARO\n",
"126 7-heptadecene ROCP5ARO VROCP5ARO\n",
"127 Acetophenone ROCP6ARO VROCP6ARO\n",
"131 Benzyl benzoate ROCP5ARO VROCP5ARO\n",
"132 Benzyl acetate ROCP6ARO VROCP6ARO\n",
"138 Cinnamic acid ROCP2OXY2 VROCP2OXY2\n",
"139 Coniferyl alcohol ROCP1OXY3 VROCP1OXY3\n",
"142 Ethyl cinnamate ROCP5ARO VROCP5ARO\n",
"159 Jasmone ROCP6ARO VROCP6ARO\n",
"162 Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO\n",
"163 Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO\n",
"165 Methyl benzoate ROCP6ARO VROCP6ARO\n",
"166 Methyl jasmonate ROCP5ARO VROCP5ARO\n",
"171 (2E)-3-(4-Hydroxyphenyl)-2-propenoic acid ROCP1OXY3 VROCP1OXY3\n",
"174 Safrole ROCP5ARO VROCP5ARO\n",
"181 (Z)-Hex-3-enyl butyrate ROCP6ARO VROCP6ARO\n",
"187 Diallyl disulfide ROCP6ARO VROCP6ARO\n",
"190 1-Dodecene ROCP6ARO VROCP6ARO\n",
"196 Indole ROCP5ARO VROCP5ARO\n",
"207 Allyl propyl disulfide ROCP6ARO VROCP6ARO\n",
"209 3-Methylindole ROCP5ARO VROCP5ARO\n",
"210 alpha-Terpinyl acetate ROCP6ARO VROCP6ARO\n",
"211 alpha-Terpinyl acetate ROCP6ARO VROCP6ARO\n",
"212 1-Tetradecene ROCP5ARO VROCP5ARO\n",
"214 Carbon monoxide SLOWROC UNKCRACMM"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"filename = datadir + 'bvoc_megan_tocracmm.csv'\n",
"dfmegan = pd.read_csv(filename)\n",
"# for checking if any species mapping changed\n",
"orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2\n",
"dfmegan = dfmegan.rename(columns=dict(CRACMMorig=orig_map_colname))\n",
"\n",
"# run cracmm2 mapper\n",
"smiles_k = 'SMILES'\n",
"koh_k = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'\n",
"cstar_k = 'log10Cstar_ugm3'\n",
"dfmegan['CRACMMnew'] = dfmegan.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)\n",
"\n",
"# check if any species mappings changed\n",
"dfmegan_checkmatch = dfmegan.eval(f'match = {orig_map_colname}==CRACMMnew')\n",
"show_cols = ['REPRESENTATIVE_COMPOUND_NAME',orig_map_colname,'CRACMMnew']\n",
"if len(dfmegan_checkmatch[dfmegan_checkmatch.match==False])>0:\n",
" print(f'the species mappings below changed from {orig_map_colname}')\n",
" display(dfmegan_checkmatch[show_cols][dfmegan_checkmatch.match==False])\n",
"else:\n",
" print(f'all species matched {orig_map_colname} mapping')\n",
"\n",
"# save output\n",
"#dfmegan = dfmegan.drop(columns=orig_map_colname)\n",
"dfmegan.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "xarray_env_kernel",
"language": "python",
"name": "xarray_env_kernel"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 5
}