{ "cells": [ { "cell_type": "markdown", "id": "0cfa8e4e-8935-4017-b8ff-965ba562f422", "metadata": {}, "source": [ "# Map BEIS and MEGAN species to CRACMM\n", "\n", "---\n", " author: Havala Pye\n", " date: 2024-08-08\n", "\n", " updated: Nash Skipper\n", " date: 2024-08-09\n", "\n", " updated: Michael Pye\n", " date: 2025-02-27\n", "---\n", "## Notebook Description\n", "This Notebook identifies the CRACMM species for each BEIS/MEGAN species using the mapper. The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/).\n", "\n", "## Download Notebook\n", "Click [here](https://github.com/USEPA/CRACMM/blob/main/utilities/BEISMEGAN_biogenicmapping2cracmm.ipynb) to access the Jupyter Notebook file directly in GitHub where it can be downloaded. \n" ] }, { "cell_type": "markdown", "id": "6f34a062-2b03-4a18-a0e1-fa37e26ec722", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "id": "84fc493b-a369-4729-8317-e9401f76a869", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import os" ] }, { "cell_type": "code", "execution_count": 2, "id": "f4562478-e1bf-4902-8297-5dfe9e5e0c18", "metadata": {}, "outputs": [], "source": [ "## Install rdkit if not already installed\n", "\n", "# !python -m pip install --user rdkit\n", "\n", "# to install in the current kernel:\n", "# %pip install rdkit" ] }, { "cell_type": "code", "execution_count": null, "id": "f83cc394-9140-4266-8347-cfd84cf04bdd", "metadata": {}, "outputs": [], "source": [ "# set location of mapper downloaded from https://github.com/USEPA/CRACMM/\n", "# import sys\n", "# utildir = '/path/to/cracmm/utilities/directory' \n", "# sys.path.append(utildir)\n", "\n", "# Import the python utilities\n", "import cracmm1_mapper as cracmm1 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)\n", "import cracmm2_mapper as cracmm2 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)" ] }, { "cell_type": "code", "execution_count": null, "id": "193f4b5f-54b3-4a84-ab36-9b99c6281f54", "metadata": {}, "outputs": [], "source": [ "datadir = '../emissions/BiogenicMappings/' # data files of mappings\n", "outputdir = os.path.join(os.getcwd(), 'output/')" ] }, { "cell_type": "code", "execution_count": 5, "id": "d3608277-b1f5-40ac-9783-f21bbd3078e7", "metadata": {}, "outputs": [], "source": [ "pd.set_option('display.max_rows', None)\n", "pd.options.mode.copy_on_write = True\n", "csvout_kw = dict(sep=',', na_rep='', float_format=None, columns=None, header=True, index=False)" ] }, { "cell_type": "markdown", "id": "ab6828cf-4cd7-46e5-a52c-e49581d560c2", "metadata": {}, "source": [ "## BEIS\n", "input beis mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings" ] }, { "cell_type": "code", "execution_count": null, "id": "c1dd4414-363a-40c2-b575-a8560231b230", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the species mappings below changed from CRACMM2alpha\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.\n", " warnings.warn(unkcracmm_msg)\n", "/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.\n", " warnings.warn(unkcracmm_msg)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SPECIES_NAMECRACMM2alphaCRACMM2
12para-cymeneROCP6AROVROCP6ARO
32carbon monoxideSLOWROCUNKCRACMM
\n", "
" ], "text/plain": [ " SPECIES_NAME CRACMM2alpha CRACMM2\n", "12 para-cymene ROCP6ARO VROCP6ARO\n", "32 carbon monoxide SLOWROC UNKCRACMM" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "filename = datadir + 'bvoc_beis_tocracmm.csv' \n", "dfbeis = pd.read_csv(filename)\n", "# for checking if any species mapping changed\n", "orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2\n", "dfbeis = dfbeis.rename(columns=dict(CRACMMorig=orig_map_colname))\n", "\n", "# run cracmm2 mapper\n", "smiles_k = 'SMILES'\n", "koh_k = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'\n", "cstar_k = 'log10Cstar_ugm3'\n", "dfbeis['CRACMMnew'] = dfbeis.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)\n", "\n", "# check if any species mappings changed\n", "dfbeis_checkmatch = dfbeis.eval(f'match = {orig_map_colname}==CRACMMnew')\n", "show_cols = ['SPECIES_NAME',orig_map_colname,'CRACMMnew']\n", "if len(dfbeis_checkmatch[dfbeis_checkmatch.match==False])>0:\n", " print(f'the species mappings below changed from {orig_map_colname}')\n", " display(dfbeis_checkmatch[show_cols][dfbeis_checkmatch.match==False])\n", "else:\n", " print(f'all species matched {orig_map_colname} mapping')\n", "\n", "# save output\n", "#dfbeis = dfbeis.drop(columns=orig_map_colname)\n", "dfbeis.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)" ] }, { "cell_type": "markdown", "id": "12d0c9c8-be4d-44ae-b20d-650f33207c71", "metadata": {}, "source": [ "## MEGAN\n", "input megan mapping from https://github.com/USEPA/CRACMM/tree/main/emissions/BiogenicMappings" ] }, { "cell_type": "code", "execution_count": null, "id": "d69adc5d-bddb-478a-8af9-ad4dd4774dbe", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES S is unknown in CRACMM and has been mapped to UNKCRACMM.\n", " warnings.warn(unkcracmm_msg)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "the species mappings below changed from CRACMM2alpha\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [C-]#[O+] is unknown in CRACMM and has been mapped to UNKCRACMM.\n", " warnings.warn(unkcracmm_msg)\n", "/work/MOD3DEV/tskipper/cracmm_hcho/CRACMM_REPO/emissions/cracmm2/cracmm_mapper.py:285: UserWarning: Species with SMILES [N]=O is unknown in CRACMM and has been mapped to UNKCRACMM.\n", " warnings.warn(unkcracmm_msg)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
REPRESENTATIVE_COMPOUND_NAMECRACMM2alphaCRACMM2
24p-CymeneROCP6AROVROCP6ARO
30EstragoleROCP6AROVROCP6ARO
32beta-IononeROCP6AROVROCP6ARO
361-Octen-3-olROCP6AROVROCP6ARO
55FarnesolROCP5AROVROCP5ARO
62cis-NerolidolROCP5AROVROCP5ARO
63trans-NerolidolROCP5AROVROCP5ARO
672-Ethylhexyl salicylateROCP2OXY2VROCP2OXY2
70(-)-alpha-CadinolROCP5AROVROCP5ARO
72(+)-CedrolROCP5OXY1VROCP5OXY1
773,3,5-Trimethylcyclohexyl salicylateROCP1OXY1VROCP1OXY1
79(-)-Kaur-16-eneROCP5AROVROCP5ARO
86(+)-LongicycleneROCP4ALKVROCP4ALK
104(E)-6,10-Dimethylundeca-5,9-dien-2-oneROCP6AROVROCP6ARO
105(E)-6,10-Dimethylundeca-5,9-dien-2-oneROCP6AROVROCP6ARO
1086,10-Dimethyl-5,9-undecadiene-2-oneROCP6AROVROCP6ARO
1102-NonenalROCP6AROVROCP6ARO
1267-heptadeceneROCP5AROVROCP5ARO
127AcetophenoneROCP6AROVROCP6ARO
131Benzyl benzoateROCP5AROVROCP5ARO
132Benzyl acetateROCP6AROVROCP6ARO
138Cinnamic acidROCP2OXY2VROCP2OXY2
139Coniferyl alcoholROCP1OXY3VROCP1OXY3
142Ethyl cinnamateROCP5AROVROCP5ARO
159JasmoneROCP6AROVROCP6ARO
162Linalool oxide pyranoid, cis-(+-)-ROCP5AROVROCP5ARO
163Linalool oxide pyranoid, cis-(+-)-ROCP5AROVROCP5ARO
165Methyl benzoateROCP6AROVROCP6ARO
166Methyl jasmonateROCP5AROVROCP5ARO
171(2E)-3-(4-Hydroxyphenyl)-2-propenoic acidROCP1OXY3VROCP1OXY3
174SafroleROCP5AROVROCP5ARO
181(Z)-Hex-3-enyl butyrateROCP6AROVROCP6ARO
187Diallyl disulfideROCP6AROVROCP6ARO
1901-DodeceneROCP6AROVROCP6ARO
196IndoleROCP5AROVROCP5ARO
207Allyl propyl disulfideROCP6AROVROCP6ARO
2093-MethylindoleROCP5AROVROCP5ARO
210alpha-Terpinyl acetateROCP6AROVROCP6ARO
211alpha-Terpinyl acetateROCP6AROVROCP6ARO
2121-TetradeceneROCP5AROVROCP5ARO
214Carbon monoxideSLOWROCUNKCRACMM
\n", "
" ], "text/plain": [ " REPRESENTATIVE_COMPOUND_NAME CRACMM2alpha CRACMM2\n", "24 p-Cymene ROCP6ARO VROCP6ARO\n", "30 Estragole ROCP6ARO VROCP6ARO\n", "32 beta-Ionone ROCP6ARO VROCP6ARO\n", "36 1-Octen-3-ol ROCP6ARO VROCP6ARO\n", "55 Farnesol ROCP5ARO VROCP5ARO\n", "62 cis-Nerolidol ROCP5ARO VROCP5ARO\n", "63 trans-Nerolidol ROCP5ARO VROCP5ARO\n", "67 2-Ethylhexyl salicylate ROCP2OXY2 VROCP2OXY2\n", "70 (-)-alpha-Cadinol ROCP5ARO VROCP5ARO\n", "72 (+)-Cedrol ROCP5OXY1 VROCP5OXY1\n", "77 3,3,5-Trimethylcyclohexyl salicylate ROCP1OXY1 VROCP1OXY1\n", "79 (-)-Kaur-16-ene ROCP5ARO VROCP5ARO\n", "86 (+)-Longicyclene ROCP4ALK VROCP4ALK\n", "104 (E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO\n", "105 (E)-6,10-Dimethylundeca-5,9-dien-2-one ROCP6ARO VROCP6ARO\n", "108 6,10-Dimethyl-5,9-undecadiene-2-one ROCP6ARO VROCP6ARO\n", "110 2-Nonenal ROCP6ARO VROCP6ARO\n", "126 7-heptadecene ROCP5ARO VROCP5ARO\n", "127 Acetophenone ROCP6ARO VROCP6ARO\n", "131 Benzyl benzoate ROCP5ARO VROCP5ARO\n", "132 Benzyl acetate ROCP6ARO VROCP6ARO\n", "138 Cinnamic acid ROCP2OXY2 VROCP2OXY2\n", "139 Coniferyl alcohol ROCP1OXY3 VROCP1OXY3\n", "142 Ethyl cinnamate ROCP5ARO VROCP5ARO\n", "159 Jasmone ROCP6ARO VROCP6ARO\n", "162 Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO\n", "163 Linalool oxide pyranoid, cis-(+-)- ROCP5ARO VROCP5ARO\n", "165 Methyl benzoate ROCP6ARO VROCP6ARO\n", "166 Methyl jasmonate ROCP5ARO VROCP5ARO\n", "171 (2E)-3-(4-Hydroxyphenyl)-2-propenoic acid ROCP1OXY3 VROCP1OXY3\n", "174 Safrole ROCP5ARO VROCP5ARO\n", "181 (Z)-Hex-3-enyl butyrate ROCP6ARO VROCP6ARO\n", "187 Diallyl disulfide ROCP6ARO VROCP6ARO\n", "190 1-Dodecene ROCP6ARO VROCP6ARO\n", "196 Indole ROCP5ARO VROCP5ARO\n", "207 Allyl propyl disulfide ROCP6ARO VROCP6ARO\n", "209 3-Methylindole ROCP5ARO VROCP5ARO\n", "210 alpha-Terpinyl acetate ROCP6ARO VROCP6ARO\n", "211 alpha-Terpinyl acetate ROCP6ARO VROCP6ARO\n", "212 1-Tetradecene ROCP5ARO VROCP5ARO\n", "214 Carbon monoxide SLOWROC UNKCRACMM" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "filename = datadir + 'bvoc_megan_tocracmm.csv'\n", "dfmegan = pd.read_csv(filename)\n", "# for checking if any species mapping changed\n", "orig_map_colname = 'CRACMM1' # an existing version in file to compare to, options: CRACMM1, CRACMM2\n", "dfmegan = dfmegan.rename(columns=dict(CRACMMorig=orig_map_colname))\n", "\n", "# run cracmm2 mapper\n", "smiles_k = 'SMILES'\n", "koh_k = 'ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED'\n", "cstar_k = 'log10Cstar_ugm3'\n", "dfmegan['CRACMMnew'] = dfmegan.apply(lambda x: cracmm2.get_cracmm_roc(x[smiles_k], x[koh_k], x[cstar_k]), axis=1)\n", "\n", "# check if any species mappings changed\n", "dfmegan_checkmatch = dfmegan.eval(f'match = {orig_map_colname}==CRACMMnew')\n", "show_cols = ['REPRESENTATIVE_COMPOUND_NAME',orig_map_colname,'CRACMMnew']\n", "if len(dfmegan_checkmatch[dfmegan_checkmatch.match==False])>0:\n", " print(f'the species mappings below changed from {orig_map_colname}')\n", " display(dfmegan_checkmatch[show_cols][dfmegan_checkmatch.match==False])\n", "else:\n", " print(f'all species matched {orig_map_colname} mapping')\n", "\n", "# save output\n", "#dfmegan = dfmegan.drop(columns=orig_map_colname)\n", "dfmegan.to_csv(outputdir+'bvoc_beis_tocracmm.csv', **csvout_kw)" ] } ], "metadata": { "kernelspec": { "display_name": "xarray_env_kernel", "language": "python", "name": "xarray_env_kernel" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 5 }