{
"cells": [
{
"cell_type": "markdown",
"id": "0cfa8e4e-8935-4017-b8ff-965ba562f422",
"metadata": {},
"source": [
"# Find CRACMM species based on SMILES\n",
"\n",
"---\n",
" author: Nash Skipper\n",
" date: 2024-02-14\n",
"\n",
" updated: Michael Pye\n",
" date: 2025-02-27\n",
"---\n",
"\n",
"## Notebook Description\n",
"This notebook provides examples of how to use the cracmm_mapper tool.\n",
"\n",
"## Download Notebook\n",
"Click [here](https://github.com/USEPA/CRACMM/blob/main/utilities/smiles2cracmm.ipynb) to access the Jupyter Notebook file directly in GitHub where it can be downloaded. \n",
"\n",
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84fc493b-a369-4729-8317-e9401f76a869",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# set location of mapper downloaded from https://github.com/USEPA/CRACMM/\n",
"# import sys\n",
"# utildir = '/path/to/cracmm/utilities/directory' \n",
"# sys.path.append(utildir)\n",
"\n",
"# Import the python utilities\n",
"import cracmm1_mapper as cracmm1 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)\n",
"import cracmm2_mapper as cracmm2 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)"
]
},
{
"cell_type": "markdown",
"id": "6f34a062-2b03-4a18-a0e1-fa37e26ec722",
"metadata": {},
"source": [
"## Install rdkit if not already installed\n",
"\n",
"The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f4562478-e1bf-4902-8297-5dfe9e5e0c18",
"metadata": {},
"outputs": [],
"source": [
"# !python -m pip install --user rdkit\n",
"\n",
"# to install in the current kernel:\n",
"# %pip install rdkit"
]
},
{
"cell_type": "markdown",
"id": "4dfc0677-5e41-4790-9812-191a6b5cecb1",
"metadata": {},
"source": [
"## Option 1: Interactively enter species properties"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8640c681-e93b-425a-84d4-31ead81026e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"enter SMILES: C=CC1=CC=CC=C1\n",
"enter kOH (cm3 molecules-1 s-1): 5.79e-11\n",
"enter log10Cstar (Cstar in ug/m3): 7.55\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CRACMM species: STY\n"
]
}
],
"source": [
"#The current values of \"smiles\", \"kOH\", and \"log10Cstar\" are example user inputs.\n",
"#For true interactive behavior, replace the variable values with the commented \n",
"#code on the same line.\n",
"smiles = 'C=CC1=CC=CC=C1' #str(input('enter SMILES: '))\n",
"kOH = 5.79e-11 #float(input('enter kOH (cm3 molecules-1 s-1): '))\n",
"log10Cstar = 7.55 #float(input('enter log10Cstar (Cstar in ug/m3): '))\n",
"print(f'CRACMM species: {cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)}')"
]
},
{
"cell_type": "markdown",
"id": "f1a18e46-63e6-4bc6-a8f4-639985652fb5",
"metadata": {},
"source": [
"## Option 2: Same as option 1 but not interactive"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "578c4b9d-7bde-45cc-aaab-80cafc091028",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'STY'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"smiles = 'C=CC1=CC=CC=C1'\n",
"kOH = 5.79e-11 # [cm3/(molecule*s)]\n",
"log10Cstar = 7.55 # [Cstar in ug/m3]\n",
"cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)"
]
},
{
"cell_type": "markdown",
"id": "623199af-2de4-44c7-8de1-9d8e57f7fb2f",
"metadata": {},
"source": [
"## Option 3: Run multiple species in batch"
]
},
{
"cell_type": "markdown",
"id": "7bf7bd41-a9d1-4bce-9739-0a52a0c4b4d3",
"metadata": {},
"source": [
"### Create a pandas DataFrame with species properties\n",
"\n",
"This is a simple example for demonstration. A more typical application would be to have a csv or excel file containing the SMILES string, kOH, and log10(Cstar) which can be used to create the DataFrame instead."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e83eadfb-4e6f-4776-b9c9-7ddc26432ea6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" SMILES | \n",
" koh | \n",
" log10cstar | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" styrene | \n",
" C=CC1=CC=CC=C1 | \n",
" 5.790000e-11 | \n",
" 7.55 | \n",
"
\n",
" \n",
" 1 | \n",
" cyclohexane | \n",
" C1CCCCC1 | \n",
" 7.480000e-12 | \n",
" 8.64 | \n",
"
\n",
" \n",
" 2 | \n",
" glyoxal | \n",
" O=CC=O | \n",
" 1.140000e-11 | \n",
" 8.90 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species SMILES koh log10cstar\n",
"0 styrene C=CC1=CC=CC=C1 5.790000e-11 7.55\n",
"1 cyclohexane C1CCCCC1 7.480000e-12 8.64\n",
"2 glyoxal O=CC=O 1.140000e-11 8.90"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = {\n",
" 'species': ['styrene', 'cyclohexane', 'glyoxal'],\n",
" 'SMILES': ['C=CC1=CC=CC=C1', 'C1CCCCC1', 'O=CC=O'],\n",
" 'koh': [5.79e-11, 7.48e-12, 1.14e-11], # [cm3/(molecule*s)]\n",
" 'log10cstar': [7.55, 8.64, 8.90] # [Cstar in ug/m3]\n",
"}\n",
"df = pd.DataFrame(data)\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "7d3ead37-05fd-4668-ba57-38d5fc7f4a19",
"metadata": {},
"source": [
"### Add column for CRACMM species from cracmm_mapper"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66ed31a2-ab94-4bdf-bc24-6e065046e904",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" species | \n",
" SMILES | \n",
" koh | \n",
" log10cstar | \n",
" CRACMM | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" styrene | \n",
" C=CC1=CC=CC=C1 | \n",
" 5.790000e-11 | \n",
" 7.55 | \n",
" STY | \n",
"
\n",
" \n",
" 1 | \n",
" cyclohexane | \n",
" C1CCCCC1 | \n",
" 7.480000e-12 | \n",
" 8.64 | \n",
" HC10 | \n",
"
\n",
" \n",
" 2 | \n",
" glyoxal | \n",
" O=CC=O | \n",
" 1.140000e-11 | \n",
" 8.90 | \n",
" GLY | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species SMILES koh log10cstar CRACMM\n",
"0 styrene C=CC1=CC=CC=C1 5.790000e-11 7.55 STY\n",
"1 cyclohexane C1CCCCC1 7.480000e-12 8.64 HC10\n",
"2 glyoxal O=CC=O 1.140000e-11 8.90 GLY"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['CRACMM'] = df.apply(lambda x: cracmm2.get_cracmm_roc(x['SMILES'], x['koh'], x['log10cstar']), axis='columns')\n",
"df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "xarray_env_kernel",
"language": "python",
"name": "xarray_env_kernel"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 5
}