{ "cells": [ { "cell_type": "markdown", "id": "0cfa8e4e-8935-4017-b8ff-965ba562f422", "metadata": {}, "source": [ "# Find CRACMM species based on SMILES\n", "\n", "---\n", " author: Nash Skipper\n", " date: 2024-02-14\n", "\n", " updated: Michael Pye\n", " date: 2025-02-27\n", "---\n", "\n", "## Notebook Description\n", "This notebook provides examples of how to use the cracmm_mapper tool.\n", "\n", "## Download Notebook\n", "Click [here](https://github.com/USEPA/CRACMM/blob/main/utilities/smiles2cracmm.ipynb) to access the Jupyter Notebook file directly in GitHub where it can be downloaded. \n", "\n", "## Setup" ] }, { "cell_type": "code", "execution_count": null, "id": "84fc493b-a369-4729-8317-e9401f76a869", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# set location of mapper downloaded from https://github.com/USEPA/CRACMM/\n", "# import sys\n", "# utildir = '/path/to/cracmm/utilities/directory' \n", "# sys.path.append(utildir)\n", "\n", "# Import the python utilities\n", "import cracmm1_mapper as cracmm1 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)\n", "import cracmm2_mapper as cracmm2 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)" ] }, { "cell_type": "markdown", "id": "6f34a062-2b03-4a18-a0e1-fa37e26ec722", "metadata": {}, "source": [ "## Install rdkit if not already installed\n", "\n", "The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/)." ] }, { "cell_type": "code", "execution_count": 2, "id": "f4562478-e1bf-4902-8297-5dfe9e5e0c18", "metadata": {}, "outputs": [], "source": [ "# !python -m pip install --user rdkit\n", "\n", "# to install in the current kernel:\n", "# %pip install rdkit" ] }, { "cell_type": "markdown", "id": "4dfc0677-5e41-4790-9812-191a6b5cecb1", "metadata": {}, "source": [ "## Option 1: Interactively enter species properties" ] }, { "cell_type": "code", "execution_count": null, "id": "8640c681-e93b-425a-84d4-31ead81026e9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "enter SMILES: C=CC1=CC=CC=C1\n", "enter kOH (cm3 molecules-1 s-1): 5.79e-11\n", "enter log10Cstar (Cstar in ug/m3): 7.55\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "CRACMM species: STY\n" ] } ], "source": [ "#The current values of \"smiles\", \"kOH\", and \"log10Cstar\" are example user inputs.\n", "#For true interactive behavior, replace the variable values with the commented \n", "#code on the same line.\n", "smiles = 'C=CC1=CC=CC=C1' #str(input('enter SMILES: '))\n", "kOH = 5.79e-11 #float(input('enter kOH (cm3 molecules-1 s-1): '))\n", "log10Cstar = 7.55 #float(input('enter log10Cstar (Cstar in ug/m3): '))\n", "print(f'CRACMM species: {cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)}')" ] }, { "cell_type": "markdown", "id": "f1a18e46-63e6-4bc6-a8f4-639985652fb5", "metadata": {}, "source": [ "## Option 2: Same as option 1 but not interactive" ] }, { "cell_type": "code", "execution_count": null, "id": "578c4b9d-7bde-45cc-aaab-80cafc091028", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'STY'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "smiles = 'C=CC1=CC=CC=C1'\n", "kOH = 5.79e-11 # [cm3/(molecule*s)]\n", "log10Cstar = 7.55 # [Cstar in ug/m3]\n", "cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)" ] }, { "cell_type": "markdown", "id": "623199af-2de4-44c7-8de1-9d8e57f7fb2f", "metadata": {}, "source": [ "## Option 3: Run multiple species in batch" ] }, { "cell_type": "markdown", "id": "7bf7bd41-a9d1-4bce-9739-0a52a0c4b4d3", "metadata": {}, "source": [ "### Create a pandas DataFrame with species properties\n", "\n", "This is a simple example for demonstration. A more typical application would be to have a csv or excel file containing the SMILES string, kOH, and log10(Cstar) which can be used to create the DataFrame instead." ] }, { "cell_type": "code", "execution_count": 5, "id": "e83eadfb-4e6f-4776-b9c9-7ddc26432ea6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesSMILESkohlog10cstar
0styreneC=CC1=CC=CC=C15.790000e-117.55
1cyclohexaneC1CCCCC17.480000e-128.64
2glyoxalO=CC=O1.140000e-118.90
\n", "
" ], "text/plain": [ " species SMILES koh log10cstar\n", "0 styrene C=CC1=CC=CC=C1 5.790000e-11 7.55\n", "1 cyclohexane C1CCCCC1 7.480000e-12 8.64\n", "2 glyoxal O=CC=O 1.140000e-11 8.90" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\n", " 'species': ['styrene', 'cyclohexane', 'glyoxal'],\n", " 'SMILES': ['C=CC1=CC=CC=C1', 'C1CCCCC1', 'O=CC=O'],\n", " 'koh': [5.79e-11, 7.48e-12, 1.14e-11], # [cm3/(molecule*s)]\n", " 'log10cstar': [7.55, 8.64, 8.90] # [Cstar in ug/m3]\n", "}\n", "df = pd.DataFrame(data)\n", "df" ] }, { "cell_type": "markdown", "id": "7d3ead37-05fd-4668-ba57-38d5fc7f4a19", "metadata": {}, "source": [ "### Add column for CRACMM species from cracmm_mapper" ] }, { "cell_type": "code", "execution_count": null, "id": "66ed31a2-ab94-4bdf-bc24-6e065046e904", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speciesSMILESkohlog10cstarCRACMM
0styreneC=CC1=CC=CC=C15.790000e-117.55STY
1cyclohexaneC1CCCCC17.480000e-128.64HC10
2glyoxalO=CC=O1.140000e-118.90GLY
\n", "
" ], "text/plain": [ " species SMILES koh log10cstar CRACMM\n", "0 styrene C=CC1=CC=CC=C1 5.790000e-11 7.55 STY\n", "1 cyclohexane C1CCCCC1 7.480000e-12 8.64 HC10\n", "2 glyoxal O=CC=O 1.140000e-11 8.90 GLY" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['CRACMM'] = df.apply(lambda x: cracmm2.get_cracmm_roc(x['SMILES'], x['koh'], x['log10cstar']), axis='columns')\n", "df" ] } ], "metadata": { "kernelspec": { "display_name": "xarray_env_kernel", "language": "python", "name": "xarray_env_kernel" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 5 }