{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0cfa8e4e-8935-4017-b8ff-965ba562f422",
   "metadata": {},
   "source": [
    "# Find CRACMM species based on SMILES\n",
    "\n",
    "---\n",
    "    author: Nash Skipper\n",
    "    date: 2024-02-14\n",
    "\n",
    "    updated: Michael Pye\n",
    "    date: 2025-02-27\n",
    "---\n",
    "\n",
    "## Notebook Description\n",
    "This notebook provides examples of how to use the cracmm_mapper tool.\n",
    "\n",
    "## Download Notebook\n",
    "Click [here](https://github.com/USEPA/CRACMM/blob/main/utilities/smiles2cracmm.ipynb) to access the Jupyter Notebook file directly in GitHub where it can be downloaded.  \n",
    "\n",
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84fc493b-a369-4729-8317-e9401f76a869",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# set location of mapper downloaded from https://github.com/USEPA/CRACMM/\n",
    "# import sys\n",
    "# utildir = '/path/to/cracmm/utilities/directory'   \n",
    "# sys.path.append(utildir)\n",
    "\n",
    "# Import the python utilities\n",
    "import cracmm1_mapper as cracmm1   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)\n",
    "import cracmm2_mapper as cracmm2   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f34a062-2b03-4a18-a0e1-fa37e26ec722",
   "metadata": {},
   "source": [
    "## Install rdkit if not already installed\n",
    "\n",
    "The cracmm_mapper function depends on [rdkit](https://www.rdkit.org/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f4562478-e1bf-4902-8297-5dfe9e5e0c18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !python -m pip install --user rdkit\n",
    "\n",
    "# to install in the current kernel:\n",
    "# %pip install rdkit"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dfc0677-5e41-4790-9812-191a6b5cecb1",
   "metadata": {},
   "source": [
    "## Option 1: Interactively enter species properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8640c681-e93b-425a-84d4-31ead81026e9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "enter SMILES:   C=CC1=CC=CC=C1\n",
      "enter kOH (cm3 molecules-1 s-1):   5.79e-11\n",
      "enter log10Cstar (Cstar in ug/m3):   7.55\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CRACMM species:  STY\n"
     ]
    }
   ],
   "source": [
    "#The current values of \"smiles\", \"kOH\", and \"log10Cstar\" are example user inputs.\n",
    "#For true interactive behavior, replace the variable values with the commented \n",
    "#code on the same line.\n",
    "smiles = 'C=CC1=CC=CC=C1'   #str(input('enter SMILES:  '))\n",
    "kOH = 5.79e-11  #float(input('enter kOH (cm3 molecules-1 s-1):  '))\n",
    "log10Cstar = 7.55   #float(input('enter log10Cstar (Cstar in ug/m3):  '))\n",
    "print(f'CRACMM species:  {cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1a18e46-63e6-4bc6-a8f4-639985652fb5",
   "metadata": {},
   "source": [
    "## Option 2: Same as option 1 but not interactive"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "578c4b9d-7bde-45cc-aaab-80cafc091028",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'STY'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "smiles = 'C=CC1=CC=CC=C1'\n",
    "kOH = 5.79e-11      # [cm3/(molecule*s)]\n",
    "log10Cstar = 7.55   # [Cstar in ug/m3]\n",
    "cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "623199af-2de4-44c7-8de1-9d8e57f7fb2f",
   "metadata": {},
   "source": [
    "## Option 3: Run multiple species in batch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bf7bd41-a9d1-4bce-9739-0a52a0c4b4d3",
   "metadata": {},
   "source": [
    "### Create a pandas DataFrame with species properties\n",
    "\n",
    "This is a simple example for demonstration. A more typical application would be to have a csv or excel file containing the SMILES string, kOH, and log10(Cstar) which can be used to create the DataFrame instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e83eadfb-4e6f-4776-b9c9-7ddc26432ea6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>koh</th>\n",
       "      <th>log10cstar</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>styrene</td>\n",
       "      <td>C=CC1=CC=CC=C1</td>\n",
       "      <td>5.790000e-11</td>\n",
       "      <td>7.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cyclohexane</td>\n",
       "      <td>C1CCCCC1</td>\n",
       "      <td>7.480000e-12</td>\n",
       "      <td>8.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>glyoxal</td>\n",
       "      <td>O=CC=O</td>\n",
       "      <td>1.140000e-11</td>\n",
       "      <td>8.90</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       species          SMILES           koh  log10cstar\n",
       "0      styrene  C=CC1=CC=CC=C1  5.790000e-11        7.55\n",
       "1  cyclohexane        C1CCCCC1  7.480000e-12        8.64\n",
       "2      glyoxal          O=CC=O  1.140000e-11        8.90"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {\n",
    "    'species':    ['styrene', 'cyclohexane', 'glyoxal'],\n",
    "    'SMILES':     ['C=CC1=CC=CC=C1', 'C1CCCCC1', 'O=CC=O'],\n",
    "    'koh':        [5.79e-11, 7.48e-12, 1.14e-11], # [cm3/(molecule*s)]\n",
    "    'log10cstar': [7.55, 8.64, 8.90] # [Cstar in ug/m3]\n",
    "}\n",
    "df = pd.DataFrame(data)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d3ead37-05fd-4668-ba57-38d5fc7f4a19",
   "metadata": {},
   "source": [
    "### Add column for CRACMM species from cracmm_mapper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66ed31a2-ab94-4bdf-bc24-6e065046e904",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>SMILES</th>\n",
       "      <th>koh</th>\n",
       "      <th>log10cstar</th>\n",
       "      <th>CRACMM</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>styrene</td>\n",
       "      <td>C=CC1=CC=CC=C1</td>\n",
       "      <td>5.790000e-11</td>\n",
       "      <td>7.55</td>\n",
       "      <td>STY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cyclohexane</td>\n",
       "      <td>C1CCCCC1</td>\n",
       "      <td>7.480000e-12</td>\n",
       "      <td>8.64</td>\n",
       "      <td>HC10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>glyoxal</td>\n",
       "      <td>O=CC=O</td>\n",
       "      <td>1.140000e-11</td>\n",
       "      <td>8.90</td>\n",
       "      <td>GLY</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       species          SMILES           koh  log10cstar CRACMM\n",
       "0      styrene  C=CC1=CC=CC=C1  5.790000e-11        7.55    STY\n",
       "1  cyclohexane        C1CCCCC1  7.480000e-12        8.64   HC10\n",
       "2      glyoxal          O=CC=O  1.140000e-11        8.90    GLY"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['CRACMM'] = df.apply(lambda x: cracmm2.get_cracmm_roc(x['SMILES'], x['koh'], x['log10cstar']), axis='columns')\n",
    "df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "xarray_env_kernel",
   "language": "python",
   "name": "xarray_env_kernel"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}