Find CRACMM species based on SMILES


author: Nash Skipper
date: 2024-02-14

updated: Michael Pye
date: 2025-02-27

Notebook Description

This notebook provides examples of how to use the cracmm_mapper tool.

Download Notebook

Click here to access the Jupyter Notebook file directly in GitHub where it can be downloaded.

Setup

import pandas as pd

# set location of mapper downloaded from https://github.com/USEPA/CRACMM/
# import sys
# utildir = '/path/to/cracmm/utilities/directory'   
# sys.path.append(utildir)

# Import the python utilities
import cracmm1_mapper as cracmm1   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)
import cracmm2_mapper as cracmm2   # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)

Install rdkit if not already installed

The cracmm_mapper function depends on rdkit .

# !python -m pip install --user rdkit

# to install in the current kernel:
# %pip install rdkit

Option 1: Interactively enter species properties

#The current values of "smiles", "kOH", and "log10Cstar" are example user inputs.
#For true interactive behavior, replace the variable values with the commented 
#code on the same line.
smiles = 'C=CC1=CC=CC=C1'   #str(input('enter SMILES:  '))
kOH = 5.79e-11  #float(input('enter kOH (cm3 molecules-1 s-1):  '))
log10Cstar = 7.55   #float(input('enter log10Cstar (Cstar in ug/m3):  '))
print(f'CRACMM species:  {cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)}')
CRACMM species:  STY

Option 2: Same as option 1 but not interactive

smiles = 'C=CC1=CC=CC=C1'
kOH = 5.79e-11      # [cm3/(molecule*s)]
log10Cstar = 7.55   # [Cstar in ug/m3]
cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)
'STY'

Option 3: Run multiple species in batch

Create a pandas DataFrame with species properties

This is a simple example for demonstration. A more typical application would be to have a csv or excel file containing the SMILES string, kOH, and log10(Cstar) which can be used to create the DataFrame instead.

data = {
    'species':    ['styrene', 'cyclohexane', 'glyoxal'],
    'SMILES':     ['C=CC1=CC=CC=C1', 'C1CCCCC1', 'O=CC=O'],
    'koh':        [5.79e-11, 7.48e-12, 1.14e-11], # [cm3/(molecule*s)]
    'log10cstar': [7.55, 8.64, 8.90] # [Cstar in ug/m3]
}
df = pd.DataFrame(data)
df
species SMILES koh log10cstar
0 styrene C=CC1=CC=CC=C1 5.790000e-11 7.55
1 cyclohexane C1CCCCC1 7.480000e-12 8.64
2 glyoxal O=CC=O 1.140000e-11 8.90

Add column for CRACMM species from cracmm_mapper

df['CRACMM'] = df.apply(lambda x: cracmm2.get_cracmm_roc(x['SMILES'], x['koh'], x['log10cstar']), axis='columns')
df
species SMILES koh log10cstar CRACMM
0 styrene C=CC1=CC=CC=C1 5.790000e-11 7.55 STY
1 cyclohexane C1CCCCC1 7.480000e-12 8.64 HC10
2 glyoxal O=CC=O 1.140000e-11 8.90 GLY