Find CRACMM species based on SMILES
author: Nash Skipper
date: 2024-02-14
updated: Michael Pye
date: 2025-02-27
Notebook Description
This notebook provides examples of how to use the cracmm_mapper tool.
Download Notebook
Click here to access the Jupyter Notebook file directly in GitHub where it can be downloaded.
Setup
import pandas as pd
# set location of mapper downloaded from https://github.com/USEPA/CRACMM/
# import sys
# utildir = '/path/to/cracmm/utilities/directory'
# sys.path.append(utildir)
# Import the python utilities
import cracmm1_mapper as cracmm1 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 1)
import cracmm2_mapper as cracmm2 # includes: get_cracmm_roc(smiles,koh,log10cstar) (Version 2)
Install rdkit if not already installed
The cracmm_mapper function depends on rdkit .
# !python -m pip install --user rdkit
# to install in the current kernel:
# %pip install rdkit
Option 1: Interactively enter species properties
#The current values of "smiles", "kOH", and "log10Cstar" are example user inputs.
#For true interactive behavior, replace the variable values with the commented
#code on the same line.
smiles = 'C=CC1=CC=CC=C1' #str(input('enter SMILES: '))
kOH = 5.79e-11 #float(input('enter kOH (cm3 molecules-1 s-1): '))
log10Cstar = 7.55 #float(input('enter log10Cstar (Cstar in ug/m3): '))
print(f'CRACMM species: {cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)}')
CRACMM species: STY
Option 2: Same as option 1 but not interactive
smiles = 'C=CC1=CC=CC=C1'
kOH = 5.79e-11 # [cm3/(molecule*s)]
log10Cstar = 7.55 # [Cstar in ug/m3]
cracmm2.get_cracmm_roc(smiles, kOH, log10Cstar)
'STY'
Option 3: Run multiple species in batch
Create a pandas DataFrame with species properties
This is a simple example for demonstration. A more typical application would be to have a csv or excel file containing the SMILES string, kOH, and log10(Cstar) which can be used to create the DataFrame instead.
data = {
'species': ['styrene', 'cyclohexane', 'glyoxal'],
'SMILES': ['C=CC1=CC=CC=C1', 'C1CCCCC1', 'O=CC=O'],
'koh': [5.79e-11, 7.48e-12, 1.14e-11], # [cm3/(molecule*s)]
'log10cstar': [7.55, 8.64, 8.90] # [Cstar in ug/m3]
}
df = pd.DataFrame(data)
df
species | SMILES | koh | log10cstar | |
---|---|---|---|---|
0 | styrene | C=CC1=CC=CC=C1 | 5.790000e-11 | 7.55 |
1 | cyclohexane | C1CCCCC1 | 7.480000e-12 | 8.64 |
2 | glyoxal | O=CC=O | 1.140000e-11 | 8.90 |
Add column for CRACMM species from cracmm_mapper
df['CRACMM'] = df.apply(lambda x: cracmm2.get_cracmm_roc(x['SMILES'], x['koh'], x['log10cstar']), axis='columns')
df
species | SMILES | koh | log10cstar | CRACMM | |
---|---|---|---|---|---|
0 | styrene | C=CC1=CC=CC=C1 | 5.790000e-11 | 7.55 | STY |
1 | cyclohexane | C1CCCCC1 | 7.480000e-12 | 8.64 | HC10 |
2 | glyoxal | O=CC=O | 1.140000e-11 | 8.90 | GLY |