Customize an Excel Export

After executing a batch analysis, you may want to add some additional information to the default Excel exports.

The pybmds package stores all modeling information in a data structure that allows you to get both the data and the reports. This guide demonstrates running a simple batch analysis, and then augmenting the default Excel export with a few extra columns of information:

from pprint import pprint

import pandas as pd

import pybmds

As a simple example, the following code generates a batch analysis using a few option sets and a single dataset. You could adapt this code to run a custom analysis of your choosing:

def build_cont_sess(ds):
    def add_model(sess, Model, base, additions=None):
        settings = {
            "priors": pybmds.PriorClass.frequentist_restricted,
            "bmr_type": base[0],
            "bmr": base[1],
            "disttype": base[2],
        }
        if additions is not None:
            settings.update(additions)
        sess.add_model(Model, settings)

    option_sets = [
        (pybmds.ContinuousRiskType.RelativeDeviation, 0.1,
         pybmds.ContinuousDistType.normal),
        (pybmds.ContinuousRiskType.RelativeDeviation, 0.1,
         pybmds.ContinuousDistType.normal_ncv),
    ]
    sessions = []
    for option_set in option_sets:
        sess = pybmds.Session(dataset=ds)
        add_model(sess, pybmds.Models.ExponentialM3, option_set)
        add_model(sess, pybmds.Models.ExponentialM5, option_set)
        add_model(sess, pybmds.Models.Power, option_set)
        add_model(sess, pybmds.Models.Power, option_set)
        add_model(sess, pybmds.Models.Linear, option_set)
        add_model(sess, pybmds.Models.Polynomial, option_set, {"degree": 2})
        add_model(sess, pybmds.Models.Polynomial, option_set, {"degree": 3})
        sess.execute_and_recommend()
        sessions.append(sess.to_dict())
    return pybmds.BatchResponse(success=True, content=sessions)


datasets = [
    pybmds.ContinuousDataset(
        doses=[0, 10, 50, 150, 400],
        ns=[111, 142, 143, 93, 42],
        means=[2.112, 2.095, 1.956, 1.587, 1.254],
        stdevs=[0.235, 0.209, 0.231, 0.263, 0.159],
    )
]
sess_batch = pybmds.BatchSession.execute(datasets, build_cont_sess, nprocs=1)

To generate a standard Excel export, call this method:

sess_batch.to_excel("output/batch.xlsx")
'output/batch.xlsx'

However, you may want to customize this export to add more information. In this example, you may want to show more information regarding the Analysis of Deviance Table than is generally shown in the summary exports.

To do this, first, generate the summary dataframe, to which you will add more info:

df_summary = sess_batch.df_summary()
df_summary.head()
session_index session_name session_description dataset_name dataset_dose_name dataset_dose_units dataset_response_name dataset_response_units dataset_doses dataset_ns ... p_value2 p_value3 p_value4 model_dof residual_of_interest residual_at_lowest_dose recommended recommendation_bin recommendation_notes selected
0 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 0.003939 0.003939 5.559276e-05 4.0 0.301926 0.399997 False Questionable Goodness of fit p-value < 0.1\nConstant varian... False
1 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 0.003939 0.003939 9.715399e-01 1.0 -0.008385 -0.022386 False Questionable Constant variance test failed (Test 2 p-value ... False
2 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 0.003939 0.003939 6.429235e-11 3.0 -0.313869 1.424190 False Questionable Goodness of fit p-value < 0.1\nConstant varian... False
3 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 0.003939 0.003939 6.429235e-11 3.0 -0.313869 1.424190 False Questionable Goodness of fit p-value < 0.1\nConstant varian... False
4 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 0.003939 0.003939 6.429235e-11 3.0 -0.313869 1.424190 False Questionable Goodness of fit p-value < 0.1\nConstant varian... False

5 rows × 37 columns

Which has 5 rows and 37 columns.

Now, add some additional information to the summary table. The following code iterates all the sessions in the dataset, and all the models in each session, and then creates a new dataframe that can be merged with the default summary dataframe:

rows = []
for i, session in enumerate(sess_batch.sessions):
    for j, model in enumerate(session.models):
        data = {
            "session_index": i,
            "bmds_model_index": j,
        }
        if model.has_results:
            res = model.results
            data.update(
                {
                    "A1_ll": res.deviance.loglikelihoods[0],
                    "A2_ll": res.deviance.loglikelihoods[1],
                    "A3_ll": res.deviance.loglikelihoods[2],
                    "fitted_ll": res.deviance.loglikelihoods[3],
                    "reduced_ll": res.deviance.loglikelihoods[4],
                    "A1_aic": res.deviance.aics[0],
                    "A2_aic": res.deviance.aics[1],
                    "A3_aic": res.deviance.aics[2],
                    "fitted_aic": res.deviance.aics[3],
                    "reduced_aic": res.deviance.aics[4],
                }
            )
        rows.append(data)

df2 = pd.DataFrame(rows)
df2.head()
session_index bmds_model_index A1_ll A2_ll A3_ll fitted_ll reduced_ll A1_aic A2_aic A3_aic fitted_aic reduced_aic
0 0 0 35.380504 43.080749 35.380504 22.988380 -194.453242 -58.761009 -66.161499 -58.761009 -41.976760 392.906483
1 0 1 35.380504 43.080749 35.380504 35.379868 -194.453242 -58.761009 -66.161499 -58.761009 -60.759736 392.906483
2 0 2 35.380504 43.080749 35.380504 10.159028 -194.453242 -58.761009 -66.161499 -58.761009 -14.318056 392.906483
3 0 3 35.380504 43.080749 35.380504 10.159028 -194.453242 -58.761009 -66.161499 -58.761009 -14.318056 392.906483
4 0 4 35.380504 43.080749 35.380504 10.159028 -194.453242 -58.761009 -66.161499 -58.761009 -14.318056 392.906483

Now, merge the two dataframes together, using the session and model keys to join:

df_summary2 = pd.merge(df_summary, df2, on=["session_index", "bmds_model_index"])
df_summary2.head()
session_index session_name session_description dataset_name dataset_dose_name dataset_dose_units dataset_response_name dataset_response_units dataset_doses dataset_ns ... A1_ll A2_ll A3_ll fitted_ll reduced_ll A1_aic A2_aic A3_aic fitted_aic reduced_aic
0 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 35.380504 43.080749 35.380504 22.988380 -194.453242 -58.761009 -66.161499 -58.761009 -41.976760 392.906483
1 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 35.380504 43.080749 35.380504 35.379868 -194.453242 -58.761009 -66.161499 -58.761009 -60.759736 392.906483
2 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 35.380504 43.080749 35.380504 10.159028 -194.453242 -58.761009 -66.161499 -58.761009 -14.318056 392.906483
3 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 35.380504 43.080749 35.380504 10.159028 -194.453242 -58.761009 -66.161499 -58.761009 -14.318056 392.906483
4 0 0.0,10.0,50.0,150.0,400.0 111.0,142.0,143.0,93.0,42.0 ... 35.380504 43.080749 35.380504 10.159028 -194.453242 -58.761009 -66.161499 -58.761009 -14.318056 392.906483

5 rows × 47 columns

Now, the summary dataframe has 47 columns instead of 35.

Finally, write the Excel export, with multiple tabs for different types of data:

with pd.ExcelWriter("output/report.xlsx") as writer:
    data = {
        "summary": df_summary2,
        "datasets": sess_batch.df_dataset(),
        "parameters": sess_batch.df_params(),
    }
    for name, df in data.items():
        df.to_excel(writer, sheet_name=name, index=False)

This export includes your additional values!

Model result introspection

This section describes how to introspect model results that are available, along with their data structures and text summaries.

Let’s grab the first model that was executed and look at its results object:

model = sess_batch.sessions[0].models[0]
res = model.results

For example, to show the BMD, BMDL, and AIC:

print(f"{res.bmd=}")
print(f"{res.bmdl=}")
print(f"{res.fit.aic=}")
res.bmd=69.7260856628418
res.bmdl=64.69549594366183
res.fit.aic=-39.976759721476874

To better understand the structure, we can “pretty print” the Python dictionary:

from pprint import pprint

data = res.model_dump()

# truncate a few fields so they print better... (you can ignore this code)
data["fit"]["bmd_dist"][0] = data["fit"]["bmd_dist"][0][:5]
data["fit"]["bmd_dist"][1] = data["fit"]["bmd_dist"][1][:5]
data["plotting"]["dr_x"] = data["plotting"]["dr_x"][:5]
data["plotting"]["dr_y"] = data["plotting"]["dr_y"][:5]

pprint(data, indent=2, width=100, sort_dicts=True)
{ 'bmd': 69.7260856628418,
  'bmdl': 64.69549594366183,
  'bmdu': 75.46896474983512,
  'deviance': { 'aics': [ -58.761008514924924,
                          -66.16149887942285,
                          -58.761008514924924,
                          -41.976759721476874,
                          392.9064834355403],
                'loglikelihoods': [ 35.38050425746246,
                                    43.080749439711425,
                                    35.38050425746246,
                                    22.988379860738437,
                                    -194.45324171777014],
                'names': ['A1', 'A2', 'A3', 'fitted', 'reduced'],
                'num_params': [6, 10, 6, 2, 2]},
  'fit': { 'aic': -39.976759721476874,
           'bic_equiv': 48.878035205148194,
           'bmd_dist': [ [ 64.69549594366183,
                           64.88798251662926,
                           65.08404336270584,
                           65.2814119494681,
                           65.47782174449254],
                         [0.05, 0.06, 0.07, 0.08, 0.09]],
           'bmds_model_df': 4.0,
           'chisq': 0.0,
           'dist': 1,
           'loglikelihood': 22.988379860738437,
           'total_df': 1.0},
  'gof': { 'calc_mean': [2.112, 2.095, 1.956, 1.587, 1.254],
           'calc_sd': [0.235, 0.209, 0.231, 0.263, 0.159],
           'dose': [0.0, 10.0, 50.0, 150.0, 400.0],
           'eb_lower': [ 2.05007080082289,
                         2.046304233630524,
                         1.9023668811812215,
                         1.5112812622766696,
                         1.1858820498658582],
           'eb_upper': [ 2.1739291991771097,
                         2.143695766369476,
                         2.0096331188187784,
                         1.6627187377233301,
                         1.3221179501341416],
           'est_mean': [ 2.1031875016962718,
                         2.071645917144676,
                         1.9501394721711907,
                         1.6766446741907262,
                         1.1491571827135112],
           'est_sd': [ 0.23211524527179625,
                       0.23211524527179625,
                       0.23211524527179625,
                       0.23211524527179625,
                       0.23211524527179625],
           'obs_mean': [2.112, 2.095, 1.956, 1.587, 1.254],
           'obs_sd': [0.235, 0.209, 0.231, 0.263, 0.159],
           'residual': [ 0.39999712520806624,
                         1.1989562145340822,
                         0.30192639775574026,
                         -3.724451314975327,
                         2.927248971211141],
           'roi': 0.30192639775574026,
           'size': [111.0, 142.0, 143.0, 93.0, 42.0]},
  'has_completed': True,
  'parameters': { 'bounded': [0.0, 0.0, 1.0, 0.0],
                  'cov': [ [ 0.00017144128882919536,
                             4.918989843420932e-07,
                             -1.5201975784980042e-12,
                             -6.715949683264648e-06],
                           [ 4.918989843420931e-07,
                             4.848157168875482e-09,
                             1.0024164806790948e-14,
                             -5.171252479548764e-10],
                           [ -1.5201975784980042e-12,
                             1.0024164806790948e-14,
                             9.378157638964157e-12,
                             1.7225957143579764e-12],
                           [ -6.715949683264648e-06,
                             -5.171252479548764e-10,
                             1.7225957143579764e-12,
                             0.0037797771004043678]],
                  'lower_ci': [ 2.077524597308932,
                                0.0013745933940738698,
                                -9999.0,
                                -3.041540942311279],
                  'names': ['a', 'b', 'd', 'log-alpha'],
                  'prior_initial_value': [0.0, 0.0, 0.0, 0.0],
                  'prior_max_value': [100.0, 100.0, 18.0, 18.0],
                  'prior_min_value': [0.0, 0.0, 1.0, -18.0],
                  'prior_stdev': [0.0, 0.0, 0.0, 0.0],
                  'prior_type': [0, 0, 0, 0],
                  'se': [0.013093559058911193, 6.962870937246706e-05, -9999.0, 0.06147989183793647],
                  'upper_ci': [ 2.1288504060836115,
                                0.0016475329215468659,
                                -9999.0,
                                -2.800544192858781],
                  'values': [ 2.1031875016962718,
                              0.0015110631578103678,
                              1.0000000010292514,
                              -2.92104256758503]},
  'plotting': { 'bmd_y': 1.8928687453150286,
                'bmdl_y': 1.9073122875049748,
                'bmdu_y': 1.8765137742516325,
                'dr_x': [ 1e-08,
                          4.040404040404041,
                          8.080808080808081,
                          12.121212121212121,
                          16.161616161616163],
                'dr_y': [ 2.1031875016644914,
                          2.0903860173979343,
                          2.0776624519546885,
                          2.0650163311087786,
                          2.052447183480147]},
  'tests': { 'dfs': [8.0, 4.0, 4.0, 4.0],
             'll_ratios': [ 475.06798231496316,
                            15.400490364497927,
                            15.400490364497927,
                            24.78424879344805],
             'names': ['Test 1', 'Test 2', 'Test 3', 'Test 4'],
             'p_values': [0.0, 0.003938741688359837, 0.003938741688359837, 5.5592759270473024e-05]}}

This may be helpful in trying to find a particular value in the nested results.

Build a text table

This is a helpful pattern to print data in a tabular format:

for name, df, ll, p_value in zip(
    res.tests.names, res.tests.dfs, res.tests.ll_ratios, res.tests.p_values, strict=True
):
    print(f"{name:10} {df: <6} {ll: <10.4f} {p_value: <8.6f}")
Test 1     8.0    475.0680   0.000000
Test 2     4.0    15.4005    0.003939
Test 3     4.0    15.4005    0.003939
Test 4     4.0    24.7842    0.000056

This pattern uses the built-in zip function that allows you to iterate across multiple lists of items in Python of the same size at the same time.