Customize an Excel Export¶

After executing a batch analysis, you may want to add some additional information to the default Excel exports.

The pybmds package stores all modeling information in a data structure that allows you to get both the data and the reports. This guide demonstrates running a simple batch analysis, and then augmenting the default Excel export with a few extra columns of information:

from pprint import pprint

import pandas as pd

import pybmds

As a simple example, the following code generates a batch analysis using a few option sets and a single dataset. You could adapt this code to run a custom analysis of your choosing:

def build_cont_sess(ds):
    def add_model(sess, Model, base, additions=None):
        settings = {
            "priors": pybmds.PriorClass.frequentist_restricted,
            "bmr_type": base[0],
            "bmr": base[1],
            "disttype": base[2],
        }
        if additions is not None:
            settings.update(additions)
        sess.add_model(Model, settings)

    option_sets = [
        (pybmds.ContinuousRiskType.RelativeDeviation, 0.1,
         pybmds.ContinuousDistType.normal),
        (pybmds.ContinuousRiskType.RelativeDeviation, 0.1,
         pybmds.ContinuousDistType.normal_ncv),
    ]
    sessions = []
    for option_set in option_sets:
        sess = pybmds.Session(dataset=ds)
        add_model(sess, pybmds.Models.ExponentialM3, option_set)
        add_model(sess, pybmds.Models.ExponentialM5, option_set)
        add_model(sess, pybmds.Models.Power, option_set)
        add_model(sess, pybmds.Models.Power, option_set)
        add_model(sess, pybmds.Models.Linear, option_set)
        add_model(sess, pybmds.Models.Polynomial, option_set, {"degree": 2})
        add_model(sess, pybmds.Models.Polynomial, option_set, {"degree": 3})
        sess.execute_and_recommend()
        sessions.append(sess.to_dict())
    return pybmds.BatchResponse(success=True, content=sessions)


datasets = [
    pybmds.ContinuousDataset(
        doses=[0, 10, 50, 150, 400],
        ns=[111, 142, 143, 93, 42],
        means=[2.112, 2.095, 1.956, 1.587, 1.254],
        stdevs=[0.235, 0.209, 0.231, 0.263, 0.159],
    )
]
sess_batch = pybmds.BatchSession.execute(datasets, build_cont_sess, nprocs=1)

To generate a standard Excel export, call this method:

sess_batch.to_excel("output/batch.xlsx")

'output/batch.xlsx'

However, you may want to customize this export to add more information. In this example, you may want to show more information regarding the Analysis of Deviance Table than is generally shown in the summary exports.

To do this, first, generate the summary dataframe, to which you will add more info:

df_summary = sess_batch.df_summary()
df_summary.head()

	dataset_doses	dataset_ns	...	p_value2	p_value3	p_value4	model_dof	residual_of_interest	residual_at_lowest_dose	recommended	recommendation_bin	recommendation_notes	selected
0	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	0.003939	0.003939	5.559276e-05	4.0	0.301926	0.399997	False	Questionable	Goodness of fit p-value < 0.1\nConstant varian...	False
1	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	0.003939	0.003939	9.715399e-01	1.0	-0.008385	-0.022386	False	Questionable	Constant variance test failed (Test 2 p-value ...	False
2	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	0.003939	0.003939	6.429235e-11	3.0	-0.313869	1.424190	False	Questionable	Goodness of fit p-value < 0.1\nConstant varian...	False
3	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	0.003939	0.003939	6.429235e-11	3.0	-0.313869	1.424190	False	Questionable	Goodness of fit p-value < 0.1\nConstant varian...	False
4	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	0.003939	0.003939	6.429235e-11	3.0	-0.313869	1.424190	False	Questionable	Goodness of fit p-value < 0.1\nConstant varian...	False

5 rows × 37 columns

Which has 5 rows and 37 columns.

Now, add some additional information to the summary table. The following code iterates all the sessions in the dataset, and all the models in each session, and then creates a new dataframe that can be merged with the default summary dataframe:

rows = []
for i, session in enumerate(sess_batch.sessions):
    for j, model in enumerate(session.models):
        data = {
            "session_index": i,
            "bmds_model_index": j,
        }
        if model.has_results:
            res = model.results
            data.update(
                {
                    "A1_ll": res.deviance.loglikelihoods[0],
                    "A2_ll": res.deviance.loglikelihoods[1],
                    "A3_ll": res.deviance.loglikelihoods[2],
                    "fitted_ll": res.deviance.loglikelihoods[3],
                    "reduced_ll": res.deviance.loglikelihoods[4],
                    "A1_aic": res.deviance.aics[0],
                    "A2_aic": res.deviance.aics[1],
                    "A3_aic": res.deviance.aics[2],
                    "fitted_aic": res.deviance.aics[3],
                    "reduced_aic": res.deviance.aics[4],
                }
            )
        rows.append(data)

df2 = pd.DataFrame(rows)
df2.head()

	bmds_model_index	A1_ll	A2_ll	A3_ll	fitted_ll	reduced_ll	A1_aic	A2_aic	A3_aic	fitted_aic	reduced_aic
0	0	35.380504	43.080749	35.380504	22.988380	-194.453242	-58.761009	-66.161499	-58.761009	-41.976760	392.906483
1	1	35.380504	43.080749	35.380504	35.379868	-194.453242	-58.761009	-66.161499	-58.761009	-60.759736	392.906483
2	2	35.380504	43.080749	35.380504	10.159028	-194.453242	-58.761009	-66.161499	-58.761009	-14.318056	392.906483
3	3	35.380504	43.080749	35.380504	10.159028	-194.453242	-58.761009	-66.161499	-58.761009	-14.318056	392.906483
4	4	35.380504	43.080749	35.380504	10.159028	-194.453242	-58.761009	-66.161499	-58.761009	-14.318056	392.906483

Now, merge the two dataframes together, using the session and model keys to join:

df_summary2 = pd.merge(df_summary, df2, on=["session_index", "bmds_model_index"])
df_summary2.head()

	dataset_doses	dataset_ns	...	A1_ll	A2_ll	A3_ll	fitted_ll	reduced_ll	A1_aic	A2_aic	A3_aic	fitted_aic	reduced_aic
0	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	35.380504	43.080749	35.380504	22.988380	-194.453242	-58.761009	-66.161499	-58.761009	-41.976760	392.906483
1	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	35.380504	43.080749	35.380504	35.379868	-194.453242	-58.761009	-66.161499	-58.761009	-60.759736	392.906483
2	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	35.380504	43.080749	35.380504	10.159028	-194.453242	-58.761009	-66.161499	-58.761009	-14.318056	392.906483
3	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	35.380504	43.080749	35.380504	10.159028	-194.453242	-58.761009	-66.161499	-58.761009	-14.318056	392.906483
4	0.0,10.0,50.0,150.0,400.0	111.0,142.0,143.0,93.0,42.0	...	35.380504	43.080749	35.380504	10.159028	-194.453242	-58.761009	-66.161499	-58.761009	-14.318056	392.906483

5 rows × 47 columns

Now, the summary dataframe has 47 columns instead of 35.

Finally, write the Excel export, with multiple tabs for different types of data:

with pd.ExcelWriter("output/report.xlsx") as writer:
    data = {
        "summary": df_summary2,
        "datasets": sess_batch.df_dataset(),
        "parameters": sess_batch.df_params(),
    }
    for name, df in data.items():
        df.to_excel(writer, sheet_name=name, index=False)

This export includes your additional values!

Model result introspection¶

This section describes how to introspect model results that are available, along with their data structures and text summaries.

Let’s grab the first model that was executed and look at its results object:

model = sess_batch.sessions[0].models[0]
res = model.results

For example, to show the BMD, BMDL, and AIC:

print(f"{res.bmd=}")
print(f"{res.bmdl=}")
print(f"{res.fit.aic=}")

res.bmd=69.7260856628418
res.bmdl=64.69549594366183
res.fit.aic=-39.976759721476874

To better understand the structure, we can “pretty print” the Python dictionary:

from pprint import pprint

data = res.model_dump()

# truncate a few fields so they print better... (you can ignore this code)
data["fit"]["bmd_dist"][0] = data["fit"]["bmd_dist"][0][:5]
data["fit"]["bmd_dist"][1] = data["fit"]["bmd_dist"][1][:5]
data["plotting"]["dr_x"] = data["plotting"]["dr_x"][:5]
data["plotting"]["dr_y"] = data["plotting"]["dr_y"][:5]

pprint(data, indent=2, width=100, sort_dicts=True)

{ 'bmd': 69.7260856628418,
  'bmdl': 64.69549594366183,
  'bmdu': 75.46896474983512,
  'deviance': { 'aics': [ -58.761008514924924,
                          -66.16149887942285,
                          -58.761008514924924,
                          -41.976759721476874,
                          392.9064834355403],
                'loglikelihoods': [ 35.38050425746246,
                                    43.080749439711425,
                                    35.38050425746246,
                                    22.988379860738437,
                                    -194.45324171777014],
                'names': ['A1', 'A2', 'A3', 'fitted', 'reduced'],
                'num_params': [6, 10, 6, 2, 2]},
  'fit': { 'aic': -39.976759721476874,
           'bic_equiv': 48.878035205148194,
           'bmd_dist': [ [ 64.69549594366183,
                           64.88798251662926,
                           65.08404336270584,
                           65.2814119494681,
                           65.47782174449254],
                         [0.05, 0.06, 0.07, 0.08, 0.09]],
           'bmds_model_df': 4.0,
           'chisq': 0.0,
           'dist': 1,
           'loglikelihood': 22.988379860738437,
           'total_df': 1.0},
  'gof': { 'calc_mean': [2.112, 2.095, 1.956, 1.587, 1.254],
           'calc_sd': [0.235, 0.209, 0.231, 0.263, 0.159],
           'dose': [0.0, 10.0, 50.0, 150.0, 400.0],
           'eb_lower': [ 2.05007080082289,
                         2.046304233630524,
                         1.9023668811812215,
                         1.5112812622766696,
                         1.1858820498658582],
           'eb_upper': [ 2.1739291991771097,
                         2.143695766369476,
                         2.0096331188187784,
                         1.6627187377233301,
                         1.3221179501341416],
           'est_mean': [ 2.1031875016962718,
                         2.071645917144676,
                         1.9501394721711907,
                         1.6766446741907262,
                         1.1491571827135112],
           'est_sd': [ 0.23211524527179625,
                       0.23211524527179625,
                       0.23211524527179625,
                       0.23211524527179625,
                       0.23211524527179625],
           'obs_mean': [2.112, 2.095, 1.956, 1.587, 1.254],
           'obs_sd': [0.235, 0.209, 0.231, 0.263, 0.159],
           'residual': [ 0.39999712520806624,
                         1.1989562145340822,
                         0.30192639775574026,
                         -3.724451314975327,
                         2.927248971211141],
           'roi': 0.30192639775574026,
           'size': [111.0, 142.0, 143.0, 93.0, 42.0]},
  'has_completed': True,
  'parameters': { 'bounded': [0.0, 0.0, 1.0, 0.0],
                  'cov': [ [ 0.00017144128882919536,
                             4.918989843420932e-07,
                             -1.5201975784980042e-12,
                             -6.715949683264648e-06],
                           [ 4.918989843420931e-07,
                             4.848157168875482e-09,
                             1.0024164806790948e-14,
                             -5.171252479548764e-10],
                           [ -1.5201975784980042e-12,
                             1.0024164806790948e-14,
                             9.378157638964157e-12,
                             1.7225957143579764e-12],
                           [ -6.715949683264648e-06,
                             -5.171252479548764e-10,
                             1.7225957143579764e-12,
                             0.0037797771004043678]],
                  'lower_ci': [ 2.077524597308932,
                                0.0013745933940738698,
                                -9999.0,
                                -3.041540942311279],
                  'names': ['a', 'b', 'd', 'log-alpha'],
                  'prior_initial_value': [0.0, 0.0, 1.0, 0.0],
                  'prior_max_value': [100.0, 100.0, 18.0, 18.0],
                  'prior_min_value': [0.0, 0.0, 1.0, -18.0],
                  'prior_stdev': [0.0, 0.0, 0.0, 0.0],
                  'prior_type': [0, 0, 0, 0],
                  'se': [0.013093559058911193, 6.962870937246706e-05, -9999.0, 0.06147989183793647],
                  'upper_ci': [ 2.1288504060836115,
                                0.0016475329215468659,
                                -9999.0,
                                -2.800544192858781],
                  'values': [ 2.1031875016962718,
                              0.0015110631578103678,
                              1.0000000010292514,
                              -2.92104256758503]},
  'plotting': { 'bmd_y': 1.8928687453150286,
                'bmdl_y': 1.9073122875049748,
                'bmdu_y': 1.8765137742516325,
                'dr_x': [ 1e-08,
                          4.040404040404041,
                          8.080808080808081,
                          12.121212121212121,
                          16.161616161616163],
                'dr_y': [ 2.1031875016644914,
                          2.0903860173979343,
                          2.0776624519546885,
                          2.0650163311087786,
                          2.052447183480147]},
  'tests': { 'dfs': [8.0, 4.0, 4.0, 4.0],
             'll_ratios': [ 475.06798231496316,
                            15.400490364497927,
                            15.400490364497927,
                            24.78424879344805],
             'names': ['Test 1', 'Test 2', 'Test 3', 'Test 4'],
             'p_values': [0.0, 0.003938741688359837, 0.003938741688359837, 5.5592759270473024e-05]}}

This may be helpful in trying to find a particular value in the nested results.

Build a text table¶

This is a helpful pattern to print data in a tabular format:

for name, df, ll, p_value in zip(
    res.tests.names, res.tests.dfs, res.tests.ll_ratios, res.tests.p_values, strict=True
):
    print(f"{name:10} {df: <6} {ll: <10.4f} {p_value: <8.6f}")

Test 1     8.0    475.0680   0.000000
Test 2     4.0    15.4005    0.003939
Test 3     4.0    15.4005    0.003939
Test 4     4.0    24.7842    0.000056

This pattern uses the built-in zip function that allows you to iterate across multiple lists of items in Python of the same size at the same time.