Customize an Excel Export¶
After executing a batch analysis, you may want to add some additional information to the default Excel exports.
The pybmds
package stores all modeling information in a data structure that allows you to get both the data and the reports. This guide demonstrates running a simple batch analysis, and then augmenting the default Excel export with a few extra columns of information:
from pprint import pprint
import pandas as pd
import pybmds
As a simple example, the following code generates a batch analysis using a few option sets and a single dataset. You could adapt this code to run a custom analysis of your choosing:
def build_cont_sess(ds):
def add_model(sess, Model, base, additions=None):
settings = {
"priors": pybmds.PriorClass.frequentist_restricted,
"bmr_type": base[0],
"bmr": base[1],
"disttype": base[2],
}
if additions is not None:
settings.update(additions)
sess.add_model(Model, settings)
option_sets = [
(pybmds.ContinuousRiskType.RelativeDeviation, 0.1,
pybmds.ContinuousDistType.normal),
(pybmds.ContinuousRiskType.RelativeDeviation, 0.1,
pybmds.ContinuousDistType.normal_ncv),
]
sessions = []
for option_set in option_sets:
sess = pybmds.Session(dataset=ds)
add_model(sess, pybmds.Models.ExponentialM3, option_set)
add_model(sess, pybmds.Models.ExponentialM5, option_set)
add_model(sess, pybmds.Models.Power, option_set)
add_model(sess, pybmds.Models.Power, option_set)
add_model(sess, pybmds.Models.Linear, option_set)
add_model(sess, pybmds.Models.Polynomial, option_set, {"degree": 2})
add_model(sess, pybmds.Models.Polynomial, option_set, {"degree": 3})
sess.execute_and_recommend()
sessions.append(sess.to_dict())
return pybmds.BatchResponse(success=True, content=sessions)
datasets = [
pybmds.ContinuousDataset(
doses=[0, 10, 50, 150, 400],
ns=[111, 142, 143, 93, 42],
means=[2.112, 2.095, 1.956, 1.587, 1.254],
stdevs=[0.235, 0.209, 0.231, 0.263, 0.159],
)
]
sess_batch = pybmds.BatchSession.execute(datasets, build_cont_sess, nprocs=1)
To generate a standard Excel export, call this method:
sess_batch.to_excel("output/batch.xlsx")
'output/batch.xlsx'
However, you may want to customize this export to add more information. In this example, you may want to show more information regarding the Analysis of Deviance Table than is generally shown in the summary exports.
To do this, first, generate the summary dataframe, to which you will add more info:
df_summary = sess_batch.df_summary()
df_summary.head()
session_index | session_name | session_description | dataset_name | dataset_dose_name | dataset_dose_units | dataset_response_name | dataset_response_units | dataset_doses | dataset_ns | ... | p_value2 | p_value3 | p_value4 | model_dof | residual_of_interest | residual_at_lowest_dose | recommended | recommendation_bin | recommendation_notes | selected | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 0.003939 | 0.003939 | 5.559276e-05 | 4.0 | 0.301926 | 0.399997 | False | Questionable | Goodness of fit p-value < 0.1\nConstant varian... | False | |||||||
1 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 0.003939 | 0.003939 | 9.715399e-01 | 1.0 | -0.008385 | -0.022386 | False | Questionable | Constant variance test failed (Test 2 p-value ... | False | |||||||
2 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 0.003939 | 0.003939 | 6.429235e-11 | 3.0 | -0.313869 | 1.424190 | False | Questionable | Goodness of fit p-value < 0.1\nConstant varian... | False | |||||||
3 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 0.003939 | 0.003939 | 6.429235e-11 | 3.0 | -0.313869 | 1.424190 | False | Questionable | Goodness of fit p-value < 0.1\nConstant varian... | False | |||||||
4 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 0.003939 | 0.003939 | 6.429235e-11 | 3.0 | -0.313869 | 1.424190 | False | Questionable | Goodness of fit p-value < 0.1\nConstant varian... | False |
5 rows × 37 columns
Which has 5 rows and 37 columns.
Now, add some additional information to the summary table. The following code iterates all the sessions in the dataset, and all the models in each session, and then creates a new dataframe that can be merged with the default summary dataframe:
rows = []
for i, session in enumerate(sess_batch.sessions):
for j, model in enumerate(session.models):
data = {
"session_index": i,
"bmds_model_index": j,
}
if model.has_results:
res = model.results
data.update(
{
"A1_ll": res.deviance.loglikelihoods[0],
"A2_ll": res.deviance.loglikelihoods[1],
"A3_ll": res.deviance.loglikelihoods[2],
"fitted_ll": res.deviance.loglikelihoods[3],
"reduced_ll": res.deviance.loglikelihoods[4],
"A1_aic": res.deviance.aics[0],
"A2_aic": res.deviance.aics[1],
"A3_aic": res.deviance.aics[2],
"fitted_aic": res.deviance.aics[3],
"reduced_aic": res.deviance.aics[4],
}
)
rows.append(data)
df2 = pd.DataFrame(rows)
df2.head()
session_index | bmds_model_index | A1_ll | A2_ll | A3_ll | fitted_ll | reduced_ll | A1_aic | A2_aic | A3_aic | fitted_aic | reduced_aic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 35.380504 | 43.080749 | 35.380504 | 22.988380 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -41.976760 | 392.906483 |
1 | 0 | 1 | 35.380504 | 43.080749 | 35.380504 | 35.379868 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -60.759736 | 392.906483 |
2 | 0 | 2 | 35.380504 | 43.080749 | 35.380504 | 10.159028 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -14.318056 | 392.906483 |
3 | 0 | 3 | 35.380504 | 43.080749 | 35.380504 | 10.159028 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -14.318056 | 392.906483 |
4 | 0 | 4 | 35.380504 | 43.080749 | 35.380504 | 10.159028 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -14.318056 | 392.906483 |
Now, merge the two dataframes together, using the session and model keys to join:
df_summary2 = pd.merge(df_summary, df2, on=["session_index", "bmds_model_index"])
df_summary2.head()
session_index | session_name | session_description | dataset_name | dataset_dose_name | dataset_dose_units | dataset_response_name | dataset_response_units | dataset_doses | dataset_ns | ... | A1_ll | A2_ll | A3_ll | fitted_ll | reduced_ll | A1_aic | A2_aic | A3_aic | fitted_aic | reduced_aic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 35.380504 | 43.080749 | 35.380504 | 22.988380 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -41.976760 | 392.906483 | |||||||
1 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 35.380504 | 43.080749 | 35.380504 | 35.379868 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -60.759736 | 392.906483 | |||||||
2 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 35.380504 | 43.080749 | 35.380504 | 10.159028 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -14.318056 | 392.906483 | |||||||
3 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 35.380504 | 43.080749 | 35.380504 | 10.159028 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -14.318056 | 392.906483 | |||||||
4 | 0 | 0.0,10.0,50.0,150.0,400.0 | 111.0,142.0,143.0,93.0,42.0 | ... | 35.380504 | 43.080749 | 35.380504 | 10.159028 | -194.453242 | -58.761009 | -66.161499 | -58.761009 | -14.318056 | 392.906483 |
5 rows × 47 columns
Now, the summary dataframe has 47 columns instead of 35.
Finally, write the Excel export, with multiple tabs for different types of data:
with pd.ExcelWriter("output/report.xlsx") as writer:
data = {
"summary": df_summary2,
"datasets": sess_batch.df_dataset(),
"parameters": sess_batch.df_params(),
}
for name, df in data.items():
df.to_excel(writer, sheet_name=name, index=False)
This export includes your additional values!
Model result introspection¶
This section describes how to introspect model results that are available, along with their data structures and text summaries.
Let’s grab the first model that was executed and look at its results object:
model = sess_batch.sessions[0].models[0]
res = model.results
For example, to show the BMD, BMDL, and AIC:
print(f"{res.bmd=}")
print(f"{res.bmdl=}")
print(f"{res.fit.aic=}")
res.bmd=69.7260856628418
res.bmdl=64.69549594366183
res.fit.aic=-39.976759721476874
To better understand the structure, we can “pretty print” the Python dictionary:
from pprint import pprint
data = res.model_dump()
# truncate a few fields so they print better... (you can ignore this code)
data["fit"]["bmd_dist"][0] = data["fit"]["bmd_dist"][0][:5]
data["fit"]["bmd_dist"][1] = data["fit"]["bmd_dist"][1][:5]
data["plotting"]["dr_x"] = data["plotting"]["dr_x"][:5]
data["plotting"]["dr_y"] = data["plotting"]["dr_y"][:5]
pprint(data, indent=2, width=100, sort_dicts=True)
{ 'bmd': 69.7260856628418,
'bmdl': 64.69549594366183,
'bmdu': 75.46896474983512,
'deviance': { 'aics': [ -58.761008514924924,
-66.16149887942285,
-58.761008514924924,
-41.976759721476874,
392.9064834355403],
'loglikelihoods': [ 35.38050425746246,
43.080749439711425,
35.38050425746246,
22.988379860738437,
-194.45324171777014],
'names': ['A1', 'A2', 'A3', 'fitted', 'reduced'],
'num_params': [6, 10, 6, 2, 2]},
'fit': { 'aic': -39.976759721476874,
'bic_equiv': 48.878035205148194,
'bmd_dist': [ [ 64.69549594366183,
64.88798251662926,
65.08404336270584,
65.2814119494681,
65.47782174449254],
[0.05, 0.06, 0.07, 0.08, 0.09]],
'bmds_model_df': 4.0,
'chisq': 0.0,
'dist': 1,
'loglikelihood': 22.988379860738437,
'total_df': 1.0},
'gof': { 'calc_mean': [2.112, 2.095, 1.956, 1.587, 1.254],
'calc_sd': [0.235, 0.209, 0.231, 0.263, 0.159],
'dose': [0.0, 10.0, 50.0, 150.0, 400.0],
'eb_lower': [ 2.05007080082289,
2.046304233630524,
1.9023668811812215,
1.5112812622766696,
1.1858820498658582],
'eb_upper': [ 2.1739291991771097,
2.143695766369476,
2.0096331188187784,
1.6627187377233301,
1.3221179501341416],
'est_mean': [ 2.1031875016962718,
2.071645917144676,
1.9501394721711907,
1.6766446741907262,
1.1491571827135112],
'est_sd': [ 0.23211524527179625,
0.23211524527179625,
0.23211524527179625,
0.23211524527179625,
0.23211524527179625],
'obs_mean': [2.112, 2.095, 1.956, 1.587, 1.254],
'obs_sd': [0.235, 0.209, 0.231, 0.263, 0.159],
'residual': [ 0.39999712520806624,
1.1989562145340822,
0.30192639775574026,
-3.724451314975327,
2.927248971211141],
'roi': 0.30192639775574026,
'size': [111.0, 142.0, 143.0, 93.0, 42.0]},
'has_completed': True,
'parameters': { 'bounded': [0.0, 0.0, 1.0, 0.0],
'cov': [ [ 0.00017144128882919536,
4.918989843420932e-07,
-1.5201975784980042e-12,
-6.715949683264648e-06],
[ 4.918989843420931e-07,
4.848157168875482e-09,
1.0024164806790948e-14,
-5.171252479548764e-10],
[ -1.5201975784980042e-12,
1.0024164806790948e-14,
9.378157638964157e-12,
1.7225957143579764e-12],
[ -6.715949683264648e-06,
-5.171252479548764e-10,
1.7225957143579764e-12,
0.0037797771004043678]],
'lower_ci': [ 2.077524597308932,
0.0013745933940738698,
-9999.0,
-3.041540942311279],
'names': ['a', 'b', 'd', 'log-alpha'],
'prior_initial_value': [0.0, 0.0, 0.0, 0.0],
'prior_max_value': [100.0, 100.0, 18.0, 18.0],
'prior_min_value': [0.0, 0.0, 1.0, -18.0],
'prior_stdev': [0.0, 0.0, 0.0, 0.0],
'prior_type': [0, 0, 0, 0],
'se': [0.013093559058911193, 6.962870937246706e-05, -9999.0, 0.06147989183793647],
'upper_ci': [ 2.1288504060836115,
0.0016475329215468659,
-9999.0,
-2.800544192858781],
'values': [ 2.1031875016962718,
0.0015110631578103678,
1.0000000010292514,
-2.92104256758503]},
'plotting': { 'bmd_y': 1.8928687453150286,
'bmdl_y': 1.9073122875049748,
'bmdu_y': 1.8765137742516325,
'dr_x': [ 1e-08,
4.040404040404041,
8.080808080808081,
12.121212121212121,
16.161616161616163],
'dr_y': [ 2.1031875016644914,
2.0903860173979343,
2.0776624519546885,
2.0650163311087786,
2.052447183480147]},
'tests': { 'dfs': [8.0, 4.0, 4.0, 4.0],
'll_ratios': [ 475.06798231496316,
15.400490364497927,
15.400490364497927,
24.78424879344805],
'names': ['Test 1', 'Test 2', 'Test 3', 'Test 4'],
'p_values': [0.0, 0.003938741688359837, 0.003938741688359837, 5.5592759270473024e-05]}}
This may be helpful in trying to find a particular value in the nested results.
Build a text table¶
This is a helpful pattern to print data in a tabular format:
for name, df, ll, p_value in zip(
res.tests.names, res.tests.dfs, res.tests.ll_ratios, res.tests.p_values, strict=True
):
print(f"{name:10} {df: <6} {ll: <10.4f} {p_value: <8.6f}")
Test 1 8.0 475.0680 0.000000
Test 2 4.0 15.4005 0.003939
Test 3 4.0 15.4005 0.003939
Test 4 4.0 24.7842 0.000056
This pattern uses the built-in zip function that allows you to iterate across multiple lists of items in Python of the same size at the same time.