Continuous Data

Table of Contents

Quickstart

To run a continuous dataset:

import pybmds

# create a continuous dataset
dataset = pybmds.ContinuousDataset(
    doses=[0, 25, 50, 75, 100],
    ns=[20, 20, 20, 20, 20],
    means=[6, 8, 13, 25, 30],
    stdevs=[4, 4.3, 3.8, 4.4, 3.7],
)

# create a session
session = pybmds.Session(dataset=dataset)
session.add_default_models()

# execute the session; recommend best-fitting model
session.execute_and_recommend()

# save excel report
df = session.to_df()
df.to_excel("output/report.xlsx")

# save to a word report
report = session.to_docx()
report.save("output/report.docx")

Continuous datasets

Continuous datasets can defined using group level summary data or individual response data.

A continuous summary dataset requires a list of doses, mean responses, standard deviations, and the total number of subjects. All lists must be the same size, and there should be one item in each list for each dose-group.

You can also add optional attributes, such as name, dose_name, dose_units, response_name, response_units, etc.

For example:

dataset = pybmds.ContinuousDataset(
    name="Body Weight from ChemX Exposure",
    dose_name="Concentration",
    dose_units="ppm",
    response_units="kg",
    doses=[0, 25, 50, 75, 100],
    ns=[20, 20, 20, 20, 20],
    means=[6, 8, 13, 25, 30],
    stdevs=[4, 4.3, 3.8, 4.4, 3.7],
)
fig = dataset.plot(figsize=(6,4))
../_images/a8c91132d94b355993b21df1b297b2b3cf07cfe6b0d1766164af477b23a00fc0.png

A continuous individual dataset requires the dose and response for each individual. Both lists must be the same size. Additional options can also be added such as the name or units:

dataset2 = pybmds.ContinuousIndividualDataset(
    name="Effect Y from ChemX Exposure",
    dose_name="Dose",
    dose_units="mg/kg/d",
    response_units="g",
    doses=[
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
            20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
            40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
            80, 80, 80, 80, 80, 80, 80, 80, 80, 80,
        ],
    responses=[
        8.57, 8.71, 7.99, 9.27, 9.47, 10.06, 8.65, 9.13, 9.01, 8.35,
        8.66, 8.45, 9.26, 8.76, 7.85, 9.01, 9.09, 9.57, 10.14, 10.49,
        9.87, 10.26, 9.76, 10.24, 10.54, 10.72, 8.83, 10.02, 9.18, 10.87,
        11.56, 10.59, 11.99, 11.59, 10.59, 10.52, 11.32, 12.0, 12.12, 12.64,
        10.96, 12.13, 12.77, 13.06, 13.16, 14.04, 14.01, 13.9, 12.99, 13.13,
    ],
)
fig = dataset2.plot(figsize=(6,4))
../_images/467b98532e3d9fd98c8f8105a80f67a684ca963251c9c93b353440030daf32de.png

Single model fit

You can fit a specific model to the dataset and plot/print the results. The printed results will include the BMD, BMDL, BMDU, p-value, AIC, etc.

For example, to fit the Hill model, run the following code and plot the results:

from pybmds.models import continuous

model = continuous.Hill(dataset)
model.execute()
fig = model.plot()
../_images/bf3ce7a246687c6eb866882c3c6a403c2d063f67a38d0c283f2fd648019594bf.png

To view an output report:

print(model.text())
          Hill Model          
══════════════════════════════

Version: pybmds 25.2a2 (bmdscore 25.2)

Input Summary:
╒══════════════════════════════╤════════════════════════════╕
│ BMR                          │ 1.0 Standard Deviation     │
│ Distribution                 │ Normal + Constant variance │
│ Modeling Direction           │ Up (↑)                     │
│ Confidence Level (one sided) │ 0.95                       │
│ Modeling Approach            │ MLE                        │
╘══════════════════════════════╧════════════════════════════╛

Parameter Settings:
╒═════════════╤═══════════╤═══════╤═══════╕
│ Parameter   │   Initial │   Min │   Max │
╞═════════════╪═══════════╪═══════╪═══════╡
│ g           │         0 │  -100 │   100 │
│ v           │         0 │  -100 │   100 │
│ k           │         0 │     0 │     5 │
│ n           │         1 │     1 │    18 │
│ alpha       │         0 │   -18 │    18 │
╘═════════════╧═══════════╧═══════╧═══════╛

Modeling Summary:
╒════════════════╤═════════════╕
│ BMD            │   44.3603   │
│ BMDL           │   37.4378   │
│ BMDU           │   49.4299   │
│ AIC            │  570.3      │
│ Log-Likelihood │ -280.15     │
│ P-Value        │    0.163921 │
│ Model d.f.     │    1        │
╘════════════════╧═════════════╛

Model Parameters:
╒════════════╤════════════╤════════════╤═════════════╕
│ Variable   │   Estimate │ On Bound   │   Std Error │
╞════════════╪════════════╪════════════╪═════════════╡
│ g          │    6.80589 │ no         │    0.689269 │
│ v          │   25.6632  │ no         │    2.65373  │
│ k          │   62.7872  │ no         │    3.59334  │
│ n          │    4.87539 │ no         │    1.08357  │
│ alpha      │   15.881   │ no         │   35.6668   │
╘════════════╧════════════╧════════════╧═════════════╛

Goodness of Fit:
╒════════╤═════╤═══════════════╤═════════════════════╤═══════════════════╕
│   Dose │   N │   Sample Mean │   Model Fitted Mean │   Scaled Residual │
╞════════╪═════╪═══════════════╪═════════════════════╪═══════════════════╡
│      0 │  20 │             6 │             6.80589 │        -0.904381  │
│     25 │  20 │             8 │             7.09076 │         1.02037   │
│     50 │  20 │            13 │            13.1658  │        -0.186064  │
│     75 │  20 │            25 │            24.8734  │         0.14202   │
│    100 │  20 │            30 │            30.0641  │        -0.0719426 │
╘════════╧═════╧═══════════════╧═════════════════════╧═══════════════════╛
╒════════╤═════╤═════════════╤═══════════════════╕
│   Dose │   N │   Sample SD │   Model Fitted SD │
╞════════╪═════╪═════════════╪═══════════════════╡
│      0 │  20 │         4   │           3.98509 │
│     25 │  20 │         4.3 │           3.98509 │
│     50 │  20 │         3.8 │           3.98509 │
│     75 │  20 │         4.4 │           3.98509 │
│    100 │  20 │         3.7 │           3.98509 │
╘════════╧═════╧═════════════╧═══════════════════╛

Likelihoods:
╒═════════╤══════════════════╤════════════╤═════════╕
│ Model   │   Log-Likelihood │   # Params │     AIC │
╞═════════╪══════════════════╪════════════╪═════════╡
│ A1      │         -279.181 │          6 │ 570.362 │
│ A2      │         -278.726 │         10 │ 577.452 │
│ A3      │         -279.181 │          6 │ 570.362 │
│ fitted  │         -280.15  │          5 │ 570.3   │
│ reduced │         -374.79  │          2 │ 753.579 │
╘═════════╧══════════════════╧════════════╧═════════╛

Tests of Mean and Variance Fits:
╒════════╤══════════════════════════════╤═════════════╤═══════════╕
│ Name   │   -2 * Log(Likelihood Ratio) │   Test d.f. │   P-Value │
╞════════╪══════════════════════════════╪═════════════╪═══════════╡
│ Test 1 │                   192.127    │           8 │  0        │
│ Test 2 │                     0.909828 │           4 │  0.923147 │
│ Test 3 │                     0.909828 │           4 │  0.923147 │
│ Test 4 │                     1.93767  │           1 │  0.163921 │
╘════════╧══════════════════════════════╧═════════════╧═══════════╛
Test 1: Test the null hypothesis that responses and variances don't differ among dose levels
(A2 vs R).  If this test fails to reject the null hypothesis (p-value > 0.05), there may not be
a dose-response.

Test 2: Test the null hypothesis that variances are homogenous (A1 vs A2).  If this test fails to
reject the null hypothesis (p-value > 0.05), the simpler constant variance model may be appropriate.

Test 3: Test the null hypothesis that the variances are adequately modeled (A3 vs A2). If this test
fails to reject the null hypothesis (p-value > 0.05), it may be inferred that the variances have
been modeled appropriately.

Test 4: Test the null hypothesis that the model for the mean fits the data (Fitted vs A3). If this
test fails to reject the null hypothesis (p-value > 0.1), the user has support for use of the
selected model.

You can change the range and initial value for any parameter in the model by following the same steps above.

Change input settings

The default settings when running a continuous analysis are a BMR of 1 standard deviation from the BMD, a 95% confidence interval, and a normally distributed and constant variance between dose groups.

If you fit a single model to your dataset, you can change these settings:

model = continuous.Hill(dataset)
print(model.settings.tbl())
print(model.priors_tbl())
╒══════════════════════════════╤════════════════════════════╕
│ BMR                          │ 1.0 Standard Deviation     │
│ Distribution                 │ Normal + Constant variance │
│ Modeling Direction           │ Up (↑)                     │
│ Confidence Level (one sided) │ 0.95                       │
│ Modeling Approach            │ MLE                        │
│ Degree                       │ 0                          │
╘══════════════════════════════╧════════════════════════════╛
╒═════════════╤═══════════╤═══════╤═══════╕
│ Parameter   │   Initial │   Min │   Max │
╞═════════════╪═══════════╪═══════╪═══════╡
│ g           │         0 │  -100 │   100 │
│ v           │         0 │  -100 │   100 │
│ k           │         0 │     0 │     5 │
│ n           │         1 │     1 │    18 │
│ alpha       │         0 │   -18 │    18 │
╘═════════════╧═══════════╧═══════╧═══════╛

You can change individual model settings:

model = continuous.Hill(
    dataset,
    settings={
        "bmr": 0.1,
        "bmr_type": pybmds.ContinuousRiskType.RelativeDeviation,
        "disttype": pybmds.ContinuousDistType.normal_ncv,
    },
)
model.settings.priors.update("k", max_value=3)
print(model.settings.tbl())
print(model.priors_tbl())
╒══════════════════════════════╤═══════════════════════════════╕
│ BMR                          │ 10% Relative Deviation        │
│ Distribution                 │ Normal + Nonconstant variance │
│ Modeling Direction           │ Up (↑)                        │
│ Confidence Level (one sided) │ 0.95                          │
│ Modeling Approach            │ MLE                           │
│ Degree                       │ 0                             │
╘══════════════════════════════╧═══════════════════════════════╛
╒═════════════╤═══════════╤═══════╤═══════╕
│ Parameter   │   Initial │   Min │   Max │
╞═════════════╪═══════════╪═══════╪═══════╡
│ g           │         0 │  -100 │   100 │
│ v           │         0 │  -100 │   100 │
│ k           │         0 │     0 │     3 │
│ n           │         1 │     1 │    18 │
│ rho         │         0 │   -18 │    18 │
│ alpha       │         0 │   -18 │    18 │
╘═════════════╧═══════════╧═══════╧═══════╛

For the Polynomial model, degrees can be increased to a maximum of the lesser of n-1 or 8, where n is the number of dose groups.

continuous.Polynomial(dataset, settings={"degree": 2})
continuous.Polynomial(dataset, settings={"degree": 3})
<pybmds.models.continuous.Polynomial at 0x7fb690ef2650>

Change parameter settings

To see a preview of the initial parameter settings:

model = continuous.Hill(dataset)
print(model.settings.tbl())
print(model.priors_tbl())
╒══════════════════════════════╤════════════════════════════╕
│ BMR                          │ 1.0 Standard Deviation     │
│ Distribution                 │ Normal + Constant variance │
│ Modeling Direction           │ Up (↑)                     │
│ Confidence Level (one sided) │ 0.95                       │
│ Modeling Approach            │ MLE                        │
│ Degree                       │ 0                          │
╘══════════════════════════════╧════════════════════════════╛
╒═════════════╤═══════════╤═══════╤═══════╕
│ Parameter   │   Initial │   Min │   Max │
╞═════════════╪═══════════╪═══════╪═══════╡
│ g           │         0 │  -100 │   100 │
│ v           │         0 │  -100 │   100 │
│ k           │         0 │     0 │     5 │
│ n           │         1 │     1 │    18 │
│ alpha       │         0 │   -18 │    18 │
╘═════════════╧═══════════╧═══════╧═══════╛

You can also change the initial parameter settings shown above for any run of a single continuous model. Continuing with the Hill model example, you can set the power parameter n to be equal to 1. You do this by changing the initial, minimum, and maximum values of that parameter:

model = continuous.Hill(dataset)
model.settings.priors.update("n", initial_value=1, min_value=1, max_value=1)
print(model.priors_tbl())
╒═════════════╤═══════════╤═══════╤═══════╕
│ Parameter   │   Initial │   Min │   Max │
╞═════════════╪═══════════╪═══════╪═══════╡
│ g           │         0 │  -100 │   100 │
│ v           │         0 │  -100 │   100 │
│ k           │         0 │     0 │     5 │
│ n           │         1 │     1 │     1 │
│ alpha       │         0 │   -18 │    18 │
╘═════════════╧═══════════╧═══════╧═══════╛

Multiple model fit (sessions) and model recommendation

To run all the default models, save the results, and save the plot of the fit of the recommended model with the data:

# create a continuous dataset
dataset = pybmds.ContinuousDataset(
    doses=[0, 25, 50, 75, 100],
    ns=[20, 20, 20, 20, 20],
    means=[6, 8, 13, 25, 30],
    stdevs=[4, 4.3, 3.8, 4.4, 3.7],
)

session = pybmds.Session(dataset=dataset)
session.add_default_models()

# execute the session
session.execute()

# recommend a best-fitting model
session.recommend()

# print recommended model and plot recommended model with dataset
model_index = session.recommender.results.recommended_model_index
if model_index:
    model = session.models[model_index]
    fig = model.plot()
../_images/7829188ff8487496ed3e18fe03c5df86f3f4dae74fa24c2b2bbc3e2b233dbd0c.png

You can also plot all models:

fig = session.plot()
../_images/05e479655ad0cf0d4c41de69fed1235e5fe6a825e71d9083cad0031e47ab015c.png
fig = session.plot(colorize=False)
../_images/cedff94ec95fc3cbb4bc9dde1ed857e82a09fd4c7ca6ed7a048fc6320386c09a.png

To print a summary table of modeling results, create a custom function:

import pandas as pd

def summary_table(session):
    data = []
    for model in session.models:
        data.append([
            model.name(),
            model.results.bmdl,
            model.results.bmd,
            model.results.bmdu,
            model.results.tests.p_values[1],
            model.results.tests.p_values[2],
            model.results.tests.p_values[3],
            model.results.fit.aic
        ])

    df = pd.DataFrame(
        data=data,
        columns=["Model", "BMDL", "BMD", "BMDU", "P-Value 2", "P-Value 3", "P-Value 4", "AIC"]
    )
    return df

summary_table(session)
Model BMDL BMD BMDU P-Value 2 P-Value 3 P-Value 4 AIC
0 Linear 15.366633 17.590090 20.476330 0.923147 0.923147 0.000002 593.840104
1 Polynomial 2 21.259623 28.096820 37.150471 0.923147 0.923147 0.000183 583.575133
2 Polynomial 3 21.251732 28.285377 37.143031 0.923147 0.923147 0.000183 583.577496
3 Power 23.273696 29.484932 36.666906 0.923147 0.923147 0.000567 581.313910
4 Hill 37.437820 44.360266 49.429919 0.923147 0.923147 0.163921 570.299792
5 Exponential 3 30.113771 33.000669 37.536866 0.923147 0.923147 0.000026 588.296076
6 Exponential 5 36.037483 42.818144 48.801014 0.923147 0.923147 0.240277 569.741084

Change model settings

To change model settings to all default models added to a session:

session = pybmds.Session(dataset=dataset)
session.add_default_models(
    settings={
        "disttype": pybmds.ContinuousDistType.normal_ncv,
        "bmr_type": pybmds.ContinuousRiskType.AbsoluteDeviation,
        "bmr": 2,
        "alpha": 0.1,
    }
)

Run subset of models and select best fit

You can select a set of models and find the best fit, rather than using all of the default continuous models.

For example, to model average on the Hill, Linear, and Exponential 3:

session = pybmds.Session(dataset=dataset)
session.add_model(pybmds.Models.Linear)
session.add_model(pybmds.Models.ExponentialM3)
session.add_model(pybmds.Models.Hill)

session.execute()

Model recommendation and selection

The pybmds package may recommend a best-fitting model based on a decision tree, but expert judgment may be required for model selection. To run model recommendation and view a recommended model, if one is recommended:

session.recommend()

if session.recommended_model is not None:
    display(session.recommended_model.plot())
    print(session.recommended_model.text())
../_images/9dbe2eb751926f3a1e65056c298268717f0260b93a3da6eeaa848f09041718bd.png
          Hill Model          
══════════════════════════════

Version: pybmds 25.2a2 (bmdscore 25.2)

Input Summary:
╒══════════════════════════════╤════════════════════════════╕
│ BMR                          │ 1.0 Standard Deviation     │
│ Distribution                 │ Normal + Constant variance │
│ Modeling Direction           │ Up (↑)                     │
│ Confidence Level (one sided) │ 0.95                       │
│ Modeling Approach            │ MLE                        │
╘══════════════════════════════╧════════════════════════════╛

Parameter Settings:
╒═════════════╤═══════════╤═══════╤═══════╕
│ Parameter   │   Initial │   Min │   Max │
╞═════════════╪═══════════╪═══════╪═══════╡
│ g           │         0 │  -100 │   100 │
│ v           │         0 │  -100 │   100 │
│ k           │         0 │     0 │     5 │
│ n           │         1 │     1 │    18 │
│ alpha       │         0 │   -18 │    18 │
╘═════════════╧═══════════╧═══════╧═══════╛

Modeling Summary:
╒════════════════╤═════════════╕
│ BMD            │   44.3603   │
│ BMDL           │   37.4378   │
│ BMDU           │   49.4299   │
│ AIC            │  570.3      │
│ Log-Likelihood │ -280.15     │
│ P-Value        │    0.163921 │
│ Model d.f.     │    1        │
╘════════════════╧═════════════╛

Model Parameters:
╒════════════╤════════════╤════════════╤═════════════╕
│ Variable   │   Estimate │ On Bound   │   Std Error │
╞════════════╪════════════╪════════════╪═════════════╡
│ g          │    6.80589 │ no         │    0.689269 │
│ v          │   25.6632  │ no         │    2.65373  │
│ k          │   62.7872  │ no         │    3.59334  │
│ n          │    4.87539 │ no         │    1.08357  │
│ alpha      │   15.881   │ no         │   35.6668   │
╘════════════╧════════════╧════════════╧═════════════╛

Goodness of Fit:
╒════════╤═════╤═══════════════╤═════════════════════╤═══════════════════╕
│   Dose │   N │   Sample Mean │   Model Fitted Mean │   Scaled Residual │
╞════════╪═════╪═══════════════╪═════════════════════╪═══════════════════╡
│      0 │  20 │             6 │             6.80589 │        -0.904381  │
│     25 │  20 │             8 │             7.09076 │         1.02037   │
│     50 │  20 │            13 │            13.1658  │        -0.186064  │
│     75 │  20 │            25 │            24.8734  │         0.14202   │
│    100 │  20 │            30 │            30.0641  │        -0.0719426 │
╘════════╧═════╧═══════════════╧═════════════════════╧═══════════════════╛
╒════════╤═════╤═════════════╤═══════════════════╕
│   Dose │   N │   Sample SD │   Model Fitted SD │
╞════════╪═════╪═════════════╪═══════════════════╡
│      0 │  20 │         4   │           3.98509 │
│     25 │  20 │         4.3 │           3.98509 │
│     50 │  20 │         3.8 │           3.98509 │
│     75 │  20 │         4.4 │           3.98509 │
│    100 │  20 │         3.7 │           3.98509 │
╘════════╧═════╧═════════════╧═══════════════════╛

Likelihoods:
╒═════════╤══════════════════╤════════════╤═════════╕
│ Model   │   Log-Likelihood │   # Params │     AIC │
╞═════════╪══════════════════╪════════════╪═════════╡
│ A1      │         -279.181 │          6 │ 570.362 │
│ A2      │         -278.726 │         10 │ 577.452 │
│ A3      │         -279.181 │          6 │ 570.362 │
│ fitted  │         -280.15  │          5 │ 570.3   │
│ reduced │         -374.79  │          2 │ 753.579 │
╘═════════╧══════════════════╧════════════╧═════════╛

Tests of Mean and Variance Fits:
╒════════╤══════════════════════════════╤═════════════╤═══════════╕
│ Name   │   -2 * Log(Likelihood Ratio) │   Test d.f. │   P-Value │
╞════════╪══════════════════════════════╪═════════════╪═══════════╡
│ Test 1 │                   192.127    │           8 │  0        │
│ Test 2 │                     0.909828 │           4 │  0.923147 │
│ Test 3 │                     0.909828 │           4 │  0.923147 │
│ Test 4 │                     1.93767  │           1 │  0.163921 │
╘════════╧══════════════════════════════╧═════════════╧═══════════╛
Test 1: Test the null hypothesis that responses and variances don't differ among dose levels
(A2 vs R).  If this test fails to reject the null hypothesis (p-value > 0.05), there may not be
a dose-response.

Test 2: Test the null hypothesis that variances are homogenous (A1 vs A2).  If this test fails to
reject the null hypothesis (p-value > 0.05), the simpler constant variance model may be appropriate.

Test 3: Test the null hypothesis that the variances are adequately modeled (A3 vs A2). If this test
fails to reject the null hypothesis (p-value > 0.05), it may be inferred that the variances have
been modeled appropriately.

Test 4: Test the null hypothesis that the model for the mean fits the data (Fitted vs A3). If this
test fails to reject the null hypothesis (p-value > 0.1), the user has support for use of the
selected model.
../_images/9dbe2eb751926f3a1e65056c298268717f0260b93a3da6eeaa848f09041718bd.png

You can select a best-fitting model and output reports generated will indicate this selection. This is a manual action. The example below selects the recommended model, but any model in the session could be selected.

session.select(model=session.recommended_model, notes="Lowest AIC; recommended model")

Changing how parameters are counted

For the purposes of statistical calculations, e.g., calcuation the AIC value and p-values, BMDS only counts the number of parameters off their respective boundaries by default. But users have the option of parameterizing BMDS so that it counts all the parameters in a model, irrespective if they were estimated on or off of their boundaries.

Parameterizing BMDS to count all parameters will impact AIC and p-value calculations and therefore may impact the selection of models using the automatic model selection logic. For example, consider the following dataset run using the default setting of only counting parameters that are off a boundary.

import pybmds
import pandas as pd

dataset = pybmds.ContinuousDataset(
    doses=[0, 25, 50, 100, 200],
    ns=[20, 20, 20, 20, 20],
    means=[6.1, 6.3, 9.5, 15.2, 30.1],
    stdevs=[3.5, 3.3, 4.1, 3.6, 4.5],
)

# create a BMD session
session = pybmds.Session(dataset=dataset)

# add all default models
session.add_default_models()

# execute the session
session.execute()

# recommend a best-fitting model
session.recommend()

# # Print out a summary table for the recommended model

model = session.recommended_model
if model is not None:
    df = pd.DataFrame([[
        model.name(),
        model.results.bmdl,
        model.results.bmd,
        model.results.bmdu,
        model.results.tests.dfs[3],
        model.results.tests.p_values[1],
        model.results.tests.p_values[2],
        model.results.tests.p_values[3],
        model.results.fit.aic
    ]], columns=["Model", "BMDL", "BMD", "BMDU", "DF", "P-Value 2", "P-Value 3", "P-Value 4", "AIC"])
    df = df.T  # Transpose for vertical display
    df.columns = ["Value"]
    display(df)
    
else:
    print("No recommended model.")

fig = session.recommended_model.plot()
Value
Model Power
BMDL 39.398522
BMD 51.656812
BMDU 66.659426
DF 2.0
P-Value 2 0.62439
P-Value 3 0.62439
P-Value 4 0.554879
AIC 556.15748
../_images/2da965b91da2a8a444464a3be1481bd86b191e08e423d8d94baa1cd78d774506.png

So, for this dataset, we can see that the automated model selection workflow chose the Power model. We can also summarize the modeling results for all models in the session by using the summary_table function defined above:

summary_table(session)
Model BMDL BMD BMDU P-Value 2 P-Value 3 P-Value 4 AIC
0 Linear 27.666465 31.528232 36.478962 0.62439 0.62439 0.008986 564.555495
1 Polynomial 2 36.344045 48.525378 65.479921 0.62439 0.62439 0.356536 557.042107
2 Polynomial 3 35.291785 47.083971 48.058609 0.62439 0.62439 0.337252 557.153318
3 Power 39.398522 51.656812 66.659426 0.62439 0.62439 0.554879 556.157480
4 Hill 41.381979 55.124484 72.138850 0.62439 0.62439 0.350988 557.849350
5 Exponential 3 54.863034 60.386583 70.355170 0.62439 0.62439 0.142384 558.417644
6 Exponential 5 40.393539 55.053391 72.461275 0.62439 0.62439 0.347962 557.860344

Further, we can also define a custom function to print out a table of models that had at least one parameter estimated on a bound:

def summary_parameter_bounds_table(session):
    rows = []
    for model in session.models:
        params = model.results.parameters
        for name, value, bounded in zip(params.names, params.values, params.bounded):
            if bool(bounded):
                rows.append({
                    "Model": model.name(),
                    "Parameter": name,
                    "Value": value,
                    "On Bound": bool(bounded)
            })
    df = pd.DataFrame(rows)
    display(df)

# Usage:
summary_parameter_bounds_table(session)
Model Parameter Value On Bound
0 Polynomial 3 b3 1.211492e-07 True
1 Exponential 3 d 1.000000e+00 True

We can see above that only two models, the Polynomial 3 and Exponential 3, had parameters estimated on a bound. So, although it seems like the issue of parameters not being counted for AIC or p-value calculation when on a boundary, we can can re-model this dataset parameterizing BMDS to count all parameters in a model to to confirm whether that assumption is accurate:

session2 = pybmds.Session(dataset=dataset)
session2.add_default_models(
    settings={
        "count_all_parameters_on_boundary": True
    }
)
session2.execute()

# recommend a best-fitting model
session2.recommend()

# Print out a summary table for the recommended model
model = session2.recommended_model
if model is not None:
    df2 = pd.DataFrame([[
        model.name(),
        model.results.bmdl,
        model.results.bmd,
        model.results.bmdu,
        model.results.tests.dfs[3],
        model.results.tests.p_values[1],
        model.results.tests.p_values[2],
        model.results.tests.p_values[3],
        model.results.fit.aic
    ]], columns=["Model", "BMDL", "BMD", "BMDU", "DF", "P-Value 2", "P-Value 3", "P-Value 4", "AIC"])
    df2 = df2.T  # Transpose for vertical display
    df2.columns = ["Value"]
    display(df2)
    
else:
    print("No recommended model.")

fig = session2.recommended_model.plot()
Value
Model Power
BMDL 39.398522
BMD 51.656812
BMDU 66.659426
DF 2.0
P-Value 2 0.62439
P-Value 3 0.62439
P-Value 4 0.554879
AIC 556.15748
../_images/2da965b91da2a8a444464a3be1481bd86b191e08e423d8d94baa1cd78d774506.png

The Power model was still selected as the best fitting model, even when counting all parameters on a boundary. Printing out a summary table that compares the p-values and AICs for session and session2lets us examine why:

def compare_pvalue_aic(session_a, session_b):
    rows = []
    for model_a, model_b in zip(session_a.models, session_b.models):
        rows.append({
            "Model": model_a.name(),
            "Session P-Value": model_a.results.tests.p_values[3],
            "Session AIC": model_a.results.fit.aic,
            "Session2 P-Value": model_b.results.tests.p_values[3],
            "Session2 AIC": model_b.results.fit.aic,
        })
    df = pd.DataFrame(rows)
    display(df)

# Usage:
compare_pvalue_aic(session, session2)
Model Session P-Value Session AIC Session2 P-Value Session2 AIC
0 Linear 0.008986 564.555495 0.008986 564.555495
1 Polynomial 2 0.356536 557.042107 0.356536 557.042107
2 Polynomial 3 0.337252 557.153318 0.140374 559.153318
3 Power 0.554879 556.157480 0.554879 556.157480
4 Hill 0.350988 557.849350 0.350988 557.849350
5 Exponential 3 0.142384 558.417644 0.065935 560.417644
6 Exponential 5 0.347962 557.860344 0.347962 557.860344

We can see that for the models that have parameters on a boundary (as reported above: Polynomial 3 and Exponential 3), the session2 AICs are higher than the session AICs due to more parameters being counted for the penalization term. The p-values for those models are also lower given the decrease in the degrees of freedom. Hence, because neither of these models were selected previously, the ultimate model selection results were not different between thesession2 and session analyses.

Jonckheere-Terpstra Trend Test

Individual Response Data

To test for a trend in a continuous dataset with responses measured in individuals using the Jonckheere-Terpstra trend test, first load an individual response continuous dataset. Note that this individual response dataset has an apparent strong increasing trend and the Jonckheere-Terpstra test will provided results that support or refute that observation.

dataset1 = pybmds.ContinuousIndividualDataset(
    name="Effect Y from ChemX Exposure",
    dose_name="Dose",
    dose_units="mg/kg-day",
    response_units="g",
    doses=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
            20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
            40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
            80, 80, 80, 80, 80, 80, 80, 80, 80, 80,
        ],
    responses=[
        8.57, 8.71, 7.99, 9.27, 9.47, 10.06, 8.65, 9.13, 9.01, 8.35,
        8.66, 8.45, 9.26, 8.76, 7.85, 9.01, 9.09, 9.57, 10.14, 10.49,
        9.87, 10.26, 9.76, 10.24, 10.54, 10.72, 8.83, 10.02, 9.18, 10.87,
        11.56, 10.59, 11.99, 11.59, 10.59, 10.52, 11.32, 12.0, 12.12, 12.64,
        10.96, 12.13, 12.77, 13.06, 13.16, 14.04, 14.01, 13.9, 12.99, 13.13,
    ],
)

fig = dataset1.plot(figsize=(6,4))
../_images/7ba81a320b38573ca53e29cc0d58ad6cd52b972b13183a8a62ef01c127d2d93e.png
# Run the Jonckheere-Terpstra trend test for an increasing trend
#Note that dataset.trend will use the Jonckheere-Terpstra trend test given
# that the dataset is continuous.
result1 = dataset1.trend(hypothesis="increasing")

# Display the results for the increasing trend test
# The alternative hypothesis for the increasting trend test is that there is a monotonic increasing trend in the data.
print("Jonckheere-Terpstra Trend Test Result:")
print(result1.tbl())

# Run the two-sided test
# The alternative hypothesis for the two-sided trend test is that there is a monotonic trend in the data, either increasing or decreasing.
result1_two_sided = dataset1.trend(hypothesis="two-sided")

# Display the results for the two-sided trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result1_two_sided.tbl())
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤═══════════════════════╕
│ Hypothesis         │ increasing            │
│ Statistic          │ 914.0                 │
│ Approach (P-Value) │ approximate           │
│ P-Value            │ 7.068789997788372e-13 │
╘════════════════════╧═══════════════════════╛
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤════════════════════════╕
│ Hypothesis         │ two-sided              │
│ Statistic          │ 914.0                  │
│ Approach (P-Value) │ approximate            │
│ P-Value            │ 1.4137579995576743e-12 │
╘════════════════════╧════════════════════════╛

Alternatively, you can load an individual continuous response dataset from a CSV file and run the Jonckheere test on that data.

# Load the CSV file (make sure JT_strong_trend.csv is in the same directory)
df = pd.read_csv('data/JT_strong_trend.csv')

dataset2 = pybmds.ContinuousIndividualDataset(
    doses=df['doses'].values,
    responses=df['responses'].values,
)

fig = dataset2.plot(figsize=(6,4))
../_images/ebd688d5251de1f71e6aa04cd51497e9a8ec79e894c2c0cb46a06469510e5b6e.png
# Run the Jonckheere-Terpstra trend test for an increasing trend
result2 = dataset2.trend(hypothesis="increasing")

# Display the results for the increasing trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result2.tbl())

# Run the two-sided test
result2_two_sided = dataset2.trend(hypothesis="two-sided")

# Display the results for the two-sided trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result2_two_sided.tbl())
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤════════════════════════╕
│ Hypothesis         │ increasing             │
│ Statistic          │ 3065.0                 │
│ Approach (P-Value) │ approximate            │
│ P-Value            │ 4.5442649643234745e-11 │
╘════════════════════╧════════════════════════╛
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤═══════════════════════╕
│ Hypothesis         │ two-sided             │
│ Statistic          │ 3065.0                │
│ Approach (P-Value) │ approximate           │
│ P-Value            │ 9.088529928646949e-11 │
╘════════════════════╧═══════════════════════╛

As a further example, the Jonckheere test can be run on a dataset with no apparent trend:

df2 = pd.read_csv('data/JT_no_trend.csv')

dataset3 = pybmds.ContinuousIndividualDataset(
    doses=df2['doses'].values,
    responses=df2['responses'].values,
)

fig = dataset3.plot(figsize=(6,4))
../_images/6b9d43127b1b06837091da384668c6f06856abc49137e3d0c02e396c4f051572.png
# Run the Jonckheere-Terpstra trend test for an increasing trend
result3 = dataset3.trend(hypothesis="increasing")

# Display the results for the increasing trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result3.tbl())

# Run the two-sided test
result3_two_sided = dataset3.trend(hypothesis="two-sided")

# Display the results for the two-sided trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result3_two_sided.tbl())
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤════════════════════╕
│ Hypothesis         │ increasing         │
│ Statistic          │ 2250.0             │
│ Approach (P-Value) │ approximate        │
│ P-Value            │ 0.0640732806328399 │
╘════════════════════╧════════════════════╛
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤════════════════════╕
│ Hypothesis         │ two-sided          │
│ Statistic          │ 2250.0             │
│ Approach (P-Value) │ approximate        │
│ P-Value            │ 0.1281465612656798 │
╘════════════════════╧════════════════════╛

Summary Response Data

Often, only summary continuous data (reported as means and standard deviations) is available to risk assessors. The Jockheere-Terpstra trend test can stil be applied to such data after synthetic individual response data is generated. First, load and plot a summary dataset:

dataset4 = pybmds.ContinuousDataset(
    doses=[0, 50, 100, 200, 400],
    ns=[20, 20, 20, 20, 20],
    means=[5.26, 5.76, 6.13, 8.24, 10.23],
    stdevs=[2.03, 1.97, 2.47, 2.24, 1.8],
)
fig = dataset4.plot()
../_images/84f29566f4824b7e2c4d30160a100e38a2e5c02f0758ff942ef363c971c48516.png

To run the Jonckheere-Terpstra trend test on the summary data, first, “synthetic” individual responses must be generated. These synthetic data are generated such that the summary statistics (i.e., means, standard deviations) calculated from them will match the user-provided summary statistics.

Call the summary.trend() function to generate the synthetic individual responses and run the Jonckheere-Terpstra trend test. Optional inputs for the generation of the synthetic individual responses include seed (random number than can be entered to facilitate reproducibility of results), the option to impose_positivity of individual responses (useful for endpoints were negative values are not possible), a tolerance value (to check how close the generated summary statistics match the entered values), and max_iterations (the maximum number of iterations the function will use to try and match synthetic summary statistics to entered values).

# Run the Jonckheere-Terpstra trend test for an increasing trend
result4 = dataset4.trend(hypothesis="increasing")

# Display the results for the increasing trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result4.tbl())

# Run the two-sided test
result4_two_sided = dataset4.trend(hypothesis="two-sided")

# Display the results for the two-sided trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result4_two_sided.tbl())
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤════════════╕
│ Hypothesis         │ increasing │
│ Statistic          │ 3094.0     │
│ Approach (P-Value) │ exact      │
│ P-Value            │ 1.0        │
╘════════════════════╧════════════╛
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤═══════════╕
│ Hypothesis         │ two-sided │
│ Statistic          │ 3057.0    │
│ Approach (P-Value) │ exact     │
│ P-Value            │ 0.0       │
╘════════════════════╧═══════════╛

Notice that the Jonckheere test statistic differs slightly between the “increasing” and “two-sided” hypothesis results for the trend test. That is because each time the trend function is called for summary data, the simulate_individual_dataset function is called and a separate individual dataset is generated using a random seed. If you want to be able to run multiple trend tests on the same simulated dataset, first call the simulate_individual_dataset function separately and then call the trend function for individual responses:

result5 = dataset4.simulate_individual_dataset(seed=42)

# Run the Jonckheere-Terpstra trend test for an increasing trend
result_increasing = result5.trend(hypothesis="increasing")

# Display the results for the increasing trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result_increasing.tbl())

# Run the two-sided test
result_two_sided = result5.trend(hypothesis="two-sided")

# Display the results for the two-sided trend test
print("Jonckheere-Terpstra Trend Test Result:")
print(result_two_sided.tbl())

# Run the two-sided test with permutations for the simulated individual dataset
result_two_sided_perms = result5.trend(nperm=20000, hypothesis="two-sided")

#Display the results for the two-sided trend test with permutations
print("Jonckheere-Terpstra Trend Test Result with Permutations:")
print(result_two_sided_perms.tbl())
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤════════════╕
│ Hypothesis         │ increasing │
│ Statistic          │ 3078.0     │
│ Approach (P-Value) │ exact      │
│ P-Value            │ 1.0        │
╘════════════════════╧════════════╛
Jonckheere-Terpstra Trend Test Result:
╒════════════════════╤═══════════╕
│ Hypothesis         │ two-sided │
│ Statistic          │ 3078.0    │
│ Approach (P-Value) │ exact     │
│ P-Value            │ 0.0       │
╘════════════════════╧═══════════╛
Jonckheere-Terpstra Trend Test Result with Permutations:
╒════════════════════╤═════════════╕
│ Hypothesis         │ two-sided   │
│ Statistic          │ 3078.0      │
│ Approach (P-Value) │ permutation │
│ P-Value            │ 0.0001      │
╘════════════════════╧═════════════╛