Dataset Preparation¶

This section compiles a few patterns that can be used to load data from a tabular file as preparation for modeling in pybmds.

Processing long datasets¶

If you have several dose-response datasets, you can run them as a batch. As an example, consider a CSV or Excel file of dichotomous datasets that looks something like this, with one row per dose group:

ID	Dose	Incidence	N
1	0	0	5
1	0.5	3	5
1	1	5	5
2	0	0	5
2	0.33	0	5
2	0.67	4	5
2	1	5	5
3	0	0	5
3	0.25	0	5
3	0.5	3	5
3	1	5	5

You can start by loading the data into a pandas dataframe.

import pandas as pd

# if it's a CSV file
df = pd.read_csv('./dataset.csv')

# if it's an XLSX file:
df = pd.read_excel('./dataset.xlsx', sheet_name='datasets')

After loading the data from a file, you need to convert dataframe style data into pybmds datasets:

import pybmds

datasets = []
for id, rows in df.groupby('ID'):
    dataset = pybmds.DichotomousDataset(
        id=id,
        doses=rows.Dose.tolist(),
        incidences=rows.Incidence.tolist(),
        ns=rows.N.tolist()
    )
    datasets.append(dataset)

print(len(datasets))
print(datasets[0].tbl())

8
╒════════╤═════════════╤═════╕
│   Dose │   Incidence │   N │
╞════════╪═════════════╪═════╡
│    0   │           0 │   5 │
│    0.5 │           3 │   5 │
│    1   │           5 │   5 │
╘════════╧═════════════╧═════╛

The end result is a list of datsets ready for BMD modeling.

Processing wide datasets¶

Consider a CSV or Excel file of dichotomous datasets that looks something like this (with one row per dataset):

ID	Dose	Incidence	N
1	0;0.5;1	0;3;5	5;5;5
2	0;0.33;0.67;1	0;0;4;5	5;5;5;5
3	0;0.25;0.5;1	0;0;3;5	5;5;5;5
4	0;0.33;0.67;1	0;0;1;1	5;5;5;5
5	0;0.25;0.5;1	0;0;1;1	5;5;5;5
6	0;0.33;0.67;1	0;0;1;1	5;5;5;5
7	0;0.25;0.5;1	0;0;1;1	5;5;5;5
8	0;0.25;0.5;0.75;1	0;0;1;3;1	5;5;5;5;5

You can start by loading the data into a pandas dataframe.

import pandas as pd

# if it's a CSV file
df = pd.read_csv('./dataset.csv')

# if it's an XLSX file:
df = pd.read_excel('./dataset.xlsx', sheet_name='datasets')

After loading the data from a file, we’ll need to convert dataframe style data into pybmds datasets:

import pybmds

def create(row):
    return pybmds.DichotomousDataset(
        id=row.ID,
        doses=list(map(float, row.Dose.split(';'))),
        ns=list(map(int, row.N.split(';'))),
        incidences=list(map(int, row.Incidence.split(';'))),
    )


datasets = df.apply(create, axis=1)

print(len(datasets))
print(datasets[0].tbl())

8
╒════════╤═════════════╤═════╕
│   Dose │   Incidence │   N │
╞════════╪═════════════╪═════╡
│    0   │           0 │   5 │
│    0.5 │           3 │   5 │
│    1   │           5 │   5 │
╘════════╧═════════════╧═════╛

The end result is a list of datasets ready for BMD modeling.