Dataset Preparation

This section compiles a few patterns that can be used to load data from a tabular file as preparation for modeling in pybmds.

Processing long datasets

If you have several dose-response datasets, you can run them as a batch. As an example, consider a CSV or Excel file of dichotomous datasets that looks something like this, with one row per dose group:

ID

Dose

Incidence

N

1

0

0

5

1

0.5

3

5

1

1

5

5

2

0

0

5

2

0.33

0

5

2

0.67

4

5

2

1

5

5

3

0

0

5

3

0.25

0

5

3

0.5

3

5

3

1

5

5

You can start by loading the data into a pandas dataframe.

import pandas as pd

# if it's a CSV file
df = pd.read_csv('./dataset.csv')

# if it's an XLSX file:
df = pd.read_excel('./dataset.xlsx', sheet_name='datasets')

After loading the data from a file, you need to convert dataframe style data into pybmds datasets:

import pybmds

datasets = []
for id, rows in df.groupby('ID'):
    dataset = pybmds.DichotomousDataset(
        id=id,
        doses=rows.Dose.tolist(),
        incidences=rows.Incidence.tolist(),
        ns=rows.N.tolist()
    )
    datasets.append(dataset)

print(len(datasets))
print(datasets[0].tbl())
8
╒════════╤═════════════╤═════╕
│   Dose │   Incidence │   N │
╞════════╪═════════════╪═════╡
│    0   │           0 │   5 │
│    0.5 │           3 │   5 │
│    1   │           5 │   5 │
╘════════╧═════════════╧═════╛

The end result is a list of datsets ready for BMD modeling.

Processing wide datasets

Consider a CSV or Excel file of dichotomous datasets that looks something like this (with one row per dataset):

ID

Dose

Incidence

N

1

0;0.5;1

0;3;5

5;5;5

2

0;0.33;0.67;1

0;0;4;5

5;5;5;5

3

0;0.25;0.5;1

0;0;3;5

5;5;5;5

4

0;0.33;0.67;1

0;0;1;1

5;5;5;5

5

0;0.25;0.5;1

0;0;1;1

5;5;5;5

6

0;0.33;0.67;1

0;0;1;1

5;5;5;5

7

0;0.25;0.5;1

0;0;1;1

5;5;5;5

8

0;0.25;0.5;0.75;1

0;0;1;3;1

5;5;5;5;5

You can start by loading the data into a pandas dataframe.

import pandas as pd

# if it's a CSV file
df = pd.read_csv('./dataset.csv')

# if it's an XLSX file:
df = pd.read_excel('./dataset.xlsx', sheet_name='datasets')

After loading the data from a file, we’ll need to convert dataframe style data into pybmds datasets:

import pybmds

def create(row):
    return pybmds.DichotomousDataset(
        id=row.ID,
        doses=list(map(float, row.Dose.split(';'))),
        ns=list(map(int, row.N.split(';'))),
        incidences=list(map(int, row.Incidence.split(';'))),
    )


datasets = df.apply(create, axis=1)

print(len(datasets))
print(datasets[0].tbl())
8
╒════════╤═════════════╤═════╕
│   Dose │   Incidence │   N │
╞════════╪═════════════╪═════╡
│    0   │           0 │   5 │
│    0.5 │           3 │   5 │
│    1   │           5 │   5 │
╘════════╧═════════════╧═════╛

The end result is a list of datasets ready for BMD modeling.