Dataset Preparation¶
This section compiles a few patterns that can be used to load data from a tabular file as preparation for modeling in pybmds
.
Processing long datasets¶
If you have several dose-response datasets, you can run them as a batch. As an example, consider a CSV or Excel file of dichotomous datasets that looks something like this, with one row per dose group:
ID |
Dose |
Incidence |
N |
---|---|---|---|
1 |
0 |
0 |
5 |
1 |
0.5 |
3 |
5 |
1 |
1 |
5 |
5 |
2 |
0 |
0 |
5 |
2 |
0.33 |
0 |
5 |
2 |
0.67 |
4 |
5 |
2 |
1 |
5 |
5 |
3 |
0 |
0 |
5 |
3 |
0.25 |
0 |
5 |
3 |
0.5 |
3 |
5 |
3 |
1 |
5 |
5 |
You can start by loading the data into a pandas dataframe.
import pandas as pd
# if it's a CSV file
df = pd.read_csv('./dataset.csv')
# if it's an XLSX file:
df = pd.read_excel('./dataset.xlsx', sheet_name='datasets')
After loading the data from a file, you need to convert dataframe style data into pybmds
datasets:
import pybmds
datasets = []
for id, rows in df.groupby('ID'):
dataset = pybmds.DichotomousDataset(
id=id,
doses=rows.Dose.tolist(),
incidences=rows.Incidence.tolist(),
ns=rows.N.tolist()
)
datasets.append(dataset)
print(len(datasets))
print(datasets[0].tbl())
8
╒════════╤═════════════╤═════╕
│ Dose │ Incidence │ N │
╞════════╪═════════════╪═════╡
│ 0 │ 0 │ 5 │
│ 0.5 │ 3 │ 5 │
│ 1 │ 5 │ 5 │
╘════════╧═════════════╧═════╛
The end result is a list of datsets ready for BMD modeling.
Processing wide datasets¶
Consider a CSV or Excel file of dichotomous datasets that looks something like this (with one row per dataset):
ID |
Dose |
Incidence |
N |
---|---|---|---|
1 |
0;0.5;1 |
0;3;5 |
5;5;5 |
2 |
0;0.33;0.67;1 |
0;0;4;5 |
5;5;5;5 |
3 |
0;0.25;0.5;1 |
0;0;3;5 |
5;5;5;5 |
4 |
0;0.33;0.67;1 |
0;0;1;1 |
5;5;5;5 |
5 |
0;0.25;0.5;1 |
0;0;1;1 |
5;5;5;5 |
6 |
0;0.33;0.67;1 |
0;0;1;1 |
5;5;5;5 |
7 |
0;0.25;0.5;1 |
0;0;1;1 |
5;5;5;5 |
8 |
0;0.25;0.5;0.75;1 |
0;0;1;3;1 |
5;5;5;5;5 |
You can start by loading the data into a pandas dataframe.
import pandas as pd
# if it's a CSV file
df = pd.read_csv('./dataset.csv')
# if it's an XLSX file:
df = pd.read_excel('./dataset.xlsx', sheet_name='datasets')
After loading the data from a file, we’ll need to convert dataframe style data into pybmds
datasets:
import pybmds
def create(row):
return pybmds.DichotomousDataset(
id=row.ID,
doses=list(map(float, row.Dose.split(';'))),
ns=list(map(int, row.N.split(';'))),
incidences=list(map(int, row.Incidence.split(';'))),
)
datasets = df.apply(create, axis=1)
print(len(datasets))
print(datasets[0].tbl())
8
╒════════╤═════════════╤═════╕
│ Dose │ Incidence │ N │
╞════════╪═════════════╪═════╡
│ 0 │ 0 │ 5 │
│ 0.5 │ 3 │ 5 │
│ 1 │ 5 │ 5 │
╘════════╧═════════════╧═════╛
The end result is a list of datasets ready for BMD modeling.