Skip to contents

This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or mean value.

Usage

TADA_AggregateMeasurements(
  .data,
  grouping_cols = c("ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode",
    "TADA.ResultMeasure.MeasureUnitCode"),
  agg_fun = c("max", "min", "mean"),
  clean = FALSE
)

Arguments

.data

A TADA dataframe

grouping_cols

The column names used to group the data

agg_fun

The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.

clean

Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements that were considered are indicated in the TADA.ResultValueAggregation.Flag. The default is clean = FALSE.

Value

A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.

Examples

# Load example dataset
utils::data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier,
# unit, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_max <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate",
    "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier",
    "ResultDetectionConditionText",
    "ActivityTypeCode",
    "TADA.ResultMeasure.MeasureUnitCode"
  ),
  agg_fun = "max",
  clean = TRUE
)
#> Aggregation results:
#> 5406110659

# Calculate a mean value per day, site, comparable data identifier, unit,
# result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_mean <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode", "TADA.ResultMeasure.MeasureUnitCode"
  ),
  agg_fun = "mean",
  clean = FALSE
)
#> Aggregation results:
#> 818715406110659