Aggregate multiple result values to a min, max, or mean — TADA

This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or average value.

Usage

TADA_AggregateMeasurements(
  .data,
  grouping_cols = c("ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode"),
  agg_fun = c("max", "min", "mean"),
  clean = TRUE
)

Arguments

.data: A TADA dataframe
grouping_cols: The column names used to group the data
agg_fun: The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.
clean: Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements are indicated in the TADA.ResultValueAggregation.Flag as "Used in aggregation function but not selected".

Value

A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.

Examples

# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode"
  ),
  agg_fun = "max", clean = TRUE
)
#> [1] "Aggregation results:"
#> 
#>           No aggregation needed Selected as max aggregate value 
#>                           54061                           10659 

# Calculate a mean value per day, site, comparable data identifier, result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode"
  ),
  agg_fun = "mean", clean = FALSE
)
#> [1] "Aggregation results:"
#> 
#>                                                                               No aggregation needed 
#>                                                                                               54061 
#> Selected as mean aggregate value, with randomly selected metadata from a row in the aggregate group 
#>                                                                                               10659 
#>                                                  Used in mean aggregation function but not selected 
#>                                                                                               81871