Skip to contents

This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or average value.

Usage

TADA_AggregateMeasurements(
  .data,
  grouping_cols = c("ActivityStartDate", "MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode"),
  agg_fun = c("max", "min", "mean"),
  clean = TRUE
)

Arguments

.data

A TADA dataframe

grouping_cols

The column names used to group the data

agg_fun

The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.

clean

Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements are indicated in the TADA.ResultValueAggregation.Flag as "Used in aggregation function but not selected".

Value

A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.

Examples

# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y, 
    grouping_cols = c("ActivityStartDate", "MonitoringLocationIdentifier", 
                      "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
                      "ActivityTypeCode"),
    agg_fun = "max", clean = TRUE)
#> [1] "Aggregation results:"
#> 
#>           No aggregation needed Selected as max aggregate value 
#>                           54036                           10437 

# Calculate a mean value per day, site, comparable data identifier, result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c("ActivityStartDate", "MonitoringLocationIdentifier", 
                  "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
                  "ActivityTypeCode"), 
  agg_fun = "mean", clean = FALSE)
#> [1] "Aggregation results:"
#> 
#>                                                                               No aggregation needed 
#>                                                                                               54036 
#> Selected as mean aggregate value, with randomly selected metadata from a row in the aggregate group 
#>                                                                                               10437 
#>                                                  Used in mean aggregation function but not selected 
#>                                                                                               80014