Aggregate multiple result values to a min, max, or mean
Source:R/Utilities.R
TADA_AggregateMeasurements.Rd
This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or average value.
Arguments
- .data
A TADA dataframe
- grouping_cols
The column names used to group the data
- agg_fun
The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.
- clean
Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements are indicated in the TADA.ResultValueAggregation.Flag as "Used in aggregation function but not selected".
Value
A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.
Examples
# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
grouping_cols = c("ActivityStartDate", "MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
"ActivityTypeCode"),
agg_fun = "max", clean = TRUE)
#> [1] "Aggregation results:"
#>
#> No aggregation needed Selected as max aggregate value
#> 54056 10427
# Calculate a mean value per day, site, comparable data identifier, result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
grouping_cols = c("ActivityStartDate", "MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
"ActivityTypeCode"),
agg_fun = "mean", clean = FALSE)
#> [1] "Aggregation results:"
#>
#> No aggregation needed
#> 54056
#> Selected as mean aggregate value, with randomly selected metadata from a row in the aggregate group
#> 10427
#> Used in mean aggregation function but not selected
#> 79994