Aggregate multiple result values to a min, max, or mean
Source:R/Transformations.R
, R/Utilities.R
TADA_AggregateMeasurements.Rd
This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or mean value.
This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or average value.
Usage
TADA_AggregateMeasurements(
.data,
grouping_cols = c("ActivityStartDate", "TADA.MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode"),
agg_fun = c("max", "min", "mean"),
clean = TRUE
)
TADA_AggregateMeasurements(
.data,
grouping_cols = c("ActivityStartDate", "TADA.MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode"),
agg_fun = c("max", "min", "mean"),
clean = TRUE
)
Arguments
- .data
A TADA dataframe
- grouping_cols
The column names used to group the data
- agg_fun
The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.
- clean
Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements are indicated in the TADA.ResultValueAggregation.Flag as "Used in aggregation function but not selected".
Value
A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.
A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.
Examples
# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier,
# unit, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_max <- TADA_AggregateMeasurements(Data_6Tribes_5y,
grouping_cols = c(
"ActivityStartDate",
"TADA.MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier",
"ResultDetectionConditionText",
"ActivityTypeCode",
"TADA.ResultMeasure.MeasureUnitCode"
),
agg_fun = "max",
clean = TRUE
)
#> [1] "Aggregation results:"
#>
#> No aggregation needed Selected as max aggregate value
#> 54061 10659
# Calculate a mean value per day, site, comparable data identifier, unit,
# result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_mean <- TADA_AggregateMeasurements(Data_6Tribes_5y,
grouping_cols = c(
"ActivityStartDate", "TADA.MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
"ActivityTypeCode", "TADA.ResultMeasure.MeasureUnitCode"
),
agg_fun = "mean",
clean = FALSE
)
#> [1] "Aggregation results:"
#>
#> Considered in mean aggregation function but not selected
#> 81871
#> No aggregation needed
#> 54061
#> Selected as mean aggregate value, with randomly selected metadata from a row in the aggregate group
#> 10659
# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
grouping_cols = c(
"ActivityStartDate", "TADA.MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
"ActivityTypeCode"
),
agg_fun = "max", clean = TRUE
)
#> [1] "Aggregation results:"
#>
#> No aggregation needed Selected as max aggregate value
#> 54061 10659
# Calculate a mean value per day, site, comparable data identifier, result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
grouping_cols = c(
"ActivityStartDate", "TADA.MonitoringLocationIdentifier",
"TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
"ActivityTypeCode"
),
agg_fun = "mean", clean = FALSE
)
#> [1] "Aggregation results:"
#>
#> Considered in mean aggregation function but not selected
#> 81871
#> No aggregation needed
#> 54061
#> Selected as mean aggregate value, with randomly selected metadata from a row in the aggregate group
#> 10659