This function organizes input and output for the estimation of change between two samples (for categorical and continuous variables). The analysis data, dframe, can be either a data frame or a simple features (sf) object. If an sf object is used, coordinates are extracted from the geometry column in the object, arguments xcoord and ycoord are assigned values "xcoord" and "ycoord", respectively, and the geometry column is dropped from the object.

change_analysis(
  dframe,
  vars_cat = NULL,
  vars_cont = NULL,
  test = "mean",
  subpops = NULL,
  surveyID = "surveyID",
  survey_names = NULL,
  siteID = "siteID",
  weight = "weight",
  revisitwgt = FALSE,
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  jointprob = "overton",
  conf = 95,
  All_Sites = FALSE
)

Arguments

dframe

Data to be analyzed (analysis data). A data frame or sf object containing survey design variables, response variables, and subpopulation (domain) variables.

vars_cat

Vector composed of character values that identify the names of categorical response variables in dframe. The default is NULL.

vars_cont

Vector composed of character values that identify the names of continuous response variables in dframe. The default is NULL.

test

Character string or character vector providing the location measure(s) to use for change estimation for continuous variables. The choices are "mean", "total", "median", or some combination of the three options (e.g., c("mean", "total")). The default is "mean".

subpops

Vector composed of character values that identify the names of subpopulation (domain) variables in dframe. If a value is not provided, the value "All_Sites" is assigned to the subpops argument and a factor variable named "All_Sites" that takes the value "All Sites" is added to dframe. The default value is NULL.

surveyID

Character value providing name of the survey ID variable in dframe. The default value is "surveyID".

survey_names

Character vector of length two that provides the survey names contained in the surveyID variable in the dframe data frame. The two values in the vector identify the first survey and second survey, respectively. If a value is not provided, unique values of the surveyID variable are assigned to the survey_names argument. The default is NULL.

siteID

Character value providing name of the site ID variable in dframe. For a two-stage sample, the site ID variable identifies stage two site IDs. The default value is "siteID". If a unique site is visited in both surveys, the corresponding siteID should be the same for both entries.

weight

Character value providing name of the design weight variable in dframe. For a two-stage sample, the weight variable identifies stage two weights. The default value is "weight".

revisitwgt

Logical value that indicates whether each repeat visit site has the same design weight in the two surveys, where TRUE = the weight for each repeat visit site is the same and FALSE = the weight for each repeat visit site is not the same. When this argument is FALSE, all of the repeat visit sites are assigned equal weights when calculating the covariance component of the change estimate standard error. The default is FALSE.

xcoord

Character value providing name of the x-coordinate variable in dframe. For a two-stage sample, the x-coordinate variable identifies stage two x-coordinates. Note that x-coordinates are required for calculation of the local mean variance estimator. If dframe is an sf object, this argument is not required (as the geometry column in dframe is used to find the x-coordinate). The default value is NULL.

ycoord

Character value providing name of the y-coordinate variable in dframe. For a two-stage sample, the y-coordinate variable identifies stage two y-coordinates. Note that y-coordinates are required for calculation of the local mean variance estimator. If dframe is an sf object, this argument is not required (as the geometry column in dframe is used to find the y-coordinate). The default value is NULL.

stratumID

Character value providing name of the stratum ID variable in dframe. The default value is NULL.

clusterID

Character value providing the name of the cluster (stage one) ID variable in dframe. Note that cluster IDs are required for a two-stage sample. The default value is NULL.

weight1

Character value providing name of the stage one weight variable in dframe. The default value is NULL.

xcoord1

Character value providing the name of the stage one x-coordinate variable in dframe. Note that x coordinates are required for calculation of the local mean variance estimator. The default value is NULL.

ycoord1

Character value providing the name of the stage one y-coordinate variable in dframe. Note that y-coordinates are required for calculation of the local mean variance estimator. The default value is NULL.

sizeweight

Logical value that indicates whether size weights should be used during estimation, where TRUE uses size weights and FALSE does not use size weights. To employ size weights for a single-stage sample, a value must be supplied for argument weight. To employ size weights for a two-stage sample, values must be supplied for arguments weight and weight1. The default value is FALSE.

sweight

Character value providing the name of the size weight variable in dframe. For a two-stage sample, the size weight variable identifies stage two size weights. The default value is NULL.

sweight1

Character value providing name of the stage one size weight variable in dframe. The default value is NULL.

fpc

Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator.

Example fpc for a single-stage unstratified survey design:

fpc <- 15000

Example fpc for a single-stage stratified survey design:

fpc <- list( Stratum_1 = 9000, Stratum_2 = 6000)

Example fpc for a two-stage unstratified survey design:

fpc <- c( Ncluster = 150, Cluster_1 = 150, Cluster_2 = 75, Cluster_3 = 75, Cluster_4 = 125, Cluster_5 = 75)

Example fpc for a two-stage stratified survey design:

fpc <- list( Stratum_1 = c( Ncluster_1 = 100, Cluster_1 = 125, Cluster_2 = 100, Cluster_3 = 100, Cluster_4 = 125, Cluster_5 = 50), Stratum_2 = c( Ncluster_2 = 50, Cluster_1 = 75, Cluster_2 = 150, Cluster_3 = 75, Cluster_4 = 75, Cluster_5 = 125))

popsize

Object that provides values for the population argument of the calibrate or postStratify functions in the survey package. If a value is provided for popsize, then either the calibrate or postStratify function is used to modify the survey design object that is required by functions in the survey package. Whether to use the calibrate or postStratify function is dictated by the format of popsize, which is discussed below. Post-stratification adjusts the sampling and replicate weights so that the joint distribution of a set of post-stratifying variables matches the known population joint distribution. Calibration, generalized raking, or GREG estimators generalize post-stratification and raking by calibrating a sample to the marginal totals of variables in a linear regression model. For the calibrate function, the object is a named list, where the names identify factor variables in dframe. Each element of the list is a named vector containing the population total for each level of the associated factor variable. For the postStratify function, the object is either a data frame, table, or xtabs object that provides the population total for all combinations of selected factor variables in the dframe data frame. If a data frame is used for popsize, the variable containing population totals must be the last variable in the data frame. If a table is used for popsize, the table must have named dimnames where the names identify factor variables in the dframe data frame. If the popsize argument is equal to NULL, then neither calibration nor post-stratification is performed. The default value is NULL.

Example popsize for calibration:

popsize <- list( Ecoregion = c( East = 750, Central = 500, West = 250), Type = c( Streams = 1150, Rivers = 350))

Example popsize for post-stratification using a data frame:

popsize <- data.frame( Ecoregion = rep(c("East", "Central", "West"), rep(2, 3)), Type = rep(c("Streams", "Rivers"), 3), Total = c(575, 175, 400, 100, 175, 75))

Example popsize for post-stratification using a table:

popsize <- with(MySurveyFrame, table(Ecoregion, Type))

Example popsize for post-stratification using an xtabs object:

popsize <- xtabs(~Ecoregion + Type, data = MySurveyFrame)

vartype

Character value providing the choice of the variance estimator, where "Local" indicates the local mean estimator and "SRS" indicates the simple random sampling estimator. The default value is "Local".

jointprob

Character value providing the choice of joint inclusion probability approximation for use with Horvitz-Thompson and Yates-Grundy variance estimators, where "overton" indicates the Overton approximation, "hr" indicates the Hartley-Rao approximation, and "brewer" equals the Brewer approximation. The default value is "overton".

conf

Numeric value providing the Gaussian-based confidence level. The default value is 95.

All_Sites

A logical variable used when subpops is not NULL. If All_Sites is TRUE, then alongside the subpopulation output, output for all sites (ignoring subpopulations) is returned for each variable in vars. If All_Sites is FALSE, then alongside the subpopulation output, output for all sites (ignoring subpopulations) is not returned for each variable in vars. The default is FALSE.

Value

List of change estimates composed of four items: (1) catsum contains change estimates for categorical variables, (2) contsum_mean contains estimates for continuous variables using the mean, (3) contsum_total contains estimates for continuous variables using the total, and (4) contsum_median contains estimates for continuous variables using the median. The items in the list will contain NULL

for estimates that were not calculated. Each data frame includes estimates for all combinations of population Types, subpopulations within types, response variables, and categories within each response variable (for categorical variables and continuous variables using the median). Change estimates are provided plus standard error estimates and confidence interval estimates.

The catsum data frame contains the following variables:

Survey_1

first survey name

Survey_2

second survey name

Type

subpopulation (domain) name

Subpopulation

subpopulation name within a domain

Indicator

response variable

Category

category of response variable

DiffEst.P

proportion difference estimate (in %; second survey - first survey)

StdError.P

standard error of proportion difference estimate

MarginofError.P

margin of error of proportion difference estimate

LCBxxPct.P

xx% (default 95%) lower confidence bound of proportion difference estimate

UCBxxPct.P

xx% (default 95%) upper confidence bound of proportion difference estimate

Estimate.U

total difference estimate (second survey - first survey)

StdError.U

standard error of total difference estimate

MarginofError.U

margin of error of total difference estimate

LCBxxPct.U

xx% (default 95%) lower confidence bound of total difference estimate

UCBxxPct.U

xx% (default 95%) upper confidence bound of total difference estimate

nResp_1

sample size in the first survey

Estimate.P_1

proportion estimate (in %) from the first survey

StdError.P_1

standard error of proportion estimate from the first survey

MarginofError.P_1

margin of error of proportion estimate from the first survey

LCBxxPct.P_1

xx% (default 95%) lower confidence bound of proportion estimate from the first survey

UCBxxPct.P_1

xx% (default 95%) upper confidence bound of proportion estimate from the first survey

nResp_2

sample size in the second survey

Estimate.U_1

total estimate from the first survey

StdError.U_1

standard error of total estimate from the first survey

MarginofError.U_1

margin of error of total estimate from the first survey

LCBxxPct.U_1

xx% (default 95%) lower confidence bound of total estimate from the first survey

UCBxxPct.U_1

xx% (default 95%) upper confidence bound of total estimate from the first survey

Estimate.P_2

proportion estimate (in %) from the second survey

StdError.P_2

standard error of proportion estimate from the second survey

MarginofError.P_2

margin of error of proportion estimate from the second survey

LCBxxPct.P_2

xx% (default 95%) lower confidence bound of proportion estimate from the second survey

UCBxxPct.P_2

xx% (default 95%) upper confidence bound of proportion estimate from the second survey

Estimate.U_2

total estimate from the second survey

StdError.U_2

standard error of total estimate from the second survey

MarginofError.U_2

margin of error of total estimate from the second survey

LCBxxPct.U_2

xx% (default 95%) lower confidence bound of total estimate from the second survey

UCBxxPct.U_2

xx% (default 95%) upper confidence bound of total estimate from the second survey

The contsum_mean data frame contains the following variables:

Survey_1

first survey name

Survey_2

second survey name

Type

subpopulation (domain) name

Subpopulation

subpopulation name within a domain

Indicator

response variable

Statistic

value of percentile

nResp

sample size at or below Value

DiffEst

mean difference estimate

StdError

standard error of mean difference estimate

MarginofError

margin of error of mean difference estimate

LCBxxPct

xx% (default 95%) lower confidence bound of mean difference estimate

UCBxxPct

xx% (default 95%) upper confidence bound of mean difference estimate

nResp_1

sample size in the first survey

Estimate_1

mean estimate from the first survey

StdError_1

standard error of mean estimate from the first survey

MarginofError_1

margin of error of mean estimate from the first survey

LCBxxPct_1

xx% (default 95%) lower confidence bound of mean estimate from the first survey

UCBxxPct_1

xx% (default 95%) upper confidence bound of mean estimate from the first survey

nResp_2

sample size in the second survey

Estimate_2

mean estimate from the second survey

StdError_2

standard error of mean estimate from the second survey

MarginofError_2

margin of error of mean estimate from the second survey

LCBxxPct_2

xx% (default 95%) lower confidence bound of mean estimate from the second survey

UCBxxPct_2

xx% (default 95%) upper confidence bound of mean estimate from the second survey

The contsum_total data frame contains the following variables:

Survey_1

first survey name

Survey_2

second survey name

Type

subpopulation (domain) name

Subpopulation

subpopulation name within a domain

Indicator

response variable

Statistic

value of percentile

nResp

sample size at or below Value

DiffEst

total difference estimate

StdError

standard error of total difference estimate

MarginofError

margin of error of total difference estimate

LCBxxPct

xx% (default 95%) lower confidence bound of total difference estimate

UCBxxPct

xx% (default 95%) upper confidence bound of total difference estimate

nResp_1

sample size in the first survey

Estimate_1

total estimate from the first survey

StdError_1

standard error of total estimate from the first survey

MarginofError_1

margin of error of total estimate from the first survey

LCBxxPct_1

xx% (default 95%) lower confidence bound of total estimate from the first survey

UCBxxPct_1

xx% (default 95%) upper confidence bound of total estimate from the first survey

nResp_2

sample size in the second survey

Estimate_2

total estimate from the second survey

StdError_2

standard error of total estimate from the second survey

MarginofError_2

margin of error of total estimate from the second survey

LCBxxPct_2

xx% (default 95%) lower confidence bound of total estimate from the second survey

UCBxxPct_2

xx% (default 95%) upper confidence bound of total estimate from the second survey

The contsum_median data frame contains the following variables:

Survey_1

first survey name

Survey_2

second survey name

Type

subpopulation (domain) name

Subpopulation

subpopulation name within a domain

Indicator

response variable

Category

category of response variable

DiffEst.P

proportion above or below median difference estimate (in %; second survey - first survey)

StdError.P

standard error of proportion above or below median difference estimate

MarginofError.P

margin of error of proportion above or below median difference estimate

LCBxxPct.P

xx% (default 95%) lower confidence bound of proportion above or below median difference estimate

UCBxxPct.P

xx% (default 95%) upper confidence bound of proportion above or below median difference estimate

Estimate.U

total above or below median difference estimate (second survey - first survey)

StdError.U

standard error of total above or below median difference estimate

MarginofError.U

margin of error of total above or below median difference estimate

LCBxxPct.U

xx% (default 95%) lower confidence bound of total above or below median difference estimate

UCBxxPct.U

xx% (default 95%) upper confidence bound of total above or below median difference estimate

nResp_1

sample size in the first survey

Estimate.P_1

proportion above or below median estimate (in %) from the first survey

StdError.P_1

standard error of proportion above or below median estimate from the first survey

MarginofError.P_1

margin of error of proportion above or below median estimate from the first survey

LCBxxPct.P_1

xx% (default 95%) lower confidence bound of proportion above or below median estimate from the first survey

UCBxxPct.P_1

xx% (default 95%) upper confidence bound of proportion above or below median estimate from the first survey

nResp_2

sample size in the second survey

Estimate.U_1

total above or below median estimate from the first survey

StdError.U_1

standard error of total above or below median estimate from the first survey

MarginofError.U_1

margin of error of total above or below median estimate from the first survey

LCBxxPct.U_1

xx% (default 95%) lower confidence bound of total above or below median estimate from the first survey

UCBxxPct.U_1

xx% (default 95%) upper confidence bound of total above or below median estimate from the first survey

Estimate.P_2

proportion above or below median estimate (in %) from the second survey

StdError.P_2

standard error of proportion above or below median estimate from the second survey

MarginofError.P_2

margin of error of proportion above or below median estimate from the second survey

LCBxxPct.P_2

xx% (default 95%) lower confidence bound of proportion above or below median estimate from the second survey

UCBxxPct.P_2

xx% (default 95%) upper confidence bound of proportion above or below median estimate from the second survey

Estimate.U_2

total above or below median estimate from the second survey

StdError.U_2

standard error of total above or below median estimate from the second survey

MarginofError.U_2

margin of error of total above or below median estimate from the second survey

LCBxxPct.U_2

xx% (default 95%) lower confidence bound of total above or below median estimate from the second survey

UCBxxPct.U_2

xx% (default 95%) upper confidence bound of total above or below median estimate from the second survey

See also

trend_analysis

for trend analysis

Author

Tom Kincaid Kincaid.Tom@epa.gov

Examples

# Categorical variable example for three resource classes
dframe <- data.frame(
  surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)),
  siteID = paste0("Site", 1:200),
  wgt = runif(200, 10, 100),
  xcoord = runif(200),
  ycoord = runif(200),
  stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50),
  CatVar = rep(c("North", "South"), 100),
  All_Sites = rep("All Sites", 200),
  Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE)
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
change_analysis(dframe,
  vars_cat = myvars, subpops = mysubpops,
  surveyID = "surveyID", siteID = "siteID", weight = "wgt",
  xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum"
)
#> $catsum
#>   Survey_1 Survey_2           Type Subpopulation Indicator Category DiffEst.P
#> 1 Survey 1 Survey 2      All_Sites     All Sites    CatVar    North -1.474844
#> 2 Survey 1 Survey 2      All_Sites     All Sites    CatVar    South  1.474844
#> 3 Survey 1 Survey 2 Resource_Class          Fair    CatVar    North -8.424944
#> 4 Survey 1 Survey 2 Resource_Class          Fair    CatVar    South  8.424944
#> 5 Survey 1 Survey 2 Resource_Class          Good    CatVar    North -2.700343
#> 6 Survey 1 Survey 2 Resource_Class          Good    CatVar    South  2.700343
#> 7 Survey 1 Survey 2 Resource_Class          Poor    CatVar    North  9.876690
#> 8 Survey 1 Survey 2 Resource_Class          Poor    CatVar    South -9.876690
#>   StdError.P MarginofError.P LCB95Pct.P UCB95Pct.P  DiffEst.U StdError.U
#> 1   6.588741        12.91369  -14.38854   11.43885 -276.59196   399.9584
#> 2   6.588741        12.91369  -11.43885   14.38854 -110.79795   405.3858
#> 3  11.629073        22.79256  -31.21751   14.36762 -135.23095   224.1637
#> 4  11.629073        22.79256  -14.36762   31.21751  187.97610   248.9814
#> 5  11.535577        22.60932  -25.30966   19.90897  -18.47734   242.7810
#> 6  11.535577        22.60932  -19.90897   25.30966   91.80035   238.9779
#> 7  10.718029        21.00695  -11.13026   30.88364 -122.88367   233.7347
#> 8  10.718029        21.00695  -30.88364   11.13026 -390.57440   200.6060
#>   MarginofError.U LCB95Pct.U UCB95Pct.U nResp_1 Estimate.P_1 StdError.P_1
#> 1        783.9040 -1060.4959 507.312026      50     50.85132     4.735320
#> 2        794.5416  -905.3396 683.743675      50     49.14868     4.735320
#> 3        439.3527  -574.5837 304.121778      18     52.86449     8.718695
#> 4        487.9946  -300.0185 675.970720      14     47.13551     8.718695
#> 5        475.8419  -494.3193 457.364609      14     46.31958     8.760600
#> 6        468.3881  -376.5877 560.188433      18     53.68042     8.760600
#> 7        458.1116  -580.9953 335.227975      18     53.15233     7.543663
#> 8        393.1805  -783.7549   2.606108      18     46.84767     7.543663
#>   MarginofError.P_1 LCB95Pct.P_1 UCB95Pct.P_1 Estimate.U_1 StdError.U_1
#> 1          9.281057     41.57026     60.13238    2941.4984     306.2000
#> 2          9.281057     39.86762     58.42974    2843.0090     286.4790
#> 3         17.088327     35.77616     69.95282     995.6200     167.0792
#> 4         17.088327     30.04718     64.22384     887.7237     180.9941
#> 5         17.170460     29.14912     63.49004     865.5563     186.1827
#> 6         17.170460     36.50996     70.85088    1003.1055     165.3294
#> 7         14.785308     38.36702     67.93764    1080.3222     185.9011
#> 8         14.785308     32.06236     61.63298     952.1798     147.9496
#>   MarginofError.U_1 LCB95Pct.U_1 UCB95Pct.U_1 nResp_2 Estimate.P_2 StdError.P_2
#> 1          600.1409    2341.3575     3541.639      50     49.37648     4.581293
#> 2          561.4884    2281.5206     3404.497      50     50.62352     4.581293
#> 3          327.4692     668.1508     1323.089      14     44.43954     7.695434
#> 4          354.7419     532.9818     1242.466      18     55.56046     7.695434
#> 5          364.9115     500.6448     1230.468      17     43.61924     7.504761
#> 6          324.0397     679.0658     1327.145      22     56.38076     7.504761
#> 7          364.3594     715.9628     1444.682      19     63.02902     7.613757
#> 8          289.9759     662.2039     1242.156      10     36.97098     7.613757
#>   MarginofError.P_2 LCB95Pct.P_2 UCB95Pct.P_2 Estimate.U_2 StdError.U_2
#> 1          8.979169     40.39731     58.35565    2664.9065     257.3097
#> 2          8.979169     41.64435     59.60269    2732.2111     286.8231
#> 3         15.082774     29.35677     59.52232     860.3891     149.4453
#> 4         15.082774     40.47768     70.64323    1075.6998     170.9763
#> 5         14.709060     28.91018     58.32830     847.0789     155.8159
#> 6         14.709060     41.67170     71.08982    1094.9058     172.5590
#> 7         14.922689     48.10633     77.95171     957.4385     141.6782
#> 8         14.922689     22.04829     51.89367     561.6054     135.4757
#>   MarginofError.U_2 LCB95Pct.U_2 UCB95Pct.U_2
#> 1          504.3177    2160.5888     3169.224
#> 2          562.1629    2170.0482     3294.374
#> 3          292.9074     567.4816     1153.296
#> 4          335.1073     740.5925     1410.807
#> 5          305.3935     541.6855     1152.472
#> 6          338.2095     756.6963     1433.115
#> 7          277.6842     679.7543     1235.123
#> 8          265.5276     296.0779      827.133
#> 
#> $contsum_mean
#> NULL
#> 
#> $contsum_total
#> NULL
#> 
#> $contsum_median
#> NULL
#>