This function organizes input and output for the estimation of change between two
samples (for categorical and continuous variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
change_analysis(
dframe,
vars_cat = NULL,
vars_cont = NULL,
test = "mean",
subpops = NULL,
surveyID = "surveyID",
survey_names = NULL,
siteID = "siteID",
weight = "weight",
revisitwgt = FALSE,
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
jointprob = "overton",
conf = 95,
All_Sites = FALSE
)
Data to be analyzed (analysis data). A data frame or
sf
object containing survey design variables, response
variables, and subpopulation (domain) variables.
Vector composed of character values that identify the
names of categorical response variables in dframe
. The
default is NULL
.
Vector composed of character values that identify the
names of continuous response variables in dframe
. The
default is NULL
.
Character string or character vector providing the location
measure(s) to use for change estimation for continuous variables. The
choices are "mean"
, "total"
, "median"
, or some
combination of the three options (e.g., c("mean", "total")
).
The default is "mean"
.
Vector composed of character values that identify the
names of subpopulation (domain) variables in dframe
.
If a value is not provided, the value "All_Sites"
is assigned to the
subpops argument and a factor variable named "All_Sites"
that takes
the value "All Sites"
is added to dframe
. The
default value is NULL
.
Character value providing name of the survey ID variable in
dframe
. The default value is "surveyID"
.
Character vector of length two that provides the survey
names contained in the surveyID
variable in the dframe
data
frame. The two values in the vector identify the first survey and second
survey, respectively. If a value is not provided, unique values of the
surveyID
variable are assigned to the survey_names
argument.
The default is NULL
.
Character value providing name of the site ID variable in
dframe
. For a two-stage sample, the site ID variable
identifies stage two site IDs. The default value is "siteID"
. If a
unique site is visited in both surveys, the corresponding siteID
should be the same for both entries.
Character value providing name of the design weight
variable in dframe
. For a two-stage sample, the
weight variable identifies stage two weights. The default value is
"weight"
.
Logical value that indicates whether each repeat visit
site has the same design weight in the two surveys, where
TRUE
= the weight for each repeat visit site is the same and
FALSE
= the weight for each repeat visit site is not the same. When
this argument is FALSE
, all of the repeat visit sites are assigned
equal weights when calculating the covariance component of the change
estimate standard error. The default is FALSE
.
Character value providing name of the x-coordinate variable in
dframe
. For a two-stage sample, the x-coordinate
variable identifies stage two x-coordinates. Note that x-coordinates are
required for calculation of the local mean variance estimator. If dframe
is an sf
object, this argument is not required (as the geometry column
in dframe
is used to find the x-coordinate). The default
value is NULL
.
Character value providing name of the y-coordinate variable in
dframe
. For a two-stage sample, the y-coordinate
variable identifies stage two y-coordinates. Note that y-coordinates are
required for calculation of the local mean variance estimator. If dframe
is an sf
object, this argument is not required (as the geometry column
in dframe
is used to find the y-coordinate). The default
value is NULL
.
Character value providing name of the stratum ID variable in
dframe
. The default value is NULL
.
Character value providing the name of the cluster
(stage one) ID variable in dframe
. Note that cluster
IDs are required for a two-stage sample. The default value is NULL
.
Character value providing name of the stage one weight
variable in dframe
. The default value is NULL
.
Character value providing the name of the stage one
x-coordinate variable in dframe
. Note that x
coordinates are required for calculation of the local mean variance
estimator. The default value is NULL
.
Character value providing the name of the stage one
y-coordinate variable in dframe
. Note that
y-coordinates are required for calculation of the local mean variance
estimator. The default value is NULL
.
Logical value that indicates whether size weights should be
used during estimation, where TRUE
uses size weights and
FALSE
does not use size weights. To employ size weights for a
single-stage sample, a value must be supplied for argument weight. To
employ size weights for a two-stage sample, values must be supplied for
arguments weight
and weight1
. The default value is FALSE
.
Character value providing the name of the size weight variable
in dframe
. For a two-stage sample, the size weight
variable identifies stage two size weights. The default value is
NULL
.
Character value providing name of the stage one size weight
variable in dframe
. The default value is NULL
.
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator.
Example fpc for a single-stage unstratified survey design:
fpc <- 15000
Example fpc for a single-stage stratified survey design:
fpc <- list(
Stratum_1 = 9000,
Stratum_2 = 6000)
Example fpc for a two-stage unstratified survey design:
fpc <- c(
Ncluster = 150,
Cluster_1 = 150,
Cluster_2 = 75,
Cluster_3 = 75,
Cluster_4 = 125,
Cluster_5 = 75)
Example fpc for a two-stage stratified survey design:
fpc <- list(
Stratum_1 = c(
Ncluster_1 = 100,
Cluster_1 = 125,
Cluster_2 = 100,
Cluster_3 = 100,
Cluster_4 = 125,
Cluster_5 = 50),
Stratum_2 = c(
Ncluster_2 = 50,
Cluster_1 = 75,
Cluster_2 = 150,
Cluster_3 = 75,
Cluster_4 = 75,
Cluster_5 = 125))
Object that provides values for the population argument of the
calibrate
or postStratify
functions in the survey package. If
a value is provided for popsize, then either the calibrate
or
postStratify
function is used to modify the survey design object
that is required by functions in the survey package. Whether to use the
calibrate
or postStratify
function is dictated by the format
of popsize, which is discussed below. Post-stratification adjusts the
sampling and replicate weights so that the joint distribution of a set of
post-stratifying variables matches the known population joint distribution.
Calibration, generalized raking, or GREG estimators generalize
post-stratification and raking by calibrating a sample to the marginal
totals of variables in a linear regression model. For the calibrate
function, the object is a named list, where the names identify factor
variables in dframe
. Each element of the list is a
named vector containing the population total for each level of the
associated factor variable. For the postStratify
function, the
object is either a data frame, table, or xtabs object that provides the
population total for all combinations of selected factor variables in the
dframe
data frame. If a data frame is used for popsize
, the
variable containing population totals must be the last variable in the data
frame. If a table is used for popsize
, the table must have named
dimnames
where the names identify factor variables in the
dframe
data frame. If the popsize argument is equal to NULL
,
then neither calibration nor post-stratification is performed. The default
value is NULL
.
Example popsize for calibration:
popsize <- list(
Ecoregion = c(
East = 750,
Central = 500,
West = 250),
Type = c(
Streams = 1150,
Rivers = 350))
Example popsize for post-stratification using a data frame:
popsize <- data.frame(
Ecoregion = rep(c("East", "Central", "West"),
rep(2, 3)),
Type = rep(c("Streams", "Rivers"), 3),
Total = c(575, 175, 400, 100, 175, 75))
Example popsize for post-stratification using a table:
popsize <- with(MySurveyFrame,
table(Ecoregion, Type))
Example popsize for post-stratification using an xtabs object:
popsize <- xtabs(~Ecoregion + Type,
data = MySurveyFrame)
Character value providing the choice of the variance
estimator, where "Local"
indicates the local mean estimator and
"SRS"
indicates the simple random sampling estimator. The default
value is "Local"
.
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where "overton"
indicates the Overton
approximation, "hr"
indicates the Hartley-Rao approximation, and
"brewer"
equals the Brewer approximation. The default value is
"overton"
.
Numeric value providing the Gaussian-based confidence level. The default value
is 95
.
A logical variable used when subpops
is not
NULL
. If All_Sites
is TRUE
, then alongside the
subpopulation output, output for all sites (ignoring subpopulations) is
returned for each variable in vars
. If All_Sites
is
FALSE
, then alongside the subpopulation output, output for all sites
(ignoring subpopulations) is not returned for each variable in vars
.
The default is FALSE
.
List of change estimates composed of four items:
(1) catsum
contains change estimates for categorical variables,
(2) contsum_mean
contains estimates for continuous variables using
the mean, (3) contsum_total
contains estimates for continuous
variables using the total, and (4) contsum_median
contains estimates for continuous
variables using the median. The items in the list will contain NULL
for estimates that were not calculated. Each data frame includes estimates for all combinations of population Types, subpopulations within types, response variables, and categories within each response variable (for categorical variables and continuous variables using the median). Change estimates are provided plus standard error estimates and confidence interval estimates.
The catsum
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
category of response variable
proportion difference estimate (in %; second survey - first survey)
standard error of proportion difference estimate
margin of error of proportion difference estimate
xx% (default 95%) lower confidence bound of proportion difference estimate
xx% (default 95%) upper confidence bound of proportion difference estimate
total difference estimate (second survey - first survey)
standard error of total difference estimate
margin of error of total difference estimate
xx% (default 95%) lower confidence bound of total difference estimate
xx% (default 95%) upper confidence bound of total difference estimate
sample size in the first survey
proportion estimate (in %) from the first survey
standard error of proportion estimate from the first survey
margin of error of proportion estimate from the first survey
xx% (default 95%) lower confidence bound of proportion estimate from the first survey
xx% (default 95%) upper confidence bound of proportion estimate from the first survey
sample size in the second survey
total estimate from the first survey
standard error of total estimate from the first survey
margin of error of total estimate from the first survey
xx% (default 95%) lower confidence bound of total estimate from the first survey
xx% (default 95%) upper confidence bound of total estimate from the first survey
proportion estimate (in %) from the second survey
standard error of proportion estimate from the second survey
margin of error of proportion estimate from the second survey
xx% (default 95%) lower confidence bound of proportion estimate from the second survey
xx% (default 95%) upper confidence bound of proportion estimate from the second survey
total estimate from the second survey
standard error of total estimate from the second survey
margin of error of total estimate from the second survey
xx% (default 95%) lower confidence bound of total estimate from the second survey
xx% (default 95%) upper confidence bound of total estimate from the second survey
The contsum_mean
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
value of percentile
sample size at or below Value
mean difference estimate
standard error of mean difference estimate
margin of error of mean difference estimate
xx% (default 95%) lower confidence bound of mean difference estimate
xx% (default 95%) upper confidence bound of mean difference estimate
sample size in the first survey
mean estimate from the first survey
standard error of mean estimate from the first survey
margin of error of mean estimate from the first survey
xx% (default 95%) lower confidence bound of mean estimate from the first survey
xx% (default 95%) upper confidence bound of mean estimate from the first survey
sample size in the second survey
mean estimate from the second survey
standard error of mean estimate from the second survey
margin of error of mean estimate from the second survey
xx% (default 95%) lower confidence bound of mean estimate from the second survey
xx% (default 95%) upper confidence bound of mean estimate from the second survey
The contsum_total
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
value of percentile
sample size at or below Value
total difference estimate
standard error of total difference estimate
margin of error of total difference estimate
xx% (default 95%) lower confidence bound of total difference estimate
xx% (default 95%) upper confidence bound of total difference estimate
sample size in the first survey
total estimate from the first survey
standard error of total estimate from the first survey
margin of error of total estimate from the first survey
xx% (default 95%) lower confidence bound of total estimate from the first survey
xx% (default 95%) upper confidence bound of total estimate from the first survey
sample size in the second survey
total estimate from the second survey
standard error of total estimate from the second survey
margin of error of total estimate from the second survey
xx% (default 95%) lower confidence bound of total estimate from the second survey
xx% (default 95%) upper confidence bound of total estimate from the second survey
The contsum_median
data frame contains the following variables:
first survey name
second survey name
subpopulation (domain) name
subpopulation name within a domain
response variable
category of response variable
proportion above or below median difference estimate (in %; second survey - first survey)
standard error of proportion above or below median difference estimate
margin of error of proportion above or below median difference estimate
xx% (default 95%) lower confidence bound of proportion above or below median difference estimate
xx% (default 95%) upper confidence bound of proportion above or below median difference estimate
total above or below median difference estimate (second survey - first survey)
standard error of total above or below median difference estimate
margin of error of total above or below median difference estimate
xx% (default 95%) lower confidence bound of total above or below median difference estimate
xx% (default 95%) upper confidence bound of total above or below median difference estimate
sample size in the first survey
proportion above or below median estimate (in %) from the first survey
standard error of proportion above or below median estimate from the first survey
margin of error of proportion above or below median estimate from the first survey
xx% (default 95%) lower confidence bound of proportion above or below median estimate from the first survey
xx% (default 95%) upper confidence bound of proportion above or below median estimate from the first survey
sample size in the second survey
total above or below median estimate from the first survey
standard error of total above or below median estimate from the first survey
margin of error of total above or below median estimate from the first survey
xx% (default 95%) lower confidence bound of total above or below median estimate from the first survey
xx% (default 95%) upper confidence bound of total above or below median estimate from the first survey
proportion above or below median estimate (in %) from the second survey
standard error of proportion above or below median estimate from the second survey
margin of error of proportion above or below median estimate from the second survey
xx% (default 95%) lower confidence bound of proportion above or below median estimate from the second survey
xx% (default 95%) upper confidence bound of proportion above or below median estimate from the second survey
total above or below median estimate from the second survey
standard error of total above or below median estimate from the second survey
margin of error of total above or below median estimate from the second survey
xx% (default 95%) lower confidence bound of total above or below median estimate from the second survey
xx% (default 95%) upper confidence bound of total above or below median estimate from the second survey
trend_analysis
for trend analysis
# Categorical variable example for three resource classes
dframe <- data.frame(
surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)),
siteID = paste0("Site", 1:200),
wgt = runif(200, 10, 100),
xcoord = runif(200),
ycoord = runif(200),
stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50),
CatVar = rep(c("North", "South"), 100),
All_Sites = rep("All Sites", 200),
Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE)
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
change_analysis(dframe,
vars_cat = myvars, subpops = mysubpops,
surveyID = "surveyID", siteID = "siteID", weight = "wgt",
xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum"
)
#> $catsum
#> Survey_1 Survey_2 Type Subpopulation Indicator Category DiffEst.P
#> 1 Survey 1 Survey 2 All_Sites All Sites CatVar North -1.474844
#> 2 Survey 1 Survey 2 All_Sites All Sites CatVar South 1.474844
#> 3 Survey 1 Survey 2 Resource_Class Fair CatVar North -8.424944
#> 4 Survey 1 Survey 2 Resource_Class Fair CatVar South 8.424944
#> 5 Survey 1 Survey 2 Resource_Class Good CatVar North -2.700343
#> 6 Survey 1 Survey 2 Resource_Class Good CatVar South 2.700343
#> 7 Survey 1 Survey 2 Resource_Class Poor CatVar North 9.876690
#> 8 Survey 1 Survey 2 Resource_Class Poor CatVar South -9.876690
#> StdError.P MarginofError.P LCB95Pct.P UCB95Pct.P DiffEst.U StdError.U
#> 1 6.588741 12.91369 -14.38854 11.43885 -276.59196 399.9584
#> 2 6.588741 12.91369 -11.43885 14.38854 -110.79795 405.3858
#> 3 11.629073 22.79256 -31.21751 14.36762 -135.23095 224.1637
#> 4 11.629073 22.79256 -14.36762 31.21751 187.97610 248.9814
#> 5 11.535577 22.60932 -25.30966 19.90897 -18.47734 242.7810
#> 6 11.535577 22.60932 -19.90897 25.30966 91.80035 238.9779
#> 7 10.718029 21.00695 -11.13026 30.88364 -122.88367 233.7347
#> 8 10.718029 21.00695 -30.88364 11.13026 -390.57440 200.6060
#> MarginofError.U LCB95Pct.U UCB95Pct.U nResp_1 Estimate.P_1 StdError.P_1
#> 1 783.9040 -1060.4959 507.312026 50 50.85132 4.735320
#> 2 794.5416 -905.3396 683.743675 50 49.14868 4.735320
#> 3 439.3527 -574.5837 304.121778 18 52.86449 8.718695
#> 4 487.9946 -300.0185 675.970720 14 47.13551 8.718695
#> 5 475.8419 -494.3193 457.364609 14 46.31958 8.760600
#> 6 468.3881 -376.5877 560.188433 18 53.68042 8.760600
#> 7 458.1116 -580.9953 335.227975 18 53.15233 7.543663
#> 8 393.1805 -783.7549 2.606108 18 46.84767 7.543663
#> MarginofError.P_1 LCB95Pct.P_1 UCB95Pct.P_1 Estimate.U_1 StdError.U_1
#> 1 9.281057 41.57026 60.13238 2941.4984 306.2000
#> 2 9.281057 39.86762 58.42974 2843.0090 286.4790
#> 3 17.088327 35.77616 69.95282 995.6200 167.0792
#> 4 17.088327 30.04718 64.22384 887.7237 180.9941
#> 5 17.170460 29.14912 63.49004 865.5563 186.1827
#> 6 17.170460 36.50996 70.85088 1003.1055 165.3294
#> 7 14.785308 38.36702 67.93764 1080.3222 185.9011
#> 8 14.785308 32.06236 61.63298 952.1798 147.9496
#> MarginofError.U_1 LCB95Pct.U_1 UCB95Pct.U_1 nResp_2 Estimate.P_2 StdError.P_2
#> 1 600.1409 2341.3575 3541.639 50 49.37648 4.581293
#> 2 561.4884 2281.5206 3404.497 50 50.62352 4.581293
#> 3 327.4692 668.1508 1323.089 14 44.43954 7.695434
#> 4 354.7419 532.9818 1242.466 18 55.56046 7.695434
#> 5 364.9115 500.6448 1230.468 17 43.61924 7.504761
#> 6 324.0397 679.0658 1327.145 22 56.38076 7.504761
#> 7 364.3594 715.9628 1444.682 19 63.02902 7.613757
#> 8 289.9759 662.2039 1242.156 10 36.97098 7.613757
#> MarginofError.P_2 LCB95Pct.P_2 UCB95Pct.P_2 Estimate.U_2 StdError.U_2
#> 1 8.979169 40.39731 58.35565 2664.9065 257.3097
#> 2 8.979169 41.64435 59.60269 2732.2111 286.8231
#> 3 15.082774 29.35677 59.52232 860.3891 149.4453
#> 4 15.082774 40.47768 70.64323 1075.6998 170.9763
#> 5 14.709060 28.91018 58.32830 847.0789 155.8159
#> 6 14.709060 41.67170 71.08982 1094.9058 172.5590
#> 7 14.922689 48.10633 77.95171 957.4385 141.6782
#> 8 14.922689 22.04829 51.89367 561.6054 135.4757
#> MarginofError.U_2 LCB95Pct.U_2 UCB95Pct.U_2
#> 1 504.3177 2160.5888 3169.224
#> 2 562.1629 2170.0482 3294.374
#> 3 292.9074 567.4816 1153.296
#> 4 335.1073 740.5925 1410.807
#> 5 305.3935 541.6855 1152.472
#> 6 338.2095 756.6963 1433.115
#> 7 277.6842 679.7543 1235.123
#> 8 265.5276 296.0779 827.133
#>
#> $contsum_mean
#> NULL
#>
#> $contsum_total
#> NULL
#>
#> $contsum_median
#> NULL
#>