Utility to load datasets from AWS DMAP Data Commons, into memory
Source:R/dataload_from_aws.R
dataload_from_aws.Rd
Utility to load datasets from AWS DMAP Data Commons, into memory
Usage
dataload_from_aws(
varnames = .arrow_ds_names[1:3],
ext = c(".arrow", ".rda")[2],
fun = c("arrow::read_ipc_file", "load")[2],
envir = globalenv(),
mybucket = "dmap-data-commons-oa",
mybucketfolder = "EJAM",
folder_local_source = "./data/",
justchecking = FALSE,
check_server_even_if_justchecking = TRUE,
testing = FALSE
)
Arguments
- varnames
character vector of the quoted names of the data objects like blockwts or quaddata
- ext
like ".arrow" file extension
- fun
like "arrow::read_ipc_file" or "load" to use when reading
- envir
e.g., globalenv() or parent.frame()
- mybucket
where in AWS, like
- mybucketfolder
where in AWS, like EJAM
- folder_local_source
path of folder (not ending in forward slash) to look in for locally saved copies during development to avoid waiting for download from a server.
- justchecking
set to TRUE to get object size (and confirm file is accessible/exists)
- check_server_even_if_justchecking
set this to TRUE to stop checking server to see if files are there when justchecking = TRUE. But server is always checked if justchecking = FALSE.
- testing
only for testing
Details
See source code for details.
*** tries dataload_from_local() first (at least during development) to avoid slow downloads.
Also see https://shiny.posit.co/r/articles/improve/scoping/
These files are public-facing – no credentials required.
Use EJAM:::dataload_from_aws(justchecking=TRUE)
or EJAM:::datapack("EJAM") to get info
or tables()
or object.size(quaddata)
blockid2fips was used only in state_from_blockid(), which is no longer used by testpoints_n(), so not loaded unless/until needed. Avoids loading the huge file "blockid2fips" (100MB) and just uses "bgid2fips" (3MB) as needed, that is only 3% as large in memory. blockid2fips was roughly 600 MB in RAM because it stores 8 million block FIPS as text.
Files may include the following:
frs (150 MB .arrow file, approx 700 MB RAM)
frs_by_programid (approx 500 MB RAM)
frs_by_sic (approx 63 MB RAM)
frs_by_naics (approx 60 MB RAM)
frs_by_mact
quaddata (168 MB on disk, 229 MB RAM)
blockid2fips ( 20 MB on disk, 621 MB RAM!) No longer needed.
blockpoints ( 86 MB on disk, 164 MB RAM)
blockwts ( 31 MB on disk, 196 MB RAM)
bgej (123 MB RAM)
bgid2fips ( 18 MB RAM)