Sample images at points where we have field-collected data, creating a data table for modeling.
Usage
sample(
site,
pattern = "{*}",
n = NULL,
p = NULL,
d = NULL,
classes = NULL,
balance = TRUE,
balance_excl = c(7, 33),
result = NULL,
transects = NULL,
drop_corr = NULL,
reuse = FALSE,
resources = NULL,
local = FALSE,
trap = TRUE,
comment = NULL
)
Arguments
- site
One or more site names, using 3 letter abbreviation. Use
all
to process all sites. In batch mode, each named site will be run in a separate job.- pattern
File names, portable names, regex matching either, or search names selecting files to sample. See Image naming in README for details. The default is
{*}
, which will include all variables.- n
Number of total samples to return.
- p
Proportion of total samples to return. Use p = 1 to sample all.
- d
Mean distance in cells between samples. No minimum spacing is guaranteed.
- classes
Class or vector of classes in transects to sample. Default is all classes.
- balance
If TRUE, balance number of samples for each class. Points will be randomly selected to match the sparsest class.
- balance_excl
Vector of classes to exclude when determining sample size when balancing. Include classes with low samples we don't care much about.
- result
Name of result file. If not specified, file will be constructed from site, number of X vars, and strategy.
- transects
Name of transects file; default is
transects
.- drop_corr
Drop one of any pair of variables with correlation more than
drop_corr
.- reuse
Reuse the named file (ending in
_all.txt
) from previous run, rather than resampling. Saves a whole lot of time if you're changingn
,p
,d
,balance
,balance_excl
, ordrop_corr
.- resources
Slurm launch resources. See launch. These take priority over the function's defaults.
- local
If TRUE, run locally; otherwise, spawn a batch run on Unity
- trap
If TRUE, trap errors in local mode; if FALSE, use normal R error handling. Use this for debugging. If you get unrecovered errors, the job won't be added to the jobs database. Has no effect if local = FALSE.
- comment
Optional slurmcollie comment
Details
There are three mutually exclusive sampling strategies (n, p, and d). You
must choose exactly one. n
samples the total number of points provided.
p
samples the proportion of total points (after balancing, if balance
is
selected. d
samples points with a mean (but not guaranteed) minimum distance.
Portable names are used for variable names in the resulting data files. Dashes from modifications are changed to underscore to avoid causing trouble.
Results are saved in four files, plus a metadata file:
_all.txt - A text version of the full dataset (selected by pattern
but not subsetted byn
,p
,d
,balance
, ordrop_corr
). Readable by any software._all.RDS - An RDS version of the full dataset; far faster to read than a text file in R (1.1 s vs. 14.4 s in one example). .txt - A text version of the final selected and subsetted dataset, as a text file. .RDS - An RDS version of the final dataset. _vars.txt - Lists the portable names used for variables in the sample alongside the file names on disk. This disambiguates when there are duplicate portable names in a flights directory.
Memory requirements: I've measured up to 28.5 GB.