Skip to contents

Given one or more sites and a model specification, builds a model of vegetation cover and report model assessment.

Usage

fit(
  site = NULL,
  datafile = "data",
  name = "",
  method = "rf",
  vars = "{*}",
  exclude_vars = "",
  exclude_classes = NULL,
  reclass = c(13, 2),
  max_samples = NULL,
  years = NULL,
  minscore = 0,
  maxmissing = 20,
  max_miss_train = 0.2,
  top_importance = 20,
  holdout = 0.2,
  auc = FALSE,
  hyper = NULL,
  resources = NULL,
  local = FALSE,
  trap = TRUE,
  comment = NULL
)

Arguments

site

Three letter site code, or vector of site names if fitting multiple sites

datafile

Name of data file. It must be an .RDS file, but exclude the extension. If fitting multiple sites, either use a single datafile name shared among sites, or a vector matching site.

name

Optional model name

method

One of rf for Random Forest, boost for AdaBoost. Default = rf.

vars

Vector of variables to restrict analysis to. Default = {*}, all variables. vars is processed by find_orthos, and may include file names, portable names, search names and regular expressions of file and portable names.

exclude_vars

An optional vector of variables to exclude. As with vars, variables are processed by find_orthos

exclude_classes

Numeric vector of subclasses to exclude

reclass

Vector of paired classes to reclassify, e.g., reclass = c(13, 2, 3, 4) would reclassify all 13s to 2 and 4s to 3, lumping each pair of classes.

max_samples

Maximum number of samples to use - subsample if necessary

years

Vector of years to restrict variables to

minscore

Minimum score for orthos. Files with a minimum score of less than this are excluded from results. Default is 0, but rejected orthos are always excluded.

maxmissing

Maximum percent missing in orthos. Files with percent missing greater than this are excluded.

max_miss_train

Maximum proportion of missing training points allowed before a variable is dropped

top_importance

Number of variables to keep for variable importance

holdout

Proportion of points to hold out. For Random Forest, this specifies the size of the single validation set, while for boosting, it is the size of each of the testing and validation sets.

auc

If TRUE, calculate class probabilities so we can calculate AUC

hyper

Hyperparameters. To be defined.

resources

Slurm launch resources. See launch. These take priority over the function's defaults.

local

If TRUE, run locally; otherwise, spawn a batch run on Unity

trap

If TRUE, trap errors in local mode; if FALSE, use normal R error handling. Use this for debugging. If you get unrecovered errors, the job won't be added to the jobs database. Has no effect if local = FALSE.

comment

Optional launch / slurmcollie comment