Provide a model assessment. Normally called by fit
, but may be called separately for models applied to new sites.
Usage
assess(
fitid = NULL,
model = NULL,
newdata = NULL,
top_importance = 20,
summary = TRUE,
confusion = TRUE,
importance = TRUE
)
Arguments
- fitid
id of a model in the fits database. If using this, omit
model
, as this info will be extracted from the database.- model
Only when called by do_fit; named list of:
- fit
model fit oject
- confuse
Confusion matrix
- nvalidate
Number of cases in validation set
- id
Model id
- name
Model name
- newdata
An alternate validation set (e.g., from a different site). Variables must conform with the original dataset.
- top_importance
Number of variables to keep for variable importance
- summary
Print model summary info if TRUE
- confusion
Print the confusion matrix and complete statistics if TRUE, and skip if FALSE
- importance
Print variable importance if TRUE, and skip printing if FALSE
Value
Invisibly, a named list of
- confusion
Confusion matrix and complete statistics
- importance
Variable importance data frame
Details
Called by do_fit
, but also may be called by the user. Either provide fitid
for
the model you want to assess (the normal approach), or model
, a list with necessary
arguments (the approach used by do_fit
, becaue the model is not yet in the database).
When you call assess
from the console, the fits database is not updated with the new
assessment.
You may supply newdata
to assess a model on sites different from what the model
was built on. newdata
is a data frame that conforms to the data the model was
built on. (how exactly?)
Assessments are returned invisibly; by default, they are printed to the console.
Explanations
1. Model info
Model fit id and name, if supplied
Number of variables fit
Sample size for training and validation holdout set. The confusion matrix and all statistics are derived from the holdout set.
Correct classification rate, the percent of cases that were predicted correctly.
Kappa, a refined version of the CCR that takes the probability of chance agreement into account.
2. Confusion matrix
Shows which classification errors were made. Values falling on the diagonal were predicted correctly.
3. Overall statistics
Accuracy is the correct classification rate (also known as CCR), the percent of cases that fall on the diagonal in the confusion matrix.
The No Information Rate is the CCR you'd get if you always bet the majority class.
Kappa is a refined version of the CCR that takes the probability of chance agreement into account.
Mcnemar's test only applies to two-class data.
4. Statistics by class
Lists the follwing statistics for each of the subclasses. These all scale from 0 to 1, with 1 generally indicating higher performance (except for prevalence, detection rate, and detection prevalence).
Precision, the proportion of cases predicted to be in the class that actually were (true positives / (true positives + false positives))
Recall, the proportion of cases actually in the class that were predicted to be in the class (true positives / (true positives + false negatives))
F1, the harmonic mean of precision and recall; a combined metric of model performance
Prevalence, the proportion of all cases that are in this class
Detection Rate, the proportion of all cases that are correctly predicted to be in this class
Detection Prevalence, the proportion of all cases predicted to be in this class
Balanced Accuracy, mean of true positive rate and true negative rate; a combined metric of model performance
AUC (Area Under the Curve) is the probability that the model, for a particular class, when given a random case in the class and a random case from another class, will rate the case in the class higher. Unlike the other statistics, AUC is independent of the particluar cutpoint chosen, and is telling us about the performance of the probabilities produced by the model.
5. Variable importance
Scaled from 0 to 100, gives the relative contribution of each variable to the model fit. Less-important variables will be trimmed based on the top_importance option. Note that variables are imagery bands, not an entire orthoimage; thus, for instance, an RGB true color image represents three varaibles, any of which may come into the model separately.