Skip to contents

Random forest model with randomForest

Usage

pipe_randomForest(
  df,
  predInput = NULL,
  responseVar = 1,
  caseClass = NULL,
  idVars = character(),
  weight = "class",
  crossValStrategy = c("Kfold", "bootstrap"),
  k = 5,
  replicates = 10,
  crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
  ntree = 500,
  importance = TRUE,
  shap = TRUE,
  aggregate_shap = TRUE,
  repVi = 5,
  summarizePred = TRUE,
  scaleDataset = FALSE,
  RFmodel = FALSE,
  DALEXexplainer = FALSE,
  variableResponse = FALSE,
  save_validateset = FALSE,
  filenameRasterPred = NULL,
  tempdirRaster = NULL,
  nCoresRaster = parallel::detectCores()%/%2,
  verbose = 0,
  ...
)

Arguments

df

a data.frame with the data.

predInput

a data.frame or a Raster with the input variables for the model as columns or layers. The columns or layer names must match the names of df columns.

responseVar

response variable as column name or index on df.

caseClass

class of the samples used to weight cases. Column names or indexes on df, or a vector with the class for each rows in df.

idVars

id column names or indexes on df. This columns will not be used for training.

weight

Optional array of the same length as nrow(df), containing weights to apply to the model's loss for each sample.

crossValStrategy

Kfold or bootstrap.

k

number of data partitions when crossValStrategy="Kfold".

replicates

number of replicates for crossValStrategy="bootstrap" and crossValStrategy="Kfold" (replicates * k-1, 1 fold for validation).

crossValRatio

Proportion of the dataset used to train, test and validate the model when crossValStrategy="bootstrap". Default to c(train=0.6, test=0.2, validate=0.2). If there is only one value, will be taken as a train proportion and the test set will be used for validation.

ntree

Number of trees to grow.

importance

parameter for randomForest::randomForest() indicating if importance of predictors should be assessed.

shap

if TRUE, return the SHAP values as shapviz::shapviz() objects.

aggregate_shap

if TRUE, and shap is also TRUE, aggregate SHAP from all replicates.

repVi

replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.

summarizePred

if TRUE, return the mean, sd and se of the predictors. if FALSE, return the predictions for each replicate.

scaleDataset

if TRUE, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.

RFmodel

if TRUE, return the model with the result.

DALEXexplainer

if TRUE, return a explainer for the models from DALEX::explain() function. It doesn't work with multisession future plans.

variableResponse

if TRUE, return aggregated_profiles_explainer object from ingredients::partial_dependency() and the coefficients of the adjusted linear model.

save_validateset

save the validateset (independent data not used for training).

filenameRasterPred

if no missing, save the predictions in a RasterBrick to this file.

tempdirRaster

path to a directory to save temporal raster files.

nCoresRaster

number of cores used for parallelized raster cores. Use half of the available cores by default.

verbose

If > 0, print state and passed to randomForest functions

...

extra parameters for randomForest::randomForest(), future.apply::future_replicate() and ingredients::feature_importance().