eXtrem Gradient Boosted models
Usage
pipe_xgboost(
  df,
  predInput = NULL,
  responseVars = 1,
  caseClass = NULL,
  idVars = character(),
  weight = "class",
  crossValStrategy = c("Kfold", "bootstrap"),
  k = 5,
  replicates = 10,
  crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
  params = list(),
  nrounds = 5,
  shap = TRUE,
  aggregate_shap = TRUE,
  repVi = 5,
  summarizePred = TRUE,
  scaleDataset = FALSE,
  XGBmodel = FALSE,
  DALEXexplainer = FALSE,
  variableResponse = FALSE,
  save_validateset = FALSE,
  baseFilenameXDG = NULL,
  filenameRasterPred = NULL,
  tempdirRaster = NULL,
  nCoresRaster = parallel::detectCores()%/%2,
  verbose = 0,
  ...
)Arguments
- df
- a - data.framewith the data.
- predInput
- a - data.frameor a- Rasterwith the input variables for the model as columns or layers. The columns or layer names must match the names of- dfcolumns.
- responseVars
- response variables as column names or indexes on - df.
- caseClass
- class of the samples used to weight cases. Column names or indexes on - df, or a vector with the class for each rows in- df.
- idVars
- id column names or indexes on - df. This columns will not be used for training.
- weight
- Optional array of the same length as - nrow(df), containing weights to apply to the model's loss for each sample.
- crossValStrategy
- Kfoldor- bootstrap.
- k
- number of data partitions when - crossValStrategy="Kfold".
- replicates
- number of replicates for - crossValStrategy="bootstrap"and- crossValStrategy="Kfold"(- replicates * k-1, 1 fold for validation).
- crossValRatio
- proportion of the dataset used to train, test and validate the model when - crossValStrategy="bootstrap". Default to- c(train=0.6, test=0.2, validate=0.2). If there is only one value, will be taken as a train proportion and the test set will be used for validation.
- params
- the list of parameters to - xgboost::xgb.train(). The complete list of parameters is available in the online documentation.
- nrounds
- max number of boosting iterations. 
- shap
- if - TRUE, return the SHAP values as- shapviz::shapviz()objects.
- aggregate_shap
- if - TRUE, and- shapis also- TRUE, aggregate SHAP from all replicates.
- repVi
- replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance. 
- summarizePred
- if - TRUE, return the mean, sd and se of the predictors. if- FALSE, return the predictions for each replicate.
- scaleDataset
- if - TRUE, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.
- XGBmodel
- if - TRUE, return the model with the result.
- DALEXexplainer
- if - TRUE, return a explainer for the models from- DALEX::explain()function. It doesn't work with multisession future plans.
- variableResponse
- if - TRUE, return aggregated_profiles_explainer object from- ingredients::partial_dependency()and the coefficients of the adjusted linear model.
- save_validateset
- save the validateset (independent data not used for training). 
- baseFilenameXDG
- if no missing, save the NN in hdf5 format on this path with iteration appended. 
- filenameRasterPred
- if no missing, save the predictions in a RasterBrick to this file. 
- tempdirRaster
- path to a directory to save temporal raster files. 
- nCoresRaster
- number of cores used for parallelized raster cores. Use half of the available cores by default. 
- verbose
- if > 0, print the state. The bigger the more information printed. 
- ...
- extra parameters for - xgboost::xgb.train(),- future.apply::future_replicate()or- ingredients::feature_importance().