eXtrem Gradient Boosted models
Usage
pipe_xgboost(
df,
predInput = NULL,
responseVars = 1,
caseClass = NULL,
idVars = character(),
weight = "class",
crossValStrategy = c("Kfold", "bootstrap"),
k = 5,
replicates = 10,
crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
params = list(),
nrounds = 5,
shap = TRUE,
aggregate_shap = TRUE,
repVi = 5,
summarizePred = TRUE,
scaleDataset = FALSE,
XGBmodel = FALSE,
DALEXexplainer = FALSE,
variableResponse = FALSE,
save_validateset = FALSE,
baseFilenameXDG = NULL,
filenameRasterPred = NULL,
tempdirRaster = NULL,
nCoresRaster = parallel::detectCores()%/%2,
verbose = 0,
...
)
Arguments
- df
a
data.frame
with the data.- predInput
a
data.frame
or aRaster
with the input variables for the model as columns or layers. The columns or layer names must match the names ofdf
columns.- responseVars
response variables as column names or indexes on
df
.- caseClass
class of the samples used to weight cases. Column names or indexes on
df
, or a vector with the class for each rows indf
.- idVars
id column names or indexes on
df
. This columns will not be used for training.- weight
Optional array of the same length as
nrow(df)
, containing weights to apply to the model's loss for each sample.- crossValStrategy
Kfold
orbootstrap
.- k
number of data partitions when
crossValStrategy="Kfold"
.- replicates
number of replicates for
crossValStrategy="bootstrap"
andcrossValStrategy="Kfold"
(replicates * k-1
, 1 fold for validation).- crossValRatio
proportion of the dataset used to train, test and validate the model when
crossValStrategy="bootstrap"
. Default toc(train=0.6, test=0.2, validate=0.2)
. If there is only one value, will be taken as a train proportion and the test set will be used for validation.- params
the list of parameters to
xgboost::xgb.train()
. The complete list of parameters is available in the online documentation.- nrounds
max number of boosting iterations.
- shap
if
TRUE
, return the SHAP values asshapviz::shapviz()
objects.- aggregate_shap
if
TRUE
, andshap
is alsoTRUE
, aggregate SHAP from all replicates.- repVi
replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.
- summarizePred
if
TRUE
, return the mean, sd and se of the predictors. ifFALSE
, return the predictions for each replicate.- scaleDataset
if
TRUE
, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.- XGBmodel
if
TRUE
, return the model with the result.- DALEXexplainer
if
TRUE
, return a explainer for the models fromDALEX::explain()
function. It doesn't work with multisession future plans.- variableResponse
if
TRUE
, return aggregated_profiles_explainer object fromingredients::partial_dependency()
and the coefficients of the adjusted linear model.- save_validateset
save the validateset (independent data not used for training).
- baseFilenameXDG
if no missing, save the NN in hdf5 format on this path with iteration appended.
- filenameRasterPred
if no missing, save the predictions in a RasterBrick to this file.
- tempdirRaster
path to a directory to save temporal raster files.
- nCoresRaster
number of cores used for parallelized raster cores. Use half of the available cores by default.
- verbose
if > 0, print the state. The bigger the more information printed.
- ...
extra parameters for
xgboost::xgb.train()
,future.apply::future_replicate()
oringredients::feature_importance()
.