Neural network model with keras

Usage

pipe_keras(
  df,
  predInput = NULL,
  responseVars = 1,
  caseClass = NULL,
  idVars = character(),
  weight = "class",
  crossValStrategy = c("Kfold", "bootstrap"),
  k = 5,
  replicates = 10,
  crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
  hidden_shape = 50,
  epochs = 500,
  maskNA = NULL,
  batch_size = "all",
  shap = TRUE,
  aggregate_shap = TRUE,
  repVi = 5,
  summarizePred = TRUE,
  scaleDataset = FALSE,
  NNmodel = FALSE,
  DALEXexplainer = FALSE,
  variableResponse = FALSE,
  save_validateset = FALSE,
  baseFilenameNN = NULL,
  filenameRasterPred = NULL,
  tempdirRaster = NULL,
  nCoresRaster = parallel::detectCores()%/%2,
  verbose = 0,
  ...
)

Arguments

df: a data.frame with the data.
predInput: a data.frame or a Raster with the input variables for the model as columns or layers. The columns or layer names must match the names of df columns.
responseVars: response variables as column names or indexes on df.
caseClass: class of the samples used to weight cases. Column names or indexes on df, or a vector with the class for each rows in df.
idVars: id column names or indexes on df. This columns will not be used for training.
weight: Optional array of the same length as nrow(df), containing weights to apply to the model's loss for each sample.
crossValStrategy: Kfold or bootstrap.
k: number of data partitions when crossValStrategy="Kfold".
replicates: number of replicates for crossValStrategy="bootstrap" and crossValStrategy="Kfold" (replicates * k-1, 1 fold for validation).
crossValRatio: Proportion of the dataset used to train, test and validate the model when crossValStrategy="bootstrap". Default to c(train=0.6, test=0.2, validate=0.2). If there is only one value, will be taken as a train proportion and the test set will be used for validation.
hidden_shape: number of neurons in the hidden layers of the neural network model. Can be a vector with values for each hidden layer.
epochs: parameter for keras::fit().
maskNA: value to assign to NAs after scaling and passed to keras::layer_masking().
batch_size: for fit and predict functions. The bigger the better if it fits your available memory. Integer or "all".
shap: if TRUE, return the SHAP values as shapviz::shapviz() object (or shapviz::mshapviz() for multioutput models).
aggregate_shap: if TRUE, and shap is also TRUE, aggregate SHAP from all replicates.
repVi: replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.
summarizePred: if TRUE, return the mean, sd and se of the predictors. if FALSE, return the predictions for each replicate.
scaleDataset: if TRUE, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.
NNmodel: if TRUE, return the serialized model with the result. Use keras::unserialize_model() to get the model.
DALEXexplainer: if TRUE, return a explainer for the models from DALEX::explain() function. It doesn't work with multisession future plans.
variableResponse: if TRUE, return aggregated_profiles_explainer objects from ingredients::partial_dependency() and the coefficients of the adjusted linear model.
save_validateset: save the validateset (independent data not used for training).
baseFilenameNN: if no missing, save the NN in hdf5 format on this path with iteration appended.
filenameRasterPred: if no missing, save the predictions in a RasterBrick to this file.
tempdirRaster: path to a directory to save temporal raster files.
nCoresRaster: number of cores used for parallelized raster cores. Use half of the available cores by default.
verbose: If > 0, print state and passed to keras functions
...: extra parameters for future.apply::future_replicate() and ingredients::feature_importance().