Skip to contents

Neural network model with keras

Usage

pipe_keras(
  df,
  predInput = NULL,
  responseVars = 1,
  caseClass = NULL,
  idVars = character(),
  weight = "class",
  crossValStrategy = c("Kfold", "bootstrap"),
  k = 5,
  replicates = 10,
  crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
  hidden_shape = 50,
  epochs = 500,
  maskNA = NULL,
  batch_size = "all",
  shap = TRUE,
  aggregate_shap = TRUE,
  repVi = 5,
  summarizePred = TRUE,
  scaleDataset = FALSE,
  NNmodel = FALSE,
  DALEXexplainer = FALSE,
  variableResponse = FALSE,
  save_validateset = FALSE,
  baseFilenameNN = NULL,
  filenameRasterPred = NULL,
  tempdirRaster = NULL,
  nCoresRaster = parallel::detectCores()%/%2,
  verbose = 0,
  ...
)

Arguments

df

a data.frame with the data.

predInput

a data.frame or a Raster with the input variables for the model as columns or layers. The columns or layer names must match the names of df columns.

responseVars

response variables as column names or indexes on df.

caseClass

class of the samples used to weight cases. Column names or indexes on df, or a vector with the class for each rows in df.

idVars

id column names or indexes on df. This columns will not be used for training.

weight

Optional array of the same length as nrow(df), containing weights to apply to the model's loss for each sample.

crossValStrategy

Kfold or bootstrap.

k

number of data partitions when crossValStrategy="Kfold".

replicates

number of replicates for crossValStrategy="bootstrap" and crossValStrategy="Kfold" (replicates * k-1, 1 fold for validation).

crossValRatio

Proportion of the dataset used to train, test and validate the model when crossValStrategy="bootstrap". Default to c(train=0.6, test=0.2, validate=0.2). If there is only one value, will be taken as a train proportion and the test set will be used for validation.

hidden_shape

number of neurons in the hidden layers of the neural network model. Can be a vector with values for each hidden layer.

epochs

parameter for keras::fit().

maskNA

value to assign to NAs after scaling and passed to keras::layer_masking().

batch_size

for fit and predict functions. The bigger the better if it fits your available memory. Integer or "all".

shap

if TRUE, return the SHAP values as shapviz::shapviz() object (or shapviz::mshapviz() for multioutput models).

aggregate_shap

if TRUE, and shap is also TRUE, aggregate SHAP from all replicates.

repVi

replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.

summarizePred

if TRUE, return the mean, sd and se of the predictors. if FALSE, return the predictions for each replicate.

scaleDataset

if TRUE, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.

NNmodel

if TRUE, return the serialized model with the result. Use keras::unserialize_model() to get the model.

DALEXexplainer

if TRUE, return a explainer for the models from DALEX::explain() function. It doesn't work with multisession future plans.

variableResponse

if TRUE, return aggregated_profiles_explainer objects from ingredients::partial_dependency() and the coefficients of the adjusted linear model.

save_validateset

save the validateset (independent data not used for training).

baseFilenameNN

if no missing, save the NN in hdf5 format on this path with iteration appended.

filenameRasterPred

if no missing, save the predictions in a RasterBrick to this file.

tempdirRaster

path to a directory to save temporal raster files.

nCoresRaster

number of cores used for parallelized raster cores. Use half of the available cores by default.

verbose

If > 0, print state and passed to keras functions

...

extra parameters for future.apply::future_replicate() and ingredients::feature_importance().