Skip to contents

Neural network model with keras

Usage

pipe_keras_timeseries(
  df,
  predInput = NULL,
  responseVars = 1,
  caseClass = NULL,
  idVars = character(),
  weight = "class",
  timevar = NULL,
  responseTime = "LAST",
  regex_time = ".+",
  staticVars = NULL,
  crossValStrategy = c("Kfold", "bootstrap"),
  k = 5,
  replicates = 10,
  crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
  hidden_shape.RNN = c(32, 32),
  hidden_shape.static = c(32, 32),
  hidden_shape.main = 32,
  epochs = 500,
  maskNA = NULL,
  batch_size = "all",
  repVi = 5,
  perm_dim = 2:3,
  comb_dims = FALSE,
  summarizePred = TRUE,
  scaleDataset = FALSE,
  NNmodel = FALSE,
  DALEXexplainer = FALSE,
  variableResponse = FALSE,
  save_validateset = FALSE,
  baseFilenameNN = NULL,
  filenameRasterPred = NULL,
  tempdirRaster = NULL,
  nCoresRaster = parallel::detectCores()%/%2,
  verbose = 0,
  ...
)

Arguments

df

a data.frame with the data in a long format (time variable in the timevar column).

predInput

a data.frame with the input variables to make predictions. The columns names must match the names of df columns.

responseVars

response variables as column names or indexes on df in wide format (eg. respVar_time).

caseClass

class of the samples used to weight cases. Column names or indexes on df, or a vector with the class for each rows in df.

idVars

id column names or indexes on df. Should be a unique identifier for a row in wide format, otherwise, values will be averaged.

weight

Optional array of the same length as nrow(df), containing weights to apply to the model's loss for each sample.

timevar

column name of the variable containing the time.

responseTime

a timevar value used as a response var for responseVars or the default "LAST" for the last timestep available (max(df[, timevar])).

regex_time

regular expression matching the timevar values format.

staticVars

predictor variables as column names or indexes on df indicating fixed vars that don't change over time.

crossValStrategy

Kfold or bootstrap.

k

number of data partitions when crossValStrategy="Kfold".

replicates

number of replicates for crossValStrategy="bootstrap" and crossValStrategy="Kfold" (replicates * k-1, 1 fold for validation).

crossValRatio

Proportion of the dataset used to train, test and validate the model when crossValStrategy="bootstrap". Default to c(train=0.6, test=0.2, validate=0.2). If there is only one value, will be taken as a train proportion and the test set will be used for validation.

hidden_shape.RNN

number of neurons in the hidden layers of the Recursive Neural Network model (time series data). Can be a vector with values for each hidden layer.

hidden_shape.static

number of neurons in the hidden layers of the densely connected neural network model (static data). Can be a vector with values for each hidden layer.

hidden_shape.main

number of neurons in the hidden layers of the densely connected neural network model connecting static and time series data. Can be a vector with values for each hidden layer.

epochs

parameter for keras::fit().

maskNA

value to assign to NAs after scaling and passed to keras::layer_masking().

batch_size

for fit and predict functions. The bigger the better if it fits your available memory. Integer or "all".

repVi

replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.

perm_dim

dimension to perform the permutations to calculate the importance of the variables (data dimensions [case, time, variable]). If perm_dim = 2:3, it calculates the importance for each combination of the 2nd and 3rd dimensions.

comb_dims

variable importance calculations, if TRUE, do the permutations for each combination of the levels of the variables from 2nd and 3rd dimensions for input data with 3 dimensions. By default FALSE.

summarizePred

if TRUE, return the mean, sd and se of the predictors. if FALSE, return the predictions for each replicate.

scaleDataset

if TRUE, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.

NNmodel

if TRUE, return the serialized model with the result.

DALEXexplainer

if TRUE, return a explainer for the models from DALEX::explain() function. It doesn't work with multisession future plans.

variableResponse

if TRUE, return aggregated_profiles_explainer object from ingredients::partial_dependency() and the coefficients of the adjusted linear model.

save_validateset

save the validateset (independent data not used for training).

baseFilenameNN

if no missing, save the NN in hdf5 format on this path with iteration appended.

filenameRasterPred

if no missing, save the predictions in a RasterBrick to this file.

tempdirRaster

path to a directory to save temporal raster files.

nCoresRaster

number of cores used for parallelized raster cores. Use half of the available cores by default.

verbose

If > 0, print state and passed to keras functions

...

extra parameters for future.apply::future_replicate() and ingredients::feature_importance().