Neural network model with keras
Usage
pipe_keras_timeseries(
df,
predInput = NULL,
responseVars = 1,
caseClass = NULL,
idVars = character(),
weight = "class",
timevar = NULL,
responseTime = "LAST",
regex_time = ".+",
staticVars = NULL,
crossValStrategy = c("Kfold", "bootstrap"),
k = 5,
replicates = 10,
crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
hidden_shape.RNN = c(32, 32),
hidden_shape.static = c(32, 32),
hidden_shape.main = 32,
epochs = 500,
maskNA = NULL,
batch_size = "all",
repVi = 5,
perm_dim = 2:3,
comb_dims = FALSE,
summarizePred = TRUE,
scaleDataset = FALSE,
NNmodel = FALSE,
DALEXexplainer = FALSE,
variableResponse = FALSE,
save_validateset = FALSE,
baseFilenameNN = NULL,
filenameRasterPred = NULL,
tempdirRaster = NULL,
nCoresRaster = parallel::detectCores()%/%2,
verbose = 0,
...
)Arguments
- df
a
data.framewith the data in a long format (time variable in thetimevarcolumn).- predInput
a
data.framewith the input variables to make predictions. The columns names must match the names ofdfcolumns.- responseVars
response variables as column names or indexes on
dfin wide format (eg. respVar_time).- caseClass
class of the samples used to weight cases. Column names or indexes on
df, or a vector with the class for each rows indf.- idVars
id column names or indexes on
df. Should be a unique identifier for a row in wide format, otherwise, values will be averaged.- weight
Optional array of the same length as
nrow(df), containing weights to apply to the model's loss for each sample.- timevar
column name of the variable containing the time.
- responseTime
a
timevarvalue used as a response var forresponseVarsor the default "LAST" for the last timestep available (max(df[, timevar])).- regex_time
regular expression matching the
timevarvalues format.- staticVars
predictor variables as column names or indexes on
dfindicating fixed vars that don't change over time.- crossValStrategy
Kfoldorbootstrap.- k
number of data partitions when
crossValStrategy="Kfold".- replicates
number of replicates for
crossValStrategy="bootstrap"andcrossValStrategy="Kfold"(replicates * k-1, 1 fold for validation).- crossValRatio
Proportion of the dataset used to train, test and validate the model when
crossValStrategy="bootstrap". Default toc(train=0.6, test=0.2, validate=0.2). If there is only one value, will be taken as a train proportion and the test set will be used for validation.- hidden_shape.RNN
number of neurons in the hidden layers of the Recursive Neural Network model (time series data). Can be a vector with values for each hidden layer.
- hidden_shape.static
number of neurons in the hidden layers of the densely connected neural network model (static data). Can be a vector with values for each hidden layer.
- hidden_shape.main
number of neurons in the hidden layers of the densely connected neural network model connecting static and time series data. Can be a vector with values for each hidden layer.
- epochs
parameter for
keras::fit().- maskNA
value to assign to
NAs after scaling and passed tokeras::layer_masking().- batch_size
for fit and predict functions. The bigger the better if it fits your available memory. Integer or "all".
- repVi
replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.
- perm_dim
dimension to perform the permutations to calculate the importance of the variables (data dimensions [case, time, variable]). If
perm_dim = 2:3, it calculates the importance for each combination of the 2nd and 3rd dimensions.- comb_dims
variable importance calculations, if
TRUE, do the permutations for each combination of the levels of the variables from 2nd and 3rd dimensions for input data with 3 dimensions. By defaultFALSE.- summarizePred
if
TRUE, return the mean, sd and se of the predictors. ifFALSE, return the predictions for each replicate.- scaleDataset
if
TRUE, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.- NNmodel
if
TRUE, return the serialized model with the result.- DALEXexplainer
if
TRUE, return a explainer for the models fromDALEX::explain()function. It doesn't work with multisession future plans.- variableResponse
if
TRUE, return aggregated_profiles_explainer object fromingredients::partial_dependency()and the coefficients of the adjusted linear model.- save_validateset
save the validateset (independent data not used for training).
- baseFilenameNN
if no missing, save the NN in hdf5 format on this path with iteration appended.
- filenameRasterPred
if no missing, save the predictions in a RasterBrick to this file.
- tempdirRaster
path to a directory to save temporal raster files.
- nCoresRaster
number of cores used for parallelized raster cores. Use half of the available cores by default.
- verbose
If > 0, print state and passed to keras functions
- ...
extra parameters for
future.apply::future_replicate()andingredients::feature_importance().