Keras with Eager Execution

Eager execution is a way to train a Keras model without building a graph. Operations return values, not tensors. Consequently, you can inspect what goes in and comes out of an operation simply by printing a variable’s contents. This is an important advantage in model development and debugging.

You can use eager execution with Keras as long as you use the TensorFlow implementation. This guide gives an outline of the workflow by way of a simple regression example. Specifically, you will see how to:

  • Set up your environment for eager execution
  • Define the main ingredients: a Keras model, an optimizer and a loss function
  • Feed data to the training routine
  • Write a simple training loop that does backprop on the model’s weights
  • Make predictions on the test set
  • Save the model’s weights

Requirements

To use eager execution with Keras, you need a current version of the R package keras with a TensorFlow backend of version at least 1.9.

The following preamble is required when using eager execution:

When in doubt, check if you are in fact using eager execution:

Define a model

Models for use with eager execution are defined as Keras custom models.

Custom models are usually made up of normal Keras layers, which you configure as usual. However, you are free to implement custom logic in the model’s (implicit) call function.

Our simple regression example will use iris to predict Sepal.Width from Petal.Length, Sepal.Length and Petal.Width.

Here is a model that can be used for that purpose:

The model is created simply by instantiating it via its wrapper:

At this point, the shapes of the model’s weights are still unknown (note how no input_shape has been defined for its first layer). You can, however, already call the model on some dummy data:

model(k_constant(matrix(1:6, nrow = 2, ncol = 3)))
tf.Tensor(
[[-1.1474639]
 [-1.0472134]], shape=(2, 1), dtype=float32)

After that call, you can inspect the model’s weights using

This will not just display the tensor shapes, but the actual weight values.

Losses and optimizers

An appropriate loss function for a regression task like this is mean squared error:

Note how we have to use loss functions from TensorFlow, not the Keras equivalents. In the same vein, we need to use an optimizer from the tf$train module.

Use tfdatasets to feed the data

In eager execution, you use tfdatasets to stream input and target data to the model. In our simple iris example, we use tensor_slices_dataset to directly create a dataset from the underlying R matrices x_train and y_train.

However, a wide variety of other dataset creation functions is available. Datasets also allow for a variety of pre-processing transformations.

Data is accessed from a dataset via make_iterator_one_shot (to create an iterator) and iterator_get_next (to obtain the next batch).

Datasets are available in non-eager (graph) execution as well. However, in eager mode, we can examine the actual values returned from the iterator:

[[1]]
tf.Tensor(
[[1.4 5.1 0.2]
 [1.4 4.9 0.2]
 [1.3 4.7 0.2]
 [1.5 4.6 0.2]
 [1.4 5.  0.2]
 [1.7 5.4 0.4]
 [1.4 4.6 0.3]
 [1.5 5.  0.2]
 [1.4 4.4 0.2]
 [1.5 4.9 0.1]], shape=(10, 3), dtype=float32)

[[2]]
tf.Tensor(
[[3.5]
 [3. ]
 [3.2]
 [3.1]
 [3.6]
 [3.9]
 [3.4]
 [3.4]
 [2.9]
 [3.1]], shape=(10, 1), dtype=float32)

Training loop

With eager execution, you take full control over the training process.

In general, you will have at least two loops: an outer loop over epochs, and an inner loop over batches of data returned by the iterator (implemented implicitly by until_out_of_range). The iterator is recreated at the start of each new epoch.

Filling in the missing pieces in the above outline, we will see that

  • Forward propagation is simply a call to model().
  • This call has to happen inside the context of a GradientTape that records all operations.
  • Loss is calculated using the loss function defined before.
  • From the loss on the one hand and the model’s current weights on the other hand, GradientTape then determines the gradients.
  • Finally, the optimizer applies the gradients to the weights in its algorithm-specific way.

Here is the complete code for the training loop:

Predictions on the test set

Getting predictions on the test set is just a call to model, just like training has been.

Saving and restoring model weights

To save model weights, create an instance of tf$Checkpoint and pass it the objects to be saved: in our case, the model and the optimizer. This has to happen after the respective objects have been created, but before the training loop.

Then at the end of each epoch, you save the model’s current weights, like so:

This call saves model weights only, not the complete graph. Thus on restore, you re-create all components in the same way as above, and then load saved the model weights using e.g.

You can then obtain predictions from the restored model, on the test set as a whole or batch-wise, using an iterator.

Complete example

Here is the complete example.

library(keras)
use_implementation("tensorflow")

library(tensorflow)
tfe_enable_eager_execution(device_policy = "silent")

library(tfdatasets)


# Prepare training and test sets ------------------------------------------

x_train <-
  iris[1:120, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
x_train <- k_constant(x_train)
y_train <-
  iris[1:120, c("Sepal.Width")] %>% as.matrix()
y_train <- k_constant(y_train)

x_test <-
  iris[121:150, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
x_test <- k_constant(x_test)
y_test <-
  iris[121:150, c("Sepal.Width")] %>% as.matrix()
y_test <- k_constant(y_test)



# Create datasets for training and testing --------------------------------

train_dataset <- tensor_slices_dataset(list (x_train, y_train)) %>%
  dataset_batch(10)
test_dataset <- tensor_slices_dataset(list (x_test, y_test)) %>%
  dataset_batch(10)


# Create model ------------------------------------------------------------

iris_regression_model <- function(name = NULL) {
  keras_model_custom(name = name, function(self) {
    self$dense1 <- layer_dense(units = 32, input_shape = 3)
    self$dropout <- layer_dropout(rate = 0.5)
    self$dense2 <- layer_dense(units = 1)
    
    function (x, mask = NULL) {
      self$dense1(x) %>%
        self$dropout() %>%
        self$dense2()
    }
  })
}

model <- iris_regression_model()


# Define loss function and optimizer --------------------------------------

mse_loss <- function(y_true, y_pred, x) {
  mse <- tf$losses$mean_squared_error(y_true, y_pred)
  mse
}

optimizer <- tf$train$AdamOptimizer()


# Set up checkpointing ----------------------------------------------------

checkpoint_dir <- "./checkpoints"
checkpoint_prefix <- file.path(checkpoint_dir, "ckpt")
checkpoint <-
  tf$train$Checkpoint(optimizer = optimizer,
                      model = model)

n_epochs <- 10

# change to TRUE if you want to restore weights
restore <- FALSE

if (!restore) {
  for (i in seq_len(n_epochs)) {
    iter <- make_iterator_one_shot(train_dataset)
    total_loss <- 0
    
    until_out_of_range({
      batch <-  iterator_get_next(iter)
      x <- batch[[1]]
      y <- batch[[2]]
      
      with(tf$GradientTape() %as% tape, {
        preds <- model(x)
        loss <- mse_loss(y, preds, x)
      })
      
      total_loss <- total_loss + loss
      gradients <- tape$gradient(loss, model$variables)
      
      optimizer$apply_gradients(purrr::transpose(list(gradients, model$variables)),
                                global_step = tf$train$get_or_create_global_step())
      
    })
    
    cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")
    
    checkpoint$save(file_prefix = checkpoint_prefix)
  }
} else {
  checkpoint$restore(tf$train$latest_checkpoint(checkpoint_dir))
}


# Get model predictions on test set ---------------------------------------

model(x_test)

iter <- make_iterator_one_shot(test_dataset)
until_out_of_range({
  batch <-  iterator_get_next(iter)
  preds <- model(batch[[1]])
  print(preds)
})

Where to from here

In this guide, the task - and consequently, the custom model, associated loss and training routine - have been chosen for their simplicity. Visit the TensorFlow for R blog for case studies and paper implementations that use more intricate custom logic.