Automatic differentiation and gradient tape

    In this tutorial we will cover automatic differentiation, a key technique for optimizing machine learning models.

    Setup

    We will use the TensorFlow R package:

    library(tensorflow)

    Gradient Tapes

    TensorFlow provides the tf$GradientTape API for automatic differentiation - computing the gradient of a computation with respect to its input variables.

    Tensorflow “records” all operations executed inside the context of a tf$GradientTape onto a “tape”. Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a “recorded” computation using reverse mode differentiation.

    For example:

    ## tf.Tensor(
    ## [[8. 8.]
    ##  [8. 8.]], shape=(2, 2), dtype=float32)

    You can also request gradients of the output with respect to intermediate values computed during a “recorded” tf$GradientTape context.

    ## tf.Tensor(8.0, shape=(), dtype=float32)

    By default, the resources held by a GradientTape are released as soon as GradientTape$gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected. For example:

    ## tf.Tensor(108.0, shape=(), dtype=float32)
    ## tf.Tensor(6.0, shape=(), dtype=float32)
    rm(t)  # Drop the reference to the tape

    Recording control flow

    Because tapes record operations as they are executed, R control flow (using ifs and whiles for example) is naturally handled:

    ## tf.Tensor(12.0, shape=(), dtype=float32)
    ## tf.Tensor(12.0, shape=(), dtype=float32)
    ## tf.Tensor(4.0, shape=(), dtype=float32)

    Higher-order gradients

    Operations inside of the GradientTape context manager are recorded for automatic differentiation. If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:

    ## tf.Tensor(3.0, shape=(), dtype=float32)
    ## tf.Tensor(6.0, shape=(), dtype=float32)