library(tensorflow)
library(keras)
The Functional API
Setup
Introduction
The Keras functional API is a way to create models that are more flexible than the sequential API. The functional API can handle models with non-linear topology, shared layers, and even multiple inputs or outputs.
The main idea is that a deep learning model is usually a directed acyclic graph (DAG) of layers. So the functional API is a way to build graphs of layers.
Consider the following model:
: 784-dimensional vectors)
(input
↧Dense (64 units, relu activation)]
[
↧Dense (64 units, relu activation)]
[
↧Dense (10 units, softmax activation)]
[
↧: logits of a probability distribution over 10 classes) (output
This is a basic graph with three layers. To build this model using the functional API, start by creating an input node:
<- layer_input(shape = c(784)) inputs
The shape of the data is set as a 784-dimensional vector. The batch size is always omitted since only the shape of each sample is specified.
If, for example, you have an image input with a shape of (32, 32, 3)
, you would use:
# Just for demonstration purposes.
<- layer_input(shape = c(32, 32, 3)) img_inputs
The inputs
that is returned contains information about the shape and dtype
of the input data that you feed to your model. Here’s the shape:
$shape inputs
TensorShape([None, 784])
Here’s the dtype:
$dtype inputs
tf.float32
You create a new node in the graph of layers by calling a layer on this inputs
object:
<- layer_dense(units = 64, activation = "relu")
dense <- dense(inputs) x
The “layer call” action is like drawing an arrow from “inputs” to this layer you created. You’re “passing” the inputs to the dense
layer, and you get x
as the output.
You can also conveniently create the layer and compose it with inputs
in one step, like this:
<- inputs %>%
x layer_dense(units = 64, activation = "relu")
Let’s add a few more layers to the graph of layers:
<- x %>%
outputs layer_dense(64, activation = "relu") %>%
layer_dense(10)
At this point, you can create a Model
by specifying its inputs and outputs in the graph of layers:
<- keras_model(inputs = inputs, outputs = outputs,
model name = "mnist_model")
Let’s check out what the model summary looks like:
model
Model: "mnist_model"
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
input_1 (InputLayer) [(None, 784)] 0
dense_1 (Dense) (None, 64) 50240
dense_3 (Dense) (None, 64) 4160
dense_2 (Dense) (None, 10) 650
============================================================================
Total params: 55050 (215.04 KB)
Trainable params: 55050 (215.04 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
You can also plot the model as a graph:
plot(model)
And, optionally, display the input and output shapes of each layer in the plotted graph:
plot(model, show_shapes = TRUE)
This figure and the code are almost identical. In the code version, the connection arrows are replaced by %>%
operator.
A “graph of layers” is an intuitive mental image for a deep learning model, and the functional API is a way to create models that closely mirrors this.
Training, evaluation, and inference
Training, evaluation, and inference work exactly in the same way for models built using the functional API as for Sequential
models.
The Model
class offers a built-in training loop (the fit()
method) and a built-in evaluation loop (the evaluate()
method). Note that you can easily customize these loops to implement training routines beyond supervised learning (e.g. GANs).
Here, load the MNIST image data, reshape it into vectors, fit the model on the data (while monitoring performance on a validation split), then evaluate the model on the test data:
c(c(x_train, y_train), c(x_test, y_test)) %<-% keras::dataset_mnist()
<- array_reshape(x_train, c(60000, 784)) / 255
x_train <- array_reshape(x_test, c(10000, 784)) / 255
x_test
%>% compile(
model loss = loss_sparse_categorical_crossentropy(from_logits = TRUE),
optimizer = optimizer_rmsprop(),
metrics = "accuracy"
)
<- model %>% fit(
history batch_size = 64, epochs = 2, validation_split = 0.2) x_train, y_train,
Epoch 1/2
750/750 - 2s - loss: 0.3521 - accuracy: 0.9020 - val_loss: 0.1825 - val_accuracy: 0.9477 - 2s/epoch - 3ms/step
Epoch 2/2
750/750 - 1s - loss: 0.1668 - accuracy: 0.9510 - val_loss: 0.1585 - val_accuracy: 0.9538 - 1s/epoch - 2ms/step
<- model %>% evaluate(x_test, y_test, verbose = 2) test_scores
313/313 - 0s - loss: 0.1489 - accuracy: 0.9510 - 351ms/epoch - 1ms/step
print(test_scores)
loss accuracy
0.1489451 0.9510000
For further reading, see the training and evaluation guide.
Save and serialize
Saving the model and serialization work the same way for models built using the functional API as they do for Sequential
models. The standard way to save a functional model is to call save_model_tf()
to save the entire model as a single file. You can later recreate the same model from this file, even if the code that built the model is no longer available.
This saved file includes the: - model architecture - model weight values (that were learned during training) - model training config, if any (as passed to compile
) - optimizer and its state, if any (to restart training where you left off)
<- tempfile()
path_to_my_model save_model_tf(model, path_to_my_model)
rm(model)
# Recreate the exact same model purely from the file:
<- load_model_tf(path_to_my_model) model
For details, read the model serialization & saving guide.
Use the same graph of layers to define multiple models
In the functional API, models are created by specifying their inputs and outputs in a graph of layers. That means that a single graph of layers can be used to generate multiple models.
In the example below, you use the same stack of layers to instantiate two models: an encoder
model that turns image inputs into 16-dimensional vectors, and an end-to-end autoencoder
model for training.
<- layer_input(shape = c(28, 28, 1),
encoder_input name = "img")
<- encoder_input %>%
encoder_output layer_conv_2d(16, 3, activation = "relu") %>%
layer_conv_2d(32, 3, activation = "relu") %>%
layer_max_pooling_2d(3) %>%
layer_conv_2d(32, 3, activation = "relu") %>%
layer_conv_2d(16, 3, activation = "relu") %>%
layer_global_max_pooling_2d()
<- keras_model(encoder_input, encoder_output,
encoder name = "encoder")
encoder
Model: "encoder"
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
img (InputLayer) [(None, 28, 28, 1)] 0
conv2d_3 (Conv2D) (None, 26, 26, 16) 160
conv2d_2 (Conv2D) (None, 24, 24, 32) 4640
max_pooling2d (MaxPooling2D) (None, 8, 8, 32) 0
conv2d_1 (Conv2D) (None, 6, 6, 32) 9248
conv2d (Conv2D) (None, 4, 4, 16) 4624
global_max_pooling2d (GlobalMax (None, 16) 0
Pooling2D)
============================================================================
Total params: 18672 (72.94 KB)
Trainable params: 18672 (72.94 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
<- encoder_output %>%
decoder_output layer_reshape(c(4, 4, 1)) %>%
layer_conv_2d_transpose(16, 3, activation = "relu") %>%
layer_conv_2d_transpose(32, 3, activation = "relu") %>%
layer_upsampling_2d(3) %>%
layer_conv_2d_transpose(16, 3, activation = "relu") %>%
layer_conv_2d_transpose(1, 3, activation = "relu")
<- keras_model(encoder_input, decoder_output,
autoencoder name = "autoencoder")
autoencoder
Model: "autoencoder"
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
img (InputLayer) [(None, 28, 28, 1)] 0
conv2d_3 (Conv2D) (None, 26, 26, 16) 160
conv2d_2 (Conv2D) (None, 24, 24, 32) 4640
max_pooling2d (MaxPooling2D) (None, 8, 8, 32) 0
conv2d_1 (Conv2D) (None, 6, 6, 32) 9248
conv2d (Conv2D) (None, 4, 4, 16) 4624
global_max_pooling2d (GlobalMax (None, 16) 0
Pooling2D)
reshape (Reshape) (None, 4, 4, 1) 0
conv2d_transpose_3 (Conv2DTrans (None, 6, 6, 16) 160
pose)
conv2d_transpose_2 (Conv2DTrans (None, 8, 8, 32) 4640
pose)
up_sampling2d (UpSampling2D) (None, 24, 24, 32) 0
conv2d_transpose_1 (Conv2DTrans (None, 26, 26, 16) 4624
pose)
conv2d_transpose (Conv2DTranspo (None, 28, 28, 1) 145
se)
============================================================================
Total params: 28241 (110.32 KB)
Trainable params: 28241 (110.32 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
Here, the decoding architecture is strictly symmetrical to the encoding architecture, so the output shape is the same as the input shape (28, 28, 1)
.
The reverse of a Conv2D
layer is a Conv2DTranspose
layer, and the reverse of a MaxPooling2D
layer is an UpSampling2D
layer.
All models are callable, just like layers
You can treat any model as if it were a layer by invoking it on an Input
or on the output of another layer. By calling a model you aren’t just reusing the architecture of the model, you’re also reusing its weights.
To see this in action, here’s a different take on the autoencoder example that creates an encoder model, a decoder model, and chains them in two calls to obtain the autoencoder model:
<- layer_input(shape = c(28, 28, 1), name = "original_img")
encoder_input <- encoder_input %>%
encoder_output layer_conv_2d(16, 3, activation = "relu") %>%
layer_conv_2d(32, 3, activation = "relu") %>%
layer_max_pooling_2d(3) %>%
layer_conv_2d(32, 3, activation = "relu") %>%
layer_conv_2d(16, 3, activation = "relu") %>%
layer_global_max_pooling_2d()
<- keras_model(encoder_input, encoder_output, name = "encoder")
encoder encoder
Model: "encoder"
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
original_img (InputLayer) [(None, 28, 28, 1)] 0
conv2d_7 (Conv2D) (None, 26, 26, 16) 160
conv2d_6 (Conv2D) (None, 24, 24, 32) 4640
max_pooling2d_1 (MaxPooling2D) (None, 8, 8, 32) 0
conv2d_5 (Conv2D) (None, 6, 6, 32) 9248
conv2d_4 (Conv2D) (None, 4, 4, 16) 4624
global_max_pooling2d_1 (GlobalM (None, 16) 0
axPooling2D)
============================================================================
Total params: 18672 (72.94 KB)
Trainable params: 18672 (72.94 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
<- layer_input(shape = c(16), name = "encoded_img")
decoder_input <- decoder_input %>%
decoder_output layer_reshape(c(4, 4, 1)) %>%
layer_conv_2d_transpose(16, 3, activation = "relu") %>%
layer_conv_2d_transpose(32, 3, activation = "relu") %>%
layer_upsampling_2d(3) %>%
layer_conv_2d_transpose(16, 3, activation = "relu") %>%
layer_conv_2d_transpose(1, 3, activation = "relu")
<- keras_model(decoder_input, decoder_output,
decoder name = "decoder")
decoder
Model: "decoder"
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
encoded_img (InputLayer) [(None, 16)] 0
reshape_1 (Reshape) (None, 4, 4, 1) 0
conv2d_transpose_7 (Conv2DTrans (None, 6, 6, 16) 160
pose)
conv2d_transpose_6 (Conv2DTrans (None, 8, 8, 32) 4640
pose)
up_sampling2d_1 (UpSampling2D) (None, 24, 24, 32) 0
conv2d_transpose_5 (Conv2DTrans (None, 26, 26, 16) 4624
pose)
conv2d_transpose_4 (Conv2DTrans (None, 28, 28, 1) 145
pose)
============================================================================
Total params: 9569 (37.38 KB)
Trainable params: 9569 (37.38 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
<- layer_input(shape = c(28, 28, 1), name = "img")
autoencoder_input <- encoder(autoencoder_input)
encoded_img <- decoder(encoded_img)
decoded_img <- keras_model(autoencoder_input, decoded_img,
autoencoder name = "autoencoder")
autoencoder
Model: "autoencoder"
____________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
img (InputLayer) [(None, 28, 28, 1)] 0
encoder (Functional) (None, 16) 18672
decoder (Functional) (None, 28, 28, 1) 9569
============================================================================
Total params: 28241 (110.32 KB)
Trainable params: 28241 (110.32 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
As you can see, the model can be nested: a model can contain sub-models (since a model is just like a layer). A common use case for model nesting is ensembling. For example, here’s how to ensemble a set of models into a single model that averages their predictions:
<- function() {
get_model <- layer_input(shape = c(128))
inputs <- inputs %>% layer_dense(1)
outputs keras_model(inputs, outputs)
}
<- get_model()
model1 <- get_model()
model2 <- get_model()
model3
<- layer_input(shape = c(128))
inputs <- model1(inputs)
y1 <- model2(inputs)
y2 <- model3(inputs)
y3 <- layer_average(list(y1, y2, y3))
outputs <- keras_model(inputs = inputs, outputs = outputs) ensemble_model
Manipulate complex graph topologies
Models with multiple inputs and outputs
The functional API makes it easy to manipulate multiple inputs and outputs. This cannot be handled with the Sequential
API.
For example, if you’re building a system for ranking customer issue tickets by priority and routing them to the correct department, then the model will have three inputs:
- the title of the ticket (text input),
- the text body of the ticket (text input), and
- any tags added by the user (categorical input)
This model will have two outputs:
- the priority score between 0 and 1 (scalar sigmoid output), and
- the department that should handle the ticket (softmax output over the set of departments).
You can build this model in a few lines with the functional API:
<- 12 # Number of unique issue tags
num_tags <- 10000 # Size of vocabulary obtained when preprocessing text data
num_words <- 4 # Number of departments for predictions
num_departments
<- layer_input(shape = c(NA), name = "title") # Variable-length sequence of ints
title_input <- layer_input(shape = c(NA), name = "body") # Variable-length sequence of ints
body_input <- layer_input(shape = c(num_tags), name = "tags") # Binary vectors of size `num_tags`
tags_input
# Embed each word in the title into a 64-dimensional vector
<- title_input %>% layer_embedding(num_words, 64)
title_features
# Embed each word in the text into a 64-dimensional vector
<- body_input %>% layer_embedding(num_words, 64)
body_features
# Reduce sequence of embedded words in the title into a single 128-dimensional vector
<- title_features %>% layer_lstm(128)
title_features
# Reduce sequence of embedded words in the body into a single 32-dimensional vector
<- body_features %>% layer_lstm(32)
body_features
# Merge all available features into a single large vector via concatenation
<- layer_concatenate(list(title_features, body_features, tags_input))
x
# Stick a logistic regression for priority prediction on top of the features
<- x %>% layer_dense(1, name = "priority")
priority_pred
# Stick a department classifier on top of the features
<- x %>% layer_dense(num_departments, name = "department")
department_pred
# Instantiate an end-to-end model predicting both priority and department
<- keras_model(
model <- list(title_input, body_input, tags_input),
inputs <- list(priority_pred, department_pred)
outputs )
Now plot the model:
plot(model, show_shapes = TRUE)
When compiling this model, you can assign different losses to each output. You can even assign different weights to each loss – to modulate their contribution to the total training loss.
%>% compile(
model optimizer = optimizer_rmsprop(1e-3),
loss = list(
loss_binary_crossentropy(from_logits = TRUE),
loss_categorical_crossentropy(from_logits = TRUE)
),<- c(1, 0.2)
loss_weights )
Since the output layers have different names, you could also specify the losses and loss weights with the corresponding layer names:
%>% compile(
model optimizer = optimizer_rmsprop(1e-3),
loss = list(
priority = loss_binary_crossentropy(from_logits = TRUE),
department = loss_categorical_crossentropy(from_logits = TRUE)
),loss_weights = c(priority = 1.0, department = 0.2),
)
Train the model by passing lists of NumPy arrays of inputs and targets:
# some helpers to generate dummy input data
<- function(dim)
random_uniform_array array(runif(prod(dim)), dim)
<- function(num_words, dim)
random_vectorized_array array(sample(0:(num_words - 1), prod(dim), replace = TRUE), dim)
# Dummy input data
<- random_vectorized_array(num_words, c(1280, 10))
title_data <- random_vectorized_array(num_words, c(1280, 100))
body_data <- random_vectorized_array(2, c(1280, num_tags))
tags_data # storage.mode(tags_data) <- "double" # from integer
# Dummy target data
<- random_uniform_array(c(1280, 1))
priority_targets <- random_vectorized_array(2, c(1280, num_departments))
dept_targets
%>% fit(
model list(title = title_data, body = body_data, tags = tags_data),
list(priority = priority_targets, department = dept_targets),
epochs = 2,
batch_size = 32
)
Epoch 1/2
40/40 - 5s - loss: 1.2531 - priority_loss: 0.7012 - department_loss: 2.7599 - 5s/epoch - 137ms/step
Epoch 2/2
40/40 - 2s - loss: 1.2548 - priority_loss: 0.7022 - department_loss: 2.7629 - 2s/epoch - 50ms/step
When calling fit with a tfdataset
object, it should yield either a tuple of lists like tuple(list(title_data, body_data, tags_data), list(priority_targets, dept_targets))
or a tuple of named lists like tuple(list(title = title_data, body = body_data, tags = tags_data), list(priority= priority_targets, department= dept_targets))
.
For more detailed explanation, refer to the training and evaluation guide.
A toy ResNet model
In addition to models with multiple inputs and outputs, the functional API makes it easy to manipulate non-linear connectivity topologies – these are models with layers that are not connected sequentially, which the Sequential
API cannot handle.
A common use case for this is residual connections. Let’s build a toy ResNet model for CIFAR10 to demonstrate this:
<- layer_input(shape = c(32, 32, 3), name = "img")
inputs <- inputs %>%
block_1_output layer_conv_2d(32, 3, activation = "relu") %>%
layer_conv_2d(64, 3, activation = "relu") %>%
layer_max_pooling_2d(3)
<- block_1_output %>%
block_2_output layer_conv_2d(64, 3, activation = "relu", padding = "same") %>%
layer_conv_2d(64, 3, activation = "relu", padding = "same") %>%
layer_add(block_1_output)
<- block_2_output %>%
block_3_output layer_conv_2d(64, 3, activation = "relu", padding = "same") %>%
layer_conv_2d(64, 3, activation = "relu", padding = "same") %>%
layer_add(block_2_output)
<- block_3_output %>%
outputs layer_conv_2d(64, 3, activation = "relu") %>%
layer_global_average_pooling_2d() %>%
layer_dense(256, activation = "relu") %>%
layer_dropout(0.5) %>%
layer_dense(10)
<- keras_model(inputs, outputs, name = "toy_resnet")
model model
Model: "toy_resnet"
____________________________________________________________________________
Layer (type) Output Shape Param Connected to
#
============================================================================
img (InputLayer) [(None, 32, 32, 3)] 0 []
conv2d_9 (Conv2D) (None, 30, 30, 32) 896 ['img[0][0]']
conv2d_8 (Conv2D) (None, 28, 28, 64) 18496 ['conv2d_9[0][0]']
max_pooling2d_2 (Ma (None, 9, 9, 64) 0 ['conv2d_8[0][0]']
xPooling2D)
conv2d_11 (Conv2D) (None, 9, 9, 64) 36928 ['max_pooling2d_2[0][0]
']
conv2d_10 (Conv2D) (None, 9, 9, 64) 36928 ['conv2d_11[0][0]']
add (Add) (None, 9, 9, 64) 0 ['conv2d_10[0][0]',
'max_pooling2d_2[0][0]
']
conv2d_13 (Conv2D) (None, 9, 9, 64) 36928 ['add[0][0]']
conv2d_12 (Conv2D) (None, 9, 9, 64) 36928 ['conv2d_13[0][0]']
add_1 (Add) (None, 9, 9, 64) 0 ['conv2d_12[0][0]',
'add[0][0]']
conv2d_14 (Conv2D) (None, 7, 7, 64) 36928 ['add_1[0][0]']
global_average_pool (None, 64) 0 ['conv2d_14[0][0]']
ing2d (GlobalAverag
ePooling2D)
dense_8 (Dense) (None, 256) 16640 ['global_average_poolin
g2d[0][0]']
dropout (Dropout) (None, 256) 0 ['dense_8[0][0]']
dense_7 (Dense) (None, 10) 2570 ['dropout[0][0]']
============================================================================
Total params: 223242 (872.04 KB)
Trainable params: 223242 (872.04 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________________________________________________
Plot the model:
plot(model, show_shapes = TRUE)
Now train the model:
c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_cifar10()
<- x_train / 255
x_train <- x_test / 255
x_test <- to_categorical(y_train, 10)
y_train <- to_categorical(y_test, 10)
y_test
%>% compile(
model optimizer = optimizer_rmsprop(1e-3),
loss = loss_categorical_crossentropy(from_logits = TRUE),
metrics = "acc"
)# We restrict the data to the first 1000 samples so as to limit execution time
# for this guide. Try to train on the entire dataset until convergence!
%>% fit(
model 1:1000, , , ],
x_train[1:1000, ],
y_train[batch_size = 64,
epochs = 1,
validation_split = 0.2
)
13/13 - 2s - loss: 2.3036 - acc: 0.1013 - val_loss: 2.2991 - val_acc: 0.1250 - 2s/epoch - 166ms/step
Extract and reuse nodes in the graph of layers
Because the graph of layers you are manipulating is a static data structure, it can be accessed and inspected. And this is how you are able to plot functional models as images.
This also means that you can access the activations of intermediate layers (“nodes” in the graph) and reuse them elsewhere – which is very useful for something like feature extraction.
Let’s look at an example. This is a VGG19 model with weights pretrained on ImageNet:
<- application_vgg19() vgg19
And these are the intermediate activations of the model, obtained by querying the graph data structure:
<- lapply(vgg19$layers, \(layer) layer$output) features_list
Use these features to create a new feature-extraction model that returns the values of the intermediate layer activations:
<- keras_model(inputs = vgg19$input,
feat_extraction_model outputs = features_list)
<- random_uniform_array(c(1, 224, 224, 3))
img <- feat_extraction_model(img) extracted_features
This comes in handy for tasks like neural style transfer, among other things.
Extend the API using custom layers
tf$keras
includes a wide range of built-in layers, for example:
- Convolutional layers:
Conv1D
,Conv2D
,Conv3D
,Conv2DTranspose
- Pooling layers:
MaxPooling1D
,MaxPooling2D
,MaxPooling3D
,AveragePooling1D
- RNN layers:
GRU
,LSTM
,ConvLSTM2D
BatchNormalization
,Dropout
,Embedding
, etc.
But if you don’t find what you need, it’s easy to extend the API by creating your own layers. All layers subclass the Layer
class and implement:
call
method, that specifies the computation done by the layer.build
method, that creates the weights of the layer (this is just a style convention since you can create weights in__init__
, as well).
To learn more about creating layers from scratch, read custom layers and models guide.
The following is a basic implementation of layer_dense()
:
library(tensorflow)
library(keras)
<- new_layer_class(
layer_custom_dense "CustomDense",
initialize = function(units = 32) {
$initialize()
super$units = as.integer(units)
self
},build = function(input_shape) {
$w <- self$add_weight(
selfshape = shape(tail(input_shape, 1), self$units),
initializer = "random_normal",
trainable = TRUE
)$b <- self$add_weight(
selfshape = shape(self$units),
initializer = "random_normal",
trainable = TRUE
)
},call = function(inputs) {
$matmul(inputs, self$w) + self$b
tf
}
)
<- layer_input(c(4))
inputs <- inputs %>% layer_custom_dense(10)
outputs
<- keras_model(inputs, outputs) model
For serialization support in your custom layer, define a get_config
method that returns the constructor arguments of the layer instance:
<- new_layer_class(
layer_custom_dense "CustomDense",
initialize = function(units = 32) {
$initialize()
super$units <- as.integer(units)
self
},
build = function(input_shape) {
$w <-
self$add_weight(
selfshape = shape(tail(input_shape, 1), self$units),
initializer = "random_normal",
trainable = TRUE
)$b <- self$add_weight(
selfshape = shape(self$units),
initializer = "random_normal",
trainable = TRUE
)
},
call = function(inputs) {
$matmul(inputs, self$w) + self$b
tf
},
get_config = function() {
list(units = self$units)
}
)
<- layer_input(c(4))
inputs <- inputs %>% layer_custom_dense(10)
outputs
<- keras_model(inputs, outputs)
model <- model %>% get_config()
config
<- from_config(config, custom_objects = list(layer_custom_dense)) new_model
Optionally, implement the class method from_config(class_constructor, config)
which is used when recreating a layer instance given its config. The default implementation of from_config
is approximately:
<- function(layer_constructor, config)
from_config do.call(layer_constructor, config)
When to use the functional API
Should you use the Keras functional API to create a new model, or just subclass the Model
class directly? In general, the functional API is higher-level, easier and safer, and has a number of features that subclassed models do not support.
However, model subclassing provides greater flexibility when building models that are not easily expressible as directed acyclic graphs of layers. For example, you could not implement a Tree-RNN with the functional API and would have to subclass Model
directly.
For an in-depth look at the differences between the functional API and model subclassing, read What are Symbolic and Imperative APIs in TensorFlow 2.0?.
Functional API strengths:
The following properties are also true for Sequential models (which are also data structures), but are not true for subclassed models (which are R code, not data structures).
Less verbose
There is no super$initialize(...)
, no call <- function(...) { }
, etc.
Compare:
<- layer_input(shape = c(32))
inputs <- inputs %>%
outputs layer_dense(64, activation = 'relu') %>%
layer_dense(10)
<- keras_model(inputs, outputs) mlp
With the subclassed version:
<- new_model_class(
MLP classname = "MLP",
initialize = function(...) {
$initialize(...)
super$dense_1 <- layer_dense(units = 64, activation = 'relu')
self$dense_2 <- layer_dense(units = 10)
self
},
call = function(inputs) {
%>%
inputs $dense_1() %>%
self$dense_2()
self
}
)
# Instantiate the model.
<- MLP()
mlp
# Necessary to create the model's state.
# The model doesn't have a state until it's called at least once.
invisible(mlp(tf$zeros(shape(1, 32))))
Model validation while defining its connectivity graph
In the functional API, the input specification (shape and dtype) is created in advance (using layer_input
). Every time you call a layer, the layer checks that the specification passed to it matches its assumptions, and it will raise a helpful error message if not.
This guarantees that any model you can build with the functional API will run. All debugging – other than convergence-related debugging – happens statically during the model construction and not at execution time. This is similar to type checking in a compiler.
A functional model is plottable and inspectable
You can plot the model as a graph, and you can easily access intermediate nodes in this graph. For example, to extract and reuse the activations of intermediate layers (as seen in a previous example):
<- lapply(vgg19$layers, \(layer) layer$output)
features_list <- keras_model(inputs = vgg19$input,
feat_extraction_model outputs = features_list)
A functional model can be serialized or cloned
Because a functional model is a data structure rather than a piece of code, it is safely serializable and can be saved as a single file that allows you to recreate the exact same model without having access to any of the original code. See the serialization & saving guide.
To serialize a subclassed model, it is necessary for the implementer to specify a get_config()
and from_config()
method at the model level.
Functional API weakness:
It does not support dynamic architectures
The functional API treats models as DAGs of layers. This is true for most deep learning architectures, but not all – for example, recursive networks or Tree RNNs do not follow this assumption and cannot be implemented in the functional API.
Mix-and-match API styles
Choosing between the functional API or Model subclassing isn’t a binary decision that restricts you into one category of models. All models in the tf$keras
API can interact with each other, whether they’re Sequential
models, functional models, or subclassed models that are written from scratch.
You can always use a functional model or Sequential
model as part of a subclassed model or layer:
<- 32L
units <- 10L
timesteps <- 5L
input_dim
# Define a Functional model
<- layer_input(c(NA, units))
inputs <- inputs %>%
outputs layer_global_average_pooling_1d() %>%
layer_dense(1)
<- keras_model(inputs, outputs)
model
<- new_layer_class(
layer_custom_rnn "CustomRNN",
initialize = function() {
$initialize()
super$units <- units
self$projection_1 <-
selflayer_dense(units = units, activation = "tanh")
$projection_2 <-
selflayer_dense(units = units, activation = "tanh")
# Our previously-defined Functional model
$classifier <- model
self
},
call = function(inputs) {
message("inputs shape: ", format(inputs$shape))
c(batch_size, timesteps, channels) %<-% dim(inputs)
<- vector("list", timesteps)
outputs <- tf$zeros(shape(batch_size, self$units))
state for (t in 1:timesteps) {
# iterate over each time_step
<- state <-
outputs[[t]] %>%
inputs[, t, ] $projection_1() %>%
self+ self$projection_2(state) }
{ .
}
<- tf$stack(outputs, axis = 1L) # axis is 1-based
features message("features shape: ", format(features$shape))
$classifier(features)
self
}
)
layer_custom_rnn(tf$zeros(shape(1, timesteps, input_dim)))
inputs shape: (1, 10, 5)
features shape: (1, 10, 32)
You can use any subclassed layer or model in the functional API as long as it implements a call
method that follows one of the following patterns:
call(inputs, ..., training = NULL, mask = NULL)
– Whereinputs
is a tensor or a nested structure of tensors (e.g. a list of tensors), and where optional named argumentstraining
andmask
can be present.are non-tensor arguments (non-inputs).
call(self, inputs, training = NULL, **kwargs)
– Wheretraining
is a boolean indicating whether the layer should behave in training mode and inference mode.call(self, inputs, mask = NULL, **kwargs)
– Wheremask
is a boolean mask tensor (useful for RNNs, for instance).call(self, inputs, training = NULL, mask = NULL, **kwargs)
– Of course, you can have both masking and training-specific behavior at the same time.
Additionally, if you implement the get_config
method on your custom Layer or model, the functional models you create will still be serializable and cloneable.
Here’s a quick example of a custom RNN, written from scratch, being used in a functional model:
<- 32
units <- 10
timesteps <- 5
input_dim <- 16
batch_size
<- new_layer_class(
layer_custom_rnn "CustomRNN",
initialize = function() {
$initialize()
super$units <- units
self$projection_1 <- layer_dense(units = units, activation = "tanh")
self$projection_2 <- layer_dense(units = units, activation = "tanh")
self$classifier <- layer_dense(units = 1)
self
},
call = function(inputs) {
c(batch_size, timesteps, channels) %<-% dim(inputs)
<- vector("list", timesteps)
outputs <- tf$zeros(shape(batch_size, self$units))
state for (t in 1:timesteps) {
# iterate over each time_step
<- state <-
outputs[[t]] %>%
inputs[, t, ] $projection_1() %>%
self+ self$projection_2(state) }
{ .
}
<- tf$stack(outputs, axis = 1L) # axis arg is 1-based
features $classifier(features)
self
}
)
# Note that you specify a static batch size for the inputs with the `batch_shape`
# arg, because the inner computation of `CustomRNN` requires a static batch size
# (when you create the `state` zeros tensor).
<- layer_input(batch_shape = c(batch_size, timesteps, input_dim))
inputs <- inputs %>%
outputs layer_conv_1d(32, 3) %>%
layer_custom_rnn()
<- keras_model(inputs, outputs)
model model(tf$zeros(shape(1, 10, 5)))
tf.Tensor(
[[[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]]], shape=(1, 8, 1), dtype=float32)
Environment Details
::tf_config() tensorflow
TensorFlow v2.13.0 (~/.virtualenvs/r-tensorflow-website/lib/python3.10/site-packages/tensorflow)
Python v3.10 (~/.virtualenvs/r-tensorflow-website/bin/python)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] keras_2.13.0.9000 tensorflow_2.13.0.9000
loaded via a namespace (and not attached):
[1] vctrs_0.6.3 cli_3.6.1 knitr_1.43
[4] zeallot_0.1.0 rlang_1.1.1 xfun_0.40
[7] png_0.1-8 generics_0.1.3 jsonlite_1.8.7
[10] glue_1.6.2 htmltools_0.5.6 fansi_1.0.4
[13] rmarkdown_2.24 grid_4.3.1 tfruns_1.5.1
[16] evaluate_0.21 tibble_3.2.1 base64enc_0.1-3
[19] fastmap_1.1.1 yaml_2.3.7 lifecycle_1.0.3
[22] whisker_0.4.1 compiler_4.3.1 htmlwidgets_1.6.2
[25] Rcpp_1.0.11 pkgconfig_2.0.3 rstudioapi_0.15.0
[28] lattice_0.21-8 digest_0.6.33 R6_2.5.1
[31] reticulate_1.31.0.9000 utf8_1.2.3 pillar_1.9.0
[34] magrittr_2.0.3 Matrix_1.5-4.1 tools_4.3.1
system2(reticulate::py_exe(), c("-m pip freeze"), stdout = TRUE) |> writeLines()
absl-py==1.4.0
array-record==0.4.1
asttokens==2.2.1
astunparse==1.6.3
backcall==0.2.0
bleach==6.0.0
cachetools==5.3.1
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.7
decorator==5.1.1
dm-tree==0.1.8
etils==1.4.1
executing==1.2.0
flatbuffers==23.5.26
gast==0.4.0
google-auth==2.22.0
google-auth-oauthlib==1.0.0
google-pasta==0.2.0
googleapis-common-protos==1.60.0
grpcio==1.57.0
h5py==3.9.0
idna==3.4
importlib-resources==6.0.1
ipython==8.14.0
jedi==0.19.0
kaggle==1.5.16
keras==2.13.1
keras-tuner==1.3.5
kt-legacy==1.0.5
libclang==16.0.6
Markdown==3.4.4
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
numpy==1.24.3
nvidia-cublas-cu11==11.11.3.6
nvidia-cudnn-cu11==8.6.0.163
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==23.1
pandas==2.0.3
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==10.0.0
promise==2.3
prompt-toolkit==3.0.39
protobuf==3.20.3
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.5.0
pyasn1-modules==0.3.0
pydot==1.4.2
Pygments==2.16.1
pyparsing==3.1.1
python-dateutil==2.8.2
python-slugify==8.0.1
pytz==2023.3
requests==2.31.0
requests-oauthlib==1.3.1
rsa==4.9
scipy==1.11.2
six==1.16.0
stack-data==0.6.2
tensorboard==2.13.0
tensorboard-data-server==0.7.1
tensorflow==2.13.0
tensorflow-datasets==4.9.2
tensorflow-estimator==2.13.0
tensorflow-hub==0.14.0
tensorflow-io-gcs-filesystem==0.33.0
tensorflow-metadata==1.14.0
termcolor==2.3.0
text-unidecode==1.3
toml==0.10.2
tqdm==4.66.1
traitlets==5.9.0
typing_extensions==4.5.0
tzdata==2023.3
urllib3==1.26.16
wcwidth==0.2.6
webencodings==0.5.1
Werkzeug==2.3.7
wrapt==1.15.0
zipp==3.16.2
TF Devices:
- PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
- PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
CPU cores: 12
Date rendered: 2023-08-28
Page render time: 27 seconds