TensorFlow serving

In this tutorial you will learn how to deploy a TensorFlow model using TensorFlow serving.

We will use the Docker container provided by the TensorFlow organization to deploy a model that classifies images of handwritten digits.

Using the Docker container is a an easy way to test the API locally and then deploy it to any cloud provider.

Building the model

The first thing we are going to do is to build our model. We will use the Keras API to build this model.

We will use the MNIST dataset to build our model.

library(keras)
library(tensorflow)
mnist <- dataset_mnist()

mnist$train$x <- (mnist$train$x/255) %>% 
  array_reshape(., dim = c(dim(.), 1))

mnist$test$x <- (mnist$test$x/255) %>% 
  array_reshape(., dim = c(dim(.), 1))

Now, we are going to define our Keras model, it will be a simple convolutional neural network.

model <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 16, kernel_size = c(3,3), activation = "relu") %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  layer_conv_2d(filters = 16, kernel_size = c(3,3), activation = "relu") %>% 
  layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  layer_flatten() %>% 
  layer_dense(units = 128, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

model %>% 
  compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = "adam",
    metrics = "accuracy"
  )

Next, we fit the model using the MNIST dataset:

model %>% 
  fit(
    x = mnist$train$x, y = mnist$train$y,
    batch_size = 32,
    epochs = 5,
    validation_sample = 0.2,
    verbose = 2
  )
Epoch 1/5
1875/1875 - 7s - loss: 0.1920 - accuracy: 0.9425 - 7s/epoch - 4ms/step
Epoch 2/5
1875/1875 - 5s - loss: 0.0607 - accuracy: 0.9816 - 5s/epoch - 3ms/step
Epoch 3/5
1875/1875 - 5s - loss: 0.0439 - accuracy: 0.9861 - 5s/epoch - 3ms/step
Epoch 4/5
1875/1875 - 6s - loss: 0.0336 - accuracy: 0.9897 - 6s/epoch - 3ms/step
Epoch 5/5
1875/1875 - 5s - loss: 0.0264 - accuracy: 0.9916 - 5s/epoch - 3ms/step

When we are happy with our model accuracy in the validation dataset we can evaluate the results on the test dataset with:

model %>% evaluate(x = mnist$test$x, y = mnist$test$y)
313/313 - 0s - loss: 0.0342 - accuracy: 0.9876 - 463ms/epoch - 1ms/step
      loss   accuracy 
0.03422508 0.98760003 

OK, we have 99% accuracy on the test dataset and we want to deploy that model. First, let’s save the model in the SavedModel format using:

save_model_tf(model, "cnn-mnist")

With the model built and saved we can now start building our plumber API file.

Running locally

You can run the tensorflow/serving Docker image locally using the great stevedore package. For example:

docker <- stevedore::docker_client()
container <- docker$container$run(
  image = "tensorflow/serving", # name of the image
  
  # host port and docker port - if you set 4000:8501, the API 
  # will be accecible in localhost:4000
  port = "8501:8501", 
  
  # a string path/to/the/saved/model/locally:models/modelname/version
  # you must put the model file in the /models/ folder.
  volume = paste0(normalizePath("cnn-mnist"), ":/models/model/1"), 
  
  # the name of the model - it's the name of the folder inside `/models`
  # above.
  env = c("MODEL_NAME" = "model"),
  
  # to run the container detached
  detach = TRUE
)

Now we have initialized the container serving the model. You can see the container logs with:

container$logs()

Now you can make POST requests no the following endpoint : http://localhost:8501/v1/models/model/versions/1:predict. The input data must be passed in a special format 0 - see the format definition here, which may seem unnatural for R users. Here is an example:

instances <- purrr::array_tree(mnist$test$x[1:5,,,,drop=FALSE]) %>% 
  purrr::map(~list(input_1 = .x))
instances <- list(instances = instances)

req <- httr::POST(
  "http://localhost:8501/v1/models/model/versions/1:predict", 
  body = instances, 
  encode = "json"
)
httr::content(req)

This is how you can serve TensorFlow models with TF serving locally. Additionaly, we can deploy this to multiple clouds. In the next section we will show how it can be deployed to Google Cloud.

When done, you can stop the container with:

container$stop()

Deploying to Google Cloud Run

THe first thing you need to do is to follow the section Before you begin in this page.

Now let’s create a Dockerfile that will copy the SavedModel to the container image. We assume in this section some experience with Docker.

Here’s an example - create a file called Dockerfile in the same root folder as your SavedModel and paste the following:

FROM tensorflow/serving
COPY cnn-mnist /models/model/1
ENTRYPOINT ["/usr/bin/tf_serving_entrypoint.sh", "--rest_api_port=8080"]

We need to run the rest service in the 8080 port. The only that is open by Google Cloud Run. Now you can build this image and send it to gcr.io. Run the following in your terminal:

docker build -t gcr.io/PROJECT-ID/cnn-mnist .
docker push gcr.io/PROJECT-ID/cnn-mnist

You can get your PROJECT-ID by running:

gcloud config get-value project

Next, we can create the service in Google Cloud Run using:

gcloud run deploy --image gcr.io/rstudio-162821/cnn-mnist --platform managed

You will be prompted to select a region, a name for the service and wether you allow unauthorized requests. If everything works correctly you will get a url like https://cnn-mnist-ld4lzfalyq-ue.a.run.app which you can now use to make requests to your model. For example:

req <- httr::POST(
  "https://cnn-mnist-ld4lzfalyq-ue.a.run.app/v1/models/model/versions/1:predict", 
  body = instances, 
  encode = "json"
)
httr::content(req)

Note that in this case, all pre-processing must be done in R before sending the data to the API.