R Interface to Google CloudML
The cloudml package provides an R interface to Google Cloud Machine Learning Engine, a managed service that enables:
On-demand access to training on GPUs, including the new Tesla P100 GPUs from NVIDIA®.
Hyperparameter tuning to optmize key attributes of model architectures in order to maximize predictive accuracy.
Deployment of trained models to the Google global prediction platform that can support thousands of users and TBs of data.
CloudML is a managed service where you pay only for the hardware resources that you use. Prices vary depending on configuration (e.g. CPU vs. GPU vs. multiple GPUs). See https://cloud.google.com/ml-engine/pricing for additional details.
Google Cloud Account
Before you can begin training models with CloudML you need to have a Google Cloud Account. If you don’t already have an account you can create one at https://console.cloud.google.com.
If you are a new customer of Google Cloud you will receive a 12-month, $300 credit that can be applied to your use of CloudML. In addition, Google is providing a $200 credit for users of the R interface to CloudML (this credit applies to both new and existing customers). Use this link to apply for the $200 credit.
Start by installing the cloudml R package from GitHub as follows:
Then, install the Google Cloud SDK, a set of utilties that enable you to interact with your Google Cloud account from within R. You can install the SDK using the
Note that in order to ensure that the cloudml package can find your installation of the SDK you should accept the default installation location (
~/) suggested within the installer.
As part of the installation you are asked to specify a default account, project, and compute region for Google Cloud. These settings are then used automatically for all CloudML jobs. To change the default account, project, or region you can use the
Once you’ve completed these steps you are ready to train models with CloudML!
Training on CloudML
To train a model on CloudML, first work the training script locally (perhaps with a smaller sample of your dataset). The script can contain arbitrary R code which trains and/or evaluates a model. Once you’ve confirmed that things work as expected, you can submit a CloudML job to perform training in the cloud.
Submitting a Job
To submit a job, call the
cloudml_train() function, specifying the R script to execute for training:
All of the files within the current working directory will be bundled up and sent along with the script to CloudML.
Note that the very first time you submit a job to CloudML the various packages required to run your script will be compiled from source. This will make the execution time of the job considerably longer that you might expect. It’s only the first job that incurs this overhead though (since the package installations are cached), and subsequent jobs will run more quickly.
If you are using RStudio v1.1 or higher, then the CloudML training job is monitored (and it’s results collected) using a background terminal:
When the job is complete, training results can be collected back to your local system (this is done automatically when monitoring the job using a background terminal in RStudio). A run report is displayed after the job is collected:
You can list all previous runs as a data frame using the
# A tibble: 6 x 34 run_dir metric_loss metric_acc metric_val_loss metric_val_acc <chr> <dbl> <dbl> <dbl> <dbl> 1 runs/cloudml_2017_12_15_182614794 0.0809 0.9763 0.0889 0.9786 2 runs/cloudml_2017_12_14_183247626 0.0806 0.9770 0.0919 0.9773 3 runs/cloudml_2017_12_14_144048138 0.0786 0.9772 0.0896 0.9777 4 runs/cloudml_2017_12_14_143427111 0.0803 0.9771 0.0940 0.9760 5 runs/cloudml_2017_12_14_124739611 0.0829 0.9766 0.0913 0.9782 6 runs/cloudml_2017_12_14_124625505 0.0805 0.9765 0.0981 0.9766 # ... with 29 more variables: flag_dropout1 <dbl>, flag_dropout2 <dbl>, samples <int>, # validation_samples <int>, batch_size <int>, epochs <int>, epochs_completed <int>, # metrics <chr>, model <chr>, loss_function <chr>, optimizer <chr>, learning_rate <dbl>, # script <chr>, start <dttm>, end <dttm>, completed <lgl>, output <chr>, source_code <chr>, # context <chr>, type <chr>, cloudml_console_url <chr>, cloudml_created <dttm>, # cloudml_end <dttm>, cloudml_job <chr>, cloudml_log_url <chr>, cloudml_ml_units <dbl>, # cloudml_scale_tier <chr>, cloudml_start <dttm>, cloudml_state <chr>
You can view run reports using the
# view the latest run view_run() # view a specific run view_run("runs/cloudml_2017_12_15_182614794")
There are many tools available to list, filter, and compare training runs. For additional information see the documentation for the tfruns package.
Training with a GPU
By default, CloudML utilizes “standard” CPU-based instances suitable for training simple models with small to moderate datasets. You can request the use of other machine types, including ones with GPUs, using the
master_type parameter of
For example, the following would train the same model as above but with a Tesla K80 GPU:
cloudml_train("train.R", master_type = "standard_gpu")
To train using a Tesla P100 GPU you would specify
cloudml_train("train.R", master_type = "standard_p100")
To train on a machine with 4 Tesla P100 GPU’s you would specify
cloudml_train("train.R", master_type = "complex_model_m_p100")
See the CloudML website for documentation on available machine types. Also note that GPU instances can be considerably more expensive that CPU ones! See the documentation on CloudML Pricing for details.
To learn more about using CloudML with R, see the following articles:
Training with CloudML goes into additional depth on managing training jobs and their output.
Hyperparameter Tuning explores how you can improve the performance of your models by running many trials with distinct hyperparameters (e.g. number and size of layers) to determine their optimal values.
Google Cloud Storage provides information on copying data between your local machine and Google Storage and also describes how to use data within Google Storage during training.
Deploying Models describes how to deploy trained models and generate predictions from them.