Boosted Trees Estimator

Construct a boosted trees estimator.

boosted_trees_regressor(feature_columns, n_batches_per_layer,
  model_dir = NULL, label_dimension = 1L, weight_column = NULL,
  n_trees = 100L, max_depth = 6L, learning_rate = 0.1,
  l1_regularization = 0, l2_regularization = 0, tree_complexity = 0,
  min_node_weight = 0, config = NULL)

boosted_trees_classifier(feature_columns, n_batches_per_layer,
  model_dir = NULL, n_classes = 2L, weight_column = NULL,
  label_vocabulary = NULL, n_trees = 100L, max_depth = 6L,
  learning_rate = 0.1, l1_regularization = 0, l2_regularization = 0,
  tree_complexity = 0, min_node_weight = 0, config = NULL)

Arguments

feature_columns

An R list containing all of the feature columns used by the model (typically, generated by feature_columns()).

n_batches_per_layer

The number of batches to collect statistics per layer.

model_dir

Directory to save the model parameters, graph, and so on. This can also be used to load checkpoints from the directory into a estimator to continue training a previously saved model.

label_dimension

Number of regression targets per example. This is the size of the last dimension of the labels and logits Tensor objects (typically, these have shape [batch_size, label_dimension]).

weight_column

A string, or a numeric column created by column_numeric() defining feature column representing weights. It is used to down weight or boost examples during training. It will be multiplied by the loss of the example. If it is a string, it is used as a key to fetch weight tensor from the features argument. If it is a numeric column, then the raw tensor is fetched by key weight_column$key, then weight_column$normalizer_fn is applied on it to get weight tensor.

n_trees

Number trees to be created.

max_depth

Maximum depth of the tree to grow.

learning_rate

Shrinkage parameter to be used when a tree added to the model.

l1_regularization

Regularization multiplier applied to the absolute weights of the tree leafs.

l2_regularization

Regularization multiplier applied to the square weights of the tree leafs.

tree_complexity

Regularization factor to penalize trees with more leaves.

min_node_weight

Minimum hessian a node must have for a split to be considered. The value will be compared with sum(leaf_hessian)/(batch_size * n_batches_per_layer).

config

A run configuration created by run_config(), used to configure the runtime settings.

n_classes

The number of label classes.

label_vocabulary

A list of strings represents possible label values. If given, labels must be string type and have any value in label_vocabulary. If it is not given, that means labels are already encoded as integer or float within [0, 1] for n_classes == 2 and encoded as integer values in {0, 1,..., n_classes -1} for n_classes > 2. Also there will be errors if vocabulary is not provided and labels are string.

See also