Construct a Categorical Column with In-Memory Vocabulary

Use this when your inputs are in string or integer format, and you have an in-memory vocabulary mapping each value to an integer ID. By default, out-of-vocabulary values are ignored. Use default_value to specify how to include out-of-vocabulary values. For the input dictionary features, features$key is either tensor or sparse tensor object. If it's tensor object, missing values can be represented by -1 for int and '' for string.

column_categorical_with_vocabulary_list(..., vocabulary_list,
  dtype = NULL, default_value = -1L, num_oov_buckets = 0L)



Expression(s) identifying input feature(s). Used as the column name and the dictionary key for feature parsing configs, feature tensors, and feature columns.


An ordered iterable defining the vocabulary. Each feature is mapped to the index of its value (if present) in vocabulary_list. Must be castable to dtype.


The type of features. Only string and integer types are supported. If NULL, it will be inferred from vocabulary_list.


The value to use for values not in vocabulary_list.


Non-negative integer, the number of out-of-vocabulary buckets. All out-of-vocabulary inputs will be assigned IDs in the range [vocabulary_size, vocabulary_size+num_oov_buckets) based on a hash of the input value. A positive num_oov_buckets can not be specified with default_value.


A categorical column with in-memory vocabulary.


Note that these values are independent of the default_value argument.


  • ValueError: if vocabulary_list is empty, or contains duplicate keys.

  • ValueError: if dtype is not integer or string.

See also