Construct a Categorical Column with a Vocabulary File

Use this when your inputs are in string or integer format, and you have a vocabulary file that maps each value to an integer ID. By default, out-of-vocabulary values are ignored. Use either (but not both) of num_oov_buckets and default_value to specify how to include out-of-vocabulary values. For input dictionary features, features[key] is either tensor or sparse tensor object. If it's tensor object, missing values can be represented by -1 for int and '' for string. Note that these values are independent of the default_value argument.

column_categorical_with_vocabulary_file(..., vocabulary_file,
  vocabulary_size, num_oov_buckets = 0L, default_value = NULL,
  dtype = tf$string)



Expression(s) identifying input feature(s). Used as the column name and the dictionary key for feature parsing configs, feature tensors, and feature columns.


The vocabulary file name.


Number of the elements in the vocabulary. This must be no greater than length of vocabulary_file, if less than length, later values are ignored.


Non-negative integer, the number of out-of-vocabulary buckets. All out-of-vocabulary inputs will be assigned IDs in the range [vocabulary_size, vocabulary_size+num_oov_buckets) based on a hash of the input value. A positive num_oov_buckets can not be specified with default_value.


The integer ID value to return for out-of-vocabulary feature values, defaults to -1. This can not be specified with a positive num_oov_buckets.


The type of features. Only string and integer types are supported.


A categorical column with a vocabulary file.


  • ValueError: vocabulary_file is missing.

  • ValueError: vocabulary_size is missing or < 1.

  • ValueError: num_oov_buckets is not a non-negative integer.

  • ValueError: dtype is neither string nor integer.

See also