Ragged tensors

    Overview

    Your data comes in many shapes; your tensors should too. Ragged tensors are the TensorFlow equivalent of nested variable-length lists. They make it easy to store and process data with non-uniform shapes, including:

    • Variable-length features, such as the set of actors in a movie.
    • Batches of variable-length sequential inputs, such as sentences or video clips.
    • Hierarchical inputs, such as text documents that are subdivided into sections, paragraphs, sentences, and words.
    • Individual fields in structured inputs, such as protocol buffers.

    What you can do with a ragged tensor

    Ragged tensors are supported by more than a hundred TensorFlow operations, including math operations (such as tf$add and tf$reduce_mean), array operations (such as tf$concat and tf$tile), string manipulation ops (such as tf$substr), and many others:

    library(tensorflow)
    digits <- tf$ragged$constant(
      list(list(3, 1, 4, 1), list(), list(5, 9, 2), list(6), list())
    )
    words = tf$ragged$constant(
      list(list("So", "long"), list("thanks", "for", "all", "the", "fish"))
    )
    tf$add(digits, 3)
    ## tf.RaggedTensor(values=Tensor("Add_1:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
    ## Tensor("RaggedReduceMean/truediv:0", shape=(5,), dtype=float32)
    tf$concat(list(digits, list(list(5, 3))), axis=0L)
    ## tf.RaggedTensor(values=Tensor("RaggedConcat/concat:0", shape=(10,), dtype=float32), row_splits=Tensor("RaggedConcat/concat_1:0", shape=(7,), dtype=int64))
    ## tf.RaggedTensor(values=Tensor("RaggedTile/Tile:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedTile/concat_1:0", shape=(6,), dtype=int64))
    tf$strings$substr(words, 0L, 2L)
    ## tf.RaggedTensor(values=Tensor("Substr_1:0", shape=(7,), dtype=string), row_splits=Tensor("RaggedConstant_1/RaggedFromRowSplits/row_splits:0", shape=(3,), dtype=int64))

    There are also a number of methods and operations that are specific to ragged tensors, including factory methods, conversion methods, and value-mapping operations.

    As with normal tensors, you can use R-style indexing to access specific slices of a ragged tensor. For more information, see the section on Indexing below.

    ## Tensor("RaggedGetItem/strided_slice_5:0", shape=(4,), dtype=float32)
    ## tf.RaggedTensor(values=Tensor("RaggedGetItem_1/GatherV2:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedGetItem_1/RaggedRange:0", shape=(6,), dtype=int64))
    ## Warning: Negative numbers are interpreted python-style when subsetting tensorflow tensors.(they select items by counting from the back). For more details, see: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#basic-slicing-and-indexing
    ## To turn off this warning, set 'options(tensorflow.extract.warn_negatives_pythonic = FALSE)'
    ## tf.RaggedTensor(values=Tensor("RaggedGetItem_2/GatherV2:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedGetItem_2/RaggedRange:0", shape=(6,), dtype=int64))

    And just like normal tensors, you can use Python arithmetic and comparison operators to perform elementwise operations. For more information, see the section on Overloaded Operators below.

    ## tf.RaggedTensor(values=Tensor("Add_3:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))
    digits + tf$ragged$constant(list(list(1, 2, 3, 4), list(), list(5, 6, 7), list(8), list()))
    ## tf.RaggedTensor(values=Tensor("Add_5:0", shape=(?,), dtype=float32), row_splits=Tensor("RaggedTile_1/concat_1:0", shape=(6,), dtype=int64))

    If you need to perform an elementwise transformation to the values of a RaggedTensor, you can use tf$ragged$map_flat_values, which takes a function plus one or more arguments, and applies the function to transform the RaggedTensor’s values.

    ## tf.RaggedTensor(values=Tensor("Add_6:0", shape=(8,), dtype=float32), row_splits=Tensor("RaggedConstant/RaggedFromRowSplits/row_splits:0", shape=(6,), dtype=int64))

    Constructing a ragged tensor

    The simplest way to construct a ragged tensor is using tf$ragged$constant, which builds the RaggedTensor corresponding to a given nested list:

    sentences <- tf$ragged$constant(list(
        list("Let's", "build", "some", "ragged", "tensors", "!"),
        list("We", "can", "use", "tf.ragged.constant", ".")))
    paragraphs <- tf$ragged$constant(list(
        list(list('I', 'have', 'a', 'cat'), list('His', 'name', 'is', 'Mat')),
        list(list('Do', 'you', 'want', 'to', 'come', 'visit'), list("I'm", 'free', 'tomorrow'))
    ))
    paragraphs
    ## tf.RaggedTensor(values=tf.RaggedTensor(values=Tensor("RaggedConstant_4/values:0", shape=(17,), dtype=string), row_splits=Tensor("RaggedConstant_4/RaggedFromRowSplits/row_splits:0", shape=(5,), dtype=int64)), row_splits=Tensor("RaggedConstant_4/RaggedFromRowSplits_1/row_splits:0", shape=(3,), dtype=int64))

    Ragged tensors can also be constructed by pairing flat values tensors with row-partitioning tensors indicating how those values should be divided into rows, using factory classmethods such as tf$RaggedTensor$from_value_rowids, tf$RaggedTensor$from_row_lengths, and tf$RaggedTensor$from_row_splits.

    tf$RaggedTensor$from_value_rowids

    If you know which row each value belongs in, then you can build a RaggedTensor using a value_rowids row-partitioning tensor:

    tf$RaggedTensor$from_value_rowids(
        values=as.integer(c(3, 1, 4, 1, 5, 9, 2, 6)),
        value_rowids=as.integer(c(0, 0, 0, 0, 2, 2, 2, 3)))
    ## tf.RaggedTensor(values=Tensor("RaggedFromValueRowIds/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromValueRowIds/concat:0", shape=(5,), dtype=int64))

    tf.RaggedTensor.from_row_lengths

    If you know how long each row is, then you can use a row_lengths row-partitioning tensor:

    ## tf.RaggedTensor(values=Tensor("RaggedFromRowLengths/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromRowLengths/concat:0", shape=(5,), dtype=int64))

    tf.RaggedTensor.from_row_splits

    If you know the index where each row starts and ends, then you can use a row_splits row-partitioning tensor:

    row_splits

    row_splits

    ## tf.RaggedTensor(values=Tensor("RaggedFromRowSplits/values:0", shape=(8,), dtype=int32), row_splits=Tensor("RaggedFromRowSplits/row_splits:0", shape=(5,), dtype=int64))

    See the tf.RaggedTensor class documentation for a full list of factory methods.

    What you can store in a ragged tensor

    As with normal Tensors, the values in a RaggedTensor must all have the same type; and the values must all be at the same nesting depth (the rank of the tensor):

    tf$ragged$constant(list(list("Hi"), list("How", "are", "you"))) # ok: type=string, rank=2
    ## tf.RaggedTensor(values=Tensor("RaggedConstant_5/values:0", shape=(4,), dtype=string), row_splits=Tensor("RaggedConstant_5/RaggedFromRowSplits/row_splits:0", shape=(3,), dtype=int64))
    tf$ragged$constant(list(list("one", "two"), list(3, 4))) # bad: multiple types
    tf$ragged$constant(list("A", list("B", "C"))) # bad: multiple nesting depths

    This is a small introduction to Ragged Tensors in TensorFlow. See the complete tutorial (in Python) here.