One-hot encode a text into a list of word indexes in a vocabulary of size n.

    One-hot encode a text into a list of word indexes in a vocabulary of size n.

    text_one_hot(
      text,
      n,
      filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n",
      lower = TRUE,
      split = " "
    )

    Arguments

    text

    Input text (string).

    n

    Size of vocabulary (integer)

    filters

    Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines.

    lower

    Whether to convert the input to lowercase.

    split

    Sentence split marker (string).

    Value

    List of integers in [1, n]. Each integer encodes a word (unicity non-guaranteed).

    See also