text_one_hot

One-hot encode a text into a list of word indexes in a vocabulary of size n.

One-hot encode a text into a list of word indexes in a vocabulary of size n.

 
text_one_hot( 
  input_text, 
  n, 
  filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", 
  lower = TRUE, 
  split = " ", 
  text = NULL 
)

Arguments	Description
input_text	Input text (string).
n	Size of vocabulary (integer)
filters	Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines.
lower	Whether to convert the input to lowercase.
split	Sentence split marker (string).
text	for compatibility purpose. use `input_text` instead.

List of integers in [1, n]. Each integer encodes a word (unicity non-guaranteed).