Reuters newswire topics classification

    Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with dataset_imdb() , each wire is encoded as a sequence of word indexes (same conventions).

    dataset_reuters(
      path = "reuters.npz",
      num_words = NULL,
      skip_top = 0L,
      maxlen = NULL,
      test_split = 0.2,
      seed = 113L,
      start_char = 1L,
      oov_char = 2L,
      index_from = 3L
    )
    
    dataset_reuters_word_index(path = "reuters_word_index.pkl")

    Arguments

    path

    Where to cache the data (relative to ~/.keras/dataset).

    num_words

    Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept

    skip_top

    Skip the top N most frequently occuring words (which may not be informative).

    maxlen

    Truncate sequences after this length.

    test_split

    Fraction of the dataset to be used as test data.

    seed

    Random seed for sample shuffling.

    start_char

    The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.

    oov_char

    words that were cut out because of the num_words or skip_top limit will be replaced with this character.

    index_from

    index actual words with this index and higher.

    Value

    Lists of training and test data: train$x, train$y, test$x, test$y with same format as dataset_imdb(). The dataset_reuters_word_index() function returns a list where the names are words and the values are integer. e.g. word_index[["giraffe"]] might return 1234.

    See also