dataset_reuters
Reuters newswire topics classification
Description
Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with dataset_imdb() , each wire is encoded as a sequence of word indexes (same conventions).
Usage
 
dataset_reuters( 
  path = "reuters.npz", 
  num_words = NULL, 
  skip_top = 0L, 
  maxlen = NULL, 
  test_split = 0.2, 
  seed = 113L, 
  start_char = 1L, 
  oov_char = 2L, 
  index_from = 3L 
) 
 
dataset_reuters_word_index(path = "reuters_word_index.pkl") Arguments
| Arguments | Description | 
|---|---|
| path | Where to cache the data (relative to ~/.keras/dataset). | 
| num_words | Max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept | 
| skip_top | Skip the top N most frequently occuring words (which may not be informative). | 
| maxlen | Truncate sequences after this length. | 
| test_split | Fraction of the dataset to be used as test data. | 
| seed | Random seed for sample shuffling. | 
| start_char | The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character. | 
| oov_char | words that were cut out because of the num_wordsorskip_toplimit will be replaced with this character. | 
| index_from | index actual words with this index and higher. | 
Value
Lists of training and test data: train$x, train$y, test$x, test$y with same format as dataset_imdb(). The dataset_reuters_word_index() function returns a list where the names are words and the values are integer. e.g. word_index[["giraffe"]] might return 1234.
See Also
Other datasets: dataset_boston_housing(), dataset_cifar100(), dataset_cifar10(), dataset_fashion_mnist(), dataset_imdb(), dataset_mnist()