Creates attention layer


Dot-product attention layer, a.k.a. Luong-style attention.


  use_scale = FALSE, 
  causal = FALSE, 
  batch_size = NULL, 
  dtype = NULL, 
  name = NULL, 
  trainable = NULL, 
  weights = NULL 


Arguments Description
inputs a list of inputs first should be the query tensor, the second the value tensor
use_scale If True, will create a scalar variable to scale the attention scores.
causal Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.
batch_size Fixed batch size for layer
dtype The data type expected by the input, as a string (float32, float64, int32…)
name An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn’t provided.
trainable Whether the layer weights will be updated during training.
weights Initial weights for layer.

See Also

Other core layers: layer_activation(), layer_activity_regularization(), layer_dense_features(), layer_dense(), layer_dropout(), layer_flatten(), layer_input(), layer_lambda(), layer_masking(), layer_permute(), layer_repeat_vector(), layer_reshape()