layer_attention
Creates attention layer
Description
Dot-product attention layer, a.k.a. Luong-style attention.
Usage
layer_attention(
inputs, use_scale = FALSE,
causal = FALSE,
batch_size = NULL,
dtype = NULL,
name = NULL,
trainable = NULL,
weights = NULL
)
Arguments
Arguments | Description |
---|---|
inputs | a list of inputs first should be the query tensor, the second the value tensor |
use_scale | If True, will create a scalar variable to scale the attention scores. |
causal | Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. |
batch_size | Fixed batch size for layer |
dtype | The data type expected by the input, as a string (float32 , float64 , int32 …) |
name | An optional name string for the layer. Should be unique in a model (do not reuse the same name twice). It will be autogenerated if it isn’t provided. |
trainable | Whether the layer weights will be updated during training. |
weights | Initial weights for layer. |
See Also
Other core layers: layer_activation()
, layer_activity_regularization()
, layer_dense_features()
, layer_dense()
, layer_dropout()
, layer_flatten()
, layer_input()
, layer_lambda()
, layer_masking()
, layer_permute()
, layer_repeat_vector()
, layer_reshape()