teras.layers.TransformerEncoderLayer#
- class teras.layers.TransformerEncoderLayer(embedding_dim, num_heads=8, feedforward_dim=None, attention_dropout=0.0, feedforward_dropout=0.0, layer_norm_epsilon=1e-05, use_normalization=True, pre_normalization=False, **kwargs)[source]#
Transformer Encoder Layer as proposed in the original Transformer architecture in the “Attention is all you need” paper.
This is the layer that makes up the encoder in the architecture. This is made up of MultiHeadAttention and TransformerFeedForward layers.
- Reference(s):
- Parameters:
embedding_dim (
int) – int, dimensionality of the embeddings used by the model. It is also referred to as the d_model or model dimensionality.num_heads (
int) – int, number of attention heads to use in the MultiHeadAttention layer.feedforward_dim (
int) – int, hidden dimensionality to use in the TransformerFeedForward layer.attention_dropout (
float) – float, dropout value to use in the0. (MultiHeadAttention layer. Defaults to)
feedforward_dropout (
float) – float, dropout value to use in the TransformerFeedForward layer. Defaults to 0.layer_norm_epsilon (
float) – float, epsilon value to use in the LayerNormalization layer. Defaults to 1e-5.use_normalization (
bool) – bool, whether to use LayerNormalization. In some architecture, normalization isn’t applied to the very first layer, so to accomodate such architectures, we introduced this parameter. Defaults to True.pre_normalization (
bool) – bool, whether to use Pre-Normalization technique whereby LayerNormalization is applied to inputs of the MultiHeadAttention or FeedForward and then outputs of those layers are elementwise added to the original inputs. Defaults to False, as the original Transformers architecture doesn’t use pre-normalization.
- Shapes:
Input Shape: (batch_size, num_features, embedding_dim) Output Shape: (batch_size, num_features, embedding_dim)
- __init__(embedding_dim, num_heads=8, feedforward_dim=None, attention_dropout=0.0, feedforward_dropout=0.0, layer_norm_epsilon=1e-05, use_normalization=True, pre_normalization=False, **kwargs)[source]#
Methods
__init__(embedding_dim[, num_heads, ...])add_loss(loss)Can be called inside of the call() method to add a scalar loss.
add_metric()add_variable(shape, initializer[, dtype, ...])Add a weight variable to the layer.
add_weight([shape, initializer, dtype, ...])Add a weight variable to the layer.
build(input_shape)build_from_config(config)Builds the layer's states with the supplied config dict.
call(inputs)compute_mask(inputs, previous_mask)compute_output_shape(*args, **kwargs)compute_output_spec(*args, **kwargs)count_params()Count the total number of scalars composing the weights.
from_config(config)Creates an operation from its config.
get_build_config()Returns a dictionary with the layer's input shape.
get_config()Returns the config of the object.
get_weights()Return the values of layer.weights as a list of NumPy arrays.
load_own_variables(store)Loads the state of the layer.
quantize(mode)quantized_call(*args, **kwargs)save_own_variables(store)Saves the state of the layer.
set_weights(weights)Sets the values of layer.weights from a list of NumPy arrays.
stateless_call(trainable_variables, ...[, ...])Call the layer without any side effects.
symbolic_call(*args, **kwargs)Attributes
compute_dtypeThe dtype of the computations performed by the layer.
dtypeAlias of layer.variable_dtype.
dtype_policyinputRetrieves the input tensor(s) of a symbolic operation.
input_dtypeThe dtype layer inputs should be converted to.
input_speclossesList of scalar losses from add_loss, regularizers and sublayers.
metricsList of all metrics.
metrics_variablesList of all metric variables.
non_trainable_variablesList of all non-trainable layer state.
non_trainable_weightsList of all non-trainable weight variables of the layer.
outputRetrieves the output tensor(s) of a layer.
pathThe path of the layer.
quantization_modeThe quantization mode of this layer, None if not quantized.
supports_maskingWhether this layer supports computing a mask using compute_mask.
trainableSettable boolean, whether this layer should be trainable or not.
trainable_variablesList of all trainable layer state.
trainable_weightsList of all trainable weight variables of the layer.
variable_dtypeThe dtype of the state (weights) of the layer.
variablesList of all layer state, including random seeds.
weightsList of all weight variables of the layer.