teras.models.TransformerEncoderBackbone

teras.models.TransformerEncoderBackbone#

class teras.models.TransformerEncoderBackbone(input_dim, embedding_dim, num_layers=6, num_heads=8, feedforward_dim=None, attention_dropout=0.0, feedforward_dropout=0.0, layer_norm_epsilon=1e-05, unnormalized_layers=[], pre_normalization=False, **kwargs)[source]#

Transformer Encoder model as proposed in the “Attention is all you need” paper.

Reference(s):

https://arxiv.org/abs/1706.03762

Parameters:
  • input_dim (int) – int, dimensionality of the input data.

  • embedding_dim (int) – int, dimensionality of the embeddings used by the model. It is also referred to as the d_model or model dimensionality.

  • num_layers (int) – int, number of `TransformerEncoderLayer`s to use in the encoder.

  • num_heads (int) – int, number of attention heads to use in the MultiHeadAttention layer.

  • feedforward_dim (int) – int, hidden dimensionality to use in the TransformerFeedForward layer.

  • attention_dropout (float) – float, dropout value to use in the

  • 0. (MultiHeadAttention layer. Defaults to)

  • feedforward_dropout (float) – float, dropout value to use in the TransformerFeedForward layer. Defaults to 0.

  • layer_norm_epsilon (float) – float, epsilon value to use in the LayerNormalization layer. Defaults to 1e-5.

  • unnormalized_layers (list) – list, list of indices corresponding to the layers in which LayerNormalization won’t be used. For instance, if you don’t want to use the normalization in the first TransformerEncoderLayer layer (like FT-Transformer) you can pass [0]. If you don’t want to normalize first and second layer, you can similarly pass [0, 1] and so on. Defaults, to [] (empty list), because the original Transformer architecture and most others use normalization in all of their layers.

  • pre_normalization (bool) – bool, whether to use Pre-Normalization technique whereby LayerNormalization is applied to inputs of the MultiHeadAttention or FeedForward and then outputs of those layers are elementwise added to the original inputs. Defaults to False, as the original Transformers architecture doesn’t use pre-normalization.

__init__(input_dim, embedding_dim, num_layers=6, num_heads=8, feedforward_dim=None, attention_dropout=0.0, feedforward_dropout=0.0, layer_norm_epsilon=1e-05, unnormalized_layers=[], pre_normalization=False, **kwargs)[source]#

Methods

__init__(input_dim, embedding_dim[, ...])

add_loss(loss)

Can be called inside of the call() method to add a scalar loss.

add_metric()

add_variable(shape, initializer[, dtype, ...])

Add a weight variable to the layer.

add_weight([shape, initializer, dtype, ...])

Add a weight variable to the layer.

build(input_shape)

build_from_config(config)

Builds the layer's states with the supplied config dict.

call(*args, **kwargs)

compile([optimizer, loss, loss_weights, ...])

Configures the model for training.

compile_from_config(config)

Compiles the model with the information given in config.

compiled_loss(y, y_pred[, sample_weight, ...])

compute_loss([x, y, y_pred, sample_weight, ...])

Compute the total loss, validate it, and return it.

compute_mask(inputs, previous_mask)

compute_metrics(x, y, y_pred[, sample_weight])

Update metric states and collect all metrics to be returned.

compute_output_shape(*args, **kwargs)

compute_output_spec(*args, **kwargs)

count_params()

Count the total number of scalars composing the weights.

evaluate([x, y, batch_size, verbose, ...])

Returns the loss value & metrics values for the model in test mode.

export(filepath[, format])

Create a TF SavedModel artifact for inference.

fit([x, y, batch_size, epochs, verbose, ...])

Trains the model for a fixed number of epochs (dataset iterations).

from_config(config)

Creates an operation from its config.

get_build_config()

Returns a dictionary with the layer's input shape.

get_compile_config()

Returns a serialized config with information for compiling the model.

get_config()

Returns the config of the object.

get_layer([name, index])

Retrieves a layer based on either its name (unique) or index.

get_metrics_result()

Returns the model's metrics values as a dict.

get_weights()

Return the values of layer.weights as a list of NumPy arrays.

load_own_variables(store)

Loads the state of the layer.

load_weights(filepath[, skip_mismatch])

Load weights from a file saved via save_weights().

loss(y, y_pred[, sample_weight])

make_predict_function([force])

make_test_function([force])

make_train_function([force])

predict(x[, batch_size, verbose, steps, ...])

Generates output predictions for the input samples.

predict_on_batch(x)

Returns predictions for a single batch of samples.

predict_step(data)

quantize(mode)

Quantize the weights of the model.

quantized_call(*args, **kwargs)

reset_metrics()

save(filepath[, overwrite, zipped])

Saves a model as a .keras file.

save_own_variables(store)

Saves the state of the layer.

save_weights(filepath[, overwrite])

Saves all layer weights to a .weights.h5 file.

set_weights(weights)

Sets the values of layer.weights from a list of NumPy arrays.

stateless_call(trainable_variables, ...[, ...])

Call the layer without any side effects.

stateless_compute_loss(trainable_variables, ...)

summary([line_length, positions, print_fn, ...])

Prints a string summary of the network.

symbolic_call(*args, **kwargs)

test_on_batch(x[, y, sample_weight, return_dict])

Test the model on a single batch of samples.

test_step(data)

to_json(**kwargs)

Returns a JSON string containing the network configuration.

train_on_batch(x[, y, sample_weight, ...])

Runs a single gradient update on a single batch of data.

train_step(data)

Attributes

compiled_metrics

compute_dtype

The dtype of the computations performed by the layer.

distribute_reduction_method

distribute_strategy

dtype

Alias of layer.variable_dtype.

dtype_policy

input

Retrieves the input tensor(s) of a symbolic operation.

input_dtype

The dtype layer inputs should be converted to.

input_spec

jit_compile

layers

losses

List of scalar losses from add_loss, regularizers and sublayers.

metrics

List of all metrics.

metrics_names

metrics_variables

List of all metric variables.

non_trainable_variables

List of all non-trainable layer state.

non_trainable_weights

List of all non-trainable weight variables of the layer.

output

Retrieves the output tensor(s) of a layer.

path

The path of the layer.

quantization_mode

The quantization mode of this layer, None if not quantized.

run_eagerly

supports_masking

Whether this layer supports computing a mask using compute_mask.

trainable

Settable boolean, whether this layer should be trainable or not.

trainable_variables

List of all trainable layer state.

trainable_weights

List of all trainable weight variables of the layer.

variable_dtype

The dtype of the state (weights) of the layer.

variables

List of all layer state, including random seeds.

weights

List of all weight variables of the layer.