teras.utils.get_metadata_for_embedding

teras.utils.get_metadata_for_embedding#

teras.utils.get_metadata_for_embedding(dataframe, categorical_features=None, numerical_features=None)[source]#

Utility function that create metadata for features in a given dataframe required by the Categorical and Numerical embedding layers in Teras. For numerical features, it maps each feature name to feature index. For categorical features, it maps each feature name to a tuple of feature index and vocabulary of words in that categorical feature. This metadata is usually required by the architectures that create embeddings of Numerical or Categorical features, such as TabTransformer, TabNet, FT-Transformer, etc.

Parameters:
  • dataframe (DataFrame) – Input dataframe

  • categorical_features – List of names of categorical features in the input dataset

  • numerical_features – List of names of categorical features in the input dataset

Returns:

A dictionary which contains sub-dictionaries for categorical and numerical features where categorical dictionary is a mapping of categorical feature names to a tuple of feature indices and the lists of unique values (vocabulary) in them, while numerical dictionary is a mapping of numerical feature names to their indices {feature_name: (feature_idx, vocabulary)} for feature in categorical features. {feature_name: feature_idx} for feature in numerical features.