teras.preprocessing.TVAEDataSampler

teras.preprocessing.TVAEDataSampler#

class teras.preprocessing.TVAEDataSampler(metadata, categorical_features=None, continuous_features=None, batch_size=512, seed=1337)[source]#

TVAEDataSampler class for TVAE architecture. It subclasses the CTGANDataSampler class from CTGAN architecture.

The two classes share much functionality since TVAE and CTGAN are proposed in the same paper and almost all preprocessing for both is same. There are, however, are a few differences in the get_dataset and generator methods, hence this new subclassed class.

Reference(s):

https://arxiv.org/abs/1907.00503 sdv-dev/CTGAN

Parameters:
  • metadata (dict) – dict, A dictionary of metadata computed during data transformation. You can access it from the .get_metadata() of TVAEDataTransformer instance.

  • categorical_features (Union[List[str], Tuple[str]]) – list, List of categorical features names. CTGAN requires dataset to have at least one categorical feature, if your dataset doesn’t contain any categorical features, consider using some other generative model.

  • continuous_features (Union[List[str], Tuple[str]]) – list, List of continuous features names

  • batch_size (int) – int, default 512, Batch size to use for the dataset.

  • seed (int) – int, Seed for random ops.

__init__(metadata, categorical_features=None, continuous_features=None, batch_size=512, seed=1337)[source]#

Methods

__init__(metadata[, categorical_features, ...])

generator(x_transformed)

Used to create a tensorflow dataset.

get_dataset(x_transformed[, x_original])

sample_cond_vectors_for_generation(batch_size)

The difference between this method and the training one is that, here we sample indices purely randomly instead of based on the calculated probability as proposed in the paper.

sample_cond_vectors_for_training(batch_size)