teras.preprocessing.CTGANDataSampler

teras.preprocessing.CTGANDataSampler#

class teras.preprocessing.CTGANDataSampler(metadata, categorical_features, continuous_features=None, batch_size=512, seed=1337)[source]#

CTGANDataSampler class based on the data sampler class in the official CTGAN implementation.

Reference(s):

https://arxiv.org/abs/1907.00503 sdv-dev/CTGAN

Parameters:
  • metadata – dict, A dictionary of metadata computed during data transformation. You can access it from the .get_metadata() of CTGANDataTransformer instance.

  • categorical_features (Union[List[str], Tuple[str]]) – list, List of categorical features names. CTGAN requires dataset to have at least one categorical feature, if your dataset doesn’t contain any categorical features, consider using some other generative model.

  • continuous_features (Union[List[str], Tuple[str]]) – list, List of continuous features names

  • batch_size (int) – int, default 512, Batch size to use for the dataset.

  • seed (int) – int, Seed for random ops.

__init__(metadata, categorical_features, continuous_features=None, batch_size=512, seed=1337)[source]#

Methods

__init__(metadata, categorical_features[, ...])

generator(x_transformed)

Used to create a tensorflow dataset.

get_dataset(x_transformed, x_original)

sample_cond_vectors_for_generation(batch_size)

The difference between this method and the training one is that, here we sample indices purely randomly instead of based on the calculated probability as proposed in the paper.

sample_cond_vectors_for_training(batch_size)