teras.preprocessing.TVAEDataTransformer

teras.preprocessing.TVAEDataTransformer#

class teras.preprocessing.TVAEDataTransformer(continuous_features=None, categorical_features=None, max_clusters=10, std_multiplier=4, weight_threshold=0.005, covariance_type='full', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=0.001)[source]#

TVAEDataTransformer class that is exactly similar to the CTGANDataTransformer, it just acts as a wrapper for convenience.

Reference(s):

https://arxiv.org/abs/1907.00503 sdv-dev/CTGAN

Parameters:
  • categorical_features (Union[List[str], Tuple[str]]) – list, List of categorical features names in the dataset.

  • continuous_features (Union[List[str], Tuple[str]]) – list, List of continuous features names in the dataset.

  • max_clusters (int) – int, Maximum Number of clusters to use in ModeSpecificNormalization. Defaults to 10.

  • std_multiplier (int) – int, Multiplies the standard deviation in the normalization. Defaults to 4.

  • weight_threshold (float) – float, The minimum value a component weight can take to be considered a valid component. weights_ under this value will be ignored. (Taken from the official implementation.) Defaults to 0.005.

  • covariance_type (str) – str, Parameter for the GaussianMixtureModel class of sklearn. Defaults to “full”.

  • weight_concentration_prior_type (str) – str, Parameter for the GaussianMixtureModel class of sklearn. Defaults to “dirichlet_process”

  • weight_concentration_prior (float) – float, Parameter for the GaussianMixtureModel class of sklearn. Defaults to 0.001.

__init__(continuous_features=None, categorical_features=None, max_clusters=10, std_multiplier=4, weight_threshold=0.005, covariance_type='full', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=0.001)[source]#

Methods

__init__([continuous_features, ...])

fit(x)

fit_transform(x)

get_metadata()

load(filename)

Loads the saved state of CTGANDataTransformer from the json file.

reverse_transform(x_generated)

Reverses transforms the generated data to the original data format.

save(filename)

Saves the fitted state of CTGANDataTransformer instance for portability, in the json format.

transform(**kwargs)

Attributes

metadata