DRBuilder

Tag: Modeling

DRBuilder creates dimension reduction models.

See also

GTDR guide
The guide to the Generic Tool for Dimension Reduction (GTDR) — the pSeven Core component for data compression and decompression used by DRBuilder to build dimension reduction models.

Sections

In sample-based mode, the block receives an uncompressed data sample to the x_sample input port and either the target compressed dimension to the dimension input port or the maximum magnitude of the reconstruction error to the error port, depending on which reduction type is selected. It may also work in so-called Feature Extraction mode - in this mode the block receives two data samples (variables and responses) to the x_sample and f_sample input ports.

The dimension reduction model may be output to the model port and/or saved to disk.

Options

  • Common options
  • Dimension-based and error-based DR options
  • Feature Extraction options
    • GTDR/Accelerator - five-position switch to control the trade-off between speed and accuracy.
    • GTDR/DiffFilterSize - preferred number of steps by each coordinate for numerical differentiation.
    • GTDR/NumDiffStep - relative numerical differentiation step.
    • GTDR/SurrogateModelType - specify the algorithm for the internal approximator used in sample-based Feature Extraction (added in 1.9.6).

GTDR/Accelerator

Five-position switch to control the trade-off between speed and accuracy.

Value:integer in range \([1, 5]\)
Default:2

This option controls training time by changing values of other options (currently works for Feature Extraction technique only). Afterwards, if any of these dependent options is modified by user, user changes override the setting previously made by changing the value of GTDR/Accelerator.

Possible values are from 1 (low speed, highest quality) to 5 (high speed, lower quality).

GTDR/DiffFilterSize

Set preferred number of steps by each coordinate for numerical differentiation.

Value:integer in range \([1, 10]\)
Default:1

GTDR/InputNanMode

Specifies how to handle non-numeric values in the input part of the training sample.

Value:"raise", "ignore"
Default:"raise"

New in version 6.8.

GTDR cannot obtain any information from non-numeric (NaN or infinity) values of variables. This option controls its behavior when such values are encountered. Default ("raise") means to raise an exception; "ignore" means to exclude data points with non-numeric values from the sample and continue training.

GTDR/LogLevel

Set minimum log level.

Value:"Debug", "Info", "Warn", "Error", "Fatal"
Default:"Info"

If this option is set, only messages with log level greater than or equal to the threshold are dumped into log.

GTDR/MaxParallel

Set the maximum number of parallel threads to use when building a model.

Value:positive integer or 0 (auto)
Default:0 (auto)

New in version 5.0rc1.

GTDR can run in parallel to speed up model training. This option sets the maximum number of threads the builder is allowed to create. Default setting (0) uses the value given by the OMP_NUM_THREADS environment variable, which by default is equal to the number of virtual processors, including hyperthreading CPUs. Other values override OMP_NUM_THREADS.

GTDR/MinImprove

Required significance of decrease in reconstruction error.

Value:floating point number in range \((0, 1]\)
Default:0.01

Dimension-based DR procedure allows increasing accuracy of reconstruction by approximating nonlinear deviation of reconstructed manifold from the linear hyperplane given by principal components. This approximation is done iteratively, and approximation process stops if the significance of decrease in reconstruction error is less than required, that is \((1 - \epsilon_{curr} / \epsilon_{prev}) < m\), where \(\epsilon_{curr}\) and \(\epsilon_{prev}\) are the reconstruction error values on current and previous iterations, respectively, and \(m\) is the GTDR/MinImprove option value.

GTDR/Normalize

Enable or disable normalization.

Value:Boolean or "Auto"
Default:"Auto"

In some cases, components of the input vector should be normalized, i.e. centered and then standardized by the corresponding standard deviation. Such transformation is useful when components of the input vector have different physical meaning (are represented in different physical units).

Normalization is always on if this option is on and always off if the option is off. If this option is left default ("Auto"), normalization is done automatically when needed.

GTDR/NumDiffStep

Relative numerical differentiation step.

Value:floating point number in range \((0, 0.1]\)
Default:\(10^{-7}\)

This option value sets the relative step size to use in numerical differentiation.

GTDR/SurrogateModelType

Specify the algorithm for the internal approximator used in sample-based Feature Extraction.

Value:"HDA", "GP", "HDAGP", "SGP", "TA", "iTA", "RSM", or "Auto"
Default:"Auto"

New in version 1.9.6.

Changed in version 1.10.0: now allows any technique except LR and SPLT.

When using sample-based Feature Extraction, GTDR first trains a surrogate model on the given sample, which is then used in projection matrix estimation (see GTDR User manual for details). GTDR/SurrogateModelType sets the technique used to train the model. This option is directly analogous to GTApprox/Technique except that it does not allow to select the LR and SPLT techniques and has simplified default logic.

By default, the technique is selected automatically based on the sample size \(|S|\):

  • \(|S| < 1000\): selects Gaussian Processes (GP).
  • \(|S| \geq 1000\): selects High-Dimensional Approximation (HDA).

GTDR/Technique

Specify the technique for dimension reduction.

Value:"NLPCA" or "PCA"
Default:"NLPCA"

This option allows user to explicitly specify technique to be used for dimension reduction. By default, Non-Linear Principal Component Analysis is used ("NLPCA"); it can be changed to Principal Component Analysis ("PCA").

For details on these techniques, refer to the GTDR manual.