8.2. Option Reference

  • Common options
    • GTDR/LogLevel - minimum log level.
    • GTDR/InputNanMode — specifies how to handle non-numeric values in the input part of the training sample (added in 6.8).
    • GTDR/MaxParallel - maximum number of parallel threads (added in 5.0 Release Candidate 1, updated in 6.17).
    • GTDR/Normalize - enable or disable normalization.
  • Dimension-based and error-based DR options
  • Feature Extraction options
    • GTDR/Accelerator - five-position switch to control the trade-off between speed and accuracy.
    • GTDR/DiffFilterSize - preferred number of steps by each coordinate for numerical differentiation.
    • GTDR/NumDiffStep - relative numerical differentiation step.
    • GTDR/SurrogateModelType - specify the algorithm for the internal approximator used in sample-based Feature Extraction (added in 1.9.6).

GTDR/Accelerator

Five-position switch to control the trade-off between speed and accuracy.

Value:integer in range \([1, 5]\)
Default:2

This option controls training time by changing values of other options (currently works for Feature Extraction technique only). Afterwards, if any of these dependent options is modified by user, user changes override the setting previously made by changing the value of GTDR/Accelerator.

Possible values are from 1 (low speed, highest quality) to 5 (high speed, lower quality).

GTDR/DiffFilterSize

Set preferred number of steps by each coordinate for numerical differentiation.

Value:integer in range \([1, 10]\)
Default:1

GTDR/InputNanMode

Specifies how to handle non-numeric values in the input part of the training sample.

Value:"raise", "ignore"
Default:"raise"

New in version 6.8.

GTDR cannot obtain any information from non-numeric (NaN or infinity) values of variables. This option controls its behavior when such values are encountered. Default ("raise") means to raise an exception; "ignore" means to exclude data points with non-numeric values from the sample and continue training.

GTDR/LogLevel

Set minimum log level.

Value:"Debug", "Info", "Warn", "Error", "Fatal"
Default:"Info"

If this option is set, only messages with log level greater than or equal to the threshold are dumped into log.

GTDR/MaxParallel

Sets the maximum number of parallel threads to use when training a model.

Value:integer in range \([1, 512]\), or 0 (auto)
Default:0 (auto)

New in version 5.0 Release Candidate 1.

GTDR can run in parallel to speed up model training. This option sets the maximum number of threads the builder is allowed to create.

Changed in version 6.12: auto (0) tries to detect hyper-threading CPUs in order to use only physical cores.

Changed in version 6.15: added the upper limit for the option value, previously was any positive integer.

Changed in version 6.17: changed the upper limit to 512 (was 100000).

Default (auto) behavior depends on the value of the OMP_NUM_THREADS environment variable.

If OMP_NUM_THREADS is set to a valid value, this value is the maximum number of threads by default. Note that OMP_NUM_THREADS must be set before the Python interpreter starts.

If OMP_NUM_THREADS is unset, set to 0 or an invalid value, the default maximum number of threads is equal to the number of cores detected by GTDR. However if a hyper-threading CPU is detected, the default maximum number of threads is set to half the number of cores (to use only physical cores).

The behavior described above is only for the default (0) option value. If you set this option to a non-default value, it will be the maximum number of threads, regardless of your CPU.

GTDR/MinImprove

Required significance of decrease in reconstruction error.

Value:floating point number in range \((0, 1]\)
Default:0.01

Dimension-based DR procedure allows increasing accuracy of reconstruction by approximating nonlinear deviation of reconstructed manifold from the linear hyperplane given by principal components. This approximation is done iteratively, and approximation process stops if the significance of decrease in reconstruction error is less than required, that is \((1 - \epsilon_{curr} / \epsilon_{prev}) < m\), where \(\epsilon_{curr}\) and \(\epsilon_{prev}\) are the reconstruction error values on current and previous iterations, respectively, and \(m\) is the GTDR/MinImprove option value.

GTDR/Normalize

Enable or disable normalization.

Value:Boolean or "Auto"
Default:"Auto"

In some cases, components of the input vector should be normalized, i.e. centered and then standardized by the corresponding standard deviation. Such transformation is useful when components of the input vector have different physical meaning (are represented in different physical units).

Normalization is always on if this option is on and always off if the option is off. If this option is left default ("Auto"), normalization is done automatically when needed.

GTDR/NumDiffStep

Relative numerical differentiation step.

Value:floating point number in range \((0, 0.1]\)
Default:\(10^{-7}\)

This option value sets the relative step size to use in numerical differentiation.

GTDR/SurrogateModelType

Specify the algorithm for the internal approximator used in sample-based Feature Extraction.

Value:"HDA", "GP", "HDAGP", "SGP", "TA", "iTA", "RSM", or "Auto"
Default:"Auto"

New in version 1.9.6.

Changed in version 1.10.0: now allows any technique except LR and SPLT.

When using sample-based Feature Extraction (see build() for the corresponding combination of arguments), GTDR first trains a surrogate model on the given sample, which is then used in projection matrix estimation (see GTDR User manual for details). GTDR/SurrogateModelType sets the technique used to train the model. This option is directly analogous to GTApprox/Technique except that it does not allow to select the LR and SPLT techniques and has simplified default logic.

By default, the technique is selected automatically based on the sample size \(|S|\):

  • \(|S| < 1000\): selects Gaussian Processes (GP).
  • \(|S| \geq 1000\): selects High-Dimensional Approximation (HDA).

GTDR/Technique

Specify the technique for dimension reduction.

Value:"NLPCA" or "PCA"
Default:"NLPCA"

This option allows user to explicitly specify technique to be used for dimension reduction. By default, Non-Linear Principal Component Analysis is used ("NLPCA"); it can be changed to Principal Component Analysis ("PCA").

For details on these techniques, refer to the GTDR manual.