6.2. Option Reference¶
- Basic options:
- GTDF/Accelerator - five-position switch to control the trade-off between speed and accuracy.
- GTDF/AccuracyEvaluation - require accuracy evaluation.
- GTDF/ExactFitRequired - require the model to fit sample data exactly.
- GTDF/InternalValidation - enable or disable internal validation.
- GTDF/LogLevel - minimum log level.
- Advanced options:
- GTDF/Componentwise - perform componentwise approximation of the output (deprecated since 6.3).
- GTDF/DependentOutputs — assume that training outputs are dependent and do not use componentwise approximation (added in 6.3).
- GTDF/Deterministic — controls the behavior of randomized initialization algorithms in certain techniques (added in 5.2).
- GTDF/InputNanMode — specifies how to handle non-numeric values in the input part of the training sample (added in 6.8).
- GTDF/IVDeterministic — controls the behavior of the pseudorandom algorithm selecting data subsets in cross validation (added in 5.0).
- GTDF/IVSavePredictions - save model values calculated during internal validation (added in 3.0 beta 1).
- GTDF/IVSeed - fixed seed used in the deterministic cross validation mode (added in 5.0).
- GTDF/IVSubsetCount — the number of subsets into which the high fidelity training sample is divided for cross validation (updated in 6.19).
- GTDF/IVSubsetSize — the size of a high fidelity training sample subset used as test data in a cross validation session (added in 6.19).
- GTDF/IVTrainingCount — an upper limit for the number of training sessions in cross validation (updated in 6.19).
- GTDF/MaxParallel - maximum number of parallel threads (added in 5.0 Release Candidate 1, updated in 6.17).
- GTDF/Seed — fixed seed used in the deterministic training mode (added in 5.2).
- GTDF/StoreTrainingSample — save a copy of training data with the model (added in 6.6).
- GTDF/Technique - specify the approximation algorithm to use.
- GTDF/UnbiasLowFidelityModel - try compensating the low-fidelity sample bias (added in 1.10.4).
- High Fidelity Approximation (HFA) options:
- GTDF/HFA/SurrogateModelType - specify the algorithm for the approximator used in the HFA technique (added in 1.10.2).
GTDF/Accelerator
Five-position switch to control the trade-off between speed and accuracy.
Value: integer in range \([1, 5]\) Default: 1 This option controls training time by changing values of other options. Afterwards, if any of these dependent options is modified by user, user changes override the setting previously made by changing the value of GTDF/Accelerator.
Possible values are from 1 (low speed, highest quality) to 5 (high speed, lower quality).
GTDF/AccuracyEvaluation
Require accuracy evaluation.
Value: Boolean Default: off If on, then in addition to the approximation constructed model contains a function providing an estimate of the approximation error as a function on the design space.
GTDF/Componentwise
Perform componentwise approximation of the output.
Value: Boolean or "Auto"
Default: "Auto"
Deprecated since version 6.3: kept for compatibility, use GTDF/DependentOutputs instead.
Prior to 6.3, this option was used to enable componentwise approximation which was disabled by default.
Since 6.3, componentwise approximation is enabled by default, and can be disabled with GTDF/DependentOutputs. Now if GTDF/Componentwise is default (
"Auto"
), GTDF/DependentOutputs takes priority. If GTDF/Componentwise is not default while GTDF/DependentOutputs is"Auto"
, then GTDF/Componentwise takes priority. In case of conflict (both options explicitly set on or off) GTDF raisesInvalidOptionsError
(but this conflict is ignored if the output is 1-dimensional).
GTDF/DependentOutputs
When training a model with multidimensional output, assume that training outputs are dependent and do not use the componentwise approximation mode.
Value: Boolean or "Auto"
Default: "Auto"
New in version 6.3.
Switches between the componentwise approximation mode and dependent outputs mode.
- When enabled (
True
): treat different components of the output as possibly dependent, do not use componentwise approximation.- When disabled (
False
): assume that output components are independent and use componentwise approximation."Auto"
(default): use componentwise approximation unless it is explicitly disabled by GTDF/Componentwise.When GTDF/DependentOutputs is default (
"Auto"
), componentwise approximation is enabled unless GTDF/Componentwise is set to a non-default value, which takes priority. As a result, if GTDF/DependentOutputs but GTDF/Componentwise isFalse
, componentwise approximation is disabled. This is done to avoid conflicts with older versions. Note that GTDF/Componentwise is a deprecated option that is kept for version compatibility only and should not be used since 6.3.
GTDF/Deterministic
Controls the behavior of randomized initialization algorithms in certain techniques.
Value: Boolean Default: on New in version 5.2.
Several model training techniques in GTDF feature randomized initialization of their internal parameters. These techniques include:
- DA, which may automatically (after analyzing the training sample) select a randomized technique for the approximator used internally by GTDF.
- HFA, if GTDF/HFA/SurrogateModelType is set to use one of the randomized approximation techniques (HDA, HDAGP, or SGP, and TA in certain cases). Note that HFA can also select one of these techniques automatically if GTDF/HFA/SurrogateModelType is default.
- DA_BB and VFGP_BB — blackbox-based techniques which perform randomized sampling of a low-fidelity blackbox.
The determinacy of randomized techniques can be controlled in the following way:
- If GTDF/Deterministic is on (deterministic training mode, default), a fixed seed is used in all randomized algorithms. The seed is set by GTDF/Seed. This makes the technique behavior reproducible — for example, two models trained in deterministic mode with the same data, same GTDF/Seed and other settings will be exactly the same, since a training algorithm is initialized with the same parameters.
- Alternatively, if GTDF/Deterministic is off (non-deterministic training mode), a new seed is generated internally every time you train a model. As a result, models trained with randomized techniques may slightly differ even if all settings and training samples are the same. In this case, GTDF/Seed is ignored. The generated seed that was actually used for initialization can be found in model info, so later the training run can still be reproduced exactly by switching to the deterministic mode and setting GTDF/Seed to this value.
Note that GTDF/Deterministic and GTDF/Seed settings are passed to the approximator and (in case of blackbox-based techniques) sample generator used internally by GTDF; in fact, they indirectly set GTApprox/Deterministic, GTApprox/Seed, GTDoE/Deterministic, and GTDoE/Seed.
In case of randomized techniques, repeated non-deterministic training runs may be used to try obtaining a more accurate approximation, because results will be slightly different. On the contrary, deterministic techniques always produce exactly the same model given the same training data and settings, and are not affected by GTDF/Deterministic and GTDF/Seed. Deterministic techniques include:
- MFGP, SVFGP, VFGP — always deterministic.
- DA, which can be deterministic for certain training samples. In general, this technique is non-deterministic because its behavior depends on the automatic selection of the internal approximation technique (which can result in using a randomized technique).
- HFA, if GTDF/HFA/SurrogateModelType is set to use the LR, SPLT, GP, iTA, or RSM technique.
GTDF/ExactFitRequired
Require the model to fit sample data exactly.
Value: Boolean Default: off If on, the approximation fits the points of the training sample. If GTDF/ExactFitRequired is off then no fitting condition is imposed, and the approximation can be either fitting or non-fitting depending on the training data (typically, noisy data means there will be no exact fit).
GTDF/HFA/SurrogateModelType
Specify the algorithm for the approximator used in the HFA technique.
Value: "LR"
,"SPLT"
,"HDA"
,"GP"
,"HDAGP"
,"SGP"
,"TA"
,"iTA"
,"RSM"
,"GBRT"
,"PLA"
, or"Auto"
Default: "Auto"
New in version 1.10.2.
This option allows to explicitly specify the approximation algorithm used whenever the HFA technique is selected (manually or automatically). It is essentially the same as GTApprox/Technique with an exception that it does not allow to select the Mixture of Approximators (MoA) technique. Default (
"Auto"
), like in GTApprox, means that the algorithm is selected automatically according to the GTApprox automatic technique selection logic (see the GTApprox user manual for details).
GTDF/InputNanMode
Specifies how to handle non-numeric values in the input part of the training sample.
Value: "raise"
,"ignore"
Default: "raise"
New in version 6.8.
GTDF cannot obtain any information from non-numeric (NaN or infinity) values of variables. This option controls its behavior when such values are encountered. Default (
"raise"
) means to raise an exception;"ignore"
means to exclude data points with non-numeric values from the sample and continue training.
GTDF/InternalValidation
Enable or disable internal validation.
Value: Boolean Default: off If on, then in addition to the approximation constructed model contains a table of cross-validation errors of different types, which may serve as an indication of the expected accuracy of the approximation.
GTDF/IVDeterministic
Controls the behavior of the pseudorandom algorithm selecting data subsets in cross validation.
Value: Boolean Default: on New in version 5.0.
Cross validation involves partitioning the training sample into a number of subsets (defined by GTDF/IVSubsetCount) and randomized combination of these subsets for each training (validation) session. Since the algorithm that combines subsets is pseudorandom, its behavior can be controlled in the following way:
- If GTDF/IVDeterministic is on (deterministic cross validation mode, default), a fixed seed is used in the combination algorithm. The seed is set by GTDF/IVSeed. This makes cross-validation reproducible — a different combination is selected for each session, but if you repeat a cross validation run, for each session it will select the same combination as the first run.
- Alternatively, if GTDF/IVDeterministic is off (non-deterministic cross validation mode), a new seed is generated internally for every run, so cross validation results may slightly differ. In this case, GTDF/IVSeed is ignored. The generated seed that was actually used in cross validation can be found in model info, so results can still be reproduced exactly by switching to the deterministic mode and setting GTDF/IVSeed to this value.
Final model is never affected by GTDF/IVDeterministic because it is always trained using the full sample.
GTDF/IVSavePredictions
Save model values calculated during internal validation.
Value: Boolean or "Auto"
Default: "Auto"
New in version 3.0 Beta 1.
If on, internal validation information, in addition to error values, also contains raw validation data: model values calculated during internal validation, as well as validation inputs and outputs.
GTDF/IVSeed
Fixed seed used in the deterministic cross validation mode.
Value: positive integer Default: 15313 New in version 5.0.
Fixed seed for the pseudorandom algorithm that selects the combination of data subsets for each cross validation session. GTDF/IVSeed has an effect only if GTDF/IVDeterministic is on — see its description for more details.
GTDF/IVSubsetCount
The number of cross validation subsets.
Value: 0 (auto) or an integer in range \([2, |S|]\), where \(|S|\) is the high fidelity sample size Default: 0 (auto) Changed in version 6.19: GTDF/IVSubsetCount is no longer required to be less than GTDF/IVTrainingCount, since the latter now sets an upper limit for the number of cross validation sessions instead of the exact number of sessions.
The number of subsets into which the high fidelity training sample is divided for cross validation. The subsets are of approximately equal size.
GTDF/IVSubsetCount cannot be set together with GTDF/IVSubsetSize. Default (0) means that the number of subsets is determined by the sample size and GTDF/IVSubsetSize. If both options are default, the number and size of subsets are selected automatically based on the sample size.
GTDF/IVSubsetSize
The size of a cross validation subset.
Value: 0 (auto) or an integer in range \([1, \frac{2}{3}|S|]\), where \(|S|\) is the training sample size Default: 0 (auto) New in version 6.19.
The size of a high fidelity sample subset used as test data in a cross validation session. This option may be more convenient than GTDF/IVSubsetCount when the high fidelity training sample size is not known or is a parameter. In such cases, GTDF can automatically determine the required number of subsets, given their size. If the sample cannot be evenly divided into subsets of the given size, then sizes of some subsets are adjusted to fit. The maximum valid option value is \(\frac{2}{3}\) of the sample size, however in this case the actual subset size is adjusted to \(\frac{1}{2}\) of the sample size.
Practically this option configures leave-n-out cross validation, where n is the option value. Since the number of subsets — hence the number of cross validation sessions — can get too high for small n, it is recommended to limit the number of sessions with GTDF/IVTrainingCount. Otherwise model training may take much time, because each session trains a dedicated internal validation model.
GTDF/IVSubsetSize cannot be set together with GTDF/IVSubsetCount. Default (0) means that the subset size is determined by the high fidelity sample size and GTDF/IVSubsetCount. If both options are default, the number and size of subsets are selected automatically based on the high fidelity sample size.
GTDF/IVTrainingCount
The maximum allowed number of training sessions in cross validation.
Value: positive integer or 0 (auto) Default: 0 (auto) Changed in version 6.19: now sets an upper limit instead of the exact number of sessions, and is no longer required to be less than GTDF/IVSubsetCount.
Each GTDF cross validation session includes the following steps:
- Select one of the cross validation subsets to be the test data. These subsets are taken from the high fidelity training sample only, since other samples contain lower quality data and must not be used for testing.
- Prepare the complement of the selected subset, which is the high fidelity training sample excluding the test data.
- Train an internal validation model, using this complement as the high fidelity training sample. All lower fidelity samples are used in full, only the high fidelity test data is excluded from training.
- Calculate error metrics for the validation model, using the previously selected test data subset.
Internal validation repeats such sessions with different test subsets, until the number of sessions reaches GTDF/IVTrainingCount, or there are no more subsets to test (each subset may be tested only once).
The number and sizes of cross validation subsets are determined by GTDF/IVSubsetCount and GTDF/IVSubsetSize, and are selected by GTDF if both these options are default. If GTDF/IVTrainingCount is also default, GTDF sets an appropriate limit for the number of sessions, based on the high fidelity training sample size.
GTDF/LogLevel
Set minimum log level.
Value: "Debug"
,"Info"
,"Warn"
,"Error"
,"Fatal"
Default: "Info"
If this option is set, only messages with log level greater than or equal to the threshold are dumped into log.
GTDF/MaxParallel
Sets the maximum number of parallel threads to use when training a model.
Value: integer in range \([1, 512]\), or 0 (auto) Default: 0 (auto) New in version 5.0 Release Candidate 1.
GTDF can run in parallel to speed up model training. This option sets the maximum number of threads the builder is allowed to create.
Changed in version 6.12: auto (0) tries to detect hyper-threading CPUs in order to use only physical cores.
Changed in version 6.15: added the upper limit for the option value, previously was any positive integer.
Changed in version 6.17: changed the upper limit to 512 (was 100000).
Default (auto) behavior depends on the value of the
OMP_NUM_THREADS
environment variable.If
OMP_NUM_THREADS
is set to a valid value, this value is the maximum number of threads by default. Note thatOMP_NUM_THREADS
must be set before the Python interpreter starts.If
OMP_NUM_THREADS
is unset, set to 0 or an invalid value, the default maximum number of threads is equal to the number of cores detected by GTDF. However if a hyper-threading CPU is detected, the default maximum number of threads is set to half the number of cores (to use only physical cores).The behavior described above is only for the default (0) option value. If you set this option to a non-default value, it will be the maximum number of threads, regardless of your CPU.
GTDF/Seed
Fixed seed used in the deterministic training mode.
Value: positive integer Default: 15313 New in version 5.2.
In the deterministic training mode, GTDF/Seed sets the seed for randomized initialization algorithms in certain techniques. See GTDF/Deterministic for more details.
GTDF/StoreTrainingSample
Save a copy of training data with the model.
Value: Boolean or "Auto"
Default: "Auto"
New in version 6.6.
If on, the trained model will store copies of training samples, sorted in order of increasing fidelity, in
training_sample
. If off, this attribute will be an empty list. The"Auto"
setting currently defaults to “off”.
GTDF/Technique
Specify the approximation algorithm to use.
Value: "DA"
,"DA_BB"
,"HFA"
,"MFGP"
,"SVFGP"
,"VFGP"
,"VFGP_BB"
, or"Auto"
Default: "Auto"
This option allows to specify the algorithm to be used in approximation. It only affects
build()
andbuild_BB()
(see below for method compatibility). The dedicatedbuild_MF()
method disregards GTDF/Technique completely.
- Sample-based techniques compatible with
build()
only:
"DA"
— Difference Approximation"HFA"
— High Fidelity Approximation"MFGP"
— Multiple Fidelity Gaussian Process"SVFGP"
— Sparse Variable Fidelity Gaussian Process"VFGP"
— Variable Fidelity Gaussian Process- Blackbox-based techniques compatible with
build_BB()
only:
"DA_BB"
— blackbox-based Difference Approximation"VFGP_BB"
— blackbox-based Variable Fidelity Gaussian ProcessNote
MFGP technique is compatible with
build()
, though it naturally limits the number of training samples to 2. That is,build(x_hf, f_hf, x_lf, f_lf, options={"GTDF/Technique": "MFGP"})
is effectively the same asbuild_MF([{"x": x_lf, "f": f_lf}, {"x": x_hf, "f": f_hf}])
.Default value (
"Auto"
) means that the best algorithm will be determined automatically.Sample size and blackbox budget requirements taking effect when the technique is selected manually are also described in section Sample Size and Budget Requirements.
GTDF/UnbiasLowFidelityModel
Try compensating the low-fidelity sample bias.
Value: Boolean or "Auto"
Default: "Auto"
New in version 1.10.4.
If on, then after building an initial low-fidelity model (the approximation model trained using the low-fidelity sample only), GTDF will try to find and compensate its bias, using the high-fidelity sample.
For example, consider a high-fidelity sample generated by function \(f_{hf}(x)\) and a low-fidelity sample generated by \(f_{lf}(x) \approx f_{hf}(x+e)\). If GTDF/UnbiasLowFidelityModel is on, GTDF will use the algorithm that compensates the bias \(e\), resulting in a more accurate final model.
This option affects all techniques except HFA. If GTDF/Technique is set to
"HFA"
, the GTDF/UnbiasLowFidelityModel option value is ignored.The
"Auto"
setting currently defaults to off (no bias compensation).