5.5. Option Reference


GTSDA/Checker/PValues/SignificanceLevel

Change bound on which decision if correlation is statistically significant (see Testing for significance) or not would be made.

Value:floating point number in range \([0.001, 1]\)
Default:\(0.05\)
Option works in a following way:
  • GTSDA Checker estimates p-value (see Testing for significance) of correlation on given sample
  • If estimated p-value is smaller then GTSDA/Checker/PValues/SignificanceLevel then it is assumed that correlation is statistically significant and corresponding value of \(decisions\) list in CheckResult is set to 1
  • Otherwise corresponding value of decisions list in CheckResult is set to 0

GTSDA/Checker/Technique

Technique to compute correlation.

Value:"PearsonCorrelation", "RobustPearsonCorrelation", "PearsonPartialCorrelation", "SpearmanCorrelation", "KendallCorrelation", "DistanceCorrelation", "DistancePartialCorrelation", "MutualInformation"
Default:"PearsonCorrelation"

New in version 6.2: added Robust Pearson Correlation technique.

Option selects technique that would be used to compute correlation. All techniques may be used in the modes:

  • where both inputs \(x\) and outputs \(y\) are provided (in this case correlation between each input and output would be computed)
  • only inputs \(x\) are provided (in this case correlation between each pair of inputs would be computed)

Techniques with “Partial” in their name also need control variables \(z\) to be specified.

For details on techniques, see Methods.

GTSDA/Checker/PValues/Enable

If on, Checker calculates p-values (see Testing for significance) and decisions.

Value:Boolean
Default:on

Option sets if checker should compute p-value in addition to correlation scores.

P-value is used to check if correlation is statistically significant (and not occurred in sample just by chance), so it is advised to set this option to ‘on’. However computing p-values increases computation time.

GTSDA/Checker/DistanceCorrelation/Unbiased

If on, distance correlation score is unbiased.

Value:Boolean
Default:off

Option specifies is bias correction term should be used in computation of Distance Correlation. Unbiased estimate tend to have higher variance.

GTSDA/Checker/PValues/Method

Specifies the algorithm to use when calculating p-values (see Testing for significance).

Value:"Permutations", "Asymptotic", "Auto"
Default:"Auto"
  • "Permutations" approach works for all techniques and gives good results, the only drawback is that it is considerably slow.
  • "Asymptotic" algorithm works much faster, but some techniques do not support it.
  • "Auto" selects the most appropriate algorithm automatically based on the sample size and the correlation type specified by GTSDA/Checker/Technique.

GTSDA/Checker/MutualInformation/Normalize

Value:Boolean
Default:on

If on, Mutual Information returns score in [0, 1] instead of [0, +inf].

GTSDA/Checker/MutualInformation/BinsMethod

Value:"FullSearch", "Scott"
Default:"Scott"

New in version 6.2.

Method to calculate optimal bins sizes for histogram Mutual Information estimation.

GTSDA/Deterministic

Require SDA process to be deterministic.

Value:Boolean
Default:on

If on, the same fixed seed is used in all randomized GTSDA algorithms, ensuring result to be the same on every run.

GTSDA/LogLevel

Set minimum log level.

Value:"Debug", "Info", "Warn", "Error", "Fatal"
Default:"Info"

If this option is set, only messages with log level greater than or equal to the threshold are dumped into log.

GTSDA/MaxParallel

Set the maximum number of parallel threads to use when running.

Value:integer in range \([1, 512]\), or 0 (auto)
Default:0 (auto)

New in version 6.12 Service Pack 1.

Some GTSDA methods can run parallel calculations in certain cases. This option sets the maximum number of threads GTSDA is allowed to create.

Changed in version 6.17: added the upper limit for the option value.

Default (auto) behavior depends on the value of the OMP_NUM_THREADS environment variable.

If OMP_NUM_THREADS is set to a valid value, this value is the maximum number of threads by default. Note that OMP_NUM_THREADS must be set before the Python interpreter starts.

If OMP_NUM_THREADS is unset, set to 0 or an invalid value, the default maximum number of threads is equal to the number of cores detected by GTSDA. However if a hyper-threading CPU is detected, the default maximum number of threads is set to half the number of cores (to use only physical cores).

The behavior described above is only for the default (0) option value. If you set this option to a non-default value, it will be the maximum number of threads, regardless of your CPU.

GTSDA/NanMode

Specifies how to handle non-numeric values in the input sample.

Value:"raise" or "ignore"
Default:"raise"

New in version 6.15.

Non-numeric (NaN or infinity) values of variables have no meaning in GTSDA methods. This option controls its behavior when such values are encountered. Default ("raise") means to raise an exception and stop; "ignore" means to exclude data points with non-numeric values from the sample before analysis.

GTSDA/Ranker/Technique

The type of indices to calculate.

Value:"Screening", "Sobol", "Taguchi"
Default:"Screening"

New in version 6.1: added Taguchi indices.

This option selects between the following types of indicies (sensitivity scores):

  • Screening Indices allow estimation of feature scores with very limited budget. Its value may be considered as crude estimation of average partial derivative for each input.
  • Sobol Indices require significantly larger budget and they estimate the amount of variance of output described by each of input variables.
  • Taguchi Indices work with noisy problems, use small budget, but require special sample structure (orthogonal array).

For details on techniques, see Techniques.

GTSDA/Ranker/NoiseCorrection

Reduce effects of additive noise when calculating sensitivity indices. Works for main, total and interaction Sobol indices.

Value:Boolean or "Auto"
Default:"Auto"

New in version 6.6.

This option tells algorithm to take into account additive noise in function outputs. Noise in interpreted like a variable which is not present among inputs, which allows Sobol Indices to be corrected.

GTSDA/Ranker/NormalizeInputs

Scale continuous inputs to the unit hypercube to estimate importance of their relative changes.

Value:Boolean
Default:on

If enabled, GTSDA scales each continuous input variable to the \([0, 1]\) range when computing score values (variables of other types are not affected).

If you use Screening Indices, such scaling allows you to estimate importance of continuous input features with regard to their relative change instead of the absolute change in a feature value. However GTSDA/Ranker/NormalizeInputs does not affect the feature ranks of non-continuous inputs.

If you use Sobol Indices, this option has no effect on their values.

GTSDA/Ranker/Screening/MorrisGridJump

Specifies size of jump (in grid steps) when computing elementary effect.

Value:integer in range \([1, 2^{31}-2]\)
Default:5

This option works together with GTSDA/Ranker/Screening/MorrisGridLevels option and specifies the size of jump of each elementary effect over the grid. Note that size of jump should always be equal or smaller then half of grid levels.

GTSDA/Ranker/Screening/MorrisGridLevels

The number of discrete levels to use for each continuous input.

Value:integer in range \([2, 2^{31}-2]\)
Default:10

When computing Screening Indices, the range of each continuous factor is transformed to a discrete mesh. This option specifies the mesh size.

GTSDA/Ranker/Screening/MorrisGridLevels applies to continuous variables only, as a variable of other (non-continuous) type already defines its discrete mesh.

GTSDA/Ranker/Screening/Method

Method to use for computation of Screening Indices.

Value:"Morris", "Auto"
Default:"Auto"

This option allows user to select which technique to use to compute screening indices (at the moment only Morris technique is available).

GTSDA/Ranker/Sobol/IndicesType

Select type of Sobol index to be computed.

Value:"total", "main" or "interactions"
Default:"total"

New in version 6.6.

This option allows user to select what type of Sobol indices should be computed: main, total or interaction indices (see section Sobol Indices for details). Estimate of main index is usually more reliable, but main index takes into account only sole influence of the considered feature on the output ignoring the influence of cross-features interactions. Total index estimates total influence of the variable on the output, taking into account all possible interactions between the considered feature and other input features, but its estimate is generally less reliable. Interactions index estimate only variable influence coming from interaction terms, ignoring variable sole influence.

GTSDA/Ranker/Sobol/FAST/NumberCurves

The number of multistart FAST curves (see Sobol Indices: FAST Method).

Value:integer in range \([1, 2^{31}-2]\)
Default:1

This option allows performing multistart when building FAST space filling curves. It can potentially increase accuracy at the cost of increasing the budget requirements. If GTSDA/Ranker/Sobol/FAST/NumberCurves is set to \(n\), the minimum budget becomes \(65 \cdot dim(X) \cdot n\) instead of default \(65 \cdot dim(X)\) (see the budget requirements in the rank() method description).

GTSDA/Ranker/Sobol/Method

Method to use for computation of Sobol Indices.

Value:"CSTA", "EASI", "FAST", "Auto"
Default:"Auto"

This option allows user to select which technique to use to compute Sobol Indices. Note that EASI (see Sobol Indices: EASI Method) technique can only be selected for sample input, FAST (see Sobol Indices: FAST Method) and CSTA (see Sobol Indices: CSTA Method) techniques can be selected for both blackbox and sample input (in the latter case pSeven Core GT Approx model would be constructed internally to be used as blackbox model).

GTSDA/Ranker/Taguchi/LevelsNumber

Specifies the number of levels for variables in the orthogonal array design that is the input sample for Taguchi analysis.

Value:list of integers specifying the number of levels for each variable
Default:[]

New in version 6.1.

Changed in version 6.42: this option is no longer required for the Taguchi technique: if levels are not specified, the technique selects them automatically.

Taguchi analysis is intended to work with samples of an orthogonal array design (see Taguchi Indices). When you use the Taguchi technique in blackbox-based mode, that design is generated by GTSDA. In that case, you can optionally specify GTSDA/Ranker/Taguchi/LevelsNumber to select a certain number of levels for each continuous variable or only some of those (levels of non-continuous variables are defined when you add such variables). If this option is default, the technique assigns levels to variables automatically.

This option has the same syntax and features as GTDoE/OrthogonalArray/LevelsNumber — see its description for more details.

GTSDA/Ranker/Taguchi/Method

Technique to compute Taguchi Indices.

Value:"Auto", "Maximum", "Minimum", "Signal_to_noise"
Default:"Auto"

New in version 6.1.

This option allows user to select which technique to use to compute Taguchi Indices. Auto is equivalent to Maximum. Note that Signal_to_noise need more than one measurement at each point.

GTSDA/Ranker/Taguchi/RepeatsNumber

Number of blackbox computations in each point for Taguchi Indices sample.

Value:integer in range \([1, min(budget, 99)]\)
Default:\(1\)

New in version 6.1.

The number of blackbox computations in each point. This value should not exceed budget. It is recommended to set number of repeats to be divider of budget.

GTSDA/Ranker/VarianceEstimateRequired

Specifies if SDA should estimate confidence interval.

Value:Boolean
Default:off

New in version 6.6.

This option allows user to create confidence interval for Screening Indices and for first-order Sobol Indices using technique CSTA. The change does not have effect on values of Screening Indices and Sobol Indices.

GTSDA/SaveBlackboxData

Require SDA to save sample generated with provided blackbox (if any was created).

Value:Boolean
Default:on

If blackbox is provided as input for GTSDA it uses it to generate sample to do the analysis on. If blackbox is expensive it may be useful to save computations GTSDA did with the blackbox. Option allows user to specify if he needs GTSDA to save sample generated with blackbox.

GTSDA/SaveModel

Require SDA to save surrogate model constructed by tool (if any was created).

Value:Boolean
Default:on

Sometimes if selected technique requires specific design of experiment and only predefined sample is given, GTSDA may construct surrogate model based on provided sample to use it as a blackbox in further computations. The option specifies whether this constructed model should be returned by the procedure.

GTSDA/Seed

Fixed random seed.

Value:integer in range \([1, 2^{31}-2]\)
Default:100

This option sets fixed seed value, which is used in all randomized algorithms if GTSDA/Deterministic option is on. If GTSDA/Deterministic is off, the GTSDA/Seed value is ignored.

GTSDA/Selector/AddDel/FirstStep

Direction of search on first iteration of Add-Del algorithm.

Value:"Del", "Add"
Default:"Add"

This option specifies starting conditions for Add-Del algorithm

  • "Add" – algorithm starts with empty feature subset (or subset of always selected features if specified)
  • "Del" – algorithm starts with all features included in feature subset

After that common "AddDel" routine is run.

GTSDA/Selector/Criteria/ErrorAggregationType

Type of averaging to be used for aggregating errors from different outputs.

Works if GTSDA/Selector/QualityMeasure is set to ``”Error”`` (i.e. if Error-based feature selection is done).

Value:"Max", "Mean", "RMS"
Default:"Mean"

In case several outputs are given, tool needs the guide how it would aggregate errors for different outputs into one metric. After that such metric is used to select feature subset.

GTSDA/Selector/Criteria/MinImprovement

Maximum relative error increase which is allowed during iterations of feature selector.

Value:floating point number in range \([0, 100]\)
Default:\(0.05\)

Option sets minimum relative improvement that is considered significant by the tool. If no significant improvement is observed for GTSDA/Selector/Criteria/MaxLookAheadSteps steps the algorithm is thought to converge and stops.

GTSDA/Selector/Criteria/ErrorType

Type of error to be used as a quality criterion for selection.

Works if GTSDA/Selector/QualityMeasure is set to ``”Error”`` (i.e. if Error-based feature selection is done).

Value:"Max", "Mean", "Median", "Q_0.95", "Q_0.99", "RMS", "RRMS"
Default:"RRMS"

Option specifies type of error to consider as a quality criterion in the feature selection process. Different errors would emphasize different properties of the models and may lead to different feature subsets selected.

See Componentwise errors for error types description.

GTSDA/Selector/RequiredFeatures

Set of the input variables, which should be included in the selected subset.

Value:list of unique unsigned integers in range \([0, dim(X)-1]\) each
Default:[]

This option allow the user to specify set of indexes of input features. Features from the list are always selected and can not be removed from active subset during search.

GTSDA/Selector/Criteria/MaxLookAheadSteps

Number of iterations with error increase which are allowed during feature selection.

Value:integer in range \([0, 2^{31}-2]\)
Default:3

Option specifies number of steps algorithm would do before it stops if no GTSDA/Selector/Criteria/MinImprovement improvement in error value is observed.

GTSDA/Selector/Criteria/TargetError

Maximum error of approximation which is not distinguished from zero.

Works if GTSDA/Selector/QualityMeasure is set to ``”Error”`` (i.e. if Error-based feature selection is done).

Value:floating point number in range \([0, 100)\)
Default:\(10^{-5}\)

Option specifies the error that is considered to be satisfactory for the model. If algorithm is adding features to the active subset it would stop if target error is achieved. If algorithm removes features than it would continue removing features until model error is smaller than target error.

GTSDA/Selector/Dependency/SignificanceLevel

Change bound on which decision if correlation is statistically significant or not would be made.

Works if GTSDA/Selector/QualityMeasure is set to ``”Dependency”`` (i.e. if Dependency-based feature selection is done).

Value:floating point number in range \([0.001, 1)\)
Default:\(0.05\)
Option works in a following way:
  • GTSDA Checker estimates p-value (see Testing for significance) of correlation on given sample
  • If estimated p-value is smaller the GTSDA/Selector/Dependency/SignificanceLevel then it is assumed that correlation is statistically significant (see Testing for significance) and otherwise is not significant

GTSDA/Selector/Dependency/Type

Type of dependency to be checked when performing Dependency-based feature selection.

Value:"Auto", "Linear", "General"
Default:"Auto"

Possible types of dependency:

  • "Linear" – statistical tests will be performed for linear types of dependency
  • "General" – statistical tests will be performed for general types of dependency

GTSDA/Selector/QualityMeasure

Quality measure to use in feature selection.

Value:"Dependency", "Error"
Default:"Error"

Sets the type of quality measure that GTSDA will use when selecting features to be included into the optimal subset.

GTSDA/Selector/Technique

Technique which is used for feature selection.

Value:"Del", "Add", "AddDel", "Full"
Default:"AddDel"

Option specifies strategy for feature search.

  • "Del" – (Del) starts with all features selected and removes features one by one starting from the ones that have the smallest impact on model accuracy
  • "Add" – (Add) starts with no features selected and adds features one by one starting from the ones that have the highest impact on model accuracy
  • "AddDel" – (Add-Del) combines previous two approaches
  • "Full" – (Full Search) does full search by checking all possible feature combinations. Please note, that this technique can be used only for Error-based feature selection.

For details on techniques, see Methods.

GTSDA/Selector/TryAllFeaturesEveryStep

Forces algorithms of iterative feature selection to look at all features at each Add or Del step.

Value:Boolean
Default:off

If option is on than algorithm would try all features at each Add or Del iterations. If the option is off algorithm would pick the first found feature that improves error more than by GTSDA/Selector/Criteria/MinImprovement.

GTSDA/Selector/ValidationType

Type of validation which is us used for computing error of approximation.

Value:"Internal", "TrainSample", "TestSample"
Default:"Internal"

Option specifies how the model error would be computed

  • "Internal" – the error is computed using internal validation
  • "TrainSample" – the error is computed on train sample
  • "TestSample" – the error is computed on test sample (to do this test sample should be provided to the function)