11.11. da.p7core.gtsda

Generic Tool for Sensitivity and Dependency Analysis (GTSDA) module.

New in version 4.0.

>>> from da.p7core import gtsda

Classes

da.p7core.gtsda.Analyzer([backend]) GTSDA interface.
da.p7core.gtsda.CheckResult(status, info, …) SDA Check final results.
da.p7core.gtsda.RankResult(status, info, …) SDA Rank final results.
da.p7core.gtsda.SelectResult(status, info, …) SDA Select final results.

11.11.1. Analyzer — analysis interface

class da.p7core.gtsda.Analyzer

GTSDA interface.

Allows user to perform SDA procedures:

  • Ranker.
  • Selector.
  • Checker.
check(x, y=None, **kwargs)

Perform SDA Checker procedure.

Parameters:
  • x (array-like) – input sample
  • y (array-like) – optional output sample
  • z (array-like) – optional control variables sample
  • options (dict) – a set of options
Returns:

SDA CheckResult

Return type:

CheckResult

license

Analyzer license.

Type:License

General license information interface. See section License Usage for details.

options

Analyzer options.

Type:Options

General options interface for the analyzer. See section Options Interface for usage and the GTSDA option reference.

rank(**kwargs)

Perform SDA Ranker procedure.

Parameters:
  • x (array-like) – input sample
  • y (array-like) – output sample
  • blackbox (Blackbox, gtopt.ProblemGeneric, or gtdoe.ProblemGeneric) – the blackbox to rank, incompatible with model
  • budget (int) – maximum number of blackbox evaluations
  • model (gtapprox.Model or gtdf.Model) – the model to rank, incompatible with blackbox
  • bounds (tuple(list[float], list[float])) – analysis space bounds (lower, upper)
  • options (dict) – a set of Ranker options
  • approx_options (dict) – a set of GTApprox options (used only in sample mode)
Returns:

analysis result

Return type:

RankResult

Changed in version 6.20: supports gtopt.ProblemGeneric or gtdoe.ProblemGeneric as blackbox; supports ranking GTApprox and GTDF models; added bounds to specify the lower and upper bounds for ranked inputs.

There are two ranking modes:

  • Sample-based: ranks variables using the feature (input) and response (output) data samples.
  • Blackbox-based: ranks inputs of the blackbox (Blackbox or a problem class) or model (GTApprox or GTDF model).

Valid argument combinations for modes:

Mode Required arguments Ignored arguments
sample-based x, y budget
blackbox-based blackbox, budget approx_options
blackbox-based (model) model, budget approx_options

Arguments required by different mode should not be combined: for example, specifying x, y, and model causes an exception.

Note

When ranking a model with input constraints or manually specified box bounds, note that you should specify bounds to rank() — otherwise the analysis space bounds will be determined from the model training sample data.

The optional bounds argument may be used in all modes and works as follows:

  • If you omit bounds:
    • In the sample-based mode: an internal model of responses is trained, using the x, y sample and applying approx_options, then model inputs are ranked in bounds determined from the sample (minimum and maximum sample values of inputs).
    • In the blackbox-based mode with Blackbox: inputs are ranked in the blackbox bounds (see da.p7core.blackbox.Blackbox.variables_bounds()).
    • In the blackbox-based mode with a model: inputs are ranked in bounds determined from the model’s training data (minimum and maximum training sample values of inputs).
  • If you specify bounds:
    • In the sample-based mode: an internal response model is trained, then model inputs are ranked in the specified bounds.
    • In the blackbox-based mode, both with Blackbox or a model: scores are calculated in the specified bounds.

Note that bounds are intended to specify a limited analysis domain (contract the blackbox or model bounds). Specifying bounds that intersect or extend the blackbox (model) input domain may lead to unexpected results.

score2rank(scores, method='average')

Compute ranking based on scores.

Parameters:
  • scores (array-like) – sensitivity indices
  • method ('max' or 'average') – method for aggregation of scores
Returns:

ranking

Return type:

array-like

The method transforms scores obtained by rank() into ranks which sort aggregated scores in descending order. In other words, ranks are the feature indices in order of decreasing importance.

Aggregation can be performed by one of the methods: average or max. It works only for multidimensional output.

If average is selected, the method averages scores over outputs before computing ranks. If max is selected, the method selects the maximum scores over outputs.

select(**kwargs)

Perform SDA Selector procedure.

Parameters:
  • x (array-like) – input sample
  • y (array-like) – output sample
  • x_test (array-like) – input sample
  • y_test (array-like) – output sample
  • ranking – a list of features indices in order of decreasing importance
  • options (dict) – a set of options
  • approx_options (dict) – a set of options for GTApprox
Returns:

SDA SelectResult

Return type:

SelectResult

There are three modes for error computation:

  • IV-based
  • Train sample-based
  • Test sample-based (test sample is needed)

Valid argument combinations for modes:

Passed arguments Mode Ignored arguments
x, y IV-based x_test, y_test
x, y Train sample-based x_test, y_test
x, y, x_test, y_test Test sample-based  
set_logger()

Set logger.

Parameters:logger – logger object
Returns:none
set_watcher()

Set watcher.

Parameters:watcher – watcher object
Returns:none

11.11.2. CheckResult — correlation check result

class da.p7core.gtsda.CheckResult

SDA Checker final results.

An object of this class is only returned by Analyzer.check() and should never be instantiated by user.

decisions
Type:ndarray, 1D or 2D, bool

List of bools, where 1 means that correlation is statistically significant, while 0 means that it is not.

info
Type:dict

SDA Checker procedure information.

p_values
Type:ndarray, 1D or 2D, float

List of p-values of scores

scores
Type:ndarray, 1D or 2D, float

List of correlation scores of input variables.

The scores, p_values, decisions matrices contain \(dim(Y)\) rows and \(dim(X)\) columns (\(dim\) is dimensionality). In general each value is a float number, except some special cases:

  • In the sample-based mode, if the value of the j-th feature in the sample is constant (the input sample, x in Analyzer.check(), contains a constant column), all scores of this feature (j-th score matrix column) will be nan since there is no way to estimate the sensitivity of the output to a constant component.
  • In the sample-based mode, if the value of the i-th response component in the sample is constant (the output sample, y in Analyzer.check(), contains a constant column), the scores of all features vs this output (i-th score matrix row) will be 0.0 — it is assumed that this output is insensitive to all features since its value is constant.
  • The first of the above rules has priority: if the sample contains both a constant feature \(x_j\) and a constant output \(y_i\), the \(s_{ij}\) score is nan.
status

Finish status.

Type:Status

For details, see section Status.

11.11.3. RankResult — feature ranking result

class da.p7core.gtsda.RankResult

SDA Ranker final results.

An object of this class is only returned by Analyzer.rank() and should never be instantiated by user.

approx_model
Type:gtapprox.Model

If the method constructs GT Approx model inside to do some computations (usually it happens if sample input was provided for the tool and selected technique requires specific design of experiment) this field returns constructed model. Note that this field is present only if surrogate model was constructed with GTApprox inside the tool.

Option GTSDA/SaveModel may be used to enable or disable saving of constructed model (if it is disabled field approx_model is not created as well).

generated_sample
Type:dict, with fields "inputs" and "outputs", which are ndarray, 1D or 2D, float

If tool uses provided blackbox to generate new sample this field allows to get the sample. Note that this field is present only if sample was generated from blackbox inside the tool.

Option GTSDA/SaveBlackboxData may be used to enable or disable saving of generated sample (if it is disabled field generated_sample is not created as well).

info
Type:dict

SDA Ranker procedure information.

Note that if method computes more than one type of scores during it’s run, values of all scores may be retrieved from RankResult.info[‘Ranker’][‘Detailed info’] field.

scores
Type:ndarray, 1D or 2D, float

Resulting scores for the input variables (features).

Note that scores field only contains values of \(\mu^*\) for screening indices, total indices for sobol indices FAST technique and main indices for Sobol indices EASI technique as a singe most informative number. For deeper analysis all other coefficients may be taken from RankResult.info[‘Ranker’][‘Detailed info’] field.

The scores matrix contains \(dim(Y)\) rows and \(dim(X)\) columns (\(dim\) is dimensionality). Each element of this matrix \(s_{ij}\) is the sensitivity of the i-th output component to the j-th component of the input (a feature). In general, \(s_{ij}\) is a positive float number, except some special cases:

  • In the sample-based mode, if the value of the j-th feature in the sample is constant (the input sample, x in Analyzer.rank(), contains a constant column), all scores of this feature (j-th score matrix column) will be nan since there is no way to estimate the sensitivity of the output to a constant component.
  • In the sample-based mode, if the value of the i-th response component in the sample is constant (the output sample, y in Analyzer.rank(), contains a constant column), the scores of all features vs this output (i-th score matrix row) will be 0.0 — it is assumed that this output is insensitive to all features since its value is constant.
  • The first of the above rules has priority: if the sample contains both a constant feature \(x_j\) and a constant output \(y_i\), the \(s_{ij}\) score is nan.
  • In the blackbox-based mode, if the lower and upper bounds of a feature are equal (see bounds in Blackbox.add_variable()), it is interpreted as a constant input, so the resulting score for this feature will be nan, similarly to the sample-based mode with a constant column.
status

Finish status.

Type:Status

For details, see section Status.

std
type:ndarray, 1D or 2D, float

Scores standard deviation.

New in version 6.6.

Standard deviation matrix is structurally similar to the scores matrix: it also contains \(dim(Y)\) rows and \(dim(X)\) columns, and each element \(\sigma_{ij}\) is the standard deviation of the \(s_{ij}\) score. In general, \(\sigma_{ij}\) is a non-negative float number, except the following special cases:

  • If \(s_{ij}\) score is nan (such as the scores of constant features, see scores for details), \(\sigma_{ij}\) is also set to nan.
  • Estimation of std may fail due to insufficient data. Again, such \(\sigma\) values are set to nan. This may happen even if there is enough data to estimate the corresponding score (so the score is not nan, but its deviation is). One of the examples where it is possible is a blackbox-based SDA run with a blackbox that frequently outputs nan values: the output data may be sufficient to estimate its score, but insufficient to perform the cross-validation process that is used to calculate score deviation.
variances
Type:ndarray, 1D or 2D, float

Variances of scores.

New in version 6.6.

Matrix of variances is filled with elementwise squares of the std.

11.11.4. SelectResult — feature selection result

class da.p7core.gtsda.SelectResult

SDA Selector final results. An object of this class is only returned by Analyzer.select() and should never be instantiated by user.

approx_model

Final GTApprox model constructed on selected subset of features.

Option GTSDA/SaveModel may be used to enable or disable saving of final constructed model (if it is disabled field approx_model is not created as well).

Type:Model
feature_list
Type:ndarray, 1D, float

List of chosen input variables (features).

info
Type:dict

SDA Selector procedure information.

status

Finish status.

Type:Status

For details, see section Status.

validation_error
Type:float

Validation error for the model constructed on selected subset of features.