11.11. `da.p7core.gtsda`¶

Generic Tool for Sensitivity and Dependency Analysis (GTSDA) module.

New in version 4.0.

>>> from da.p7core import gtsda

Classes

`da.p7core.gtsda.Analyzer`([backend])	GTSDA interface.
`da.p7core.gtsda.CheckResult`(status, info, …)	SDA Check final results.
`da.p7core.gtsda.RankResult`(status, info, …)	SDA Rank final results.
`da.p7core.gtsda.SelectResult`(status, info, …)	SDA Select final results.

11.11.1. `Analyzer` — analysis interface¶

class da.p7core.gtsda.Analyzer¶

GTSDA interface.

Allows user to perform SDA procedures:

Ranker.

Selector.

Checker.

check(x, y=None, **kwargs)¶

Perform SDA Checker procedure.

Parameters:

x (array-like) – input sample

y (array-like) – optional output sample

z (array-like) – optional control variables sample

options (dict) – a set of options

Returns:
SDA CheckResult

Return type:
CheckResult

license¶

Analyzer license.

Type: License

General license information interface. See section License Usage for details.

options¶

Analyzer options.

Type: Options

General options interface for the analyzer. See section Options Interface for usage and the GTSDA option reference.

rank(**kwargs)¶

Perform SDA Ranker procedure.

Parameters:

x (array-like) – input sample

y (array-like) – output sample

blackbox (Blackbox, gtopt.ProblemGeneric, or gtdoe.ProblemGeneric) – the blackbox to rank, incompatible with model

budget (int) – maximum number of blackbox evaluations

model (gtapprox.Model or gtdf.Model) – the model to rank, incompatible with blackbox

bounds (tuple(list[float], list[float])) – analysis space bounds (lower, upper)

options (dict) – a set of Ranker options

approx_options (dict) – a set of GTApprox options (used only in sample mode)

Returns:
analysis result

Return type:
RankResult

Changed in version 6.20: supports gtopt.ProblemGeneric or gtdoe.ProblemGeneric as blackbox; supports ranking GTApprox and GTDF models; added bounds to specify the lower and upper bounds for ranked inputs.

There are two ranking modes:

Sample-based: ranks variables using the feature (input) and response (output) data samples.

Blackbox-based: ranks inputs of the blackbox (Blackbox or a problem class) or model (GTApprox or GTDF model).

Valid argument combinations for modes:

Mode Required arguments Ignored arguments

sample-based x, y budget

blackbox-based blackbox, budget approx_options

blackbox-based (model) model, budget approx_options

Arguments required by different mode should not be combined: for example, specifying x, y, and model causes an exception.

Note

When ranking a model with input constraints or manually specified box bounds, note that you should specify bounds to rank() — otherwise the analysis space bounds will be determined from the model training sample data.

The optional bounds argument may be used in all modes and works as follows:

If you omit bounds:

In the sample-based mode: an internal model of responses is trained, using the x, y sample and applying approx_options, then model inputs are ranked in bounds determined from the sample (minimum and maximum sample values of inputs).

In the blackbox-based mode with Blackbox: inputs are ranked in the blackbox bounds (see da.p7core.blackbox.Blackbox.variables_bounds()).

In the blackbox-based mode with a model: inputs are ranked in bounds determined from the model’s training data (minimum and maximum training sample values of inputs).

If you specify bounds:

In the sample-based mode: an internal response model is trained, then model inputs are ranked in the specified bounds.

In the blackbox-based mode, both with Blackbox or a model: scores are calculated in the specified bounds.

Note that bounds are intended to specify a limited analysis domain (contract the blackbox or model bounds). Specifying bounds that intersect or extend the blackbox (model) input domain may lead to unexpected results.

score2rank(scores, method='average')¶

Compute ranking based on scores.

Parameters:

scores (array-like) – sensitivity indices

method ('max' or 'average') – method for aggregation of scores

Returns:
ranking

Return type:
array-like

The method transforms scores obtained by rank() into ranks which sort aggregated scores in descending order. In other words, ranks are the feature indices in order of decreasing importance.

Aggregation can be performed by one of the methods: average or max. It works only for multidimensional output.

If average is selected, the method averages scores over outputs before computing ranks. If max is selected, the method selects the maximum scores over outputs.

select(**kwargs)¶

Perform SDA Selector procedure.

Parameters:

x (array-like) – input sample

y (array-like) – output sample

x_test (array-like) – input sample

y_test (array-like) – output sample

ranking – a list of features indices in order of decreasing importance

options (dict) – a set of options

approx_options (dict) – a set of options for GTApprox

Returns:
SDA SelectResult

Return type:
SelectResult

There are three modes for error computation:

IV-based

Train sample-based

Test sample-based (test sample is needed)

Valid argument combinations for modes:

Passed arguments Mode Ignored arguments

x, y IV-based x_test, y_test

x, y Train sample-based x_test, y_test

x, y, x_test, y_test Test sample-based

set_logger()¶

Set logger.

Parameters: logger – logger object

Returns: none

set_watcher()¶

Set watcher.

Parameters: watcher – watcher object

Returns: none

11.11.2. `CheckResult` — correlation check result¶

class da.p7core.gtsda.CheckResult¶

SDA Checker final results.

An object of this class is only returned by Analyzer.check() and should never be instantiated by user.

decisions¶

Type: ndarray, 1D or 2D, bool

List of bools, where 1 means that correlation is statistically significant, while 0 means that it is not.

info¶

Type: dict

SDA Checker procedure information.

p_values¶

Type: ndarray, 1D or 2D, float

List of p-values of scores

scores¶

Type: ndarray, 1D or 2D, float

List of correlation scores of input variables.

The scores, p_values, decisions matrices contain \(dim(Y)\) rows and \(dim(X)\) columns (\(dim\) is dimensionality). In general each value is a float number, except some special cases:

In the sample-based mode, if the value of the j-th feature in the sample is constant (the input sample, x in Analyzer.check(), contains a constant column), all scores of this feature (j-th score matrix column) will be nan since there is no way to estimate the sensitivity of the output to a constant component.

In the sample-based mode, if the value of the i-th response component in the sample is constant (the output sample, y in Analyzer.check(), contains a constant column), the scores of all features vs this output (i-th score matrix row) will be 0.0 — it is assumed that this output is insensitive to all features since its value is constant.

The first of the above rules has priority: if the sample contains both a constant feature \(x_j\) and a constant output \(y_i\), the \(s_{ij}\) score is nan.

status¶

Finish status.

Type: Status

For details, see section Status.

11.11.3. `RankResult` — feature ranking result¶

class da.p7core.gtsda.RankResult¶

SDA Ranker final results.

An object of this class is only returned by Analyzer.rank() and should never be instantiated by user.

approx_model¶

Type: gtapprox.Model

If the method constructs GT Approx model inside to do some computations (usually it happens if sample input was provided for the tool and selected technique requires specific design of experiment) this field returns constructed model. Note that this field is present only if surrogate model was constructed with GTApprox inside the tool.

Option GTSDA/SaveModel may be used to enable or disable saving of constructed model (if it is disabled field approx_model is not created as well).

generated_sample¶

Type: dict, with fields "inputs" and "outputs", which are ndarray, 1D or 2D, float

If tool uses provided blackbox to generate new sample this field allows to get the sample. Note that this field is present only if sample was generated from blackbox inside the tool.

Option GTSDA/SaveBlackboxData may be used to enable or disable saving of generated sample (if it is disabled field generated_sample is not created as well).

info¶

Type: dict

SDA Ranker procedure information.

Note that if method computes more than one type of scores during it’s run, values of all scores may be retrieved from RankResult.info[‘Ranker’][‘Detailed info’] field.

scores¶

Type: ndarray, 1D or 2D, float

Resulting scores for the input variables (features).

Note that scores field only contains values of \(\mu^*\) for screening indices, total indices for sobol indices FAST technique and main indices for Sobol indices EASI technique as a singe most informative number. For deeper analysis all other coefficients may be taken from RankResult.info[‘Ranker’][‘Detailed info’] field.

The scores matrix contains \(dim(Y)\) rows and \(dim(X)\) columns (\(dim\) is dimensionality). Each element of this matrix \(s_{ij}\) is the sensitivity of the i-th output component to the j-th component of the input (a feature). In general, \(s_{ij}\) is a positive float number, except some special cases:

In the sample-based mode, if the value of the j-th feature in the sample is constant (the input sample, x in Analyzer.rank(), contains a constant column), all scores of this feature (j-th score matrix column) will be nan since there is no way to estimate the sensitivity of the output to a constant component.

In the sample-based mode, if the value of the i-th response component in the sample is constant (the output sample, y in Analyzer.rank(), contains a constant column), the scores of all features vs this output (i-th score matrix row) will be 0.0 — it is assumed that this output is insensitive to all features since its value is constant.

The first of the above rules has priority: if the sample contains both a constant feature \(x_j\) and a constant output \(y_i\), the \(s_{ij}\) score is nan.

In the blackbox-based mode, if the lower and upper bounds of a feature are equal (see bounds in Blackbox.add_variable()), it is interpreted as a constant input, so the resulting score for this feature will be nan, similarly to the sample-based mode with a constant column.

status¶

Finish status.

Type: Status

For details, see section Status.

std¶

type: ndarray, 1D or 2D, float

Scores standard deviation.

New in version 6.6.

Standard deviation matrix is structurally similar to the scores matrix: it also contains \(dim(Y)\) rows and \(dim(X)\) columns, and each element \(\sigma_{ij}\) is the standard deviation of the \(s_{ij}\) score. In general, \(\sigma_{ij}\) is a non-negative float number, except the following special cases:

If \(s_{ij}\) score is nan (such as the scores of constant features, see scores for details), \(\sigma_{ij}\) is also set to nan.

Estimation of std may fail due to insufficient data. Again, such \(\sigma\) values are set to nan. This may happen even if there is enough data to estimate the corresponding score (so the score is not nan, but its deviation is). One of the examples where it is possible is a blackbox-based SDA run with a blackbox that frequently outputs nan values: the output data may be sufficient to estimate its score, but insufficient to perform the cross-validation process that is used to calculate score deviation.

variances¶

Type: ndarray, 1D or 2D, float

Variances of scores.

New in version 6.6.

Matrix of variances is filled with elementwise squares of the std.

11.11.4. `SelectResult` — feature selection result¶

class da.p7core.gtsda.SelectResult¶

SDA Selector final results. An object of this class is only returned by Analyzer.select() and should never be instantiated by user.

approx_model¶

Final GTApprox model constructed on selected subset of features.

Option GTSDA/SaveModel may be used to enable or disable saving of final constructed model (if it is disabled field approx_model is not created as well).

Type: Model

feature_list¶

Type: ndarray, 1D, float

List of chosen input variables (features).

info¶

Type: dict

SDA Selector procedure information.

status¶

Finish status.

Type: Status

For details, see section Status.

validation_error¶

Type: float

Validation error for the model constructed on selected subset of features.