11.11. da.p7core.gtsda
¶
Generic Tool for Sensitivity and Dependency Analysis (GTSDA) module.
New in version 4.0.
>>> from da.p7core import gtsda
Classes
da.p7core.gtsda.Analyzer ([backend]) |
GTSDA interface. |
da.p7core.gtsda.CheckResult (status, info, …) |
SDA Check final results. |
da.p7core.gtsda.RankResult (status, info, …) |
SDA Rank final results. |
da.p7core.gtsda.SelectResult (status, info, …) |
SDA Select final results. |
11.11.1. Analyzer
— analysis interface¶
- class
da.p7core.gtsda.
Analyzer
¶GTSDA interface.
Allows user to perform SDA procedures:
- Ranker.
- Selector.
- Checker.
check
(x, y=None, **kwargs)¶Perform SDA Checker procedure.
Parameters:
- x (array-like) – input sample
- y (array-like) – optional output sample
- z (array-like) – optional control variables sample
- options (
dict
) – a set of optionsReturns: SDA CheckResult
Return type:
license
¶Analyzer license.
Type: License
General license information interface. See section License Usage for details.
options
¶Analyzer options.
Type: Options
General options interface for the analyzer. See section Options Interface for usage and the GTSDA option reference.
rank
(**kwargs)¶Perform SDA Ranker procedure.
Parameters:
- x (array-like) – input sample
- y (array-like) – output sample
- blackbox (
Blackbox
,gtopt.ProblemGeneric
, orgtdoe.ProblemGeneric
) – the blackbox to rank, incompatible with model- budget (
int
) – maximum number of blackbox evaluations- model (
gtapprox.Model
orgtdf.Model
) – the model to rank, incompatible with blackbox- bounds (
tuple(list[float], list[float])
) – analysis space bounds (lower, upper)- options (
dict
) – a set of Ranker options- approx_options (
dict
) – a set of GTApprox options (used only in sample mode)Returns: analysis result
Return type: Changed in version 6.20: supports
gtopt.ProblemGeneric
orgtdoe.ProblemGeneric
as blackbox; supports ranking GTApprox and GTDF models; added bounds to specify the lower and upper bounds for ranked inputs.There are two ranking modes:
- Sample-based: ranks variables using the feature (input) and response (output) data samples.
- Blackbox-based: ranks inputs of the blackbox (
Blackbox
or a problem class) or model (GTApprox or GTDF model).Valid argument combinations for modes:
Mode Required arguments Ignored arguments sample-based x, y budget blackbox-based blackbox, budget approx_options blackbox-based (model) model, budget approx_options Arguments required by different mode should not be combined: for example, specifying x, y, and model causes an exception.
Note
When ranking a model with input constraints or manually specified box bounds, note that you should specify bounds to
rank()
— otherwise the analysis space bounds will be determined from the model training sample data.The optional bounds argument may be used in all modes and works as follows:
- If you omit bounds:
- In the sample-based mode: an internal model of responses is trained, using the x, y sample and applying approx_options, then model inputs are ranked in bounds determined from the sample (minimum and maximum sample values of inputs).
- In the blackbox-based mode with
Blackbox
: inputs are ranked in the blackbox bounds (seeda.p7core.blackbox.Blackbox.variables_bounds()
).- In the blackbox-based mode with a model: inputs are ranked in bounds determined from the model’s training data (minimum and maximum training sample values of inputs).
- If you specify bounds:
- In the sample-based mode: an internal response model is trained, then model inputs are ranked in the specified bounds.
- In the blackbox-based mode, both with
Blackbox
or a model: scores are calculated in the specified bounds.Note that bounds are intended to specify a limited analysis domain (contract the blackbox or model bounds). Specifying bounds that intersect or extend the blackbox (model) input domain may lead to unexpected results.
score2rank
(scores, method='average')¶Compute ranking based on scores.
Parameters:
- scores (array-like) – sensitivity indices
- method (
'max'
or'average'
) – method for aggregation of scoresReturns: ranking
Return type: The method transforms scores obtained by
rank()
into ranks which sort aggregated scores in descending order. In other words, ranks are the feature indices in order of decreasing importance.Aggregation can be performed by one of the methods:
average
ormax
. It works only for multidimensional output.If
average
is selected, the method averages scores over outputs before computing ranks. Ifmax
is selected, the method selects the maximum scores over outputs.
select
(**kwargs)¶Perform SDA Selector procedure.
Parameters:
- x (array-like) – input sample
- y (array-like) – output sample
- x_test (array-like) – input sample
- y_test (array-like) – output sample
- ranking – a list of features indices in order of decreasing importance
- options (
dict
) – a set of options- approx_options (
dict
) – a set of options for GTApproxReturns: SDA SelectResult
Return type: There are three modes for error computation:
- IV-based
- Train sample-based
- Test sample-based (test sample is needed)
Valid argument combinations for modes:
Passed arguments Mode Ignored arguments x, y IV-based x_test, y_test x, y Train sample-based x_test, y_test x, y, x_test, y_test Test sample-based
set_logger
()¶Set logger.
Parameters: logger – logger object Returns: none
set_watcher
()¶Set watcher.
Parameters: watcher – watcher object Returns: none
11.11.2. CheckResult
— correlation check result¶
- class
da.p7core.gtsda.
CheckResult
¶SDA Checker final results.
An object of this class is only returned by
Analyzer.check()
and should never be instantiated by user.
decisions
¶
Type: ndarray
, 1D or 2D,bool
List of bools, where 1 means that correlation is statistically significant, while 0 means that it is not.
info
¶
Type: dict
SDA Checker procedure information.
p_values
¶
Type: ndarray
, 1D or 2D,float
List of p-values of scores
scores
¶
Type: ndarray
, 1D or 2D,float
List of correlation scores of input variables.
The
scores
,p_values
,decisions
matrices contain \(dim(Y)\) rows and \(dim(X)\) columns (\(dim\) is dimensionality). In general each value is afloat
number, except some special cases:
- In the sample-based mode, if the value of the j-th feature in the sample is constant (the input sample, x in
Analyzer.check()
, contains a constant column), all scores of this feature (j-th score matrix column) will benan
since there is no way to estimate the sensitivity of the output to a constant component.- In the sample-based mode, if the value of the i-th response component in the sample is constant (the output sample, y in
Analyzer.check()
, contains a constant column), the scores of all features vs this output (i-th score matrix row) will be0.0
— it is assumed that this output is insensitive to all features since its value is constant.- The first of the above rules has priority: if the sample contains both a constant feature \(x_j\) and a constant output \(y_i\), the \(s_{ij}\) score is
nan
.
11.11.3. RankResult
— feature ranking result¶
- class
da.p7core.gtsda.
RankResult
¶SDA Ranker final results.
An object of this class is only returned by
Analyzer.rank()
and should never be instantiated by user.
approx_model
¶
Type: gtapprox.Model
If the method constructs GT Approx model inside to do some computations (usually it happens if sample input was provided for the tool and selected technique requires specific design of experiment) this field returns constructed model. Note that this field is present only if surrogate model was constructed with GTApprox inside the tool.
Option GTSDA/SaveModel may be used to enable or disable saving of constructed model (if it is disabled field
approx_model
is not created as well).
generated_sample
¶
Type: dict
, with fields"inputs"
and"outputs"
, which arendarray
, 1D or 2D,float
If tool uses provided blackbox to generate new sample this field allows to get the sample. Note that this field is present only if sample was generated from blackbox inside the tool.
Option GTSDA/SaveBlackboxData may be used to enable or disable saving of generated sample (if it is disabled field
generated_sample
is not created as well).
info
¶
Type: dict
SDA Ranker procedure information.
Note that if method computes more than one type of scores during it’s run, values of all scores may be retrieved from RankResult.info[‘Ranker’][‘Detailed info’] field.
scores
¶
Type: ndarray
, 1D or 2D,float
Resulting scores for the input variables (features).
Note that scores field only contains values of \(\mu^*\) for screening indices, total indices for sobol indices FAST technique and main indices for Sobol indices EASI technique as a singe most informative number. For deeper analysis all other coefficients may be taken from RankResult.info[‘Ranker’][‘Detailed info’] field.
The scores matrix contains \(dim(Y)\) rows and \(dim(X)\) columns (\(dim\) is dimensionality). Each element of this matrix \(s_{ij}\) is the sensitivity of the i-th output component to the j-th component of the input (a feature). In general, \(s_{ij}\) is a positive
float
number, except some special cases:
- In the sample-based mode, if the value of the j-th feature in the sample is constant (the input sample, x in
Analyzer.rank()
, contains a constant column), all scores of this feature (j-th score matrix column) will benan
since there is no way to estimate the sensitivity of the output to a constant component.- In the sample-based mode, if the value of the i-th response component in the sample is constant (the output sample, y in
Analyzer.rank()
, contains a constant column), the scores of all features vs this output (i-th score matrix row) will be0.0
— it is assumed that this output is insensitive to all features since its value is constant.- The first of the above rules has priority: if the sample contains both a constant feature \(x_j\) and a constant output \(y_i\), the \(s_{ij}\) score is
nan
.- In the blackbox-based mode, if the lower and upper bounds of a feature are equal (see bounds in
Blackbox.add_variable()
), it is interpreted as a constant input, so the resulting score for this feature will benan
, similarly to the sample-based mode with a constant column.
std
¶
type: ndarray
, 1D or 2D,float
Scores standard deviation.
New in version 6.6.
Standard deviation matrix is structurally similar to the
scores
matrix: it also contains \(dim(Y)\) rows and \(dim(X)\) columns, and each element \(\sigma_{ij}\) is the standard deviation of the \(s_{ij}\) score. In general, \(\sigma_{ij}\) is a non-negativefloat
number, except the following special cases:
- If \(s_{ij}\) score is
nan
(such as the scores of constant features, seescores
for details), \(\sigma_{ij}\) is also set tonan
.- Estimation of std may fail due to insufficient data. Again, such \(\sigma\) values are set to
nan
. This may happen even if there is enough data to estimate the corresponding score (so the score is notnan
, but its deviation is). One of the examples where it is possible is a blackbox-based SDA run with a blackbox that frequently outputsnan
values: the output data may be sufficient to estimate its score, but insufficient to perform the cross-validation process that is used to calculate score deviation.
11.11.4. SelectResult
— feature selection result¶
- class
da.p7core.gtsda.
SelectResult
¶SDA Selector final results. An object of this class is only returned by
Analyzer.select()
and should never be instantiated by user.
approx_model
¶Final GTApprox model constructed on selected subset of features.
Option GTSDA/SaveModel may be used to enable or disable saving of final constructed model (if it is disabled field
approx_model
is not created as well).
Type: Model
feature_list
¶
Type: ndarray
, 1D,float
List of chosen input variables (features).
info
¶
Type: dict
SDA Selector procedure information.
validation_error
¶
Type: float
Validation error for the model constructed on selected subset of features.