5.1. IntroductionΒΆ

Generic Tool for Sensitivity and Dependency Analysis (GTSDA) allows to perform sensitivity analysis, correlation tests and forward/backward feature selection.

There are three main groups of methods in GTSDA, namely methods for correlation analysis, sensitivity analysis and feature selection.

The methods are designed to be used together with a goal to obtain the most informative subset of features, while each group of methods does its own job:

  • Correlation analysis methods check whether dependency between each input and output exists and is statistically significant.
  • Sensitivity analysis methods rank features according to their importance.
  • Feature selection finds the most suitable feature subset to construct approximation.

The detection of informative subset of features is motivated by real world applications. Modern problems of data analysis involve modeling of high dimensional functions with a limited sample size. The performance of state of the art approximation and optimization techniques is significantly influenced by these two parameters. More specifically, approximation methods work well only when the sample size is significantly large compared to dimension of the problem, and optimization algorithms also work worse when the input dimension is increased. Normally, the sample size of a given problem is fixed and the actual opportunity is to detect which input features are irrelevant for the problem and use for modeling and optimization only informative features. We suggest to do this in a few steps which are summarized in the figure below.

../../_images/intro.png

The first step is to detect relevant features with correlation analysis methods and drop irrelevant features. These methods are described in section Correlation Analysis. The result of this procedure is a list of decisions whether specific feature is correlated with output or not. These decisions are computed based on correlation values computed and the significance level set by user.

The next step is to rank relevant features detected on the previous step in order of their importance by means of sensitivity analysis algorithms described in section Sensitivity Analysis. The result of sensitivity analysis is set of sensitivity indices which measure how variations of input parameters influence the variation of outputs. The features which have bigger sensitivity indices are generally more important for the model in question. Sorting input parameters in order of their importance allows to perform feature selection in an efficient way.

Finally, in case of solving approximation problem feature selection should be performed to obtain best subset of features for approximation construction based on the ranking of their importance from previous step. The accuracy of the approximation model is measured by certain criterion set by user. For details, see section Feature Selection.