Generic Tool for Dimension Reduction (GT DR)

Complex objects are described by a large number of parameters, and it is often desirable to reduce the dimensionality of descriptions, say for the purposes of easier parameterization, optimization or visualization. pSeven Core has several tools handling most aspects of dimensionality reduction.

Generalized Principal Component Analysis

Generic Tool for Generalized Principal Component Analysis (GT GPCA)

If a collection of objects is represented as a set of multi-dimensional points, the tool approximates this set with a smooth hyper-surface and produces compression and decompression procedures which allow the user to:

  • automaticaly re-parameterize the objects with a smaller number of parameters,
  • generate similar objects,
  • visualize the intrinsic geometry of the set

The tool contains several different nonlinear unsupervised dimension reduction techniques, so that the most appropriate technique can be chosen depending on the amount, dimensionality and noisiness of data.

The reduced dimensionality can be selected manually or it can be found automaticallybased on the desired compression/reconstruction error.

Supervised Linear Variable Extraction

slve

This tool finds linear combinations of the original parameters (features) having the greatest influence on the response function.

  • If a collection of objects is represented as a set of multi-dimensional points, the tool approximates this set with a hyperplane and produces compression and decompression procedures approximately preserving the responses of the objects.
  • The number of required features is estimated automatically and may be manually changed without retraining the procedures.
  • The tool uses an original technique developed by Datadvance: VEGA - Variable Extraction via Gradient Approximation.
  • The tool can estimate scores showing which original variables contribute the most to the obtained features.
  • The tool may work with the user-provided dataset, or it can itself generate a suitable dataset allowing for more accurate results if the user provides means to compute the response function.