11.8. da.p7core.gtdr

Generic Tool for Dimension Reduction (GTDR) module.

>>> from da.p7core import gtdr

Classes

da.p7core.gtdr.Builder([backend]) Dimension reduction model builder.
da.p7core.gtdr.ExportedFormat Enumerates available export formats.
da.p7core.gtdr.GradMatrixOrder Enumerates available gradient output modes.
da.p7core.gtdr.Model([file]) Dimension reduction model.

11.8.1. Builder — model builder

class da.p7core.gtdr.Builder(backend=None)

Dimension reduction model builder.

build(x=None, y=None, dim=None, error=None, blackbox=None, budget=None, options=None, comment=None, annotations=None, x_meta=None)

Train a dimension reduction model.

Parameters:
  • x (array-like, 2D) – training sample, input part (values of variables)
  • y (array-like, 1D or 2D) – training sample, optional response part (function values)
  • dim (int, long) – output dimension, optional
  • error (float) – error threshold, optional
  • blackbox (Blackbox, gtapprox.Model, or gtdf.Model) – Feature Extraction blackbox, optional
  • budget (int, long) – blackbox budget
  • options (dict) – option settings
  • comment (str) – text comment
  • annotations (dict) – extended comment and notes
  • x_meta (list) – descriptions of inputs
Returns:

trained model

Return type:

Model

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as the x, y training samples.

Trains a dimension reduction model (codec) which provides vector compression and decompression methods.

In the sample-based modes, x is always a 2D array because there is no meaningful interpretation of an 1D array. However, 1D y is supported as a simplified form for the case of 1D output when using sample-based Feature Extraction.

Valid argument combinations and mode selection:

Passed arguments Technique Optional arguments Ignored arguments
x, dim dimension-based - y, error, blackbox, budget
x, error error-based - y, dim, blackbox, budget
x, y Feature Extraction dim error, blackbox, budget
blackbox, budget Feature Extraction dim x, y, error

All other combinations of arguments are invalid.

Example:

>>> from da.p7core.gtdr import Builder
>>> sample = [[ 0.1, 0.2, 0.3, 0.4],
              [ 0.2, 0.3, 0.4, 0.41],
              [ 0.3, 0.4, 0.5, 0.39]]
>>> model = Builder().build(sample, dim=1)
>>> model.original_dim
4
>>> model.compressed_dim
1

Changed in version 6.14: added the comment, annotations, and x_meta parameters.

The comment and annotations parameters add optional notes to model. The comment string is stored to the model’s comment. The annotations dictionary can contain more notes or other supplementary information; all keys and values in annotations must be strings. Annotations are stored to the model’s annotations. After training a model, you can also edit its comment and annotations using modify().

The x_meta parameter adds names and descriptions of model inputs. It is a list of length equal to the number of inputs, or the number of columns in x. List element can be a string (Unicode) or a dictionary. A string specifies a name for the respective input. It must be a valid identifier according to the FMI standard, so there are certain restrictions for names (see below). A dictionary describes a single input and can have the following keys (all keys are optional, all values must be str or unicode):

  • "name": contains the name for this input. If this key is omitted, a default name will be saved to the model — "x[i]" where i is the index of the respective column in x.
  • "description": contains a brief description, any text.
  • "quantity": physical quantity, for example "Angle" or "Energy".
  • "unit": measurement units used for this input, for example "deg" or "J".

Names of inputs and outputs must satisfy the following rules:

  • Name must not be empty.
  • All names must be unique.
  • The only whitespace character allowed in names is the ASCII space, so \\t, \\n, \\r, and various Unicode whitespace characters are prohibited.
  • Name cannot contain leading or trailing spaces, and cannot contain two or more consecutive spaces.
  • Name cannot contain leading or trailing dots, and cannot contain two or more consecutive dots, since dots are commonly used as name separators.
  • Parts of the name separated by dots must not begin or end with a space, so the name cannot contain '. ' or ' .'.
  • Name cannot contain control characters and Unicode separators. Prohibited Unicode character categories are: Cc, Cf, Cn, Co, Cs, Zl, Zp, Zs.
  • Name cannot contain characters from this set: :"/\\|?*.

Input descriptions are stored to model details (the "Input Variables" key). If you do not specify a name or description for some input, its information in details contains only the default name ("x[i]"). When you export a model, input descriptions are found in the comments in the exported code.

license

Builder license.

Type:License

General license information interface. See section License Usage for details.

options

Builder options.

Type:Options

General options interface for the builder. See section Options Interface for usage and the GTDR Option Reference.

set_logger(logger)

Set logger.

Parameters:logger – logger object
Returns:None

Used to set up a logger for the build process. See section Loggers for details.

set_watcher(watcher)

Set watcher.

Parameters:watcher – watcher object
Returns:None

Used to set up a watcher for the build process. See section Watchers for details.

11.8.2. ExportedFormat — model export formats

class da.p7core.gtdr.ExportedFormat

Enumerates available export formats.

OCTAVE

Octave format.

Alias: "octave".

OCTAVE_MEX

C source for a MEX file.

Aliases: "octave_mex", "mex".

C99_PROGRAM

C source with the main() function for a complete command-line based C program.

Aliases: "c99_program", "c_program", "program".

C99_HEADER

C header of the target function.

Aliases: "c99_header", "c_header", "header".

C99_SOURCE

C header and implementation of the target function.

Aliases: "c99_source", "c_source", "c".

EXCEL_DLL

C implementation of the model intended for creating a DLL compatible with Microsoft Excel.

Aliases: "excel_dll", "excel".

11.8.3. GradMatrixOrder — model gradients order

class da.p7core.gtdr.GradMatrixOrder

Enumerates available gradient output modes.

F_MAJOR

Indexed in function-major order (\(grad_{ij} = \frac{df_i}{dx_j}\)).

X_MAJOR

Indexed in variable-major order (\(grad_{ij} = \frac{df_j}{dx_i}\)).

11.8.4. Model — dimension reduction model

class da.p7core.gtdr.Model(file=None, **kwargs)

Dimension reduction model.

Can be created by Builder or loaded from a file via the Model constructor.

Model objects are immutable. All methods which are meant to change the model return a new Model instance.

annotations

Extended comment or supplementary information.

Type:dict

New in version 6.14.

The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using modify().

build_log

Model building log.

Type:str
comment

Text comment to the model.

Type:str

New in version 6.14.

Optional plain text comment to the model. You can add the comment when training a model and edit it using modify().

compress(vec, dim=None)

Compression method.

Parameters:
  • vec (array-like, 2D or 1D) – vector(s) to compress
  • dim (int, long) – required dimension
Returns:

compressed vectors

Return type:

pandas.DataFrame or pandas.Series if vec is a pandas type; otherwise ndarray, 2D

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.

Compresses a vector (1D) or each of vectors in a batch (2D). Vector length must be equal to original_dim.

When using pandas, the return type is the same as the vec type. Also in this case the returned array keeps indexing of vec.

compress_export_to(format, name, description, file, dim=None, single_file=None)

Export the compression procedure to a source file in specified format.

Parameters:
  • format (ExportedFormat or str) – source code format
  • name (str) – exported function name
  • description (str) – additional comment
  • file (file-like, str, zipfile.ZipFile, tarfile.TarFile) – export file or path
  • dim (int or long) – required compressed dimension
  • single_file (bool) – export sources as a single file (default) or multiple files (False)
Returns:

None

Raise:

GTException if name is empty and format is not C99_PROGRAM

New in version 6.10: added str aliases for export formats.

Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.

Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.

The source code format can be specified using an enumeration or a string alias — see ExportedFormat for details.

By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to False. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.

To pack source files into an archive, you can pass a zipfile.ZipFile or tarfile.TarFile object as file, or specify a path to the file wit an archive type extension. Recognized extensions are: .zip, .tar, .tgz, .tar.gz, .taz, .tbz, .tbz2, .tar.bz2.

The name argument is optional if format is C99_PROGRAM. For other source code formats, an empty name raises an exception.

The description provides an additional comment, which is added on top of the generated source file.

compressed_dim

Compressed vector dimension.

Type:long or None

This attribute is None if the model supports variable-dimension compression.

decompress(vec)

Decompression method.

Parameters:vec (array-like, 2D or 1D) – vector(s) to decompress
Returns:decompressed vectors
Return type:pandas.DataFrame or pandas.Series if vec is a pandas type; otherwise ndarray, 2D

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.

Decompresses a vector (1D) or each of the vectors in a batch (2D). The vector length should be equal to compressed_dim for a model with fixed-dimension compression, and no more than original_dim for a model with variable-dimension compression.

When using pandas, the return type is the same as the vec type. Also in this case the returned array keeps indexing of vec.

decompress_export_to(format, name, description, file, dim=None, single_file=None)

Export the decompression procedure to a source file in specified format.

Parameters:
  • format (ExportedFormat or str) – source code format
  • name (str) – exported function name
  • description (str) – additional comment
  • file (file-like, str, zipfile.ZipFile, tarfile.TarFile) – export file or path
  • dim (int or long) – required compressed dimension
  • single_file (bool) – export sources as a single file (default) or multiple files (False)
Returns:

None

Raise:

GTException if name is empty and format is not C99_PROGRAM

New in version 6.10: added str aliases for export formats.

Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.

Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.

The source code format can be specified using an enumeration or a string alias — see ExportedFormat for details.

By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to False. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.

To pack source files into an archive, you can pass a zipfile.ZipFile or tarfile.TarFile object as file, or specify a path to the file wit an archive type extension. Recognized extensions are: .zip, .tar, .tgz, .tar.gz, .taz, .tbz, .tbz2, .tar.bz2.

The name argument is optional if format is C99_PROGRAM. For other source code formats, an empty name raises an exception.

The description provides an additional comment, which is added on top of the generated source file.

details

Detailed model information.

Type:dict

New in version 6.14.

Changed in version 6.14.3: added training time.

Changed in version 6.16: added training warnings.

Contains training information and descriptions of model inputs specified when training the model or added using modify().

The details dictionary has the following keys:

  • "Input Variables" — model input descriptions.
  • "Issues" — training warnings extracted from build_log.
  • "Training Time" — time statistics.

The value under the "Input Variables" key is a list of descriptions for original model inputs, list length is original_dim. List order follows the order of columns in the training data sample. Each list element is a dictionary describing a single input. This dictionary has the following keys:

  • "name" (str) — contains the name of respective input. This key always exists. If a name for this input was never specified, a default name (x[i]) is stored here.
  • "description" (str) — contains a brief description for the input. This key exists only if the description was specified by user.
  • "quantity" (str) — physical quantity of this input. This key exists only if variable’s quantity was specified by user.
  • "unit" (str) — measurement units used for this input. This key exists only if measurement units were specified by user.

The value under the "Issues" key is a dictionary where a key is a string identifying the source of a warning, and value is a list containing all warnings (as strings) collected from this source.

The value under the "Training Time" key is a dictionary with the following keys:

  • "Start" (str) — training start time.
  • "Finish" (str) — finish time.
  • "Total" (str) — the difference between the start and finish times.

Note that the total is wall time, which may be different from the real time spent in training. For example, if you run training on a laptop and it enters the suspend mode (sleeps) during training, the suspend period is included in the total time, while training was actually paused during suspend.

fromstring(modelString)

Deserialize a model from string.

Parameters:modelString (str) – serialized model
Returns:None
gradCompress(vec, dim=None, order=0)

Evaluate compression transformation gradient.

Parameters:
  • vec (array-like, 2D or 1D) – vector(s) to evaluate
  • dim (int, long) – required dimension
  • order (GradMatrixOrder) – gradient matrix order
Returns:

gradients

Return type:

pandas.DataFrame if vec is a pandas type; otherwise ndarray, 3D or 2D

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.

Evaluates compression transformation gradients for a data sample (if vec is a 2D array-like) or a single vector (if vec is 1D). The returned array is 3D if vec is a sample, and 2D if vec is a single vector.

When using pandas data samples (vec is a pandas.DataFrame), a 3D array in return value is represented by a pandas.DataFrame with multi-indexing (pandas.MultiIndex). In this case, the first element of the multi-index is the index of a vector from the input sample. The second element of the multi-index is:

  • the index or name of a model’s output, if order is F_MAJOR (default)
  • the index or name of a model’s input, if order is X_MAJOR

When vec is a pandas.Series, its index becomes the row index of the returned pandas.DataFrame.

gradDecompress(vec, order=0)

Evaluate decompression transformation gradient.

Parameters:
Returns:

gradients

Return type:

pandas.DataFrame if vec is a pandas type; otherwise ndarray, 3D or 2D

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.

Evaluates decompression transformation gradients for a data sample (if vec is a 2D array-like) or a single vector (if vec is 1D). The returned array is 3D if vec is a sample, and 2D if vec is a single vector.

When using pandas data samples (vec is a pandas.DataFrame), a 3D array in return value is represented by a pandas.DataFrame with multi-indexing (pandas.MultiIndex). In this case, the first element of the multi-index is the index of a vector from the input sample. The second element of the multi-index is:

  • the index or name of a model’s output, if order is F_MAJOR (default)
  • the index or name of a model’s input, if order is X_MAJOR

When vec is a pandas.Series, its index becomes the row index of the returned pandas.DataFrame.

has_variable_compression

Variable-dimension compression support.

Type:bool

If True, the model supports variable-dimension compression.

info

Model description.

Type:dict

Contains all technical information which can be gathered from the model.

license

Model license.

Type:License

General license information interface. See section License Usage for details.

load(file)

Load a model from file.

Parameters:file (file or str) – file object or path
Returns:None

Deprecated since version 6.29: use Model constructor instead.

modify(comment=None, annotations=None, x_meta=None)

Create a copy of the model with modified metainformation.

Parameters:
  • comment (str) – new comment
  • annotations (dict) – new annotations
  • x_meta (list) – descriptions of inputs
Returns:

model copy with modified information

Return type:

Model

New in version 6.14.

This method is intended to edit model annotations, comment, and input descriptions found in details. Parameters are similar to build() — see the full description there. If a parameter is None, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.

Note that modify() returns a new modified model, which is identical to the original except your edits to the model metainformation.

original_dim

Original (uncompressed) vector dimension.

Type:long
save(file)

Save the model to file.

Parameters:file (file or str) – file object or path
Returns:None
save_to_octave_compress(function, file)

Deprecated since version 3.0 Release Candidate 1: use compress_export_to() instead.

Since version 3.0 Release Candidate 1, this method no longer exists and is completely replaced by compress_export_to().

save_to_octave_decompress(function, file)

Deprecated since version 3.0 Release Candidate 1: use decompress_export_to() instead.

Since version 3.0 Release Candidate 1, this method no longer exists and is completely replaced by decompress_export_to().

tostring()

Serialize the model.

Returns:serialized model
Return type:str
modify(comment=None, annotations=None, x_meta=None)

Create a copy of the model with modified metainformation.

Parameters:
  • comment (str) – new comment
  • annotations (dict) – new annotations
  • x_meta (list) – descriptions of inputs
Returns:

model copy with modified information

Return type:

Model

New in version 6.14.

This method is intended to edit model annotations, comment, and input descriptions found in details. Parameters are similar to build() — see the full description there. If a parameter is None, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.

Note that modify() returns a new modified model, which is identical to the original except your edits to the model metainformation.