11.8. `da.p7core.gtdr`¶

Generic Tool for Dimension Reduction (GTDR) module.

>>> from da.p7core import gtdr

Classes

`da.p7core.gtdr.Builder`([backend])	Dimension reduction model builder.
`da.p7core.gtdr.ExportedFormat`	Enumerates available export formats.
`da.p7core.gtdr.GradMatrixOrder`	Enumerates available gradient output modes.
`da.p7core.gtdr.Model`([file])	Dimension reduction model.

11.8.1. `Builder` — model builder¶

class da.p7core.gtdr.Builder(backend=None)¶

Dimension reduction model builder.

build(x=None, y=None, dim=None, error=None, blackbox=None, budget=None, options=None, comment=None, annotations=None, x_meta=None)¶

Train a dimension reduction model.

Parameters:	x (array-like, 2D) – training sample, input part (values of variables) y (array-like, 1D or 2D) – training sample, optional response part (function values) dim (`int`, `long`) – output dimension, optional error (`float`) – error threshold, optional blackbox (`Blackbox`, `gtapprox.Model`, or `gtdf.Model`) – Feature Extraction blackbox, optional budget (`int`, `long`) – blackbox budget options (`dict`) – option settings comment (`str`) – text comment annotations (`dict`) – extended comment and notes x_meta (`list`) – descriptions of inputs
Returns:	trained model
Return type:	`Model`

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as the x, y training samples.

Trains a dimension reduction model (codec) which provides vector compression and decompression methods.

In the sample-based modes, x is always a 2D array because there is no meaningful interpretation of an 1D array. However, 1D y is supported as a simplified form for the case of 1D output when using sample-based Feature Extraction.

Valid argument combinations and mode selection:

Passed arguments	Technique	Optional arguments	Ignored arguments
x, dim	dimension-based	-	y, error, blackbox, budget
x, error	error-based	-	y, dim, blackbox, budget
x, y	Feature Extraction	dim	error, blackbox, budget
blackbox, budget	Feature Extraction	dim	x, y, error

All other combinations of arguments are invalid.

Example:

>>> from da.p7core.gtdr import Builder
>>> sample = [[ 0.1, 0.2, 0.3, 0.4],
              [ 0.2, 0.3, 0.4, 0.41],
              [ 0.3, 0.4, 0.5, 0.39]]
>>> model = Builder().build(sample, dim=1)
>>> model.original_dim
4
>>> model.compressed_dim
1

Changed in version 6.14: added the comment, annotations, and x_meta parameters.

The comment and annotations parameters add optional notes to model. The comment string is stored to the model’s comment. The annotations dictionary can contain more notes or other supplementary information; all keys and values in annotations must be strings. Annotations are stored to the model’s annotations. After training a model, you can also edit its comment and annotations using modify().

The x_meta parameter adds names and descriptions of model inputs. It is a list of length equal to the number of inputs, or the number of columns in x. List element can be a string (Unicode) or a dictionary. A string specifies a name for the respective input. It must be a valid identifier according to the FMI standard, so there are certain restrictions for names (see below). A dictionary describes a single input and can have the following keys (all keys are optional, all values must be str or unicode):

"name": contains the name for this input. If this key is omitted, a default name will be saved to the model — "x[i]" where i is the index of the respective column in x.
"description": contains a brief description, any text.
"quantity": physical quantity, for example "Angle" or "Energy".
"unit": measurement units used for this input, for example "deg" or "J".

Names of inputs and outputs must satisfy the following rules:

Name must not be empty.
All names must be unique.
The only whitespace character allowed in names is the ASCII space, so \\t, \\n, \\r, and various Unicode whitespace characters are prohibited.
Name cannot contain leading or trailing spaces, and cannot contain two or more consecutive spaces.
Name cannot contain leading or trailing dots, and cannot contain two or more consecutive dots, since dots are commonly used as name separators.
Parts of the name separated by dots must not begin or end with a space, so the name cannot contain '. ' or ' .'.
Name cannot contain control characters and Unicode separators. Prohibited Unicode character categories are: Cc, Cf, Cn, Co, Cs, Zl, Zp, Zs.
Name cannot contain characters from this set: :"/\\|?*.

Input descriptions are stored to model details (the "Input Variables" key). If you do not specify a name or description for some input, its information in details contains only the default name ("x[i]"). When you export a model, input descriptions are found in the comments in the exported code.

license¶

Builder license.

Type:	`License`

General license information interface. See section License Usage for details.

options¶

Builder options.

Type:	`Options`

General options interface for the builder. See section Options Interface for usage and the GTDR Option Reference.

set_logger(logger)¶

Set logger.

Parameters:	logger – logger object
Returns:	`None`

Used to set up a logger for the build process. See section Loggers for details.

set_watcher(watcher)¶

Set watcher.

Parameters:	watcher – watcher object
Returns:	`None`

Used to set up a watcher for the build process. See section Watchers for details.

11.8.2. `ExportedFormat` — model export formats¶

class da.p7core.gtdr.ExportedFormat¶

Enumerates available export formats.

OCTAVE¶

Octave format.

Alias: "octave".

OCTAVE_MEX¶

C source for a MEX file.

Aliases: "octave_mex", "mex".

C99_PROGRAM¶

C source with the main() function for a complete command-line based C program.

Aliases: "c99_program", "c_program", "program".

C99_HEADER¶

C header of the target function.

Aliases: "c99_header", "c_header", "header".

C99_SOURCE¶

C header and implementation of the target function.

Aliases: "c99_source", "c_source", "c".

EXCEL_DLL¶

C implementation of the model intended for creating a DLL compatible with Microsoft Excel.

Aliases: "excel_dll", "excel".

11.8.3. `GradMatrixOrder` — model gradients order¶

class da.p7core.gtdr.GradMatrixOrder¶

Enumerates available gradient output modes.

F_MAJOR¶: Indexed in function-major order (\(grad_{ij} = \frac{df_i}{dx_j}\)).

X_MAJOR¶: Indexed in variable-major order (\(grad_{ij} = \frac{df_j}{dx_i}\)).

11.8.4. `Model` — dimension reduction model¶

class da.p7core.gtdr.Model(file=None, **kwargs)¶

Dimension reduction model.

Can be created by Builder or loaded from a file via the Model constructor.

Model objects are immutable. All methods which are meant to change the model return a new Model instance.

annotations¶

Extended comment or supplementary information.

Type:	`dict`

New in version 6.14.

The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using modify().

build_log¶

Model building log.

Type:	`str`

comment¶

Text comment to the model.

Type:	`str`

New in version 6.14.

Optional plain text comment to the model. You can add the comment when training a model and edit it using modify().

compress(vec, dim=None)¶

Compression method.

Parameters:	vec (array-like, 2D or 1D) – vector(s) to compress dim (`int`, `long`) – required dimension
Returns:	compressed vectors
Return type:	`pandas.DataFrame` or `pandas.Series` if vec is a pandas type; otherwise `ndarray`, 2D

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.

Compresses a vector (1D) or each of vectors in a batch (2D). Vector length must be equal to original_dim.

When using pandas, the return type is the same as the vec type. Also in this case the returned array keeps indexing of vec.

compress_export_to(format, name, description, file, dim=None, single_file=None)¶

Export the compression procedure to a source file in specified format.

Parameters:	format (`ExportedFormat` or `str`) – source code format name (`str`) – exported function name description (`str`) – additional comment file (file-like, `str`, `zipfile.ZipFile`, `tarfile.TarFile`) – export file or path dim (`int` or `long`) – required compressed dimension single_file (`bool`) – export sources as a single file (default) or multiple files (`False`)
Returns:	`None`
Raise:	`GTException` if name is empty and format is not `C99_PROGRAM`

New in version 6.10: added str aliases for export formats.

Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.

Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.

The source code format can be specified using an enumeration or a string alias — see ExportedFormat for details.

By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to False. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.

To pack source files into an archive, you can pass a zipfile.ZipFile or tarfile.TarFile object as file, or specify a path to the file wit an archive type extension. Recognized extensions are: .zip, .tar, .tgz, .tar.gz, .taz, .tbz, .tbz2, .tar.bz2.

The name argument is optional if format is C99_PROGRAM. For other source code formats, an empty name raises an exception.

The description provides an additional comment, which is added on top of the generated source file.

compressed_dim¶

Compressed vector dimension.

Type:	`long` or `None`

This attribute is None if the model supports variable-dimension compression.

decompress(vec)¶

Decompression method.

Parameters:	vec (array-like, 2D or 1D) – vector(s) to decompress
Returns:	decompressed vectors
Return type:	`pandas.DataFrame` or `pandas.Series` if vec is a pandas type; otherwise `ndarray`, 2D

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.

Decompresses a vector (1D) or each of the vectors in a batch (2D). The vector length should be equal to compressed_dim for a model with fixed-dimension compression, and no more than original_dim for a model with variable-dimension compression.

When using pandas, the return type is the same as the vec type. Also in this case the returned array keeps indexing of vec.

decompress_export_to(format, name, description, file, dim=None, single_file=None)¶

Export the decompression procedure to a source file in specified format.

Parameters:	format (`ExportedFormat` or `str`) – source code format name (`str`) – exported function name description (`str`) – additional comment file (file-like, `str`, `zipfile.ZipFile`, `tarfile.TarFile`) – export file or path dim (`int` or `long`) – required compressed dimension single_file (`bool`) – export sources as a single file (default) or multiple files (`False`)
Returns:	`None`
Raise:	`GTException` if name is empty and format is not `C99_PROGRAM`