11.8. da.p7core.gtdr
¶
Generic Tool for Dimension Reduction (GTDR) module.
>>> from da.p7core import gtdr
Classes
da.p7core.gtdr.Builder ([backend]) |
Dimension reduction model builder. |
da.p7core.gtdr.ExportedFormat |
Enumerates available export formats. |
da.p7core.gtdr.GradMatrixOrder |
Enumerates available gradient output modes. |
da.p7core.gtdr.Model ([file]) |
Dimension reduction model. |
11.8.1. Builder
— model builder¶
-
class
da.p7core.gtdr.
Builder
(backend=None)¶ Dimension reduction model builder.
-
build
(x=None, y=None, dim=None, error=None, blackbox=None, budget=None, options=None, comment=None, annotations=None, x_meta=None)¶ Train a dimension reduction model.
Parameters: - x (array-like, 2D) – training sample, input part (values of variables)
- y (array-like, 1D or 2D) – training sample, optional response part (function values)
- dim (
int
,long
) – output dimension, optional - error (
float
) – error threshold, optional - blackbox (
Blackbox
,gtapprox.Model
, orgtdf.Model
) – Feature Extraction blackbox, optional - budget (
int
,long
) – blackbox budget - options (
dict
) – option settings - comment (
str
) – text comment - annotations (
dict
) – extended comment and notes - x_meta (
list
) – descriptions of inputs
Returns: trained model
Return type: Changed in version 6.20: blackbox may be a
gtapprox.Model
or agtdf.Model
.Changed in version 6.25:
pandas.DataFrame
andpandas.Series
are supported as the x, y training samples.Trains a dimension reduction model (codec) which provides vector compression and decompression methods.
In the sample-based modes, x is always a 2D array because there is no meaningful interpretation of an 1D array. However, 1D y is supported as a simplified form for the case of 1D output when using sample-based Feature Extraction.
Valid argument combinations and mode selection:
Passed arguments Technique Optional arguments Ignored arguments x, dim dimension-based - y, error, blackbox, budget x, error error-based - y, dim, blackbox, budget x, y Feature Extraction dim error, blackbox, budget blackbox, budget Feature Extraction dim x, y, error All other combinations of arguments are invalid.
Example:
>>> from da.p7core.gtdr import Builder >>> sample = [[ 0.1, 0.2, 0.3, 0.4], [ 0.2, 0.3, 0.4, 0.41], [ 0.3, 0.4, 0.5, 0.39]] >>> model = Builder().build(sample, dim=1) >>> model.original_dim 4 >>> model.compressed_dim 1
Changed in version 6.14: added the comment, annotations, and x_meta parameters.
The comment and annotations parameters add optional notes to model. The comment string is stored to the model’s
comment
. The annotations dictionary can contain more notes or other supplementary information; all keys and values in annotations must be strings. Annotations are stored to the model’sannotations
. After training a model, you can also edit itscomment
andannotations
usingmodify()
.The x_meta parameter adds names and descriptions of model inputs. It is a list of length equal to the number of inputs, or the number of columns in x. List element can be a string (Unicode) or a dictionary. A string specifies a name for the respective input. It must be a valid identifier according to the FMI standard, so there are certain restrictions for names (see below). A dictionary describes a single input and can have the following keys (all keys are optional, all values must be
str
orunicode
):"name"
: contains the name for this input. If this key is omitted, a default name will be saved to the model —"x[i]"
wherei
is the index of the respective column in x."description"
: contains a brief description, any text."quantity"
: physical quantity, for example"Angle"
or"Energy"
."unit"
: measurement units used for this input, for example"deg"
or"J"
.
Names of inputs and outputs must satisfy the following rules:
- Name must not be empty.
- All names must be unique.
- The only whitespace character allowed in names is the ASCII space,
so
\\t
,\\n
,\\r
, and various Unicode whitespace characters are prohibited. - Name cannot contain leading or trailing spaces, and cannot contain two or more consecutive spaces.
- Name cannot contain leading or trailing dots, and cannot contain two or more consecutive dots, since dots are commonly used as name separators.
- Parts of the name separated by dots must not begin or end with a space,
so the name cannot contain
'. '
or' .'
. - Name cannot contain control characters and Unicode separators.
Prohibited Unicode character categories are:
Cc
,Cf
,Cn
,Co
,Cs
,Zl
,Zp
,Zs
. - Name cannot contain characters from this set:
:"/\\|?*
.
Input descriptions are stored to model
details
(the"Input Variables"
key). If you do not specify a name or description for some input, its information indetails
contains only the default name ("x[i]"
). When you export a model, input descriptions are found in the comments in the exported code.
-
license
¶ Builder license.
Type: License
General license information interface. See section License Usage for details.
-
options
¶ Builder options.
Type: Options
General options interface for the builder. See section Options Interface for usage and the GTDR Option Reference.
-
11.8.2. ExportedFormat
— model export formats¶
-
class
da.p7core.gtdr.
ExportedFormat
¶ Enumerates available export formats.
-
OCTAVE_MEX
¶ C source for a MEX file.
Aliases:
"octave_mex"
,"mex"
.
-
C99_PROGRAM
¶ C source with the
main()
function for a complete command-line based C program.Aliases:
"c99_program"
,"c_program"
,"program"
.
-
C99_HEADER
¶ C header of the target function.
Aliases:
"c99_header"
,"c_header"
,"header"
.
-
C99_SOURCE
¶ C header and implementation of the target function.
Aliases:
"c99_source"
,"c_source"
,"c"
.
-
EXCEL_DLL
¶ C implementation of the model intended for creating a DLL compatible with Microsoft Excel.
Aliases:
"excel_dll"
,"excel"
.
-
11.8.3. GradMatrixOrder
— model gradients order¶
11.8.4. Model
— dimension reduction model¶
-
class
da.p7core.gtdr.
Model
(file=None, **kwargs)¶ Dimension reduction model.
Can be created by
Builder
or loaded from a file via theModel
constructor.Model
objects are immutable. All methods which are meant to change the model return a newModel
instance.-
annotations
¶ Extended comment or supplementary information.
Type: dict
New in version 6.14.
The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using
modify()
.
-
build_log
¶ Model building log.
Type: str
-
comment
¶ Text comment to the model.
Type: str
New in version 6.14.
Optional plain text comment to the model. You can add the comment when training a model and edit it using
modify()
.
-
compress
(vec, dim=None)¶ Compression method.
Parameters: - vec (array-like, 2D or 1D) – vector(s) to compress
- dim (
int
,long
) – required dimension
Returns: compressed vectors
Return type: pandas.DataFrame
orpandas.Series
if vec is a pandas type; otherwisendarray
, 2DChanged in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.Compresses a vector (1D) or each of vectors in a batch (2D). Vector length must be equal to
original_dim
.When using pandas, the return type is the same as the vec type. Also in this case the returned array keeps indexing of vec.
-
compress_export_to
(format, name, description, file, dim=None, single_file=None)¶ Export the compression procedure to a source file in specified format.
Parameters: - format (
ExportedFormat
orstr
) – source code format - name (
str
) – exported function name - description (
str
) – additional comment - file (file-like,
str
,zipfile.ZipFile
,tarfile.TarFile
) – export file or path - dim (
int
orlong
) – required compressed dimension - single_file (
bool
) – export sources as a single file (default) or multiple files (False
)
Returns: None
Raise: GTException
if name is empty and format is notC99_PROGRAM
New in version 6.10: added
str
aliases for export formats.Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.
Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.
The source code format can be specified using an enumeration or a string alias — see
ExportedFormat
for details.By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to
False
. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.To pack source files into an archive, you can pass a
zipfile.ZipFile
ortarfile.TarFile
object as file, or specify a path to the file wit an archive type extension. Recognized extensions are:.zip
,.tar
,.tgz
,.tar.gz
,.taz
,.tbz
,.tbz2
,.tar.bz2
.The name argument is optional if format is
C99_PROGRAM
. For other source code formats, an empty name raises an exception.The description provides an additional comment, which is added on top of the generated source file.
- format (
-
compressed_dim
¶ Compressed vector dimension.
Type: long
orNone
This attribute is
None
if the model supports variable-dimension compression.
-
decompress
(vec)¶ Decompression method.
Parameters: vec (array-like, 2D or 1D) – vector(s) to decompress Returns: decompressed vectors Return type: pandas.DataFrame
orpandas.Series
if vec is a pandas type; otherwisendarray
, 2DChanged in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.Decompresses a vector (1D) or each of the vectors in a batch (2D). The vector length should be equal to
compressed_dim
for a model with fixed-dimension compression, and no more thanoriginal_dim
for a model with variable-dimension compression.When using pandas, the return type is the same as the vec type. Also in this case the returned array keeps indexing of vec.
-
decompress_export_to
(format, name, description, file, dim=None, single_file=None)¶ Export the decompression procedure to a source file in specified format.
Parameters: - format (
ExportedFormat
orstr
) – source code format - name (
str
) – exported function name - description (
str
) – additional comment - file (file-like,
str
,zipfile.ZipFile
,tarfile.TarFile
) – export file or path - dim (
int
orlong
) – required compressed dimension - single_file (
bool
) – export sources as a single file (default) or multiple files (False
)
Returns: None
Raise: GTException
if name is empty and format is notC99_PROGRAM
New in version 6.10: added
str
aliases for export formats.Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.
Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.
The source code format can be specified using an enumeration or a string alias — see
ExportedFormat
for details.By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to
False
. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.To pack source files into an archive, you can pass a
zipfile.ZipFile
ortarfile.TarFile
object as file, or specify a path to the file wit an archive type extension. Recognized extensions are:.zip
,.tar
,.tgz
,.tar.gz
,.taz
,.tbz
,.tbz2
,.tar.bz2
.The name argument is optional if format is
C99_PROGRAM
. For other source code formats, an empty name raises an exception.The description provides an additional comment, which is added on top of the generated source file.
- format (
-
details
¶ Detailed model information.
Type: dict
New in version 6.14.
Changed in version 6.14.3: added training time.
Changed in version 6.16: added training warnings.
Contains training information and descriptions of model inputs specified when training the model or added using
modify()
.The
details
dictionary has the following keys:"Input Variables"
— model input descriptions."Issues"
— training warnings extracted frombuild_log
."Training Time"
— time statistics.
The value under the
"Input Variables"
key is a list of descriptions for original model inputs, list length isoriginal_dim
. List order follows the order of columns in the training data sample. Each list element is a dictionary describing a single input. This dictionary has the following keys:"name"
(str
) — contains the name of respective input. This key always exists. If a name for this input was never specified, a default name (x[i]
) is stored here."description"
(str
) — contains a brief description for the input. This key exists only if the description was specified by user."quantity"
(str
) — physical quantity of this input. This key exists only if variable’s quantity was specified by user."unit"
(str
) — measurement units used for this input. This key exists only if measurement units were specified by user.
The value under the
"Issues"
key is a dictionary where a key is a string identifying the source of a warning, and value is a list containing all warnings (as strings) collected from this source.The value under the
"Training Time"
key is a dictionary with the following keys:"Start"
(str
) — training start time."Finish"
(str
) — finish time."Total"
(str
) — the difference between the start and finish times.
Note that the total is wall time, which may be different from the real time spent in training. For example, if you run training on a laptop and it enters the suspend mode (sleeps) during training, the suspend period is included in the total time, while training was actually paused during suspend.
-
fromstring
(modelString)¶ Deserialize a model from string.
Parameters: modelString ( str
) – serialized modelReturns: None
-
gradCompress
(vec, dim=None, order=0)¶ Evaluate compression transformation gradient.
Parameters: - vec (array-like, 2D or 1D) – vector(s) to evaluate
- dim (
int
,long
) – required dimension - order (
GradMatrixOrder
) – gradient matrix order
Returns: gradients
Return type: pandas.DataFrame
if vec is a pandas type; otherwisendarray
, 3D or 2DChanged in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.Evaluates compression transformation gradients for a data sample (if vec is a 2D array-like) or a single vector (if vec is 1D). The returned array is 3D if vec is a sample, and 2D if vec is a single vector.
When using pandas data samples (vec is a
pandas.DataFrame
), a 3D array in return value is represented by apandas.DataFrame
with multi-indexing (pandas.MultiIndex
). In this case, the first element of the multi-index is the index of a vector from the input sample. The second element of the multi-index is:- the index or name of a model’s output,
if order is
F_MAJOR
(default) - the index or name of a model’s input,
if order is
X_MAJOR
When vec is a
pandas.Series
, its index becomes the row index of the returnedpandas.DataFrame
.
-
gradDecompress
(vec, order=0)¶ Evaluate decompression transformation gradient.
Parameters: - vec (array-like, 2D or 1D) – vector(s) to evaluate
- order (
GradMatrixOrder
) – gradient matrix order
Returns: gradients
Return type: pandas.DataFrame
if vec is a pandas type; otherwisendarray
, 3D or 2DChanged in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if vec is a pandas data type.Evaluates decompression transformation gradients for a data sample (if vec is a 2D array-like) or a single vector (if vec is 1D). The returned array is 3D if vec is a sample, and 2D if vec is a single vector.
When using pandas data samples (vec is a
pandas.DataFrame
), a 3D array in return value is represented by apandas.DataFrame
with multi-indexing (pandas.MultiIndex
). In this case, the first element of the multi-index is the index of a vector from the input sample. The second element of the multi-index is:- the index or name of a model’s output,
if order is
F_MAJOR
(default) - the index or name of a model’s input,
if order is
X_MAJOR
When vec is a
pandas.Series
, its index becomes the row index of the returnedpandas.DataFrame
.
-
has_variable_compression
¶ Variable-dimension compression support.
Type: bool
If
True
, the model supports variable-dimension compression.
-
info
¶ Model description.
Type: dict
Contains all technical information which can be gathered from the model.
-
license
¶ Model license.
Type: License
General license information interface. See section License Usage for details.
-
load
(file)¶ Load a model from file.
Parameters: file ( file
orstr
) – file object or pathReturns: None
Deprecated since version 6.29: use
Model
constructor instead.
-
modify
(comment=None, annotations=None, x_meta=None)¶ Create a copy of the model with modified metainformation.
Parameters: - comment (
str
) – new comment - annotations (
dict
) – new annotations - x_meta (
list
) – descriptions of inputs
Returns: model copy with modified information
Return type: New in version 6.14.
This method is intended to edit model
annotations
,comment
, and input descriptions found indetails
. Parameters are similar tobuild()
— see the full description there. If a parameter isNone
, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.Note that
modify()
returns a new modified model, which is identical to the original except your edits to the model metainformation.- comment (
-
original_dim
¶ Original (uncompressed) vector dimension.
Type: long
-
save
(file)¶ Save the model to file.
Parameters: file ( file
orstr
) – file object or pathReturns: None
-
save_to_octave_compress
(function, file)¶ Deprecated since version 3.0 Release Candidate 1: use
compress_export_to()
instead.Since version 3.0 Release Candidate 1, this method no longer exists and is completely replaced by
compress_export_to()
.
-
save_to_octave_decompress
(function, file)¶ Deprecated since version 3.0 Release Candidate 1: use
decompress_export_to()
instead.Since version 3.0 Release Candidate 1, this method no longer exists and is completely replaced by
decompress_export_to()
.
-
tostring
()¶ Serialize the model.
Returns: serialized model Return type: str
-
modify
(comment=None, annotations=None, x_meta=None) Create a copy of the model with modified metainformation.
Parameters: - comment (
str
) – new comment - annotations (
dict
) – new annotations - x_meta (
list
) – descriptions of inputs
Returns: model copy with modified information
Return type: New in version 6.14.
This method is intended to edit model
annotations
,comment
, and input descriptions found indetails
. Parameters are similar tobuild()
— see the full description there. If a parameter isNone
, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.Note that
modify()
returns a new modified model, which is identical to the original except your edits to the model metainformation.- comment (
-