11.4. da.p7core.gtapprox

Generic Tool for Approximation (GTApprox) module.

>>> from da.p7core import gtapprox

Classes

da.p7core.gtapprox.Builder() Approximation model builder.
da.p7core.gtapprox.ExportedFormat Enumerates available export formats.
da.p7core.gtapprox.GradMatrixOrder Enumerates available gradient output modes.
da.p7core.gtapprox.Model([file]) Approximation model.
da.p7core.gtapprox.Utilities Utility functions.

Functions

da.p7core.gtapprox.export_fmi_20(model, file) Export the model to a Functional Mock-up Unit for Model Exchange and Co-Simulation 2.0.
da.p7core.gtapprox.export_fmi_cs(model, file) Export the model to a Functional Mock-up Unit for Co-Simulation 1.0.
da.p7core.gtapprox.export_fmi_me(model, file) Export the model to a Functional Mock-up Unit for Model Exchange 1.0.
da.p7core.gtapprox.set_remote_build(builder) Configure a GTApprox builder to run on a remote host over SSH or to use a HPC cluster.
da.p7core.gtapprox.disable_remote_build(builder) Reset builder configuration to run on the local host only.
da.p7core.gtapprox.train_test_split(x, y[, …]) Split a data sample into train and test subsets optimized for model training.

11.4.1. Builder — model builder

class da.p7core.gtapprox.Builder

Approximation model builder.

build(x, y, options=None, outputNoiseVariance=None, comment=None, weights=None, initial_model=None, annotations=None, x_meta=None, y_meta=None)

Train an approximation model.

Parameters:
  • x (array-like, 1D or 2D) – training sample, input part (values of variables)
  • y (array-like, 1D or 2D) – training sample, response part (function values)
  • options (dict) – option settings
  • outputNoiseVariance (array-like, 1D or 2D) – optional y noise variance, supported by the GP, SGP, HDA, and HDAGP techniques
  • comment (str) – optional comment added to model info
  • weights (array-like, 1D) – optional weights of the training sample points, supported by the RSM, HDA, GP, SGP, HDAGP, iTA, and MoA techniques
  • initial_model (Model) – optional initial model, supported by the GBRT, HDAGP, MoA, and TBL techniques only
  • annotations (dict) – optional extended comment and notes
  • x_meta (list) – optional input variables information
  • y_meta (list) – optional output variables information
Returns:

trained model

Return type:

Model

Train a model using x and y as the training sample. 1D samples are supported as a simplified form for the case of 1D input and/or response.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as the x, y training samples.

If information on the noise level in the response sample y is available, GTApprox accepts it as the outputNoiseVariance argument to build(). This array should specify a noise variance value for each element of the y array (that is, for each response component of every single point). Thus outputNoiseVariance has the same shape as y.

Changed in version v2024.04: added the output noise variance support for the HDA technique.

Output noise variance feature is supported by the following techniques:

  • Gaussian Processes (GP),
  • Sparse Gaussian Processes (SGP),
  • High Dimensional Approximation (HDA), and
  • High Dimensional Approximation combined with Gaussian Processes (HDAGP).

That is, to use output noise variance meaningfully, one of the techniques above has to be selected using GTApprox/Technique in addition to specifying outputNoiseVariance. If any other technique is selected, either manually or automatically, the outputNoiseVariance argument is ignored (but see the next note).

Note

Output noise variance is not compatible with point weighting. If both outputNoiseVariance and weights are specified, build() raises an InvalidProblemError exception. This holds even if you select a technique that does not support output noise variance or point weighting and would normally ignore these arguments.

Note

Output noise variance is not compatible with GTApprox/ExactFitRequired. If outputNoiseVariance is not None and GTApprox/ExactFitRequired is on, build() raises an InvalidOptionsError exception.

Changed in version 3.0 Release Candidate 1: elements in outputNoiseVariance can have NaN values in special cases.

Since 3.0 Release Candidate 1, NaN values can be used in outputNoiseVariance to specify that noise variance data is not available. Valid uses are:

  • If noise variance data is not available for some point (a row in y), all elements of the corresponding row in outputNoiseVariance should be NaN. Note that the row cannot contain any numeric elements in this case.
  • Likewise, if noise variance data is not available for some output component (a column in y), the corresponding column in outputNoiseVariance should be filled with NaN values and cannot contain any numeric elements.
  • If some element in y is NaN (this is valid when GTApprox/OutputNanMode is set to "ignore" or "predict"), the corresponding element in outputNoiseVariance should be NaN. A numeric noise value in this case is not an error, but it will be ignored by GTApprox.

Changed in version 1.9.5: added the weights parameter.

Changed in version 5.0: added weights support to the LR, RSM, HDA, GP, SGP, HDAGP, and MoA techniques (previously was available in the iTA technique only).

Changed in version 5.0: point weight is no longer limited to range \([0, 1]\) and can be an arbitrary non-negative floating point value or infinity.

Changed in version 5.2: infinite weights are no longer allowed for numerical stability.

A number of GTApprox techniques support sample point weighting. Roughly, point weight is a relative confidence characteristic for this point which affects the model fit to the training sample. The model will try to fit the points with greater weights better, possibly at the cost of decreasing accuracy for the points with lesser weights. The points with zero weight may be completely ignored when fitting the model.

Point weighting is supported in the following techniques:

  • Response Surface Model (RSM).
  • High Dimensional Approximation (HDA).
  • Gaussian Processes (GP).
  • Sparse Gaussian Processes (SGP).
  • High Dimensional Approximation + Gaussian Processes (HDAGP).
  • incomplete Tensor Approximation (iTA).
  • Mixture of Approximators (MoA).

That is, to use point weights meaningfully, one of the techniques above has to be selected using GTApprox/Technique in addition to specifying weights. If any other technique is selected, either manually or automatically, weights are ignored (but see the next note).

Note

Point weighting is not compatible with GTApprox/ExactFitRequired. If weights is not None and GTApprox/ExactFitRequired is on, build() raises an InvalidOptionsError exception.

Point weight is an arbitrary non-negative numeric float value. This value has no specific meaning, it simply notes the relative “importance” of a point compared to other points in the training sample.

The weights argument should be a 1D array of point weights, and its length has to be equal to the number of training sample points.

Note

At least one weight has to be non-zero. If weights contains only zero values, build() raises an InvalidProblemError exception.

Note

Point weighting is not compatible with output noise variance. If both outputNoiseVariance and weights are specified, build() raises an InvalidProblemError exception. This holds even if you select a technique that does not support output noise variance or point weighting and would normally ignore these arguments.

Changed in version 5.3: added the incremental training (model update) support for GBRT models.

Changed in version 6.14: added the initial HDA model support for the HDAGP technique.

Changed in version 6.15.1: added the initial model support for the MoA technique.

Changed in version 6.25: added the incremental training (model update) support for TBL models.

Changed in version 6.47: added the incremental training (model update) support for GP models.

Changed in version 6.47: if you specify initial_model and manually select a technique that does not support model update, build() raises an InapplicableTechniqueException; previous versions could ignore the initial model was ignored in such cases.

A GP model can be updated with new data by specifying the existing GP model as initial_model and either selecting the GP technique manually or enabling the automatic technique selection (GTApprox/Technique set to "Auto", default). In this case, resulting model is also a GP model.

A GBRT model, similarly, can be updated with new data by specifying it as initial_model and selecting the GBRT technique or enabling the automatic technique selection.

When you use the HDAGP technique, you can add an existing HDA model as initial_model to use it as a trend, which provides noticeable savings in training time. Note that the training sample must be the same which was used to train the HDA model, otherwise the new HDAGP model will be inaccurate and incorrect. The intent in this case is to speed up the HDAGP model training by skipping the initial step of training a trend model internally.

The MoA technique can use a model trained by any technique as the initial one. MoA can improve model accuracy, update the model with new data, or do both. See section Initial Model for more information.

The TBL technique can use an existing TBL model as the initial one. This technique simply updates the model’s internal table with new input-output pairs from the training sample.

Other techniques do not support initial models and raise an exception if explicitly selected — for example, if you set GTApprox/Technique to "RSM" and specify initial_model, build() raises an InapplicableTechniqueException.

The MoA technique does not impose any specific limitations on initial models. For GBRT, GP, HDAGP, and TBL, if the initial_model does not match the selected technique, build() raises an exception — for example, if you specify the HDAGP technique but initial_model is not a HDA model. Also note the following limitations:

  • If you have trained an GBRT or HDA model with output transformation enabled, and you are using that model as an initial one, you must set the GTApprox/OutputTransformation option when updating the model, as explained in that option description.
  • When updating a GP model, you must get the GTApprox/GPType and GTApprox/GPPower option values from from the initial model details and set those options to the same values in build(). Additionally, the GTApprox/GPInteractionCardinality must be set to [] or to the value from the initial model.
  • Model update is not supported for GP models with the following features:
    • Model trained with heteroscedastic noise processing (GTApprox/Heteroscedastic set to True).
    • Models with categorical inputs.
  • GP model update is not compatible with point weighting: if initial_model is a GP model, and you specify weights, build() raises an exception.

Changed in version 6.14: added the annotations, x_meta, and y_meta parameters.

Changed in version 6.16: the x_meta parameter can specify input constraints.

Changed in version 6.17: the y_meta parameter can specify output thresholds.

Changed in version 6.17: training reuses the metainformation from an initial model.

The annotations dictionary adds optional notes or extended comments to model. It can contain any number of notes, all keys and values must be strings. The x_meta and y_meta parameters provide additional details on model inputs and outputs (constraints, names, descriptions, and other) — see Model Metainformation for details. Note that if you use an initial model that already contains metainformation, this metainformation is copied to the trained model. In this case, x_meta and y_meta can be used to edit metainformation: information specified in x_meta, y_meta overwrites the initial metainformation, while information not specified in the arguments is copied from the initial metainformation.

build_smart(x, y, options=None, outputNoiseVariance=None, comment=None, weights=None, initial_model=None, hints=None, x_test=None, y_test=None, annotations=None, x_meta=None, y_meta=None)

Train an approximation model using smart training.

Parameters:
  • x (array-like, 1D or 2D) – training sample, input part (values of variables)
  • y (array-like, 1D or 2D) – training sample, response part (function values)
  • options (dict) – option settings which will be set fixed during parameter search
  • outputNoiseVariance (array-like, 1D or 2D) – optional y noise variance
  • comment (str) – text comment
  • weights (array-like, 1D) – training sample point weights
  • initial_model (Model) – initial model for incremental training
  • hints (dict) – user-provided hints on the data behaviour and desirable model properties
  • x_test (array-like, 1D or 2D) – testing sample, input part (values of variables)
  • y_test (array-like, 1D or 2D) – testing sample, response part (function values)
  • annotations (dict) – extended comment and notes
  • x_meta (list) – descriptions of inputs
  • y_meta (list) – descriptions of outputs
Returns:

trained model

Return type:

Model

New in version 6.6.

Changed in version 6.14: added the annotations, x_meta, and y_meta parameters.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as the x, y training samples.

Train a model with x and y as the training sample using the smart training procedure. Arguments are the same as build(), with 3 additional arguments: hints, x_test and y_test.

  • hints: additional information about the data set or requirements to the model, and optional smart training settings. See section Hint Reference for details.
  • x_test and y_test: test samples which can be used to control model quality during training.

See section Smart Training for details on smart training.

license

Builder license.

Type:License

General license information interface. See section License Usage for details.

options

Builder options.

Type:Options

General options interface for the builder. See section Options Interface for usage and the GTApprox option reference.

static postprocess(model, train_x, train_y, hints={}, test_x=None, test_y=None)

Deprecated since version 6.6: it is recommended to use smart model training instead, see build_smart().

This method is deprecated since version 6.6 which added smart model training functionality to GTApprox. Pre- and post-processing features in GTApprox were completely removed in version 6.8, since smart model training is more convenient and provides better results. See build_smart() and section Smart Training for details.

static preprocess(train_x, train_y, hints={})

Deprecated since version 6.6: it is recommended to use smart model training instead, see build_smart().

This method is deprecated since version 6.6 which added smart model training functionality to GTApprox. Pre- and post-processing features in GTApprox were completely removed in version 6.8, since smart model training is more convenient and provides better results. See build_smart() and section Smart Training for details.

set_logger(logger)

Set logger.

Parameters:logger – logger object
Returns:None

Used to set up a logger for the build process. See section Loggers for details.

set_watcher(watcher)

Set watcher.

Parameters:watcher – watcher object
Returns:None

Used to set up a watcher for the build process. See section Watchers for details.

11.4.2. ExportedFormat — model export formats

class da.p7core.gtapprox.ExportedFormat

Enumerates available export formats.

New in version 6.10: added str aliases for export formats.

Changed in version 6.16: added the C# source format, see CSHARP_SOURCE.

Changed in version 6.16.1: C# source export is supported for all GTApprox models but is not yet supported for GTDF models loaded to gtapprox.Model.

In export_to() you can specify format in two ways:

  1. Using enumeration, for example: my_model.export_to(gtapprox.ExportedFormat.C99_PROGRAM, "func_name", "comment", "my_model.c").
  2. Using str alias (added in 6.10), for example: my_model.export_to("c_program", "func_name", "comment", "my_model.c").
OCTAVE

Octave format.

Alias: "octave".

OCTAVE_MEX

C source for a MEX file.

Aliases: "octave_mex", "mex".

C99_PROGRAM

C source with the main() function for a complete command-line based C program.

Aliases: "c99_program", "c_program", "program".

C99_HEADER

C header of the model.

Aliases: "c99_header", "c_header", "header".

C99_SOURCE

C header and implementation of the model.

Aliases: "c99_source", "c_source", "c".

EXCEL_DLL

C implementation of the model intended for creating a DLL compatible with Microsoft Excel.

Aliases: "excel_dll", "excel".

CSHARP_SOURCE

New in version 6.16.

C# implementation of the model.

Alias: "c#".

Note

The C# source export is not yet supported for GTDF models loaded to gtapprox.Model.

Note

The C# source export requires an up to date license valid for pSeven Core 6.16 and above.

11.4.3. GradMatrixOrder — model gradients order

class da.p7core.gtapprox.GradMatrixOrder

Enumerates available gradient output modes.

F_MAJOR

Indexed in function-major order (\(grad_{ij} = \frac{df_i}{dx_j}\)).

X_MAJOR

Indexed in variable-major order (\(grad_{ij} = \frac{df_j}{dx_i}\)).

11.4.4. Model — approximation model

class da.p7core.gtapprox.Model(file=None, **kwargs)

Approximation model.

Can be created by Builder or loaded from a file via the Model constructor.

Changed in version 6.16: the file to load may also be a GTDF model saved with gtdf.Model.save(). Note that loading a GTDF model converts it into a GTApprox model, but the backward conversion is not supported.

Model objects are immutable. All methods which are meant to change the model return a new Model instance.

annotations

Extended comment or supplementary information.

Type:dict

New in version 6.6.

The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using modify().

See also Model Metainformation.

static available_sections(**kwargs)

Get a list of available model sections.

Parameters:
  • file (file or str) – file object or path to load model from
  • string (str) – serialized model
  • model (Model) – model object
Returns:

available model sections

Return type:

list

New in version 6.11.

Returns a list of strings specifying which sections can be loaded from the model:

  • "model": main model section, required for model evaluation and smoothing methods.
  • "info": model information, required for info.
  • "comment": comment section, required for comment.
  • "annotations": annotations section, required for annotations.
  • "training_sample": a copy of training sample data, required for training_sample.
  • "iv_info": internal validation data, required for iv_info.
  • "build_log": model training log, required for build_log.

See Approximation Model Structure for details.

build_log

Model building log.

Type:str
calc(point)

Evaluate the model.

Parameters:point (float or array-like, 2D or 1D) – the sample or point to evaluate
Returns:model values
Return type:pandas.DataFrame or pandas.Series if point is a pandas type; otherwise ndarray, 2D or 1D

Changed in version 1.9.0: smoothness parameter is no longer supported (see Version Compatibility Issues).

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.22: returns ndarray with dtype=object if the model has string categorical outputs.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Evaluates a data sample or a single point. In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported.

The returned array is 2D if point is a sample, and 1D if point is a single point. When point is a pandas.DataFrame or pandas.Series, the returned array keeps indexing of the point array.

  • In the case of 1D model input, a single float value is interpreted as a single point. A 1D array-like with a single element is also one point; other 1D array-likes are interpreted as a sample. A 2D array-like is always interpreted as a sample, even if it contains a single point actually. For example:

    model_1d.calc(0.0)             # a 1D point
    model_1d.calc([0.0])           # a 1D point
    model_1d.calc([[0.0]])         # a sample, one 1D point
    model_1d.calc([0.0, 1.0])      # a sample, two 1D points
    model_1d.calc([[0.0], [1.0]])  # a sample, two 1D points
    model_1d.calc([[0.0, 1.0]])    # incorrect: a sample with a single 2D point (model input is 1D)
    
  • If model input is multidimensional, a 1D array-like is interpreted as a single point, and 2D array-likes are interpreted as data samples. For example, if model input is 2D:

    model_2d.calc(0.0)                       # incorrect: point is 1D
    model_2d.calc([0.0])                     # incorrect: point is 1D
    model_2d.calc([[0.0]])                   # incorrect: sample contains one 1D point
    model_2d.calc([0.0, 0.0])                # a 2D point
    model_2d.calc([[0.0, 0.0]])              # a sample, one 2D point
    model_2d.calc([[0.0, 0.0], [1.0, 1.0]])  # a sample, two 2D points
    
calc_ae(point)

Calculate the accuracy evaluation estimate.

Parameters:point (float or array-like, 2D or 1D) – the sample or point to evaluate
Returns:estimates
Return type:pandas.DataFrame or pandas.Series if point is a pandas type; otherwise ndarray, 2D or 1D
Raise:FeatureNotAvailableError if the model does not provide accuracy evaluation

Changed in version 1.9.0: smoothness parameter is no longer supported (see Version Compatibility Issues).

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Check has_ae before using this method. It is available only if the model was trained with GTApprox/AccuracyEvaluation on.

Performs accuracy evaluation for a data sample or a single point. In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported, similar to calc().

The returned array is 2D if point is a sample, and 1D if point is a single point. When point is a pandas.DataFrame or pandas.Series, the returned array keeps indexing of the point array.

comment

Text comment to the model.

Type:str

New in version 6.6.

Optional plain text comment to the model. You can add the comment when training a model and edit it using modify().

See also Model Metainformation.

details

Detailed model information.

Type:dict

New in version 5.2.

A detailed description of the model. Includes model metainformation, accuracy data, training sample statistics, regression coefficients for RSM models, and other data.

See sections Model Details and Model Metainformation.

export_to(format, function, description, file, single_file=None)

Export the model to a source file in specified format.

Parameters:
  • format (ExportedFormat or str) – source code format
  • function (str) – exported function name
  • description (str) – additional comment
  • file (file-like, str, zipfile.ZipFile, tarfile.TarFile) – export file or path
  • single_file (bool) – export sources as a single file (default) or multiple files (False)
Returns:

None

Raise:

GTException if function is empty and format is not C99_PROGRAM

New in version 6.10: added str aliases for export formats.

Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.

Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.

The source code format can be specified using an enumeration or a string alias — see ExportedFormat for details.

By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to False. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.

To pack source files into an archive, you can pass a zipfile.ZipFile or tarfile.TarFile object as file, or specify a path to the file wit an archive type extension. Recognized extensions are: .zip, .tar, .tgz, .tar.gz, .taz, .tbz, .tbz2, .tar.bz2.

The function argument is optional if format is C99_PROGRAM For other source code formats, an empty function name raises an exception.

For the C# source format (CSHARP_SOURCE), the function argument sets the name of the model class and its namespace. There are two ways to use it:

  • If you specify a name without dots ., it becomes the namespace, and the class name remains default (Model). For example, if function is “myGTAmodel”:

    namespace myGTAmodel {
      public sealed class Model {
        // attributes and methods
      }
    }
    
  • If you specify a name with dots ., it is split by dots and the last part becomes the class name, while the remaining parts become a namespace hierarchy. For example, if function is “ns1.ns2.MyExportedModel”:

    namespace ns1 {
      namespace ns2 {
        public sealed class MyExportedModel {
          // attributes and methods
        }
      }
    }
    

The description provides an additional comment, which is added on top of the generated source file.

See also the Model Export example.

fromstring(modelString, sections='all')

Deserialize a model from string.

Parameters:
  • modelString (str) – serialized model
  • sections (list or str) – model sections to load
Returns:

None

Changed in version 6.6: added the sections argument.

A model can be loaded (deserialized) partially, omitting certain sections to reduce memory usage. Note that availability of Model methods and attributes depends on which sections are loaded. This dependency is described in more detail in section Approximation Model Structure.

The sections argument can be a string or a list of strings specifying which sections to load:

  • "all": all sections (default).
  • "none": minimum model information, does not load any other section (the minimum load).
  • "model": main model section, required for model evaluation and smoothing methods.
  • "info": model information, required for info.
  • "comment": comment section, required for comment.
  • "annotations": annotations section, required for annotations.
  • "training_sample": a copy of training sample data, required for training_sample.
  • "iv_info": internal validation data, required for iv_info.
  • "build_log": model training log, required for build_log.

To get a list of sections available for load, use available_sections().

grad(point, order=0)

Evaluate model gradient.

Parameters:
  • point (float or array-like, 2D or 1D) – the sample or point to evaluate
  • order (GradMatrixOrder) – gradient matrix order
Returns:

model gradients

Return type:

pandas.DataFrame if point is a pandas type; otherwise ndarray, 3D or 2D

Changed in version 1.9.0: smoothness parameter is no longer supported (see Version Compatibility Issues).

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Evaluates model gradients for a data sample or a single point. In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported, similar to calc().

The returned array is 3D if point is a sample, and 2D if point is a single point.

When using pandas data samples (point is a pandas.DataFrame), a 3D array in return value is represented by a pandas.DataFrame with multi-indexing (pandas.MultiIndex). In this case, the first element of the multi-index is the point index from the input sample. The second element of the multi-index is:

  • the index or name of a model’s output, if order is F_MAJOR (default)
  • the index or name of a model’s input, if order is X_MAJOR

When point is a pandas.Series, its index becomes the row index of the returned pandas.DataFrame.

grad_ae(point, order=0)

Calculate gradients of the accuracy evaluation function.

Parameters:
  • point (float or array-like, 2D or 1D) – the sample or point to evaluate
  • order (GradMatrixOrder) – gradient matrix order
Returns:

accuracy evaluation gradients

Return type:

pandas.DataFrame if point is a pandas type; otherwise ndarray, 3D or 2D

Raise:

FeatureNotAvailableError if the model does not provide accuracy evaluation

Changed in version 1.9.0: the smoothness argument is no longer supported (see Version Compatibility Issues).

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Check has_ae before using this method. It is available only if the model was trained with GTApprox/AccuracyEvaluation on.

Evaluates gradients of the accuracy evaluation function for a data sample or a single point. In general form, point is a 2D array (a data sample). Several simplified argument forms are also supported, similar to calc().

The returned array is 3D if point is a sample, and 2D if point is a single point.

When using pandas data samples (point is a pandas.DataFrame), a 3D array in return value is represented by a pandas.DataFrame with multi-indexing (pandas.MultiIndex). In this case, the first element of the multi-index is the point index from the input sample. The second element of the multi-index is:

  • the index or name of a model’s output, if order is F_MAJOR (default)
  • the index or name of a model’s input, if order is X_MAJOR

When point is a pandas.Series, its index becomes the row index of the returned pandas.DataFrame.

has_ae

Accuracy evaluation support.

Type:bool

Check this attribute before using calc_ae() or grad_ae(). If True, the model supports accuracy evaluation. If False, then accuracy evaluation is not available, and the methods above raise an exception.

has_ironing

Deprecated since version 1.9.0: in older versions this attribute was used to check if the model has already been smoothed using the ironing() method. It was replaced with is_smoothed following the replacement of ironing() with the advanced smoothing methods (see smooth(), smooth_anisotropic(), and smooth_errbased()).

has_smoothing

Smoothing support.

Type:bool

New in version 1.9.0.

Check this attribute before using smooth(), smooth_anisotropic(), or smooth_errbased(). If True, the model supports smoothing. If False, then smoothing is not available, and smoothing methods raise an exception.

has_smoothness

Deprecated since version 1.9.0: in older versions this attribute was used to check if the model supports dynamic smoothing (see section Version Compatibility Issues for details). It was replaced with has_smoothing following the replacement of the ironing() method with the advanced smoothing methods (see smooth(), smooth_anisotropic(), and smooth_errbased()).

info

Model description.

Type:dict

Contains all technical information which can be gathered from the model.

ironing(smoothness)

Deprecated since version 1.9.0: this method had been replaced by the advanced smoothing methods smooth(), smooth_anisotropic(), and smooth_errbased(). See section Version Compatibility Issues for details.

is_smoothed

Smoothed model.

Type:bool

New in version 1.9.0.

Check this attribute to see if the model is already smoothed. It is True for models returned by smooth(), smooth_errbased(), and smooth_anisotropic() methods, and False for other models.

iv_info

Internal validation results.

Type:dict

New in version 2.0 Release Candidate 1.

Changed in version 2.0 Release Candidate 2: also stores raw validation data.

A dictionary containing error values calculated during internal validation. Has the same structure as the details["Training Dataset"]["Accuracy"] dictionary in details — see section Accuracy in Model Details for a full description.

Additionally, if the model was trained with GTApprox/IVSavePredictions on, iv_info also contains raw validation data: model values calculated during internal validation, reference inputs, and reference outputs. This data is stored under the "Dataset" key.

If internal validation was not required when training the model (see GTApprox/InternalValidation), iv_info is an empty dictionary.

license

Model license.

Type:License

General license information interface. See section License Usage for details.

load(file, sections='all')

Load a model from file.

Parameters:
  • file (file or str) – file object or path
  • sections (list or str) – model sections to load
Returns:

None

Changed in version 6.6: added the sections argument.

Deprecated since version 6.29: use Model constructor instead.

A model can be loaded partially, omitting certain sections to reduce memory usage and load time. Note that availability of Model methods and attributes depends on which sections are loaded. This dependency is described in more detail in section Approximation Model Structure.

The sections argument can be a string or a list of strings specifying which sections to load:

  • "all": all sections (default).
  • "none": minimum model information, does not load any other section (the minimum load).
  • "model": main model section, required for model evaluation and smoothing methods.
  • "info": model information, required for info.
  • "comment": comment section, required for comment.
  • "annotations": annotations section, required for annotations.
  • "training_sample": a copy of training sample data, required for training_sample.
  • "iv_info": internal validation data, required for iv_info.
  • "build_log": model training log, required for build_log.

To get a list of sections available for load, use available_sections().

modify()

Create a copy of the model with modified features or metainformation.

Parameters:
  • comment (str) – new comment
  • annotations (dict) – new annotations
  • x_meta (list) – descriptions of inputs
  • y_meta (list) – descriptions of outputs
  • strip (list or str) – optional list of features to strip from the model
Returns:

copy of this model with modifications

Return type:

Model

New in version 6.6.

Changed in version 6.14: can edit descriptions of inputs and outputs.

Changed in version 6.14.3: can remove the accuracy evaluation and smoothing features.

Changed in version 6.17: can disable the model output thresholds.

This method is intended to edit model annotations, comment, metainformation, and can be used to reduce model size by removing certain features. If a parameter is None, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.

The x_meta and y_meta parameters that edit metainformation are similar to build() and are described in section Model Metainformation — however note that specifying any new input constraints or output thresholds in x_meta or y_meta does not change the effective (current) model constraints: changes in x_meta and y_meta only apply to model information stored in details. For example, if you set a new, more restrictive input constraint in x_meta in modify(), the model will still evaluate outputs for any input that is within the range previously set by x_meta in build(). Generally, it is not recommended to edit the model constraints information with modify() to avoid confusion.

The strip argument can be used to remove accuracy evaluation (AE) and smoothing features from the model. It can be a string or a list of strings specifying which features to remove:

  • "ae" — remove accuracy evaluation.
  • "smoothing" — remove smoothing.
  • "output_bounds" — disable the output bounds (thresholds), which were previously set with the y_meta parameter when training the model or using modify().

Removing AE may be useful for models trained with the GP, HDAGP, SGP, or TGP techniques (other techniques do not support AE). It reduces the size of the of the main model section (see Approximation Model Structure), thus decreasing the model size in memory. Also it significantly reduces volume of the C code generated by export_to(). The has_ae property of the modified model will be False.

Removing the smoothing feature reduces the size of the of the main model section only. It decreases the model size, but not the volume of exported code. The size reduction is most noticeable for models trained with the RSM and HDA techniques (up to 10 times for HDA). If the model was smoothed before modify(), the modified model remains smoothed. However, smoothing methods will no longer be available from the modified model (has_smoothing will be False).

Note that modify() returns a new modified model, which is identical to the original except your modifications.

See also Model Metainformation.

save(file, sections='all')

Save the model to file.

Parameters:
  • file (file or str) – file object or path
  • sections (list or str) – model sections to save
Returns:

None

Changed in version 6.6: sections argument added.

When saving, certain sections of the model can be skipped to reduce the model file size (see Approximation Model Structure for details). The sections argument can be a string or a list of strings specifying which sections to save:

  • "all": all sections (default).
  • "model": main model section, required for model evaluation. This section is always saved even if not specified. For some models, the size of this section can be additionally reduced by removing the accuracy evaluation or smoothing information with modify().
  • "info": model information, info.
  • "comment": comment section, comment.
  • "annotations": annotations section, annotations.
  • "training_sample": a copy of training sample data, training_sample.
  • "iv_info": internal validation data, iv_info.
  • "build_log": model training log, build_log.

Note that the main model section is always saved, so sections="model" and sections=[] are equivalent.

save_to_octave(function, file)

Deprecated since version 1.8.0: use export_to() instead.

Since version 3.0 Release Candidate 1, this method no longer exists and is completely replaced by export_to().

shap_value(point, data=None, interactions=False, approximate=False, shap_compatible=True)

Compute SHAP (SHapley Additive exPlanations) values.

Parameters:
  • point (float or array-like, 1D or 2D) – a point or sample to evaluate
  • data (float or array-like, 1D or 2D) – optional background data sample
  • interactions (bool) – if True, evaluate pairwise interactions (supported by GBRT models only)
  • approximate (bool) – if True, compute approximate SHAP values (fast but less accurate)
  • shap_compatible (bool) – if True, return shap.Explanation (requires shap)
Returns:

explanations

Return type:

shap.Explanation or tuple (elements depend on the point type)

New in version 6.20.

Evaluates SHAP, using an optimized internal implementation when possible. The following models support the internal method and do not require the shap module, if you set shap_compatible to False:

  • All models trained with the GBRT technique.
  • All differentiable models — that is, all models without categorical variables.

Other models use shap.PermutationExplainer and require shap.

The point syntax is the same as in calc(): general form is a 2D array, and several simplified forms are supported. When shap_compatible is False, the return value is a pair (tuple) where elements depend on the point type:

  • If point is a single point, the return pair is a scalar base value and an ndarray — 1D or 2D, depending on interactions.
  • If point is a sample, the return pair is a list of base values for each output and an ndarray — 2D or 3D, also depending on interactions. In this case, a base value for an output is the average of this output over the training dataset.

Array structure in results is:

  • If interactions is False (default), resulting SHAP values form an \(n \times m\) matrix, where \(n\) in the number of points in point, and \(m\) is the model’s input dimension. Each matrix row contains contributions of model inputs to push the model output from the base value.
  • If interactions is True, contributions for each input point form an \(m \times m\) matrix, where main effects are on the diagonal and interaction effects are off-diagonal. Resulting SHAP values form an \(n \times m \times m\) array. Note that only GBRT models support pairwise interactions.

For more convenience, if you have shap installed, set shap_compatible to True to return a shap.Explanation object.

GBRT models estimate SHAP values by a fast and exact method for tree models and ensembles of trees. Differentiable models (without categorical variables) approximate SHAP values using expected gradients (Sundararajan et al. 2017) — an extension of integrated gradients, a feature attribution method designed for differentiable models based on an extension of Shapley values to infinite player games (Aumann-Shapley values).

size_f

Model output dimension.

Type:long
size_x

Model input dimension.

Type:long
smooth(f_smoothness)

Apply smoothing to model.

Parameters:f_smoothness (float or array-like, 1D) – output smoothing factors
Returns:smoothed model
Return type:Model
Raise:GTException if the model does not support smoothing

New in version 1.9.0.

Check has_smoothing before using this method.

This method creates and returns a new smoothed model. The amount of smoothing is specified by the f_smoothness argument. Details on model smoothing can be found in section Model Smoothing.

smooth_anisotropic(f_smoothness, x_weights)

Apply anisotropic smoothing to model.

Parameters:
  • f_smoothness (float or array-like, 1D) – output smoothing factors
  • x_weights (array-like, 1D or 2D) – the amount of smoothing by different input components
Returns:

smoothed model

Return type:

Model

Raise:

GTException if the model does not support smoothing

New in version 1.9.0.

Check has_smoothing before using this method.

This method extends the simple smoothing functionality (see smooth()) by allowing anisotropic smoothing: x_weights specify relative smoothing by different components of the input.

Details on anisotropic smoothing can be found in section Anisotropic Smoothing.

smooth_errbased(x_sample, f_sample, error_type, error_thresholds, x_weights=None)

Apply error based smoothing to model, controlling model errors over a reference inputs-responses array.

Parameters:
  • x_sample (float or array-like, 1D or 2D) – reference inputs
  • f_sample (float or array-like, 1D or 2D) – reference responses
  • error_type (str or list[str]) – error types to calculate
  • error_thresholds (float or array-like, 1D) – error thresholds
  • x_weights (array-like, 1D or 2D) – the amount of smoothing for different input components
Returns:

smoothed model

Return type:

Model

Raise:

GTException if the model does not support smoothing

New in version 1.9.0.

Check has_smoothing before using this method.

This method creates and returns a model which has maximum smoothness while preserving approximation errors of the model below specified threshold.

Details on error-based smoothing can be found in section Error-Based Smoothing.

tostring(sections='all')

Serialize the model.

Parameters:sections (list or str) – model sections to save
Returns:serialized model
Return type:str

Changed in version 6.6: sections argument added.

When serializing, certain sections of the model can be skipped to reduce the model size (see Approximation Model Structure for details). The sections argument can be a string or a list of strings specifying which sections to include:

  • "all": all sections (default).
  • "model": main model section, required for model evaluation. This section is always saved even if not specified. For some models, the size of this section can be additionally reduced by removing the accuracy evaluation or smoothing information with modify().
  • "info": model information, info.
  • "comment": comment section, comment.
  • "annotations": annotations section, annotations.
  • "training_sample": a copy of training sample data, training_sample.
  • "iv_info": internal validation data, iv_info.
  • "build_log": model training log, build_log.

Note that the main model section is always included, so sections="model" and sections=[] are equivalent.

training_sample

Model training sample optionally stored with the model.

Type:list

New in version 6.6.

If GTApprox/StoreTrainingSample was enabled when training the model, this attribute contains a copy of training data. Otherwise it will be an empty list.

Training data is a single dict element contained in the list. This dictionary has the following keys:

  • "x" — the input part of the training sample (values of variables).
  • "f" — the response part of the training sample (function values).
  • "tol" — response noise variance. This key is present only if output noise variance was specified when training.
  • "weights" — sample point weights. This key is present only if point weights were specified when training.
  • "x_test" — the input part of the test sample (added in 6.8). This key is present only if a test sample was used when training.
  • "f_test" — the response part of the test sample (added in 6.8). This key is present only if a test sample was used when training.

Note that in case of GBRT incremental training (see Incremental Training) only the last (most recent) training sample can be saved.

Note

Training sample data is stored in lightweight NumPy arrays that have limited lifetime, which cannot exceed the lifetime of the model object. It means that you should avoid assigning these arrays to new variables. Either use them directly, or if you want to read this data without keeping the model object, create copies of arrays: train_x = my_model.training_sample["x"].copy.

validate(pointsX, pointsY, weights=None)

Validate the model using a reference inputs-responses array.

Parameters:
  • pointsX (float or array-like, 1D or 2D) – reference inputs
  • pointsY (float or array-like, 1D or 2D) – reference responses
  • weights (array-like, 1D) – optional weights of the reference points
Returns:

accuracy data

Return type:

dict

Validates the model against the reference array, evaluating model responses to pointsX and comparing them to pointsY.

Generally, pointsX and pointsY should be 2D arrays. Several simplified argument forms are also supported, similar to calc().

Returns a dictionary containing lists of error values calculated componentwise, with names of errors as keys. The returned dictionary has the same structure as the details["Training Dataset"]["Accuracy"]["Componentwise"] dictionary in details — see section Accuracy in Model Details for a full description.

shap_value(point, data=None, interactions=False, approximate=False, shap_compatible=True)

Compute SHAP (SHapley Additive exPlanations) values.

Parameters:
  • point (float or array-like, 1D or 2D) – a point or sample to evaluate
  • data (float or array-like, 1D or 2D) – optional background data sample
  • interactions (bool) – if True, evaluate pairwise interactions (supported by GBRT models only)
  • approximate (bool) – if True, compute approximate SHAP values (fast but less accurate)
  • shap_compatible (bool) – if True, return shap.Explanation (requires shap)
Returns:

explanations

Return type:

shap.Explanation or tuple (elements depend on the point type)

New in version 6.20.

Evaluates SHAP, using an optimized internal implementation when possible. The following models support the internal method and do not require the shap module, if you set shap_compatible to False:

  • All models trained with the GBRT technique.
  • All differentiable models — that is, all models without categorical variables.

Other models use shap.PermutationExplainer and require shap.

The point syntax is the same as in calc(): general form is a 2D array, and several simplified forms are supported. When shap_compatible is False, the return value is a pair (tuple) where elements depend on the point type:

  • If point is a single point, the return pair is a scalar base value and an ndarray — 1D or 2D, depending on interactions.
  • If point is a sample, the return pair is a list of base values for each output and an ndarray — 2D or 3D, also depending on interactions. In this case, a base value for an output is the average of this output over the training dataset.

Array structure in results is:

  • If interactions is False (default), resulting SHAP values form an \(n \times m\) matrix, where \(n\) in the number of points in point, and \(m\) is the model’s input dimension. Each matrix row contains contributions of model inputs to push the model output from the base value.
  • If interactions is True, contributions for each input point form an \(m \times m\) matrix, where main effects are on the diagonal and interaction effects are off-diagonal. Resulting SHAP values form an \(n \times m \times m\) array. Note that only GBRT models support pairwise interactions.

For more convenience, if you have shap installed, set shap_compatible to True to return a shap.Explanation object.

GBRT models estimate SHAP values by a fast and exact method for tree models and ensembles of trees. Differentiable models (without categorical variables) approximate SHAP values using expected gradients (Sundararajan et al. 2017) — an extension of integrated gradients, a feature attribution method designed for differentiable models based on an extension of Shapley values to infinite player games (Aumann-Shapley values).

11.4.5. Utilities — auxiliary functions

class da.p7core.gtapprox.Utilities

Utility functions.

static checkTensorStructure(trainPoints, userDefinedFactors=())

Check if the source data has proper structure so the Tensor Approximation technique may be used.

Parameters:
Returns:

check result and (if no user-defined factors are given) calculated tensor factors, as a tuple

Return type:

tuple(bool, list[list])

The Tensor Approximation technique requires specific design of an experiment type to be used (the so-called gridded data). This function may be used to check if sample data structure allows TA usage. User shall supply the training sample and, optionally, a list of proposed tensor factors. Return value is a tuple of Boolean check result (True means sample is TA-compatible) and a list of tensor factors which are either user-defined or calculated automatically if userDefinedFactors is an empty list.

11.4.6. Functions

da.p7core.gtapprox.export_fmi_20(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)

Export the model to a Functional Mock-up Unit for Model Exchange and Co-Simulation 2.0.

Parameters:
  • model (Model) – exported model
  • file (file or str) – file object or path where to export
  • id (str) – a string used in model and function names
  • der_outputs (bool) – if True, include partial derivatives of model outputs in the list of FMI model outputs
  • meta (dict) – model information
  • inputs_meta (list) – input variable information
  • outputs_meta (list) – output variable information
  • compilers (dict) – compiler settings to export an FMU with binary
  • single_file (bool) – pass sources to compilers as a single file (default) or multiple files (False)
Returns:

description of model variables

Return type:

list

New in version 6.31.

According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be .fmu by standard.

For the general model description, use meta. This argument is a dictionary that may contain the following keys:

  • "name": a string with the name of the model that will be shown in the modeling environment.
  • "description": a string with a brief model description; if omitted, the model’s comment is used.
  • "naming_convention": name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:
    • "flat": a list of strings (default).
    • "structured": hierarchical names using dot separator, with array elements and derivative characterization.
  • "author": an string containing author’s name and organization.
  • "version": model version string.
  • "copyright": optional information on the intellectual property copyright for this FMU.
  • "license": optional information on the intellectual property licensing for this FMU.

For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length size_x (or size_f respectively). List element is a dictionary with the following keys (all keys are optional):

  • "name": name of the variable (string), optional. Default is "x[i]" for inputs, "f[i]" for outputs, where i is the index of this input or output in the training sample.
  • "description": a string containing brief variable description.
  • "quantity": physical quantity of the variable, for example "Angle" or "Energy".
  • "unit": measurement units used for this variable in model equations, for example "deg" or "J".
  • "min": the minimum value of the variable (float) or "training" to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine.
  • "max": the maximum value of the variable (float) or "training" to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.

If some or all details for a variable are not specified, GTApprox also tries to get them from model’s details. If some details are specified both in details and as parameters to export_fmi_20(), information from parameters takes priority.

By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.

  • A key in compilers is a string identifying the target platform. Recognized platform names are: "win32", "win64", "linux32", "linux64". You can add compilers for different platforms to export an FMU with cross-platform support.
  • A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.

Each callable in compilers should support three input parameters:

  • source_code - the source code to compile.
    • If single_file is True or not specified, source_code is a string.
    • If single_file is False, source_code is a list of string pairs (file_name, source_code). The file_name and source_code strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
  • model_id is the model identifier and the name of the shared library (a .dll or .so file).
  • platform is the platform identifier, one of the following strings: "win32", "win64", "linux32", "linux64".

Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by export_fmi_20().

On successful export, export_fmi_20() returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:

  • "name": the name of the variable.
  • "causality": "input" or "output"; indicates how the variable is visible from the outside of the model.
  • "variability": "constant" or "parameter"; indicates when the value of the variable changes.
  • "type": "real" or "enum"; indicates type of the variable.
  • "value": "real" or "constant"; omitted for other types of variables.
  • "enumerators": list of enumerators if variable type is "enum"; omitted for other types of variables.
  • "origin": a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:
    • (j, -1) is the j-th component of the original model input.
    • (-1, i) is the i-th component of the original model output.
    • (j, i) is the partial derivative of the i-th model output with respect to j-th input.
da.p7core.gtapprox.export_fmi_cs(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)

Export the model to a Functional Mock-up Unit for Co-Simulation 1.0.

Parameters:
  • model (Model) – exported model
  • file (file or str) – file object or path where to export
  • id (str) – a string used in model and function names
  • der_outputs (bool) – if True, include partial derivatives of model outputs in the list of FMI model outputs
  • meta (dict) – model information
  • inputs_meta (list) – input variable information
  • outputs_meta (list) – output variable information
  • compilers (dict) – compiler settings to export an FMU with binary
  • single_file (bool) – pass sources to compilers as a single file (default) or multiple files (False)
Returns:

description of model variables

Return type:

list

New in version 6.9.

Changed in version 6.24: added the single_file parameter.

According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be .fmu by standard.

For the general model description, use meta. This argument is a dictionary that may contain the following keys:

  • "name": a string with the name of the model that will be shown in the modeling environment.
  • "description": a string with a brief model description; if omitted, the model’s comment is used.
  • "naming_convention": name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:
    • "flat": a list of strings (default).
    • "structured": hierarchical names using dot separator, with array elements and derivative characterization.
  • "author": an string containing author’s name and organization.
  • "version": model version string.

For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length size_x (or size_f respectively). List element is a dictionary with the following keys (all keys are optional):

  • "name": name of the variable (string), optional. Default is "x[i]" for inputs, "f[i]" for outputs, where i is the index of this input or output in the training sample.
  • "description": a string containing brief variable description.
  • "quantity": physical quantity of the variable, for example "Angle" or "Energy".
  • "unit": measurement units used for this variable in model equations, for example "deg" or "J".
  • "min": the minimum value of the variable (float) or "training" to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine.
  • "max": the maximum value of the variable (float) or "training" to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.

If some or all details for a variable are not specified, GTApprox also tries to get them from model’s details. If some details are specified both in details and as parameters to export_fmi_cs(), information from parameters takes priority.

By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.

  • A key in compilers is a string identifying the target platform. Recognized platform names are: "win32", "win64", "linux32", "linux64". You can add compilers for different platforms to export an FMU with cross-platform support.
  • A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.

Each callable in compilers should support three input parameters:

  • source_code - the source code to compile.
    • If single_file is True or not specified, source_code is a string.
    • If single_file is False, source_code is a list of string pairs (file_name, source_code). The file_name and source_code strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
  • model_id is the model identifier and the name of the shared library (a .dll or .so file).
  • platform is the platform identifier, one of the following strings: "win32", "win64", "linux32", "linux64".

Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by export_fmi_cs().

On successful export, export_fmi_cs() returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:

  • "name": the name of the variable.
  • "causality": "input" or "output"; indicates how the variable is visible from the outside of the model.
  • "variability": "constant" or "parameter"; indicates when the value of the variable changes.
  • "type": "real" or "enum"; indicates type of the variable.
  • "value": "real" or "constant"; omitted for other types of variables.
  • "enumerators": list of enumerators if variable type is "enum"; omitted for other types of variables.
  • "origin": a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:
    • (j, -1) is the j-th component of the original model input.
    • (-1, i) is the i-th component of the original model output.
    • (j, i) is the partial derivative of the i-th model output with respect to j-th input.
da.p7core.gtapprox.export_fmi_me(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)

Export the model to a Functional Mock-up Unit for Model Exchange 1.0.

Parameters:
  • model (Model) – exported model
  • file (file or str) – file object or path where to export
  • id (str) – a string used in model and function names
  • der_outputs (bool) – if True, include partial derivatives of model outputs in the list of FMI model outputs
  • meta (dict) – model information
  • inputs_meta (list) – input variable information
  • outputs_meta (list) – output variable information
  • compilers (dict) – compiler settings to export an FMU with binary
  • single_file (bool) – pass sources to compilers as a single file (default) or multiple files (False)
Returns:

description of model variables

Return type:

list

New in version 6.14.3.

Changed in version 6.24: added the single_file parameter.

According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be .fmu by standard.

For the general model description, use meta. This argument is a dictionary that may contain the following keys:

  • "name": a string with the name of the model that will be shown in the modeling environment.
  • "description": a string with a brief model description; if omitted, the model’s comment is used.
  • "naming_convention": name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:
    • "flat": a list of strings (default).
    • "structured": hierarchical names using dot separator, with array elements and derivative characterization.
  • "author": an string containing author’s name and organization.
  • "version": model version string.

For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length size_x (or size_f respectively). List element is a dictionary with the following keys (all keys are optional):

  • "name": name of the variable (string), optional. Default is "x[i]" for inputs, "f[i]" for outputs, where i is the index of this input or output in the training sample.
  • "description": a string containing brief variable description.
  • "quantity": physical quantity of the variable, for example "Angle" or "Energy".
  • "unit": measurement units used for this variable in model equations, for example "deg" or "J".
  • "min": the minimum value of the variable (float) or "training" to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine.
  • "max": the maximum value of the variable (float) or "training" to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.

If some or all details for a variable are not specified, GTApprox also tries to get them from model’s details. If some details are specified both in details and as parameters to export_fmi_me(), information from parameters takes priority.

By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.

  • A key in compilers is a string identifying the target platform. Recognized platform names are: "win32", "win64", "linux32", "linux64". You can add compilers for different platforms to export an FMU with cross-platform support.
  • A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.

Each callable in compilers should support three input parameters:

  • source_code - the source code to compile.
    • If single_file is True or not specified, source_code is a string.
    • If single_file is False, source_code is a list of string pairs (file_name, source_code). The file_name and source_code strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
  • model_id is the model identifier and the name of the shared library (a .dll or .so file).
  • platform is the platform identifier, one of the following strings: "win32", "win64", "linux32", "linux64".

Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by export_fmi_me().

On successful export, export_fmi_me() returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:

  • "name": the name of the variable.
  • "causality": "input" or "output"; indicates how the variable is visible from the outside of the model.
  • "variability": "constant" or "parameter"; indicates when the value of the variable changes.
  • "type": "real" or "enum"; indicates type of the variable.
  • "value": "real" or "constant"; omitted for other types of variables.
  • "enumerators": list of enumerators if variable type is "enum"; omitted for other types of variables.
  • "origin": a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:
    • (j, -1) is the j-th component of the original model input.
    • (-1, i) is the i-th component of the original model output.
    • (j, i) is the partial derivative of the i-th model output with respect to j-th input.
da.p7core.gtapprox.set_remote_build(builder, options={}, config_file=None)

Configure a GTApprox builder to run on a remote host over SSH or to use a HPC cluster.

Parameters:
  • builder (Builder) – model builder
  • options (dict) – configuration options
  • config_file (str) – optional path to a configuration file

New in version 4.3: initial support for remote model training and distributed training of MoA models on a cluster.

New in version 5.3: distributed training now supported for all componentwise models.

Changed in version 6.3: GTApprox now enables componentwise training by default, hence distributed training also becomes default when using a cluster.

New in version 6.6: for models with categorical variables, distributed training now supports parallelization over all unique combinations of their values found in the training sample.

Deprecated since version 6.35: this method is no longer updated and may be behind build() and build_smart() with regard to certain features or training techniques; using it is not recommended as it may get removed in future versions.

Allows to configure a model builder to run remotely or to perform distributed model training on a cluster. Distributed training on a cluster means that a model is divided into several sub-models which become separate cluster jobs, allowing high degree of parallelization.

Note

The same version of pSeven Core has to be installed on the local and remote hosts or, in case of distributed training, on the local host and all cluster nodes.

Note

Remote training requires the paramiko module and its dependencies (pycrypto and ecdsa). These modules are not required for pSeven Core in general and hence are not listed in section System Requirements.

Distributed training is effective in the following cases:

  1. When using the Mixture of Approximators (MoA) technique (set GTApprox/Technique to "MoA"). This technique automatically partitions the training sample and trains several sub-models which are then combined in the final model. Naturally it can support distributed training for its sub-models.
  2. When a model has multidimensional output and componentwise training is enabled. The componentwise mode is default since 6.3 (see GTApprox/DependentOutputs). Componentwise models can be trained in parallel since each model component is trained independently.
  3. When you define one or more categorical variables (see GTApprox/CategoricalVariables) and the training sample contains two or more unique combinations of their values. In this case, an independent model can be trained for each of such combinations.

Note that a combination of the above cases is also supported — that is, GTApprox tries to achieve as high parallelization ratio as possible. For example, if you train a componentwise model with categorical variables, the ratio can be higher than the number of model outputs.

If none of the above cases apply, cluster training is still available but will simply submit a single job to the cluster.

The options argument is a dictionary with the following recognized keys (all keys are str, value types are noted below):

  • "ssh-hostname" (str) — remote SSH host name.
  • "ssh-username" (str) — SSH username.
  • "ssh-password" (str) — SSH password (warning: unsafe).
  • "ssh-keyfile" (str) — path to an SSH private key file.
  • "environment" (dict) — dictionary of environment variables.
  • "workdir" (str) — path to the working directory (local or remote, depending on SSH configuration).
  • "cluster" (str) — cluster type. Currently the only supported type is LSF ("lsf"). If cluster type is None the model is trained on a remote host without using a HPC cluster.
  • "cluster-queue" (str) — name of the destination cluster queue.
  • "cluster-job-name" (str) — cluster job name.
  • "cluster-exclusive" (bool) — if True, cluster nodes are used exclusively by jobs (the destination queue must support exclusive jobs). Note that if exclusive jobs are disabled (False), it is recommended to set GTApprox/MaxParallel to 1 or 2 (in builder options) to avoid performance degradation in case of two or more jobs being allocated to the same node by a cluster manager. See section Multi-core Scalability for details.
  • "cluster-slot-limit" (int) — maximum number of jobs that can run simultaneously.

To train a model remotely over SSH, you have to specify "ssh-hostname" and either:

  • "ssh-username" and "ssh-password", or
  • "ssh-keyfile" ("ssh-username" may also be required when using a key file).

Using a key file is recommended since storing SSH password in your script is unsafe. If you have no key file, you can use the standard getpass module as a workaround. For example:

builder = gtapprox.Builder()
# will prompt for password, getpass() requires interactive input
gtapprox.set_remote_build(builder, {"ssh-hostname": "theserver", "ssh-username": "user", "ssh-password": getpass.getpass()})

To use a cluster, you have to specify "cluster"; "cluster-queue" and "cluster-job-name" may also be required, depending on your cluster manager configuration. If you connect to the cluster submit node over SSH, also specify "ssh-username" and "ssh-password" or "ssh-keyfile". For example:

builder = gtapprox.Builder()
# will prompt for password, getpass() requires interactive input
gtapprox.set_remote_build(builder, {"ssh-hostname": "submit-node", "ssh-username": "user", "ssh-password": getpass.getpass(), "cluster": "lsf"})

Instead of options you can specify the path to a configuration file in config_file. Also you can combine both — in this case option values are read from file first, then from options. If a conflict occurs, values set by options override those specified in the configuration file.

The configuration file should contain options and values in JSON format, for example:

{
    "ssh-hostname": "submit-node",
    "ssh-username": "user",
    "ssh-password": "password",

    "environment": {"OMP_NUM_THREADS": 8, "SHELL": "/bin/bash -i"},

    "cluster": "lsf",
    "cluster-queue": "normal",
    "cluster-exclusive": True
}
da.p7core.gtapprox.disable_remote_build(builder)

Reset builder configuration to run on the local host only.

Parameters:builder (Builder) – model builder

Used to cancel the set_remote_build() configuration.

da.p7core.gtapprox.train_test_split(x, y, train_size=None, test_size=None, options=None)

Split a data sample into train and test subsets optimized for model training.

Parameters:
  • x (array-like, 1D or 2D) – sample inputs (values of variables)
  • y (array-like, 1D or 2D) – sample responses (function values)
  • train_size (int or float) – optional number of training points (int) or portion of the sample to include in the train subset (float)
  • test_size (int or float) – optional number of test points (int) or portion of the sample to include in the test subset (float)
  • options (dict) – option settings
Returns:

tuple of train inputs, test inputs, train outputs and test outputs

Return type:

tuple

Performs an optimized split of the given data sample into two subsets to be used as model training and validation (test) data. The distribution of points between train and test is optimized to create subsets that both provide good representation of input and response variance, aiming to avoid skew, which may be introduced by random split.