11.4. `da.p7core.gtapprox`¶

Generic Tool for Approximation (GTApprox) module.

>>> from da.p7core import gtapprox

Classes

`da.p7core.gtapprox.Builder`()	Approximation model builder.
`da.p7core.gtapprox.ExportedFormat`	Enumerates available export formats.
`da.p7core.gtapprox.GradMatrixOrder`	Enumerates available gradient output modes.
`da.p7core.gtapprox.Model`([file])	Approximation model.
`da.p7core.gtapprox.Utilities`	Utility functions.

Functions

`da.p7core.gtapprox.export_fmi_20`(model, file)	Export the model to a Functional Mock-up Unit for Model Exchange and Co-Simulation 2.0.
`da.p7core.gtapprox.export_fmi_cs`(model, file)	Export the model to a Functional Mock-up Unit for Co-Simulation 1.0.
`da.p7core.gtapprox.export_fmi_me`(model, file)	Export the model to a Functional Mock-up Unit for Model Exchange 1.0.
`da.p7core.gtapprox.set_remote_build`(builder)	Configure a GTApprox builder to run on a remote host over SSH or to use a HPC cluster.
`da.p7core.gtapprox.disable_remote_build`(builder)	Reset builder configuration to run on the local host only.
`da.p7core.gtapprox.train_test_split`(x, y[, …])	Split a data sample into train and test subsets optimized for model training.

11.4.1. `Builder` — model builder¶

class da.p7core.gtapprox.Builder¶

Approximation model builder.

build(x, y, options=None, outputNoiseVariance=None, comment=None, weights=None, initial_model=None, annotations=None, x_meta=None, y_meta=None)¶

Train an approximation model.

Parameters:	x (array-like, 1D or 2D) – training sample, input part (values of variables) y (array-like, 1D or 2D) – training sample, response part (function values) options (`dict`) – option settings outputNoiseVariance (array-like, 1D or 2D) – optional y noise variance, supported by the GP, SGP, HDA, and HDAGP techniques comment (`str`) – optional comment added to model `info` weights (array-like, 1D) – optional weights of the training sample points, supported by the RSM, HDA, GP, SGP, HDAGP, iTA, and MoA techniques initial_model (`Model`) – optional initial model, supported by the GBRT, HDAGP, MoA, and TBL techniques only annotations (`dict`) – optional extended comment and notes x_meta (`list`) – optional input variables information y_meta (`list`) – optional output variables information
Returns:	trained model
Return type:	`Model`

Train a model using x and y as the training sample. 1D samples are supported as a simplified form for the case of 1D input and/or response.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as the x, y training samples.

If information on the noise level in the response sample y is available, GTApprox accepts it as the outputNoiseVariance argument to build(). This array should specify a noise variance value for each element of the y array (that is, for each response component of every single point). Thus outputNoiseVariance has the same shape as y.

Changed in version v2024.04: added the output noise variance support for the HDA technique.

Output noise variance feature is supported by the following techniques:

Gaussian Processes (GP),
Sparse Gaussian Processes (SGP),
High Dimensional Approximation (HDA), and
High Dimensional Approximation combined with Gaussian Processes (HDAGP).

That is, to use output noise variance meaningfully, one of the techniques above has to be selected using GTApprox/Technique in addition to specifying outputNoiseVariance. If any other technique is selected, either manually or automatically, the outputNoiseVariance argument is ignored (but see the next note).

Note

Output noise variance is not compatible with point weighting. If both outputNoiseVariance and weights are specified, build() raises an InvalidProblemError exception. This holds even if you select a technique that does not support output noise variance or point weighting and would normally ignore these arguments.

Note

Output noise variance is not compatible with GTApprox/ExactFitRequired. If outputNoiseVariance is not None and GTApprox/ExactFitRequired is on, build() raises an InvalidOptionsError exception.

Changed in version 3.0 Release Candidate 1: elements in outputNoiseVariance can have NaN values in special cases.

Since 3.0 Release Candidate 1, NaN values can be used in outputNoiseVariance to specify that noise variance data is not available. Valid uses are:

If noise variance data is not available for some point (a row in y), all elements of the corresponding row in outputNoiseVariance should be NaN. Note that the row cannot contain any numeric elements in this case.
Likewise, if noise variance data is not available for some output component (a column in y), the corresponding column in outputNoiseVariance should be filled with NaN values and cannot contain any numeric elements.
If some element in y is NaN (this is valid when GTApprox/OutputNanMode is set to "ignore" or "predict"), the corresponding element in outputNoiseVariance should be NaN. A numeric noise value in this case is not an error, but it will be ignored by GTApprox.

Changed in version 1.9.5: added the weights parameter.

Changed in version 5.0: added weights support to the LR, RSM, HDA, GP, SGP, HDAGP, and MoA techniques (previously was available in the iTA technique only).

Changed in version 5.0: point weight is no longer limited to range \([0, 1]\) and can be an arbitrary non-negative floating point value or infinity.

Changed in version 5.2: infinite weights are no longer allowed for numerical stability.

A number of GTApprox techniques support sample point weighting. Roughly, point weight is a relative confidence characteristic for this point which affects the model fit to the training sample. The model will try to fit the points with greater weights better, possibly at the cost of decreasing accuracy for the points with lesser weights. The points with zero weight may be completely ignored when fitting the model.

Point weighting is supported in the following techniques:

Response Surface Model (RSM).
High Dimensional Approximation (HDA).
Gaussian Processes (GP).
Sparse Gaussian Processes (SGP).
High Dimensional Approximation + Gaussian Processes (HDAGP).
incomplete Tensor Approximation (iTA).
Mixture of Approximators (MoA).

That is, to use point weights meaningfully, one of the techniques above has to be selected using GTApprox/Technique in addition to specifying weights. If any other technique is selected, either manually or automatically, weights are ignored (but see the next note).

Note

Point weighting is not compatible with GTApprox/ExactFitRequired. If weights is not None and GTApprox/ExactFitRequired is on, build() raises an InvalidOptionsError exception.

Point weight is an arbitrary non-negative numeric float value. This value has no specific meaning, it simply notes the relative “importance” of a point compared to other points in the training sample.

The weights argument should be a 1D array of point weights, and its length has to be equal to the number of training sample points.

Note

At least one weight has to be non-zero. If weights contains only zero values, build() raises an InvalidProblemError exception.

Note

Point weighting is not compatible with output noise variance. If both outputNoiseVariance and weights are specified, build() raises an InvalidProblemError exception. This holds even if you select a technique that does not support output noise variance or point weighting and would normally ignore these arguments.

Changed in version 5.3: added the incremental training (model update) support for GBRT models.

Changed in version 6.14: added the initial HDA model support for the HDAGP technique.

Changed in version 6.15.1: added the initial model support for the MoA technique.

Changed in version 6.25: added the incremental training (model update) support for TBL models.

Changed in version 6.47: added the incremental training (model update) support for GP models.

Changed in version 6.47: if you specify initial_model and manually select a technique that does not support model update, build() raises an InapplicableTechniqueException; previous versions could ignore the initial model was ignored in such cases.

A GP model can be updated with new data by specifying the existing GP model as initial_model and either selecting the GP technique manually or enabling the automatic technique selection (GTApprox/Technique set to "Auto", default). In this case, resulting model is also a GP model.

A GBRT model, similarly, can be updated with new data by specifying it as initial_model and selecting the GBRT technique or enabling the automatic technique selection.

When you use the HDAGP technique, you can add an existing HDA model as initial_model to use it as a trend, which provides noticeable savings in training time. Note that the training sample must be the same which was used to train the HDA model, otherwise the new HDAGP model will be inaccurate and incorrect. The intent in this case is to speed up the HDAGP model training by skipping the initial step of training a trend model internally.

The MoA technique can use a model trained by any technique as the initial one. MoA can improve model accuracy, update the model with new data, or do both. See section Initial Model for more information.

The TBL technique can use an existing TBL model as the initial one. This technique simply updates the model’s internal table with new input-output pairs from the training sample.

Other techniques do not support initial models and raise an exception if explicitly selected — for example, if you set GTApprox/Technique to "RSM" and specify initial_model, build() raises an InapplicableTechniqueException.

The MoA technique does not impose any specific limitations on initial models. For GBRT, GP, HDAGP, and TBL, if the initial_model does not match the selected technique, build() raises an exception — for example, if you specify the HDAGP technique but initial_model is not a HDA model. Also note the following limitations:

If you have trained an GBRT or HDA model with output transformation enabled, and you are using that model as an initial one, you must set the GTApprox/OutputTransformation option when updating the model, as explained in that option description.
When updating a GP model, you must get the GTApprox/GPType and GTApprox/GPPower option values from from the initial model details and set those options to the same values in build(). Additionally, the GTApprox/GPInteractionCardinality must be set to [] or to the value from the initial model.
Model update is not supported for GP models with the following features:
- Model trained with heteroscedastic noise processing (GTApprox/Heteroscedastic set to True).
- Models with categorical inputs.
GP model update is not compatible with point weighting: if initial_model is a GP model, and you specify weights, build() raises an exception.

Changed in version 6.14: added the annotations, x_meta, and y_meta parameters.

Changed in version 6.16: the x_meta parameter can specify input constraints.

Changed in version 6.17: the y_meta parameter can specify output thresholds.

Changed in version 6.17: training reuses the metainformation from an initial model.

The annotations dictionary adds optional notes or extended comments to model. It can contain any number of notes, all keys and values must be strings. The x_meta and y_meta parameters provide additional details on model inputs and outputs (constraints, names, descriptions, and other) — see Model Metainformation for details. Note that if you use an initial model that already contains metainformation, this metainformation is copied to the trained model. In this case, x_meta and y_meta can be used to edit metainformation: information specified in x_meta, y_meta overwrites the initial metainformation, while information not specified in the arguments is copied from the initial metainformation.

build_smart(x, y, options=None, outputNoiseVariance=None, comment=None, weights=None, initial_model=None, hints=None, x_test=None, y_test=None, annotations=None, x_meta=None, y_meta=None)¶

Train an approximation model using smart training.

Parameters:	x (array-like, 1D or 2D) – training sample, input part (values of variables) y (array-like, 1D or 2D) – training sample, response part (function values) options (`dict`) – option settings which will be set fixed during parameter search outputNoiseVariance (array-like, 1D or 2D) – optional y noise variance comment (`str`) – text comment weights (array-like, 1D) – training sample point weights initial_model (`Model`) – initial model for incremental training hints (`dict`) – user-provided hints on the data behaviour and desirable model properties x_test (array-like, 1D or 2D) – testing sample, input part (values of variables) y_test (array-like, 1D or 2D) – testing sample, response part (function values) annotations (`dict`) – extended comment and notes x_meta (`list`) – descriptions of inputs y_meta (`list`) – descriptions of outputs
Returns:	trained model
Return type:	`Model`

New in version 6.6.

Changed in version 6.14: added the annotations, x_meta, and y_meta parameters.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as the x, y training samples.

Train a model with x and y as the training sample using the smart training procedure. Arguments are the same as build(), with 3 additional arguments: hints, x_test and y_test.

hints: additional information about the data set or requirements to the model, and optional smart training settings. See section Hint Reference for details.
x_test and y_test: test samples which can be used to control model quality during training.

See section Smart Training for details on smart training.

license¶

Builder license.

Type:	`License`

General license information interface. See section License Usage for details.

options¶

Builder options.

Type:	`Options`

General options interface for the builder. See section Options Interface for usage and the GTApprox option reference.

static postprocess(model, train_x, train_y, hints={}, test_x=None, test_y=None)¶: Deprecated since version 6.6: it is recommended to use smart model training instead, see build_smart().

This method is deprecated since version 6.6 which added smart model training functionality to GTApprox. Pre- and post-processing features in GTApprox were completely removed in version 6.8, since smart model training is more convenient and provides better results. See build_smart() and section Smart Training for details.

static preprocess(train_x, train_y, hints={})¶: Deprecated since version 6.6: it is recommended to use smart model training instead, see build_smart().

This method is deprecated since version 6.6 which added smart model training functionality to GTApprox. Pre- and post-processing features in GTApprox were completely removed in version 6.8, since smart model training is more convenient and provides better results. See build_smart() and section Smart Training for details.

set_logger(logger)¶

Set logger.

Parameters:	logger – logger object
Returns:	`None`

Used to set up a logger for the build process. See section Loggers for details.

set_watcher(watcher)¶

Set watcher.

Parameters:	watcher – watcher object
Returns:	`None`

Used to set up a watcher for the build process. See section Watchers for details.

11.4.2. `ExportedFormat` — model export formats¶

class da.p7core.gtapprox.ExportedFormat¶

Enumerates available export formats.

New in version 6.10: added str aliases for export formats.

Changed in version 6.16: added the C# source format, see CSHARP_SOURCE.

Changed in version 6.16.1: C# source export is supported for all GTApprox models but is not yet supported for GTDF models loaded to gtapprox.Model.

In export_to() you can specify format in two ways:

Using enumeration, for example: my_model.export_to(gtapprox.ExportedFormat.C99_PROGRAM, "func_name", "comment", "my_model.c").
Using str alias (added in 6.10), for example: my_model.export_to("c_program", "func_name", "comment", "my_model.c").

OCTAVE¶

Octave format.

Alias: "octave".

OCTAVE_MEX¶

C source for a MEX file.

Aliases: "octave_mex", "mex".

C99_PROGRAM¶

C source with the main() function for a complete command-line based C program.

Aliases: "c99_program", "c_program", "program".

C99_HEADER¶

C header of the model.

Aliases: "c99_header", "c_header", "header".

C99_SOURCE¶

C header and implementation of the model.

Aliases: "c99_source", "c_source", "c".

EXCEL_DLL¶

C implementation of the model intended for creating a DLL compatible with Microsoft Excel.

Aliases: "excel_dll", "excel".

CSHARP_SOURCE¶

New in version 6.16.

C# implementation of the model.

Alias: "c#".

Note

The C# source export is not yet supported for GTDF models loaded to gtapprox.Model.

Note

The C# source export requires an up to date license valid for pSeven Core 6.16 and above.

11.4.3. `GradMatrixOrder` — model gradients order¶

class da.p7core.gtapprox.GradMatrixOrder¶

Enumerates available gradient output modes.

F_MAJOR¶: Indexed in function-major order (\(grad_{ij} = \frac{df_i}{dx_j}\)).

X_MAJOR¶: Indexed in variable-major order (\(grad_{ij} = \frac{df_j}{dx_i}\)).

11.4.4. `Model` — approximation model¶

class da.p7core.gtapprox.Model(file=None, **kwargs)¶

Approximation model.

Can be created by Builder or loaded from a file via the Model constructor.

Changed in version 6.16: the file to load may also be a GTDF model saved with gtdf.Model.save(). Note that loading a GTDF model converts it into a GTApprox model, but the backward conversion is not supported.

Model objects are immutable. All methods which are meant to change the model return a new Model instance.

annotations¶

Extended comment or supplementary information.

Type:	`dict`

New in version 6.6.

The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using modify().

11.4.5. `Utilities` — auxiliary functions¶

class da.p7core.gtapprox.Utilities¶

Utility functions.

static checkTensorStructure(trainPoints, userDefinedFactors=())¶

Check if the source data has proper structure so the Tensor Approximation technique may be used.

Parameters:	trainPoints (array-like) – training sample (variables only) userDefinedFactors (array-like) – optional user-defined tensor factors, as in GTApprox/TensorFactors
Returns:	check result and (if no user-defined factors are given) calculated tensor factors, as a tuple
Return type:	`tuple(bool, list[list])`

The Tensor Approximation technique requires specific design of an experiment type to be used (the so-called gridded data). This function may be used to check if sample data structure allows TA usage. User shall supply the training sample and, optionally, a list of proposed tensor factors. Return value is a tuple of Boolean check result (True means sample is TA-compatible) and a list of tensor factors which are either user-defined or calculated automatically if userDefinedFactors is an empty list.

11.4.6. Functions¶

da.p7core.gtapprox.export_fmi_20(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)¶

Export the model to a Functional Mock-up Unit for Model Exchange and Co-Simulation 2.0.

Parameters:	model (`Model`) – exported model file (`file` or `str`) – file object or path where to export id (`str`) – a string used in model and function names der_outputs (`bool`) – if `True`, include partial derivatives of model outputs in the list of FMI model outputs meta (`dict`) – model information inputs_meta (`list`) – input variable information outputs_meta (`list`) – output variable information compilers (`dict`) – compiler settings to export an FMU with binary single_file (`bool`) – pass sources to compilers as a single file (default) or multiple files (`False`)
Returns:	description of model variables
Return type:	`list`

New in version 6.31.

According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be .fmu by standard.

For the general model description, use meta. This argument is a dictionary that may contain the following keys:

"name": a string with the name of the model that will be shown in the modeling environment.
"description": a string with a brief model description; if omitted, the model’s comment is used.
"naming_convention": name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:
- "flat": a list of strings (default).
- "structured": hierarchical names using dot separator, with array elements and derivative characterization.
"author": an string containing author’s name and organization.
"version": model version string.
"copyright": optional information on the intellectual property copyright for this FMU.
"license": optional information on the intellectual property licensing for this FMU.

For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length size_x (or size_f respectively). List element is a dictionary with the following keys (all keys are optional):

"name": name of the variable (string), optional. Default is "x[i]" for inputs, "f[i]" for outputs, where i is the index of this input or output in the training sample.
"description": a string containing brief variable description.
"quantity": physical quantity of the variable, for example "Angle" or "Energy".
"unit": measurement units used for this variable in model equations, for example "deg" or "J".
"min": the minimum value of the variable (float) or "training" to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine.
"max": the maximum value of the variable (float) or "training" to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.

If some or all details for a variable are not specified, GTApprox also tries to get them from model’s details. If some details are specified both in details and as parameters to export_fmi_20(), information from parameters takes priority.

By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.

A key in compilers is a string identifying the target platform. Recognized platform names are: "win32", "win64", "linux32", "linux64". You can add compilers for different platforms to export an FMU with cross-platform support.
A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.

Each callable in compilers should support three input parameters:

source_code - the source code to compile.
- If single_file is True or not specified, source_code is a string.
- If single_file is False, source_code is a list of string pairs (file_name, source_code). The file_name and source_code strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
model_id is the model identifier and the name of the shared library (a .dll or .so file).
platform is the platform identifier, one of the following strings: "win32", "win64", "linux32", "linux64".

Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by export_fmi_20().

On successful export, export_fmi_20() returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:

"name": the name of the variable.
"causality": "input" or "output"; indicates how the variable is visible from the outside of the model.
"variability": "constant" or "parameter"; indicates when the value of the variable changes.
"type": "real" or "enum"; indicates type of the variable.
"value": "real" or "constant"; omitted for other types of variables.
"enumerators": list of enumerators if variable type is "enum"; omitted for other types of variables.
"origin": a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:
- (j, -1) is the j-th component of the original model input.
- (-1, i) is the i-th component of the original model output.
- (j, i) is the partial derivative of the i-th model output with respect to j-th input.

da.p7core.gtapprox.export_fmi_cs(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)¶

Export the model to a Functional Mock-up Unit for Co-Simulation 1.0.

Parameters:	model (`Model`) – exported model file (`file` or `str`) – file object or path where to export id (`str`) – a string used in model and function names der_outputs (`bool`) – if `True`, include partial derivatives of model outputs in the list of FMI model outputs meta (`dict`) – model information inputs_meta (`list`) – input variable information outputs_meta (`list`) – output variable information compilers (`dict`) – compiler settings to export an FMU with binary single_file (`bool`) – pass sources to compilers as a single file (default) or multiple files (`False`)
Returns:	description of model variables
Return type:	`list`

New in version 6.9.

Changed in version 6.24: added the single_file parameter.

According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be .fmu by standard.

For the general model description, use meta. This argument is a dictionary that may contain the following keys:

"name": a string with the name of the model that will be shown in the modeling environment.
"description": a string with a brief model description; if omitted, the model’s comment is used.
"naming_convention": name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:
- "flat": a list of strings (default).
- "structured": hierarchical names using dot separator, with array elements and derivative characterization.
"author": an string containing author’s name and organization.
"version": model version string.

For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length size_x (or size_f respectively). List element is a dictionary with the following keys (all keys are optional):

"name": name of the variable (string), optional. Default is "x[i]" for inputs, "f[i]" for outputs, where i is the index of this input or output in the training sample.
"description": a string containing brief variable description.
"quantity": physical quantity of the variable, for example "Angle" or "Energy".
"unit": measurement units used for this variable in model equations, for example "deg" or "J".
"min": the minimum value of the variable (float) or "training" to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine.
"max": the maximum value of the variable (float) or "training" to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.

If some or all details for a variable are not specified, GTApprox also tries to get them from model’s details. If some details are specified both in details and as parameters to export_fmi_cs(), information from parameters takes priority.

By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.

A key in compilers is a string identifying the target platform. Recognized platform names are: "win32", "win64", "linux32", "linux64". You can add compilers for different platforms to export an FMU with cross-platform support.
A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.

Each callable in compilers should support three input parameters:

source_code - the source code to compile.
- If single_file is True or not specified, source_code is a string.
- If single_file is False, source_code is a list of string pairs (file_name, source_code). The file_name and source_code strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
model_id is the model identifier and the name of the shared library (a .dll or .so file).
platform is the platform identifier, one of the following strings: "win32", "win64", "linux32", "linux64".

Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by export_fmi_cs().

On successful export, export_fmi_cs() returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:

"name": the name of the variable.
"causality": "input" or "output"; indicates how the variable is visible from the outside of the model.
"variability": "constant" or "parameter"; indicates when the value of the variable changes.
"type": "real" or "enum"; indicates type of the variable.
"value": "real" or "constant"; omitted for other types of variables.
"enumerators": list of enumerators if variable type is "enum"; omitted for other types of variables.
"origin": a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:
- (j, -1) is the j-th component of the original model input.
- (-1, i) is the i-th component of the original model output.
- (j, i) is the partial derivative of the i-th model output with respect to j-th input.

da.p7core.gtapprox.export_fmi_me(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)¶

Export the model to a Functional Mock-up Unit for Model Exchange 1.0.

Parameters:	model (`Model`) – exported model file (`file` or `str`) – file object or path where to export id (`str`) – a string used in model and function names der_outputs (`bool`) – if `True`, include partial derivatives of model outputs in the list of FMI model outputs meta (`dict`) – model information inputs_meta (`list`) – input variable information outputs_meta (`list`) – output variable information compilers (`dict`) – compiler settings to export an FMU with binary single_file (`bool`) – pass sources to compilers as a single file (default) or multiple files (`False`)
Returns:	description of model variables
Return type:	`list`

New in version 6.14.3.

Changed in version 6.24: added the single_file parameter.

According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be .fmu by standard.

For the general model description, use meta. This argument is a dictionary that may contain the following keys:

"name": a string with the name of the model that will be shown in the modeling environment.
"description": a string with a brief model description; if omitted, the model’s comment is used.
"naming_convention": name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:
- "flat": a list of strings (default).
- "structured": hierarchical names using dot separator, with array elements and derivative characterization.
"author": an string containing author’s name and organization.
"version": model version string.

For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length size_x (or size_f respectively). List element is a dictionary with the following keys (all keys are optional):

"name": name of the variable (string), optional. Default is "x[i]" for inputs, "f[i]" for outputs, where i is the index of this input or output in the training sample.
"description": a string containing brief variable description.
"quantity": physical quantity of the variable, for example "Angle" or "Energy".
"unit": measurement units used for this variable in model equations, for example "deg" or "J".
"min": the minimum value of the variable (float) or "training" to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine.
"max": the maximum value of the variable (float) or "training" to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.

If some or all details for a variable are not specified, GTApprox also tries to get them from model’s details. If some details are specified both in details and as parameters to export_fmi_me(), information from parameters takes priority.

By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.

A key in compilers is a string identifying the target platform. Recognized platform names are: "win32", "win64", "linux32", "linux64". You can add compilers for different platforms to export an FMU with cross-platform support.
A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.

Each callable in compilers should support three input parameters:

source_code - the source code to compile.
- If single_file is True or not specified, source_code is a string.
- If single_file is False, source_code is a list of string pairs (file_name, source_code). The file_name and source_code strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
model_id is the model identifier and the name of the shared library (a .dll or .so file).
platform is the platform identifier, one of the following strings: "win32", "win64", "linux32", "linux64".

Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by export_fmi_me().

On successful export, export_fmi_me() returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:

"name": the name of the variable.
"causality": "input" or "output"; indicates how the variable is visible from the outside of the model.
"variability": "constant" or "parameter"; indicates when the value of the variable changes.
"type": "real" or "enum"; indicates type of the variable.
"value": "real" or "constant"; omitted for other types of variables.
"enumerators": list of enumerators if variable type is "enum"; omitted for other types of variables.
"origin": a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:
- (j, -1) is the j-th component of the original model input.
- (-1, i) is the i-th component of the original model output.
- (j, i) is the partial derivative of the i-th model output with respect to j-th input.

da.p7core.gtapprox.set_remote_build(builder, options={}, config_file=None)¶

Configure a GTApprox builder to run on a remote host over SSH or to use a HPC cluster.

Parameters:	builder (`Builder`) – model builder options (`dict`) – configuration options config_file (`str`) – optional path to a configuration file

New in version 4.3: initial support for remote model training and distributed training of MoA models on a cluster.

New in version 5.3: distributed training now supported for all componentwise models.

Changed in version 6.3: GTApprox now enables componentwise training by default, hence distributed training also becomes default when using a cluster.

New in version 6.6: for models with categorical variables, distributed training now supports parallelization over all unique combinations of their values found in the training sample.

Deprecated since version 6.35: this method is no longer updated and may be behind build() and build_smart() with regard to certain features or training techniques; using it is not recommended as it may get removed in future versions.

Allows to configure a model builder to run remotely or to perform distributed model training on a cluster. Distributed training on a cluster means that a model is divided into several sub-models which become separate cluster jobs, allowing high degree of parallelization.

Note

The same version of pSeven Core has to be installed on the local and remote hosts or, in case of distributed training, on the local host and all cluster nodes.

Note

Remote training requires the paramiko module and its dependencies (pycrypto and ecdsa). These modules are not required for pSeven Core in general and hence are not listed in section System Requirements.

Distributed training is effective in the following cases:

When using the Mixture of Approximators (MoA) technique (set GTApprox/Technique to "MoA"). This technique automatically partitions the training sample and trains several sub-models which are then combined in the final model. Naturally it can support distributed training for its sub-models.
When a model has multidimensional output and componentwise training is enabled. The componentwise mode is default since 6.3 (see GTApprox/DependentOutputs). Componentwise models can be trained in parallel since each model component is trained independently.
When you define one or more categorical variables (see GTApprox/CategoricalVariables) and the training sample contains two or more unique combinations of their values. In this case, an independent model can be trained for each of such combinations.

Note that a combination of the above cases is also supported — that is, GTApprox tries to achieve as high parallelization ratio as possible. For example, if you train a componentwise model with categorical variables, the ratio can be higher than the number of model outputs.

If none of the above cases apply, cluster training is still available but will simply submit a single job to the cluster.

The options argument is a dictionary with the following recognized keys (all keys are str, value types are noted below):

"ssh-hostname" (str) — remote SSH host name.
"ssh-username" (str) — SSH username.
"ssh-password" (str) — SSH password (warning: unsafe).
"ssh-keyfile" (str) — path to an SSH private key file.
"environment" (dict) — dictionary of environment variables.
"workdir" (str) — path to the working directory (local or remote, depending on SSH configuration).
"cluster" (str) — cluster type. Currently the only supported type is LSF ("lsf"). If cluster type is None the model is trained on a remote host without using a HPC cluster.
"cluster-queue" (str) — name of the destination cluster queue.
"cluster-job-name" (str) — cluster job name.
"cluster-exclusive" (bool) — if True, cluster nodes are used exclusively by jobs (the destination queue must support exclusive jobs). Note that if exclusive jobs are disabled (False), it is recommended to set GTApprox/MaxParallel to 1 or 2 (in builder options) to avoid performance degradation in case of two or more jobs being allocated to the same node by a cluster manager. See section Multi-core Scalability for details.
"cluster-slot-limit" (int) — maximum number of jobs that can run simultaneously.

To train a model remotely over SSH, you have to specify "ssh-hostname" and either:

"ssh-username" and "ssh-password", or
"ssh-keyfile" ("ssh-username" may also be required when using a key file).

Using a key file is recommended since storing SSH password in your script is unsafe. If you have no key file, you can use the standard getpass module as a workaround. For example:

builder = gtapprox.Builder()
# will prompt for password, getpass() requires interactive input
gtapprox.set_remote_build(builder, {"ssh-hostname": "theserver", "ssh-username": "user", "ssh-password": getpass.getpass()})

To use a cluster, you have to specify "cluster"; "cluster-queue" and "cluster-job-name" may also be required, depending on your cluster manager configuration. If you connect to the cluster submit node over SSH, also specify "ssh-username" and "ssh-password" or "ssh-keyfile". For example:

builder = gtapprox.Builder()
# will prompt for password, getpass() requires interactive input
gtapprox.set_remote_build(builder, {"ssh-hostname": "submit-node", "ssh-username": "user", "ssh-password": getpass.getpass(), "cluster": "lsf"})

Instead of options you can specify the path to a configuration file in config_file. Also you can combine both — in this case option values are read from file first, then from options. If a conflict occurs, values set by options override those specified in the configuration file.

The configuration file should contain options and values in JSON format, for example:

{
    "ssh-hostname": "submit-node",
    "ssh-username": "user",
    "ssh-password": "password",

    "environment": {"OMP_NUM_THREADS": 8, "SHELL": "/bin/bash -i"},

    "cluster": "lsf",
    "cluster-queue": "normal",
    "cluster-exclusive": True
}

da.p7core.gtapprox.disable_remote_build(builder)¶

Reset builder configuration to run on the local host only.

Parameters:	builder (`Builder`) – model builder

Used to cancel the set_remote_build() configuration.

da.p7core.gtapprox.train_test_split(x, y, train_size=None, test_size=None, options=None)¶

Split a data sample into train and test subsets optimized for model training.

Parameters:	x (array-like, 1D or 2D) – sample inputs (values of variables) y (array-like, 1D or 2D) – sample responses (function values) train_size (`int` or `float`) – optional number of training points (`int`) or portion of the sample to include in the train subset (`float`) test_size (`int` or `float`) – optional number of test points (`int`) or portion of the sample to include in the test subset (`float`) options (`dict`) – option settings
Returns:	tuple of train inputs, test inputs, train outputs and test outputs
Return type:	`tuple`

Performs an optimized split of the given data sample into two subsets to be used as model training and validation (test) data. The distribution of points between train and test is optimized to create subsets that both provide good representation of input and response variance, aiming to avoid skew, which may be introduced by random split.

Parameters:	point (`float` or array-like, 2D or 1D) – the sample or point to evaluate
Returns:	estimates
Return type:	`pandas.DataFrame` or `pandas.Series` if point is a pandas type; otherwise `ndarray`, 2D or 1D
Raise:	`FeatureNotAvailableError` if the model does not provide accuracy evaluation

Parameters:	point (`float` or array-like, 2D or 1D) – the sample or point to evaluate order (`GradMatrixOrder`) – gradient matrix order
Returns:	accuracy evaluation gradients
Return type:	`pandas.DataFrame` if point is a pandas type; otherwise `ndarray`, 3D or 2D
Raise:	`FeatureNotAvailableError` if the model does not provide accuracy evaluation

11.4. da.p7core.gtapprox¶

11.4.1. Builder — model builder¶

11.4.2. ExportedFormat — model export formats¶

11.4.3. GradMatrixOrder — model gradients order¶

11.4.4. Model — approximation model¶

11.4.5. Utilities — auxiliary functions¶

11.4.6. Functions¶

11.4. `da.p7core.gtapprox`¶

11.4.1. `Builder` — model builder¶

11.4.2. `ExportedFormat` — model export formats¶

11.4.3. `GradMatrixOrder` — model gradients order¶

11.4.4. `Model` — approximation model¶

11.4.5. `Utilities` — auxiliary functions¶