11.4. da.p7core.gtapprox
¶
Generic Tool for Approximation (GTApprox) module.
>>> from da.p7core import gtapprox
Classes
da.p7core.gtapprox.Builder () |
Approximation model builder. |
da.p7core.gtapprox.ExportedFormat |
Enumerates available export formats. |
da.p7core.gtapprox.GradMatrixOrder |
Enumerates available gradient output modes. |
da.p7core.gtapprox.Model ([file]) |
Approximation model. |
da.p7core.gtapprox.Utilities |
Utility functions. |
Functions
da.p7core.gtapprox.export_fmi_20 (model, file) |
Export the model to a Functional Mock-up Unit for Model Exchange and Co-Simulation 2.0. |
da.p7core.gtapprox.export_fmi_cs (model, file) |
Export the model to a Functional Mock-up Unit for Co-Simulation 1.0. |
da.p7core.gtapprox.export_fmi_me (model, file) |
Export the model to a Functional Mock-up Unit for Model Exchange 1.0. |
da.p7core.gtapprox.set_remote_build (builder) |
Configure a GTApprox builder to run on a remote host over SSH or to use a HPC cluster. |
da.p7core.gtapprox.disable_remote_build (builder) |
Reset builder configuration to run on the local host only. |
da.p7core.gtapprox.train_test_split (x, y[, …]) |
Split a data sample into train and test subsets optimized for model training. |
11.4.1. Builder
— model builder¶
-
class
da.p7core.gtapprox.
Builder
¶ Approximation model builder.
-
build
(x, y, options=None, outputNoiseVariance=None, comment=None, weights=None, initial_model=None, annotations=None, x_meta=None, y_meta=None)¶ Train an approximation model.
Parameters: - x (array-like, 1D or 2D) – training sample, input part (values of variables)
- y (array-like, 1D or 2D) – training sample, response part (function values)
- options (
dict
) – option settings - outputNoiseVariance (array-like, 1D or 2D) – optional y noise variance, supported by the GP, SGP, HDA, and HDAGP techniques
- comment (
str
) – optional comment added to modelinfo
- weights (array-like, 1D) – optional weights of the training sample points, supported by the RSM, HDA, GP, SGP, HDAGP, iTA, and MoA techniques
- initial_model (
Model
) – optional initial model, supported by the GBRT, HDAGP, MoA, and TBL techniques only - annotations (
dict
) – optional extended comment and notes - x_meta (
list
) – optional input variables information - y_meta (
list
) – optional output variables information
Returns: trained model
Return type: Train a model using x and y as the training sample. 1D samples are supported as a simplified form for the case of 1D input and/or response.
Changed in version 6.25:
pandas.DataFrame
andpandas.Series
are supported as the x, y training samples.If information on the noise level in the response sample y is available, GTApprox accepts it as the outputNoiseVariance argument to
build()
. This array should specify a noise variance value for each element of the y array (that is, for each response component of every single point). Thus outputNoiseVariance has the same shape as y.Changed in version v2024.04: added the output noise variance support for the HDA technique.
Output noise variance feature is supported by the following techniques:
- Gaussian Processes (GP),
- Sparse Gaussian Processes (SGP),
- High Dimensional Approximation (HDA), and
- High Dimensional Approximation combined with Gaussian Processes (HDAGP).
That is, to use output noise variance meaningfully, one of the techniques above has to be selected using GTApprox/Technique in addition to specifying outputNoiseVariance. If any other technique is selected, either manually or automatically, the outputNoiseVariance argument is ignored (but see the next note).
Note
Output noise variance is not compatible with point weighting. If both outputNoiseVariance and weights are specified,
build()
raises anInvalidProblemError
exception. This holds even if you select a technique that does not support output noise variance or point weighting and would normally ignore these arguments.Note
Output noise variance is not compatible with GTApprox/ExactFitRequired. If outputNoiseVariance is not
None
and GTApprox/ExactFitRequired is on,build()
raises anInvalidOptionsError
exception.Changed in version 3.0 Release Candidate 1: elements in outputNoiseVariance can have NaN values in special cases.
Since 3.0 Release Candidate 1, NaN values can be used in outputNoiseVariance to specify that noise variance data is not available. Valid uses are:
- If noise variance data is not available for some point (a row in y), all elements of the corresponding row in outputNoiseVariance should be NaN. Note that the row cannot contain any numeric elements in this case.
- Likewise, if noise variance data is not available for some output component (a column in y), the corresponding column in outputNoiseVariance should be filled with NaN values and cannot contain any numeric elements.
- If some element in y is NaN
(this is valid when GTApprox/OutputNanMode is set to
"ignore"
or"predict"
), the corresponding element in outputNoiseVariance should be NaN. A numeric noise value in this case is not an error, but it will be ignored by GTApprox.
Changed in version 1.9.5: added the weights parameter.
Changed in version 5.0: added weights support to the LR, RSM, HDA, GP, SGP, HDAGP, and MoA techniques (previously was available in the iTA technique only).
Changed in version 5.0: point weight is no longer limited to range \([0, 1]\) and can be an arbitrary non-negative floating point value or infinity.
Changed in version 5.2: infinite weights are no longer allowed for numerical stability.
A number of GTApprox techniques support sample point weighting. Roughly, point weight is a relative confidence characteristic for this point which affects the model fit to the training sample. The model will try to fit the points with greater weights better, possibly at the cost of decreasing accuracy for the points with lesser weights. The points with zero weight may be completely ignored when fitting the model.
Point weighting is supported in the following techniques:
- Response Surface Model (RSM).
- High Dimensional Approximation (HDA).
- Gaussian Processes (GP).
- Sparse Gaussian Processes (SGP).
- High Dimensional Approximation + Gaussian Processes (HDAGP).
- incomplete Tensor Approximation (iTA).
- Mixture of Approximators (MoA).
That is, to use point weights meaningfully, one of the techniques above has to be selected using GTApprox/Technique in addition to specifying weights. If any other technique is selected, either manually or automatically, weights are ignored (but see the next note).
Note
Point weighting is not compatible with GTApprox/ExactFitRequired. If weights is not
None
and GTApprox/ExactFitRequired is on,build()
raises anInvalidOptionsError
exception.Point weight is an arbitrary non-negative numeric
float
value. This value has no specific meaning, it simply notes the relative “importance” of a point compared to other points in the training sample.The weights argument should be a 1D array of point weights, and its length has to be equal to the number of training sample points.
Note
At least one weight has to be non-zero. If weights contains only zero values,
build()
raises anInvalidProblemError
exception.Note
Point weighting is not compatible with output noise variance. If both outputNoiseVariance and weights are specified,
build()
raises anInvalidProblemError
exception. This holds even if you select a technique that does not support output noise variance or point weighting and would normally ignore these arguments.Changed in version 5.3: added the incremental training (model update) support for GBRT models.
Changed in version 6.14: added the initial HDA model support for the HDAGP technique.
Changed in version 6.15.1: added the initial model support for the MoA technique.
Changed in version 6.25: added the incremental training (model update) support for TBL models.
Changed in version 6.47: added the incremental training (model update) support for GP models.
Changed in version 6.47: if you specify initial_model and manually select a technique that does not support model update,
build()
raises anInapplicableTechniqueException
; previous versions could ignore the initial model was ignored in such cases.A GP model can be updated with new data by specifying the existing GP model as initial_model and either selecting the GP technique manually or enabling the automatic technique selection (GTApprox/Technique set to
"Auto"
, default). In this case, resulting model is also a GP model.A GBRT model, similarly, can be updated with new data by specifying it as initial_model and selecting the GBRT technique or enabling the automatic technique selection.
When you use the HDAGP technique, you can add an existing HDA model as initial_model to use it as a trend, which provides noticeable savings in training time. Note that the training sample must be the same which was used to train the HDA model, otherwise the new HDAGP model will be inaccurate and incorrect. The intent in this case is to speed up the HDAGP model training by skipping the initial step of training a trend model internally.
The MoA technique can use a model trained by any technique as the initial one. MoA can improve model accuracy, update the model with new data, or do both. See section Initial Model for more information.
The TBL technique can use an existing TBL model as the initial one. This technique simply updates the model’s internal table with new input-output pairs from the training sample.
Other techniques do not support initial models and raise an exception if explicitly selected — for example, if you set GTApprox/Technique to
"RSM"
and specify initial_model,build()
raises anInapplicableTechniqueException
.The MoA technique does not impose any specific limitations on initial models. For GBRT, GP, HDAGP, and TBL, if the initial_model does not match the selected technique,
build()
raises an exception — for example, if you specify the HDAGP technique but initial_model is not a HDA model. Also note the following limitations:- If you have trained an GBRT or HDA model with output transformation enabled, and you are using that model as an initial one, you must set the GTApprox/OutputTransformation option when updating the model, as explained in that option description.
- When updating a GP model, you must get the
GTApprox/GPType and GTApprox/GPPower
option values from from the initial model
details
and set those options to the same values inbuild()
. Additionally, the GTApprox/GPInteractionCardinality must be set to[]
or to the value from the initial model. - Model update is not supported for GP models with the following features:
- Model trained with heteroscedastic noise processing
(GTApprox/Heteroscedastic set to
True
). - Models with categorical inputs.
- Model trained with heteroscedastic noise processing
(GTApprox/Heteroscedastic set to
- GP model update is not compatible with point weighting: if initial_model
is a GP model, and you specify weights,
build()
raises an exception.
Changed in version 6.14: added the annotations, x_meta, and y_meta parameters.
Changed in version 6.16: the x_meta parameter can specify input constraints.
Changed in version 6.17: the y_meta parameter can specify output thresholds.
Changed in version 6.17: training reuses the metainformation from an initial model.
The annotations dictionary adds optional notes or extended comments to model. It can contain any number of notes, all keys and values must be strings. The x_meta and y_meta parameters provide additional details on model inputs and outputs (constraints, names, descriptions, and other) — see Model Metainformation for details. Note that if you use an initial model that already contains metainformation, this metainformation is copied to the trained model. In this case, x_meta and y_meta can be used to edit metainformation: information specified in x_meta, y_meta overwrites the initial metainformation, while information not specified in the arguments is copied from the initial metainformation.
-
build_smart
(x, y, options=None, outputNoiseVariance=None, comment=None, weights=None, initial_model=None, hints=None, x_test=None, y_test=None, annotations=None, x_meta=None, y_meta=None)¶ Train an approximation model using smart training.
Parameters: - x (array-like, 1D or 2D) – training sample, input part (values of variables)
- y (array-like, 1D or 2D) – training sample, response part (function values)
- options (
dict
) – option settings which will be set fixed during parameter search - outputNoiseVariance (array-like, 1D or 2D) – optional y noise variance
- comment (
str
) – text comment - weights (array-like, 1D) – training sample point weights
- initial_model (
Model
) – initial model for incremental training - hints (
dict
) – user-provided hints on the data behaviour and desirable model properties - x_test (array-like, 1D or 2D) – testing sample, input part (values of variables)
- y_test (array-like, 1D or 2D) – testing sample, response part (function values)
- annotations (
dict
) – extended comment and notes - x_meta (
list
) – descriptions of inputs - y_meta (
list
) – descriptions of outputs
Returns: trained model
Return type: New in version 6.6.
Changed in version 6.14: added the annotations, x_meta, and y_meta parameters.
Changed in version 6.25:
pandas.DataFrame
andpandas.Series
are supported as the x, y training samples.Train a model with x and y as the training sample using the smart training procedure. Arguments are the same as
build()
, with 3 additional arguments: hints, x_test and y_test.- hints: additional information about the data set or requirements to the model, and optional smart training settings. See section Hint Reference for details.
- x_test and y_test: test samples which can be used to control model quality during training.
See section Smart Training for details on smart training.
-
license
¶ Builder license.
Type: License
General license information interface. See section License Usage for details.
-
options
¶ Builder options.
Type: Options
General options interface for the builder. See section Options Interface for usage and the GTApprox option reference.
-
static
postprocess
(model, train_x, train_y, hints={}, test_x=None, test_y=None)¶ Deprecated since version 6.6: it is recommended to use smart model training instead, see
build_smart()
.This method is deprecated since version 6.6 which added smart model training functionality to GTApprox. Pre- and post-processing features in GTApprox were completely removed in version 6.8, since smart model training is more convenient and provides better results. See
build_smart()
and section Smart Training for details.
-
static
preprocess
(train_x, train_y, hints={})¶ Deprecated since version 6.6: it is recommended to use smart model training instead, see
build_smart()
.This method is deprecated since version 6.6 which added smart model training functionality to GTApprox. Pre- and post-processing features in GTApprox were completely removed in version 6.8, since smart model training is more convenient and provides better results. See
build_smart()
and section Smart Training for details.
-
11.4.2. ExportedFormat
— model export formats¶
-
class
da.p7core.gtapprox.
ExportedFormat
¶ Enumerates available export formats.
New in version 6.10: added
str
aliases for export formats.Changed in version 6.16: added the C# source format, see
CSHARP_SOURCE
.Changed in version 6.16.1: C# source export is supported for all GTApprox models but is not yet supported for GTDF models loaded to
gtapprox.Model
.In
export_to()
you can specify format in two ways:- Using enumeration, for example:
my_model.export_to(gtapprox.ExportedFormat.C99_PROGRAM, "func_name", "comment", "my_model.c")
. - Using
str
alias (added in 6.10), for example:my_model.export_to("c_program", "func_name", "comment", "my_model.c")
.
-
OCTAVE_MEX
¶ C source for a MEX file.
Aliases:
"octave_mex"
,"mex"
.
-
C99_PROGRAM
¶ C source with the
main()
function for a complete command-line based C program.Aliases:
"c99_program"
,"c_program"
,"program"
.
-
C99_HEADER
¶ C header of the model.
Aliases:
"c99_header"
,"c_header"
,"header"
.
-
C99_SOURCE
¶ C header and implementation of the model.
Aliases:
"c99_source"
,"c_source"
,"c"
.
-
EXCEL_DLL
¶ C implementation of the model intended for creating a DLL compatible with Microsoft Excel.
Aliases:
"excel_dll"
,"excel"
.
-
CSHARP_SOURCE
¶ New in version 6.16.
C# implementation of the model.
Alias:
"c#"
.Note
The C# source export is not yet supported for GTDF models loaded to
gtapprox.Model
.Note
The C# source export requires an up to date license valid for pSeven Core 6.16 and above.
- Using enumeration, for example:
11.4.3. GradMatrixOrder
— model gradients order¶
11.4.4. Model
— approximation model¶
-
class
da.p7core.gtapprox.
Model
(file=None, **kwargs)¶ Approximation model.
Can be created by
Builder
or loaded from a file via theModel
constructor.Changed in version 6.16: the file to load may also be a GTDF model saved with
gtdf.Model.save()
. Note that loading a GTDF model converts it into a GTApprox model, but the backward conversion is not supported.Model
objects are immutable. All methods which are meant to change the model return a newModel
instance.-
annotations
¶ Extended comment or supplementary information.
Type: dict
New in version 6.6.
The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using
modify()
.See also Model Metainformation.
-
static
available_sections
(**kwargs)¶ Get a list of available model sections.
Parameters: - file (
file
orstr
) – file object or path to load model from - string (
str
) – serialized model - model (
Model
) – model object
Returns: available model sections
Return type: list
New in version 6.11.
Returns a list of strings specifying which sections can be loaded from the model:
"model"
: main model section, required for model evaluation and smoothing methods."info"
: model information, required forinfo
."comment"
: comment section, required forcomment
."annotations"
: annotations section, required forannotations
."training_sample"
: a copy of training sample data, required fortraining_sample
."iv_info"
: internal validation data, required foriv_info
."build_log"
: model training log, required forbuild_log
.
See Approximation Model Structure for details.
- file (
-
build_log
¶ Model building log.
Type: str
-
calc
(point)¶ Evaluate the model.
Parameters: point ( float
or array-like, 2D or 1D) – the sample or point to evaluateReturns: model values Return type: pandas.DataFrame
orpandas.Series
if point is a pandas type; otherwisendarray
, 2D or 1DChanged in version 1.9.0: smoothness parameter is no longer supported (see Version Compatibility Issues).
Changed in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.22: returns
ndarray
withdtype=object
if the model has string categorical outputs.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.Evaluates a data sample or a single point. In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported.
The returned array is 2D if point is a sample, and 1D if point is a single point. When point is a
pandas.DataFrame
orpandas.Series
, the returned array keeps indexing of the point array.In the case of 1D model input, a single
float
value is interpreted as a single point. A 1D array-like with a single element is also one point; other 1D array-likes are interpreted as a sample. A 2D array-like is always interpreted as a sample, even if it contains a single point actually. For example:model_1d.calc(0.0) # a 1D point model_1d.calc([0.0]) # a 1D point model_1d.calc([[0.0]]) # a sample, one 1D point model_1d.calc([0.0, 1.0]) # a sample, two 1D points model_1d.calc([[0.0], [1.0]]) # a sample, two 1D points model_1d.calc([[0.0, 1.0]]) # incorrect: a sample with a single 2D point (model input is 1D)
If model input is multidimensional, a 1D array-like is interpreted as a single point, and 2D array-likes are interpreted as data samples. For example, if model input is 2D:
model_2d.calc(0.0) # incorrect: point is 1D model_2d.calc([0.0]) # incorrect: point is 1D model_2d.calc([[0.0]]) # incorrect: sample contains one 1D point model_2d.calc([0.0, 0.0]) # a 2D point model_2d.calc([[0.0, 0.0]]) # a sample, one 2D point model_2d.calc([[0.0, 0.0], [1.0, 1.0]]) # a sample, two 2D points
-
calc_ae
(point)¶ Calculate the accuracy evaluation estimate.
Parameters: point ( float
or array-like, 2D or 1D) – the sample or point to evaluateReturns: estimates Return type: pandas.DataFrame
orpandas.Series
if point is a pandas type; otherwisendarray
, 2D or 1DRaise: FeatureNotAvailableError
if the model does not provide accuracy evaluationChanged in version 1.9.0: smoothness parameter is no longer supported (see Version Compatibility Issues).
Changed in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.Check
has_ae
before using this method. It is available only if the model was trained with GTApprox/AccuracyEvaluation on.Performs accuracy evaluation for a data sample or a single point. In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported, similar to
calc()
.The returned array is 2D if point is a sample, and 1D if point is a single point. When point is a
pandas.DataFrame
orpandas.Series
, the returned array keeps indexing of the point array.
-
comment
¶ Text comment to the model.
Type: str
New in version 6.6.
Optional plain text comment to the model. You can add the comment when training a model and edit it using
modify()
.See also Model Metainformation.
-
details
¶ Detailed model information.
Type: dict
New in version 5.2.
A detailed description of the model. Includes model metainformation, accuracy data, training sample statistics, regression coefficients for RSM models, and other data.
See sections Model Details and Model Metainformation.
-
export_to
(format, function, description, file, single_file=None)¶ Export the model to a source file in specified format.
Parameters: - format (
ExportedFormat
orstr
) – source code format - function (
str
) – exported function name - description (
str
) – additional comment - file (file-like,
str
,zipfile.ZipFile
,tarfile.TarFile
) – export file or path - single_file (
bool
) – export sources as a single file (default) or multiple files (False
)
Returns: None
Raise: GTException
if function is empty and format is notC99_PROGRAM
New in version 6.10: added
str
aliases for export formats.Changed in version 6.24: added the support for exporting sources to archives; added the single_file parameter.
Generates the model source code in the specified format and saves it to a file or a set of source files. Supports packing the source code to various archive file formats.
The source code format can be specified using an enumeration or a string alias — see
ExportedFormat
for details.By default, all source code is exported to a single file. This mode is not recommended for large models, since large source files can cause problems during compilation. To split the source code into multiple files, set single_file to
False
. In this case, the filename from file serves as a basename, and additional source files have names with an added suffix. In the multi-file mode, all exported files are required to compile the model.To pack source files into an archive, you can pass a
zipfile.ZipFile
ortarfile.TarFile
object as file, or specify a path to the file wit an archive type extension. Recognized extensions are:.zip
,.tar
,.tgz
,.tar.gz
,.taz
,.tbz
,.tbz2
,.tar.bz2
.The function argument is optional if format is
C99_PROGRAM
For other source code formats, an empty function name raises an exception.For the C# source format (
CSHARP_SOURCE
), the function argument sets the name of the model class and its namespace. There are two ways to use it:If you specify a name without dots
.
, it becomes the namespace, and the class name remains default (Model
). For example, if function is “myGTAmodel”:namespace myGTAmodel { public sealed class Model { // attributes and methods } }
If you specify a name with dots
.
, it is split by dots and the last part becomes the class name, while the remaining parts become a namespace hierarchy. For example, if function is “ns1.ns2.MyExportedModel”:namespace ns1 { namespace ns2 { public sealed class MyExportedModel { // attributes and methods } } }
The description provides an additional comment, which is added on top of the generated source file.
See also the Model Export example.
- format (
-
fromstring
(modelString, sections='all')¶ Deserialize a model from string.
Parameters: - modelString (
str
) – serialized model - sections (
list
orstr
) – model sections to load
Returns: None
Changed in version 6.6: added the sections argument.
A model can be loaded (deserialized) partially, omitting certain sections to reduce memory usage. Note that availability of
Model
methods and attributes depends on which sections are loaded. This dependency is described in more detail in section Approximation Model Structure.The sections argument can be a string or a list of strings specifying which sections to load:
"all"
: all sections (default)."none"
: minimum model information, does not load any other section (the minimum load)."model"
: main model section, required for model evaluation and smoothing methods."info"
: model information, required forinfo
."comment"
: comment section, required forcomment
."annotations"
: annotations section, required forannotations
."training_sample"
: a copy of training sample data, required fortraining_sample
."iv_info"
: internal validation data, required foriv_info
."build_log"
: model training log, required forbuild_log
.
To get a list of sections available for load, use
available_sections()
.- modelString (
-
grad
(point, order=0)¶ Evaluate model gradient.
Parameters: - point (
float
or array-like, 2D or 1D) – the sample or point to evaluate - order (
GradMatrixOrder
) – gradient matrix order
Returns: model gradients
Return type: pandas.DataFrame
if point is a pandas type; otherwisendarray
, 3D or 2DChanged in version 1.9.0: smoothness parameter is no longer supported (see Version Compatibility Issues).
Changed in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.Evaluates model gradients for a data sample or a single point. In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported, similar to
calc()
.The returned array is 3D if point is a sample, and 2D if point is a single point.
When using pandas data samples (point is a
pandas.DataFrame
), a 3D array in return value is represented by apandas.DataFrame
with multi-indexing (pandas.MultiIndex
). In this case, the first element of the multi-index is the point index from the input sample. The second element of the multi-index is:- the index or name of a model’s output,
if order is
F_MAJOR
(default) - the index or name of a model’s input,
if order is
X_MAJOR
When point is a
pandas.Series
, its index becomes the row index of the returnedpandas.DataFrame
.- point (
-
grad_ae
(point, order=0)¶ Calculate gradients of the accuracy evaluation function.
Parameters: - point (
float
or array-like, 2D or 1D) – the sample or point to evaluate - order (
GradMatrixOrder
) – gradient matrix order
Returns: accuracy evaluation gradients
Return type: pandas.DataFrame
if point is a pandas type; otherwisendarray
, 3D or 2DRaise: FeatureNotAvailableError
if the model does not provide accuracy evaluationChanged in version 1.9.0: the smoothness argument is no longer supported (see Version Compatibility Issues).
Changed in version 3.0 Release Candidate 1: returns
ndarray
.Changed in version 6.25: supports
pandas.DataFrame
andpandas.Series
as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.Check
has_ae
before using this method. It is available only if the model was trained with GTApprox/AccuracyEvaluation on.Evaluates gradients of the accuracy evaluation function for a data sample or a single point. In general form, point is a 2D array (a data sample). Several simplified argument forms are also supported, similar to
calc()
.The returned array is 3D if point is a sample, and 2D if point is a single point.
When using pandas data samples (point is a
pandas.DataFrame
), a 3D array in return value is represented by apandas.DataFrame
with multi-indexing (pandas.MultiIndex
). In this case, the first element of the multi-index is the point index from the input sample. The second element of the multi-index is:- the index or name of a model’s output,
if order is
F_MAJOR
(default) - the index or name of a model’s input,
if order is
X_MAJOR
When point is a
pandas.Series
, its index becomes the row index of the returnedpandas.DataFrame
.- point (
-
has_ae
¶ Accuracy evaluation support.
Type: bool
Check this attribute before using
calc_ae()
orgrad_ae()
. IfTrue
, the model supports accuracy evaluation. IfFalse
, then accuracy evaluation is not available, and the methods above raise an exception.
-
has_ironing
¶ Deprecated since version 1.9.0: in older versions this attribute was used to check if the model has already been smoothed using the
ironing()
method. It was replaced withis_smoothed
following the replacement ofironing()
with the advanced smoothing methods (seesmooth()
,smooth_anisotropic()
, andsmooth_errbased()
).
-
has_smoothing
¶ Smoothing support.
Type: bool
New in version 1.9.0.
Check this attribute before using
smooth()
,smooth_anisotropic()
, orsmooth_errbased()
. IfTrue
, the model supports smoothing. IfFalse
, then smoothing is not available, and smoothing methods raise an exception.
-
has_smoothness
¶ Deprecated since version 1.9.0: in older versions this attribute was used to check if the model supports dynamic smoothing (see section Version Compatibility Issues for details). It was replaced with
has_smoothing
following the replacement of theironing()
method with the advanced smoothing methods (seesmooth()
,smooth_anisotropic()
, andsmooth_errbased()
).
-
info
¶ Model description.
Type: dict
Contains all technical information which can be gathered from the model.
-
ironing
(smoothness)¶ Deprecated since version 1.9.0: this method had been replaced by the advanced smoothing methods
smooth()
,smooth_anisotropic()
, andsmooth_errbased()
. See section Version Compatibility Issues for details.
-
is_smoothed
¶ Smoothed model.
Type: bool
New in version 1.9.0.
Check this attribute to see if the model is already smoothed. It is
True
for models returned bysmooth()
,smooth_errbased()
, andsmooth_anisotropic()
methods, andFalse
for other models.
-
iv_info
¶ Internal validation results.
Type: dict
New in version 2.0 Release Candidate 1.
Changed in version 2.0 Release Candidate 2: also stores raw validation data.
A dictionary containing error values calculated during internal validation. Has the same structure as the
details["Training Dataset"]["Accuracy"]
dictionary indetails
— see section Accuracy in Model Details for a full description.Additionally, if the model was trained with GTApprox/IVSavePredictions on,
iv_info
also contains raw validation data: model values calculated during internal validation, reference inputs, and reference outputs. This data is stored under the"Dataset"
key.If internal validation was not required when training the model (see GTApprox/InternalValidation),
iv_info
is an empty dictionary.
-
license
¶ Model license.
Type: License
General license information interface. See section License Usage for details.
-
load
(file, sections='all')¶ Load a model from file.
Parameters: - file (
file
orstr
) – file object or path - sections (
list
orstr
) – model sections to load
Returns: None
Changed in version 6.6: added the sections argument.
Deprecated since version 6.29: use
Model
constructor instead.A model can be loaded partially, omitting certain sections to reduce memory usage and load time. Note that availability of
Model
methods and attributes depends on which sections are loaded. This dependency is described in more detail in section Approximation Model Structure.The sections argument can be a string or a list of strings specifying which sections to load:
"all"
: all sections (default)."none"
: minimum model information, does not load any other section (the minimum load)."model"
: main model section, required for model evaluation and smoothing methods."info"
: model information, required forinfo
."comment"
: comment section, required forcomment
."annotations"
: annotations section, required forannotations
."training_sample"
: a copy of training sample data, required fortraining_sample
."iv_info"
: internal validation data, required foriv_info
."build_log"
: model training log, required forbuild_log
.
To get a list of sections available for load, use
available_sections()
.- file (
-
modify
()¶ Create a copy of the model with modified features or metainformation.
Parameters: - comment (
str
) – new comment - annotations (
dict
) – new annotations - x_meta (
list
) – descriptions of inputs - y_meta (
list
) – descriptions of outputs - strip (
list
orstr
) – optional list of features to strip from the model
Returns: copy of this model with modifications
Return type: New in version 6.6.
Changed in version 6.14: can edit descriptions of inputs and outputs.
Changed in version 6.14.3: can remove the accuracy evaluation and smoothing features.
Changed in version 6.17: can disable the model output thresholds.
This method is intended to edit model
annotations
,comment
, metainformation, and can be used to reduce model size by removing certain features. If a parameter isNone
, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.The x_meta and y_meta parameters that edit metainformation are similar to
build()
and are described in section Model Metainformation — however note that specifying any new input constraints or output thresholds in x_meta or y_meta does not change the effective (current) model constraints: changes in x_meta and y_meta only apply to model information stored indetails
. For example, if you set a new, more restrictive input constraint in x_meta inmodify()
, the model will still evaluate outputs for any input that is within the range previously set by x_meta inbuild()
. Generally, it is not recommended to edit the model constraints information withmodify()
to avoid confusion.The strip argument can be used to remove accuracy evaluation (AE) and smoothing features from the model. It can be a string or a list of strings specifying which features to remove:
"ae"
— remove accuracy evaluation."smoothing"
— remove smoothing."output_bounds"
— disable the output bounds (thresholds), which were previously set with the y_meta parameter when training the model or usingmodify()
.
Removing AE may be useful for models trained with the GP, HDAGP, SGP, or TGP techniques (other techniques do not support AE). It reduces the size of the of the main model section (see Approximation Model Structure), thus decreasing the model size in memory. Also it significantly reduces volume of the C code generated by
export_to()
. Thehas_ae
property of the modified model will beFalse
.Removing the smoothing feature reduces the size of the of the main model section only. It decreases the model size, but not the volume of exported code. The size reduction is most noticeable for models trained with the RSM and HDA techniques (up to 10 times for HDA). If the model was smoothed before
modify()
, the modified model remains smoothed. However, smoothing methods will no longer be available from the modified model (has_smoothing
will beFalse
).Note that
modify()
returns a new modified model, which is identical to the original except your modifications.See also Model Metainformation.
- comment (
-
save
(file, sections='all')¶ Save the model to file.
Parameters: - file (
file
orstr
) – file object or path - sections (
list
orstr
) – model sections to save
Returns: None
Changed in version 6.6: sections argument added.
When saving, certain sections of the model can be skipped to reduce the model file size (see Approximation Model Structure for details). The sections argument can be a string or a list of strings specifying which sections to save:
"all"
: all sections (default)."model"
: main model section, required for model evaluation. This section is always saved even if not specified. For some models, the size of this section can be additionally reduced by removing the accuracy evaluation or smoothing information withmodify()
."info"
: model information,info
."comment"
: comment section,comment
."annotations"
: annotations section,annotations
."training_sample"
: a copy of training sample data,training_sample
."iv_info"
: internal validation data,iv_info
."build_log"
: model training log,build_log
.
Note that the main model section is always saved, so
sections="model"
andsections=[]
are equivalent.- file (
-
save_to_octave
(function, file)¶ Deprecated since version 1.8.0: use
export_to()
instead.Since version 3.0 Release Candidate 1, this method no longer exists and is completely replaced by
export_to()
.
-
shap_value
(point, data=None, interactions=False, approximate=False, shap_compatible=True)¶ Compute SHAP (SHapley Additive exPlanations) values.
Parameters: - point (
float
or array-like, 1D or 2D) – a point or sample to evaluate - data (
float
or array-like, 1D or 2D) – optional background data sample - interactions (
bool
) – ifTrue
, evaluate pairwise interactions (supported by GBRT models only) - approximate (
bool
) – ifTrue
, compute approximate SHAP values (fast but less accurate) - shap_compatible (
bool
) – ifTrue
, returnshap.Explanation
(requiresshap
)
Returns: explanations
Return type: shap.Explanation
ortuple
(elements depend on the point type)New in version 6.20.
Evaluates SHAP, using an optimized internal implementation when possible. The following models support the internal method and do not require the
shap
module, if you set shap_compatible toFalse
:- All models trained with the GBRT technique.
- All differentiable models — that is, all models without categorical variables.
Other models use
shap.PermutationExplainer
and requireshap
.The point syntax is the same as in
calc()
: general form is a 2D array, and several simplified forms are supported. When shap_compatible isFalse
, the return value is a pair (tuple) where elements depend on the point type:- If point is a single point, the return pair is
a scalar base value and
an
ndarray
— 1D or 2D, depending on interactions. - If point is a sample, the return pair is
a list of base values for each output and
an
ndarray
— 2D or 3D, also depending on interactions. In this case, a base value for an output is the average of this output over the training dataset.
Array structure in results is:
- If interactions is
False
(default), resulting SHAP values form an \(n \times m\) matrix, where \(n\) in the number of points in point, and \(m\) is the model’s input dimension. Each matrix row contains contributions of model inputs to push the model output from the base value. - If interactions is
True
, contributions for each input point form an \(m \times m\) matrix, where main effects are on the diagonal and interaction effects are off-diagonal. Resulting SHAP values form an \(n \times m \times m\) array. Note that only GBRT models support pairwise interactions.
For more convenience, if you have
shap
installed, set shap_compatible toTrue
to return ashap.Explanation
object.GBRT models estimate SHAP values by a fast and exact method for tree models and ensembles of trees. Differentiable models (without categorical variables) approximate SHAP values using expected gradients (Sundararajan et al. 2017) — an extension of integrated gradients, a feature attribution method designed for differentiable models based on an extension of Shapley values to infinite player games (Aumann-Shapley values).
- point (
-
size_f
¶ Model output dimension.
Type: long
-
size_x
¶ Model input dimension.
Type: long
-
smooth
(f_smoothness)¶ Apply smoothing to model.
Parameters: f_smoothness ( float
or array-like, 1D) – output smoothing factorsReturns: smoothed model Return type: Model
Raise: GTException
if the model does not support smoothingNew in version 1.9.0.
Check
has_smoothing
before using this method.This method creates and returns a new smoothed model. The amount of smoothing is specified by the
f_smoothness
argument. Details on model smoothing can be found in section Model Smoothing.
-
smooth_anisotropic
(f_smoothness, x_weights)¶ Apply anisotropic smoothing to model.
Parameters: - f_smoothness (
float
or array-like, 1D) – output smoothing factors - x_weights (array-like, 1D or 2D) – the amount of smoothing by different input components
Returns: smoothed model
Return type: Raise: GTException
if the model does not support smoothingNew in version 1.9.0.
Check
has_smoothing
before using this method.This method extends the simple smoothing functionality (see
smooth()
) by allowing anisotropic smoothing:x_weights
specify relative smoothing by different components of the input.Details on anisotropic smoothing can be found in section Anisotropic Smoothing.
- f_smoothness (
-
smooth_errbased
(x_sample, f_sample, error_type, error_thresholds, x_weights=None)¶ Apply error based smoothing to model, controlling model errors over a reference inputs-responses array.
Parameters: - x_sample (
float
or array-like, 1D or 2D) – reference inputs - f_sample (
float
or array-like, 1D or 2D) – reference responses - error_type (
str
orlist[str]
) – error types to calculate - error_thresholds (
float
or array-like, 1D) – error thresholds - x_weights (array-like, 1D or 2D) – the amount of smoothing for different input components
Returns: smoothed model
Return type: Raise: GTException
if the model does not support smoothingNew in version 1.9.0.
Check
has_smoothing
before using this method.This method creates and returns a model which has maximum smoothness while preserving approximation errors of the model below specified threshold.
Details on error-based smoothing can be found in section Error-Based Smoothing.
- x_sample (
-
tostring
(sections='all')¶ Serialize the model.
Parameters: sections ( list
orstr
) – model sections to saveReturns: serialized model Return type: str
Changed in version 6.6: sections argument added.
When serializing, certain sections of the model can be skipped to reduce the model size (see Approximation Model Structure for details). The sections argument can be a string or a list of strings specifying which sections to include:
"all"
: all sections (default)."model"
: main model section, required for model evaluation. This section is always saved even if not specified. For some models, the size of this section can be additionally reduced by removing the accuracy evaluation or smoothing information withmodify()
."info"
: model information,info
."comment"
: comment section,comment
."annotations"
: annotations section,annotations
."training_sample"
: a copy of training sample data,training_sample
."iv_info"
: internal validation data,iv_info
."build_log"
: model training log,build_log
.
Note that the main model section is always included, so
sections="model"
andsections=[]
are equivalent.
-
training_sample
¶ Model training sample optionally stored with the model.
Type: list
New in version 6.6.
If GTApprox/StoreTrainingSample was enabled when training the model, this attribute contains a copy of training data. Otherwise it will be an empty list.
Training data is a single
dict
element contained in the list. This dictionary has the following keys:"x"
— the input part of the training sample (values of variables)."f"
— the response part of the training sample (function values)."tol"
— response noise variance. This key is present only if output noise variance was specified when training."weights"
— sample point weights. This key is present only if point weights were specified when training."x_test"
— the input part of the test sample (added in 6.8). This key is present only if a test sample was used when training."f_test"
— the response part of the test sample (added in 6.8). This key is present only if a test sample was used when training.
Note that in case of GBRT incremental training (see Incremental Training) only the last (most recent) training sample can be saved.
Note
Training sample data is stored in lightweight NumPy arrays that have limited lifetime, which cannot exceed the lifetime of the model object. It means that you should avoid assigning these arrays to new variables. Either use them directly, or if you want to read this data without keeping the model object, create copies of arrays:
train_x = my_model.training_sample["x"].copy
.
-
validate
(pointsX, pointsY, weights=None)¶ Validate the model using a reference inputs-responses array.
Parameters: - pointsX (
float
or array-like, 1D or 2D) – reference inputs - pointsY (
float
or array-like, 1D or 2D) – reference responses - weights (array-like, 1D) – optional weights of the reference points
Returns: accuracy data
Return type: dict
Validates the model against the reference array, evaluating model responses to pointsX and comparing them to pointsY.
Generally, pointsX and pointsY should be 2D arrays. Several simplified argument forms are also supported, similar to
calc()
.Returns a dictionary containing lists of error values calculated componentwise, with names of errors as keys. The returned dictionary has the same structure as the
details["Training Dataset"]["Accuracy"]["Componentwise"]
dictionary indetails
— see section Accuracy in Model Details for a full description.- pointsX (
-
shap_value
(point, data=None, interactions=False, approximate=False, shap_compatible=True) Compute SHAP (SHapley Additive exPlanations) values.
Parameters: - point (
float
or array-like, 1D or 2D) – a point or sample to evaluate - data (
float
or array-like, 1D or 2D) – optional background data sample - interactions (
bool
) – ifTrue
, evaluate pairwise interactions (supported by GBRT models only) - approximate (
bool
) – ifTrue
, compute approximate SHAP values (fast but less accurate) - shap_compatible (
bool
) – ifTrue
, returnshap.Explanation
(requiresshap
)
Returns: explanations
Return type: shap.Explanation
ortuple
(elements depend on the point type)New in version 6.20.
Evaluates SHAP, using an optimized internal implementation when possible. The following models support the internal method and do not require the
shap
module, if you set shap_compatible toFalse
:- All models trained with the GBRT technique.
- All differentiable models — that is, all models without categorical variables.
Other models use
shap.PermutationExplainer
and requireshap
.The point syntax is the same as in
calc()
: general form is a 2D array, and several simplified forms are supported. When shap_compatible isFalse
, the return value is a pair (tuple) where elements depend on the point type:- If point is a single point, the return pair is
a scalar base value and
an
ndarray
— 1D or 2D, depending on interactions. - If point is a sample, the return pair is
a list of base values for each output and
an
ndarray
— 2D or 3D, also depending on interactions. In this case, a base value for an output is the average of this output over the training dataset.
Array structure in results is:
- If interactions is
False
(default), resulting SHAP values form an \(n \times m\) matrix, where \(n\) in the number of points in point, and \(m\) is the model’s input dimension. Each matrix row contains contributions of model inputs to push the model output from the base value. - If interactions is
True
, contributions for each input point form an \(m \times m\) matrix, where main effects are on the diagonal and interaction effects are off-diagonal. Resulting SHAP values form an \(n \times m \times m\) array. Note that only GBRT models support pairwise interactions.
For more convenience, if you have
shap
installed, set shap_compatible toTrue
to return ashap.Explanation
object.GBRT models estimate SHAP values by a fast and exact method for tree models and ensembles of trees. Differentiable models (without categorical variables) approximate SHAP values using expected gradients (Sundararajan et al. 2017) — an extension of integrated gradients, a feature attribution method designed for differentiable models based on an extension of Shapley values to infinite player games (Aumann-Shapley values).
- point (
-
11.4.5. Utilities
— auxiliary functions¶
-
class
da.p7core.gtapprox.
Utilities
¶ Utility functions.
-
static
checkTensorStructure
(trainPoints, userDefinedFactors=())¶ Check if the source data has proper structure so the Tensor Approximation technique may be used.
Parameters: - trainPoints (array-like) – training sample (variables only)
- userDefinedFactors (array-like) – optional user-defined tensor factors, as in GTApprox/TensorFactors
Returns: check result and (if no user-defined factors are given) calculated tensor factors, as a tuple
Return type: tuple(bool, list[list])
The Tensor Approximation technique requires specific design of an experiment type to be used (the so-called gridded data). This function may be used to check if sample data structure allows TA usage. User shall supply the training sample and, optionally, a list of proposed tensor factors. Return value is a tuple of Boolean check result (
True
means sample is TA-compatible) and a list of tensor factors which are either user-defined or calculated automatically if userDefinedFactors is an empty list.
-
static
11.4.6. Functions¶
-
da.p7core.gtapprox.
export_fmi_20
(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)¶ Export the model to a Functional Mock-up Unit for Model Exchange and Co-Simulation 2.0.
Parameters: - model (
Model
) – exported model - file (
file
orstr
) – file object or path where to export - id (
str
) – a string used in model and function names - der_outputs (
bool
) – ifTrue
, include partial derivatives of model outputs in the list of FMI model outputs - meta (
dict
) – model information - inputs_meta (
list
) – input variable information - outputs_meta (
list
) – output variable information - compilers (
dict
) – compiler settings to export an FMU with binary - single_file (
bool
) – pass sources to compilers as a single file (default) or multiple files (False
)
Returns: description of model variables
Return type: list
New in version 6.31.
According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be
.fmu
by standard.For the general model description, use meta. This argument is a dictionary that may contain the following keys:
"name"
: a string with the name of the model that will be shown in the modeling environment."description"
: a string with a brief model description; if omitted, the model’scomment
is used."naming_convention"
: name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:"flat"
: a list of strings (default)."structured"
: hierarchical names using dot separator, with array elements and derivative characterization.
"author"
: an string containing author’s name and organization."version"
: model version string."copyright"
: optional information on the intellectual property copyright for this FMU."license"
: optional information on the intellectual property licensing for this FMU.
For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length
size_x
(orsize_f
respectively). List element is a dictionary with the following keys (all keys are optional):"name"
: name of the variable (string), optional. Default is"x[i]"
for inputs,"f[i]"
for outputs, wherei
is the index of this input or output in the training sample."description"
: a string containing brief variable description."quantity"
: physical quantity of the variable, for example"Angle"
or"Energy"
."unit"
: measurement units used for this variable in model equations, for example"deg"
or"J"
."min"
: the minimum value of the variable (float
) or"training"
to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine."max"
: the maximum value of the variable (float
) or"training"
to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.
If some or all details for a variable are not specified, GTApprox also tries to get them from model’s
details
. If some details are specified both indetails
and as parameters toexport_fmi_20()
, information from parameters takes priority.By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.
- A key in compilers is a string identifying the target platform.
Recognized platform names are:
"win32"
,"win64"
,"linux32"
,"linux64"
. You can add compilers for different platforms to export an FMU with cross-platform support. - A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.
Each callable in compilers should support three input parameters:
- source_code - the source code to compile.
- If single_file is
True
or not specified, source_code is a string. - If single_file is
False
, source_code is a list of string pairs(file_name, source_code)
. Thefile_name
andsource_code
strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
- If single_file is
- model_id is the model identifier and the name of the shared library (a
.dll
or.so
file). - platform is the platform identifier, one of the following strings:
"win32"
,"win64"
,"linux32"
,"linux64"
.
Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by
export_fmi_20()
.On successful export,
export_fmi_20()
returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:"name"
: the name of the variable."causality"
:"input"
or"output"
; indicates how the variable is visible from the outside of the model."variability"
:"constant"
or"parameter"
; indicates when the value of the variable changes."type"
:"real"
or"enum"
; indicates type of the variable."value"
:"real"
or"constant"
; omitted for other types of variables."enumerators"
: list of enumerators if variable type is"enum"
; omitted for other types of variables."origin"
: a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:(j, -1)
is the j-th component of the original model input.(-1, i)
is the i-th component of the original model output.(j, i)
is the partial derivative of the i-th model output with respect to j-th input.
- model (
-
da.p7core.gtapprox.
export_fmi_cs
(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)¶ Export the model to a Functional Mock-up Unit for Co-Simulation 1.0.
Parameters: - model (
Model
) – exported model - file (
file
orstr
) – file object or path where to export - id (
str
) – a string used in model and function names - der_outputs (
bool
) – ifTrue
, include partial derivatives of model outputs in the list of FMI model outputs - meta (
dict
) – model information - inputs_meta (
list
) – input variable information - outputs_meta (
list
) – output variable information - compilers (
dict
) – compiler settings to export an FMU with binary - single_file (
bool
) – pass sources to compilers as a single file (default) or multiple files (False
)
Returns: description of model variables
Return type: list
New in version 6.9.
Changed in version 6.24: added the single_file parameter.
According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be
.fmu
by standard.For the general model description, use meta. This argument is a dictionary that may contain the following keys:
"name"
: a string with the name of the model that will be shown in the modeling environment."description"
: a string with a brief model description; if omitted, the model’scomment
is used."naming_convention"
: name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:"flat"
: a list of strings (default)."structured"
: hierarchical names using dot separator, with array elements and derivative characterization.
"author"
: an string containing author’s name and organization."version"
: model version string.
For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length
size_x
(orsize_f
respectively). List element is a dictionary with the following keys (all keys are optional):"name"
: name of the variable (string), optional. Default is"x[i]"
for inputs,"f[i]"
for outputs, wherei
is the index of this input or output in the training sample."description"
: a string containing brief variable description."quantity"
: physical quantity of the variable, for example"Angle"
or"Energy"
."unit"
: measurement units used for this variable in model equations, for example"deg"
or"J"
."min"
: the minimum value of the variable (float
) or"training"
to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine."max"
: the maximum value of the variable (float
) or"training"
to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.
If some or all details for a variable are not specified, GTApprox also tries to get them from model’s
details
. If some details are specified both indetails
and as parameters toexport_fmi_cs()
, information from parameters takes priority.By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.
- A key in compilers is a string identifying the target platform.
Recognized platform names are:
"win32"
,"win64"
,"linux32"
,"linux64"
. You can add compilers for different platforms to export an FMU with cross-platform support. - A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.
Each callable in compilers should support three input parameters:
- source_code - the source code to compile.
- If single_file is
True
or not specified, source_code is a string. - If single_file is
False
, source_code is a list of string pairs(file_name, source_code)
. Thefile_name
andsource_code
strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
- If single_file is
- model_id is the model identifier and the name of the shared library (a
.dll
or.so
file). - platform is the platform identifier, one of the following strings:
"win32"
,"win64"
,"linux32"
,"linux64"
.
Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by
export_fmi_cs()
.On successful export,
export_fmi_cs()
returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:"name"
: the name of the variable."causality"
:"input"
or"output"
; indicates how the variable is visible from the outside of the model."variability"
:"constant"
or"parameter"
; indicates when the value of the variable changes."type"
:"real"
or"enum"
; indicates type of the variable."value"
:"real"
or"constant"
; omitted for other types of variables."enumerators"
: list of enumerators if variable type is"enum"
; omitted for other types of variables."origin"
: a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:(j, -1)
is the j-th component of the original model input.(-1, i)
is the i-th component of the original model output.(j, i)
is the partial derivative of the i-th model output with respect to j-th input.
- model (
-
da.p7core.gtapprox.
export_fmi_me
(model, file, id=None, der_outputs=False, meta=None, inputs_meta=None, outputs_meta=None, compilers=None, single_file=None)¶ Export the model to a Functional Mock-up Unit for Model Exchange 1.0.
Parameters: - model (
Model
) – exported model - file (
file
orstr
) – file object or path where to export - id (
str
) – a string used in model and function names - der_outputs (
bool
) – ifTrue
, include partial derivatives of model outputs in the list of FMI model outputs - meta (
dict
) – model information - inputs_meta (
list
) – input variable information - outputs_meta (
list
) – output variable information - compilers (
dict
) – compiler settings to export an FMU with binary - single_file (
bool
) – pass sources to compilers as a single file (default) or multiple files (False
)
Returns: description of model variables
Return type: list
New in version 6.14.3.
Changed in version 6.24: added the single_file parameter.
According to the FMI standard, in an FMU with source code the same string (id) is used as the model name and as a prefix in function names in the model code. If file is a path, omit id as it will be generated from the filename in the path. In this case, the filename must be a valid C identifier. If file is a file object, id is required and must be a valid C identifier. If you specify both file and id, the filename and id must be identical. The file extension should be
.fmu
by standard.For the general model description, use meta. This argument is a dictionary that may contain the following keys:
"name"
: a string with the name of the model that will be shown in the modeling environment."description"
: a string with a brief model description; if omitted, the model’scomment
is used."naming_convention"
: name convention used for names of variables. For details, see the FMI documentation. Currently included in the standard are:"flat"
: a list of strings (default)."structured"
: hierarchical names using dot separator, with array elements and derivative characterization.
"author"
: an string containing author’s name and organization."version"
: model version string.
For variable description, use inputs_meta and outputs_meta. If specified, variable description must be a list with length
size_x
(orsize_f
respectively). List element is a dictionary with the following keys (all keys are optional):"name"
: name of the variable (string), optional. Default is"x[i]"
for inputs,"f[i]"
for outputs, wherei
is the index of this input or output in the training sample."description"
: a string containing brief variable description."quantity"
: physical quantity of the variable, for example"Angle"
or"Energy"
."unit"
: measurement units used for this variable in model equations, for example"deg"
or"J"
."min"
: the minimum value of the variable (float
) or"training"
to use the minimum value from the model’s training sample. If omitted, the minimum is the largest negative number that can be represented on the machine."max"
: the maximum value of the variable (float
) or"training"
to use the training sample value. If omitted, the maximum is the largest positive number that can be represented on the machine.
If some or all details for a variable are not specified, GTApprox also tries to get them from model’s
details
. If some details are specified both indetails
and as parameters toexport_fmi_me()
, information from parameters takes priority.By default, an FMU is exported with source code only. To export an FMU with binaries for one or more of the standard FMI platforms, specify compilers.
- A key in compilers is a string identifying the target platform.
Recognized platform names are:
"win32"
,"win64"
,"linux32"
,"linux64"
. You can add compilers for different platforms to export an FMU with cross-platform support. - A value in compilers is a Python callable object that implements an FMU compiler for the platform specified by key.
Each callable in compilers should support three input parameters:
- source_code - the source code to compile.
- If single_file is
True
or not specified, source_code is a string. - If single_file is
False
, source_code is a list of string pairs(file_name, source_code)
. Thefile_name
andsource_code
strings are the name and source code of a C translation unit. Together, these translation units form the FMU source code.
- If single_file is
- model_id is the model identifier and the name of the shared library (a
.dll
or.so
file). - platform is the platform identifier, one of the following strings:
"win32"
,"win64"
,"linux32"
,"linux64"
.
Each callable in compilers must return a string containing the binary code of the compiled shared library. Note that any exception raised by the callable is re-raised by
export_fmi_me()
.On successful export,
export_fmi_me()
returns a description of the exported model variables in terms of FMI standard. The description is a list of dictionaries with the following keys:"name"
: the name of the variable."causality"
:"input"
or"output"
; indicates how the variable is visible from the outside of the model."variability"
:"constant"
or"parameter"
; indicates when the value of the variable changes."type"
:"real"
or"enum"
; indicates type of the variable."value"
:"real"
or"constant"
; omitted for other types of variables."enumerators"
: list of enumerators if variable type is"enum"
; omitted for other types of variables."origin"
: a tuple of two integers that are 0-based indices of the original model input and output components related to this FMU parameter:(j, -1)
is the j-th component of the original model input.(-1, i)
is the i-th component of the original model output.(j, i)
is the partial derivative of the i-th model output with respect to j-th input.
- model (
-
da.p7core.gtapprox.
set_remote_build
(builder, options={}, config_file=None)¶ Configure a GTApprox builder to run on a remote host over SSH or to use a HPC cluster.
Parameters: - builder (
Builder
) – model builder - options (
dict
) – configuration options - config_file (
str
) – optional path to a configuration file
New in version 4.3: initial support for remote model training and distributed training of MoA models on a cluster.
New in version 5.3: distributed training now supported for all componentwise models.
Changed in version 6.3: GTApprox now enables componentwise training by default, hence distributed training also becomes default when using a cluster.
New in version 6.6: for models with categorical variables, distributed training now supports parallelization over all unique combinations of their values found in the training sample.
Deprecated since version 6.35: this method is no longer updated and may be behind
build()
andbuild_smart()
with regard to certain features or training techniques; using it is not recommended as it may get removed in future versions.Allows to configure a model builder to run remotely or to perform distributed model training on a cluster. Distributed training on a cluster means that a model is divided into several sub-models which become separate cluster jobs, allowing high degree of parallelization.
Note
The same version of pSeven Core has to be installed on the local and remote hosts or, in case of distributed training, on the local host and all cluster nodes.
Note
Remote training requires the
paramiko
module and its dependencies (pycrypto
andecdsa
). These modules are not required for pSeven Core in general and hence are not listed in section System Requirements.Distributed training is effective in the following cases:
- When using the Mixture of Approximators (MoA) technique (set GTApprox/Technique to
"MoA"
). This technique automatically partitions the training sample and trains several sub-models which are then combined in the final model. Naturally it can support distributed training for its sub-models. - When a model has multidimensional output and componentwise training is enabled. The componentwise mode is default since 6.3 (see GTApprox/DependentOutputs). Componentwise models can be trained in parallel since each model component is trained independently.
- When you define one or more categorical variables (see GTApprox/CategoricalVariables) and the training sample contains two or more unique combinations of their values. In this case, an independent model can be trained for each of such combinations.
Note that a combination of the above cases is also supported — that is, GTApprox tries to achieve as high parallelization ratio as possible. For example, if you train a componentwise model with categorical variables, the ratio can be higher than the number of model outputs.
If none of the above cases apply, cluster training is still available but will simply submit a single job to the cluster.
The options argument is a dictionary with the following recognized keys (all keys are
str
, value types are noted below):"ssh-hostname"
(str
) — remote SSH host name."ssh-username"
(str
) — SSH username."ssh-password"
(str
) — SSH password (warning: unsafe)."ssh-keyfile"
(str
) — path to an SSH private key file."environment"
(dict
) — dictionary of environment variables."workdir"
(str
) — path to the working directory (local or remote, depending on SSH configuration)."cluster"
(str
) — cluster type. Currently the only supported type is LSF ("lsf"
). If cluster type isNone
the model is trained on a remote host without using a HPC cluster."cluster-queue"
(str
) — name of the destination cluster queue."cluster-job-name"
(str
) — cluster job name."cluster-exclusive"
(bool
) — ifTrue
, cluster nodes are used exclusively by jobs (the destination queue must support exclusive jobs). Note that if exclusive jobs are disabled (False
), it is recommended to set GTApprox/MaxParallel to 1 or 2 (in builder options) to avoid performance degradation in case of two or more jobs being allocated to the same node by a cluster manager. See section Multi-core Scalability for details."cluster-slot-limit"
(int
) — maximum number of jobs that can run simultaneously.
To train a model remotely over SSH, you have to specify
"ssh-hostname"
and either:"ssh-username"
and"ssh-password"
, or"ssh-keyfile"
("ssh-username"
may also be required when using a key file).
Using a key file is recommended since storing SSH password in your script is unsafe. If you have no key file, you can use the standard
getpass
module as a workaround. For example:builder = gtapprox.Builder() # will prompt for password, getpass() requires interactive input gtapprox.set_remote_build(builder, {"ssh-hostname": "theserver", "ssh-username": "user", "ssh-password": getpass.getpass()})
To use a cluster, you have to specify
"cluster"
;"cluster-queue"
and"cluster-job-name"
may also be required, depending on your cluster manager configuration. If you connect to the cluster submit node over SSH, also specify"ssh-username"
and"ssh-password"
or"ssh-keyfile"
. For example:builder = gtapprox.Builder() # will prompt for password, getpass() requires interactive input gtapprox.set_remote_build(builder, {"ssh-hostname": "submit-node", "ssh-username": "user", "ssh-password": getpass.getpass(), "cluster": "lsf"})
Instead of options you can specify the path to a configuration file in config_file. Also you can combine both — in this case option values are read from file first, then from options. If a conflict occurs, values set by options override those specified in the configuration file.
The configuration file should contain options and values in JSON format, for example:
{ "ssh-hostname": "submit-node", "ssh-username": "user", "ssh-password": "password", "environment": {"OMP_NUM_THREADS": 8, "SHELL": "/bin/bash -i"}, "cluster": "lsf", "cluster-queue": "normal", "cluster-exclusive": True }
- builder (
-
da.p7core.gtapprox.
disable_remote_build
(builder)¶ Reset builder configuration to run on the local host only.
Parameters: builder ( Builder
) – model builderUsed to cancel the
set_remote_build()
configuration.
-
da.p7core.gtapprox.
train_test_split
(x, y, train_size=None, test_size=None, options=None)¶ Split a data sample into train and test subsets optimized for model training.
Parameters: - x (array-like, 1D or 2D) – sample inputs (values of variables)
- y (array-like, 1D or 2D) – sample responses (function values)
- train_size (
int
orfloat
) – optional number of training points (int
) or portion of the sample to include in the train subset (float
) - test_size (
int
orfloat
) – optional number of test points (int
) or portion of the sample to include in the test subset (float
) - options (
dict
) – option settings
Returns: tuple of train inputs, test inputs, train outputs and test outputs
Return type: tuple
Performs an optimized split of the given data sample into two subsets to be used as model training and validation (test) data. The distribution of points between train and test is optimized to create subsets that both provide good representation of input and response variance, aiming to avoid skew, which may be introduced by random split.