11.5. da.p7core.gtdf

Generic Tool for Data Fusion (GTDF) module.

>>> from da.p7core import gtdf

Classes

da.p7core.gtdf.Builder([backend]) Data fusion model builder.
da.p7core.gtdf.GradMatrixOrder Enumerates available gradient output modes.
da.p7core.gtdf.Model([file]) Data fusion model.

11.5.1. Builder — model builder

class da.p7core.gtdf.Builder(backend=None)

Data fusion model builder.

build(x_hf, f_hf, x_lf, f_lf, options=None, weights_hf=None, weights_lf=None, comment=None, annotations=None, x_meta=None, f_meta=None)

Train a sample-based data fusion model.

Parameters:
  • x_hf (array-like, 1D or 2D) – high fidelity training sample, input part (values of variables)
  • f_hf (array-like, 1D or 2D) – high fidelity training sample, response part (function values)
  • x_lf (array-like, 1D or 2D) – low fidelity training sample, input part (values of variables)
  • f_lf (array-like, 1D or 2D) – low fidelity training sample, response part (function values)
  • options (dict) – option settings
  • weights_hf (array-like, 1D) – optional weights of the high fidelity training sample points
  • weights_lf (array-like, 1D) – optional weights of the low fidelity training sample points
  • comment (str) – text comment
  • annotations (dict) – extended comment and notes
  • x_meta (list) – descriptions of inputs
  • f_meta (list) – descriptions of outputs
Returns:

trained model

Return type:

Model

Train a data fusion model using x_hf, f_hf and x_lf, f_lf as the high and low fidelity training samples, respectively. 1D samples are supported as a simplified form for the case of 1D input and/or response.

New in version 5.0: sample point weighting support.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as training samples.

Some of the sample-based GTDF techniques support sample point weighting (weights_hf, weights_lf). Roughly, point weight is a relative confidence characteristic for this point which affects the model fit to the training sample. The model will try to fit the points with greater weights better, possibly at the cost of decreasing accuracy for the points with lesser weights. The points with zero weight may be completely ignored when fitting the model.

Point weighting is supported in the following techniques:

  • Difference Approximation (DA).
  • High Fidelity Approximation (HFA). Note that HFA ignores weights_lf because it does not use the low fidelity sample.
  • Multiple Fidelity Gaussian Process (MFGP), both in build() and build_MF().

That is, to use point weights meaningfully, one of the techniques above has to be selected using GTDF/Technique. If any other technique is selected, either manually or automatically, all weights are ignored.

Point weight is an arbitrary non-negative float value. This value has no specific meaning, it simply notes the relative “importance” of a point compared to other points in the training sample.

The weights_hf and weights_lf arguments are independent, so it is possible to specify only one of them. If specified, it should be a 1D array of point weights, and its length has to be equal to the number of points in the respective training sample.

Note

At least one weight has to be non-zero. If weights_hf or weights_lf is specified but contains only zero values, build() raises an InvalidProblemError exception.

Changed in version 6.14: added the comment, annotations, x_meta, and f_meta parameters.

The comment and annotations parameters add optional notes to model. The comment string is stored to the model’s comment. The annotations dictionary can contain more notes or other supplementary information; all keys and values in annotations must be strings. Annotations are stored to the model’s annotations. After training a model, you can also edit its comment and annotations using modify().

The x_meta and f_meta parameters add names and descriptions of model inputs and outputs. These parameters are lists of length equal to the number of inputs and outputs respectively, or the number of columns in the input and response parts of the training sample. List element can be a string (Unicode) or a dictionary. A string specifies a name for the respective input or output. It must be a valid identifier according to the FMI standard, so there are certain restrictions for names (see below). A dictionary describes a single input or output and can have the following keys (all keys are optional, all values must be str or unicode):

  • "name": contains the name for this input or output. If this key is omitted, default names will be saved to the model: "x[i]" for inputs, "f[i]" for outputs, where i is the index of the respective column in the training samples.
  • "description": contains a brief description, any text.
  • "quantity": physical quantity, for example "Angle" or "Energy".
  • "unit": measurement units used for this input or output, for example "deg" or "J".

Names of inputs and outputs must satisfy the following rules:

  • Name must not be empty.
  • All names must be unique. The same name for an input and an output is also prohibited.
  • The only whitespace character allowed in names is the ASCII space, so \t, \n, \r, and various Unicode whitespace characters are prohibited.
  • Name cannot contain leading or trailing spaces, and cannot contain two or more consecutive spaces.
  • Name cannot contain leading or trailing dots, and cannot contain two or more consecutive dots, since dots are commonly used as name separators.
  • Parts of the name separated by dots must not begin or end with a space, so the name cannot contain '. ' or ' .'.
  • Name cannot contain control characters and Unicode separators. Prohibited Unicode character categories are: Cc, Cf, Cn, Co, Cs, Zl, Zp, Zs.
  • Name cannot contain characters from this set: :"/\|?*.

Input and output descriptions are stored to model details (the "Input Variables" and "Output Variables" keys). If you do not specify a name or description for some input or output, its information in details contains only the default name ("x[i]" for inputs, "f[i]" for outputs).

build_BB(x_hf, f_hf, blackbox, options=None, comment=None, annotations=None, x_meta=None, f_meta=None)

Train a blackbox-based data fusion model.

Parameters:
  • x_hf (array-like, 1D or 2D) – high fidelity training sample, input part (values of variables)
  • f_hf (array-like, 1D or 2D) – high fidelity training sample, response part (function values)
  • blackbox (Blackbox, gtapprox.Model, or gtdf.Model) – low fidelity blackbox
  • options (dict) – option settings
  • comment (str) – text comment
  • annotations (dict) – extended comment and notes
  • x_meta (list) – descriptions of inputs
  • f_meta (list) – descriptions of outputs
Returns:

trained model

Return type:

Model

Train a data fusion model using x_hf, f_hf as the high fidelity training sample and obtaining low-fidelity training points from the blackbox. 1D samples are supported as a simplified form for the case of 1D input and/or response.

Changed in version 6.14: added the comment, annotations, x_meta, and f_meta parameters.

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as training samples.

The comment and annotations parameters add optional notes to model. The x_meta and f_meta parameters add names and descriptions to model inputs and outputs. See full descriptions of these parameters in build().

build_MF(samples, options=None, comment=None, annotations=None, x_meta=None, f_meta=None)

Train a data fusion model using multiple training samples of different fidelity.

Parameters:
  • samples (list[dict]) – training samples, in order of increasing fidelity
  • options (dict) – option settings
  • comment (str) – text comment
  • annotations (dict) – extended comment and notes
  • x_meta (list) – descriptions of inputs
  • f_meta (list) – descriptions of outputs
Returns:

trained model

Return type:

Model

New in version 4.0.

Changed in version 6.25: pandas.DataFrame and pandas.Series are supported as training samples in samples.

This is a dedicated method for the Multiple Fidelity Gaussian Processes (MFGP) technique. It allows using more than two samples of different fidelity to train the model; internally this technique is a version of the Variable Fidelity Gaussian Processes technique (VFGP) updated to support more than one low-fidelity data set.

The samples argument is a list of training samples sorted in order of increasing fidelity, so the sample with maximum fidelity is the last. Each element is a dictionary with the following keys:

  • "x" — the input part of the training sample (values of variables).
  • "f" — the response part of the training sample (function values).
  • "tol" — response noise variance. Optional: the key may be omitted, or its value may be an explicit None. Incompatible with sample point weights.
  • "weights" — sample point weights. Optional: may be omitted or set to None. Incompatible with response noise variance.

For example:

s_low = {"x": x_low, "f": f_low, "tol": None}
s_higher = {"x": x_higher, "f": f_higher, "weights": pt_weights}
s_highest = {"x": x_highest, "f": f_highest, "tol": f_var}
samples = [s_low, s_higher, s_highest]

All dictionary values are array-like. Arrays in "x", "f", "tol" can be 1D or 2D, with 1D samples supported as a simplified form for the case of 1D input and/or response. The array in "weights" is always 1D.

If information on the noise level in the response sample (key "f") is available, it can be added to the sample dictionary under the "tol" key. The "tol" array should specify a noise variance value for each element of the "f" array (that is, for each response component of every single point). Thus the "tol" and "f" arrays are of the same shape. If noise variance data is not available for some points or output components, corresponding values in "tol" should be replaced with NaN.

Note

The response noise variance in build_MF() is similar to the outputNoiseVariance argument in da.p7core.gtapprox.Builder.build().

New in version 5.0: sample point weighting support.

MFGP supports sample point weighting. Roughly, point weight is a relative confidence characteristic for this point which affects the model fit to the training sample. The model will try to fit the points with greater weights better, possibly at the cost of decreasing accuracy for the points with lesser weights. The points with zero weight may be completely ignored when fitting the model.

Point weight is an arbitrary non-negative float value. This value has no specific meaning, it simply notes the relative “importance” of a point compared to other points in the training sample.

If weights for a sample are available, they can be added to this sample dictionary under the "weights" key. The value should be a 1D array of point weights, and its length has to be equal to the number of points in this sample.

Note

At least one weight has to be non-zero. If there is a sample with all weights set to zero, build_MF() raises an InvalidProblemError exception.

Note

Point weighting is not compatible with output noise variance. If there is a sample with both "tol" and "weights" specified, build_MF() raises an InvalidProblemError exception.

Changed in version 6.14: added the comment, annotations, x_meta, and f_meta parameters.

The comment and annotations parameters add optional notes to model. The x_meta and f_meta parameters add names and descriptions to model inputs and outputs. See full descriptions of these parameters in build().

license

Builder license.

Type:License

General license information interface. See section License Usage for details.

options

Builder options.

Type:Options

General options interface for the builder. See section Options Interface for usage and the GTDF option reference.

set_logger(logger)

Set logger.

Parameters:logger – logger object
Returns:None

Used to set up a logger for the build process. See section Loggers for details.

set_watcher(watcher)

Set watcher.

Parameters:watcher – watcher object
Returns:None

Used to set up a watcher for the build process. See section Watchers for details.

11.5.2. GradMatrixOrder — model gradients order

class da.p7core.gtdf.GradMatrixOrder

Enumerates available gradient output modes.

F_MAJOR

Indexed in function-major order (\(grad_{ij} = \frac{df_i}{dx_j}\)).

X_MAJOR

Indexed in variable-major order (\(grad_{ij} = \frac{df_j}{dx_i}\)).

11.5.3. Model — data fusion model

class da.p7core.gtdf.Model(file=None, **kwargs)

Data fusion model.

Can be created by Builder or loaded from a file via the Model constructor.

Model objects are immutable. All methods which are meant to change the model return a new Model instance.

annotations

Extended comment or supplementary information.

Type:dict

New in version 6.6.

The annotations dictionary can optionally contain any number of notes. All dictionary keys and values are strings. You can add annotations when training a model and edit them using modify().

static available_sections(**kwargs)

Get a list of available model sections.

Parameters:
  • file (file or str) – file object or path to load model from
  • string (str) – serialized model
  • model (Model) – model object
Returns:

available model sections

Return type:

list

New in version 6.11.

Returns a list of strings specifying which sections can be loaded from the model:

  • "model": main model section, required for model evaluation.
  • "info": model information, required for info.
  • "comment": comment section, required for comment.
  • "annotations": annotations section, required for annotations.
  • "training_sample": a copy of training sample data, required for training_sample.
  • "iv_info": internal validation data, required for iv_info.
  • "build_log": model training log, required for build_log.

See Data Fusion Model Structure for details.

build_log

Model building log.

Type:str
calc(point, blackbox=None)

Evaluate the model.

Parameters:
Returns:

model values

Return type:

pandas.DataFrame or pandas.Series if point is a pandas type; otherwise ndarray, 2D or 1D

Raise:

FeatureNotAvailableError if the model does not support blackbox-based calculations but blackbox is not None

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 5.1: blackbox support added.

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Evaluates a data sample or a single point, optionally requesting low fidelity data from the blackbox (check has_bb before performing blackbox-based calculations). In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported.

The returned array is 2D if point is a sample, and 1D if point is a single point. When point is a pandas.DataFrame or pandas.Series, the returned array keeps indexing of the point array.

  • In the case of 1D model input, a single float value is interpreted as a single point. A 1D array-like with a single element is also one point; other 1D array-likes are interpreted as a sample. A 2D array-like is always interpreted as a sample, even if it contains a single point actually. For example:

    model_1d.calc(0.0)             # a 1D point
    model_1d.calc([0.0])           # a 1D point
    model_1d.calc([[0.0]])         # a sample, one 1D point
    model_1d.calc([0.0, 1.0])      # a sample, two 1D points
    model_1d.calc([[0.0], [1.0]])  # a sample, two 1D points
    model_1d.calc([[0.0, 1.0]])    # incorrect: a sample with a single 2D point (model input is 1D)
    
  • If model input is multidimensional, a 1D array-like is interpreted as a single point, and 2D array-likes are interpreted as data samples. For example, if model input is 2D:

    model_2d.calc(0.0)                       # incorrect: point is 1D
    model_2d.calc([0.0])                     # incorrect: point is 1D
    model_2d.calc([[0.0]])                   # incorrect: sample contains one 1D point
    model_2d.calc([0.0, 0.0])                # a 2D point
    model_2d.calc([[0.0, 0.0]])              # a sample, one 2D point
    model_2d.calc([[0.0, 0.0], [1.0, 1.0]])  # a sample, two 2D points
    

Using a low-fidelity blackbox in model evaluations increases accuracy, but at the same time effectively limits the model domain to the blackbox domain: if the point to evaluate is outside the blackbox variable bounds (see bounds in da.p7core.blackbox.Blackbox.add_variable()), blackbox-based calc() returns NaN values of responses since it receives NaN responses from the blackbox.

calc_ae(point, blackbox=None)

Calculate the accuracy evaluation estimate.

Parameters:
Returns:

estimates

Return type:

pandas.DataFrame or pandas.Series if point is a pandas type; otherwise ndarray, 2D or 1D

Raise:

FeatureNotAvailableError if the model does not provide accuracy evaluation

Raise:

FeatureNotAvailableError if the model does not support blackbox-based accuracy evaluation but blackbox is not None

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 5.1: blackbox support added.

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Check has_ae before using this method. It is available only if the model was trained with GTDF/AccuracyEvaluation on.

Performs accuracy evaluation for a data sample or a single point, optionally requesting low fidelity data from the blackbox (check has_ae_bb before performing blackbox-based accuracy evaluation). In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported, similar to calc().

The returned array is 2D if point is a sample, and 1D if point is a single point. When point is a pandas.DataFrame or pandas.Series, the returned array keeps indexing of the point array.

Using a low-fidelity blackbox in accuracy evaluation improves its quality, but at the same time effectively limits the model domain to the blackbox domain: if the point to evaluate is outside the blackbox variable bounds (see bounds in da.p7core.blackbox.Blackbox.add_variable()), blackbox-based calc_ae() returns NaN values of estimates since it receives NaN responses from the blackbox.

calc_ae_bb(blackbox, point)

Calculate the accuracy evaluation estimate, requesting low fidelity data from the blackbox.

Deprecated since version 5.1: use calc_ae() instead.

This method is deprecated since the blackbox support was added to calc_ae(), and is kept for compatibility only. It is recommended to use calc_ae() with the blackbox argument to perform blackbox-based accuracy evaluation.

calc_bb(blackbox, point)

Evaluate the model, requesting low fidelity data from the blackbox.

Deprecated since version 5.1: use calc() instead.

This method is deprecated since the blackbox support was added to calc(), and is kept for compatibility only. It is recommended to use calc() with the blackbox argument to perform blackbox-based model evaluations.

comment

Text comment to the model.

Type:str

New in version 6.6.

Optional plain text comment to the model. You can add the comment when training a model and edit it using modify().

details

Detailed model information.

Type:dict

New in version 6.14.

A detailed description of the model. Includes model metainformation, accuracy data, training sample statistics, and other data.

The gtdf.Model.details dictionary structure is generally the same as the gtapprox.Model.details structure (described in section Model Details), with the following exceptions:

  • Training dataset information (the structure under details["Training Dataset"], see Training Dataset Information) for GTDF models is a list of dictionaries. Each of these dictionaries describes one of the training samples, in the order of increasing fidelity — so the highest fidelity sample is the last (details["Training Dataset"][-1]).
  • GTDF model accuracy data (see Accuracy) is available only for the highest fidelity sample.
  • Regression model information and model decomposition are not applicable GTDF models, so the details["Regression Model"] and details["Model Decomposition"] keys never exist in gtdf.Model.details.

Also, in GTDF models from deprecated pSeven Core versions that were trained with the DA or DA_BB technique, the sample statistics dictionaries may omit the "Output" key as this information is not available from the model.

fromstring(modelString, sections='all')

Deserialize a model from string.

Parameters:
  • modelString (str) – serialized model
  • sections (list or str) – model sections to load
Returns:

None

Changed in version 6.6: added the sections argument.

A model can be loaded (deserialized) partially, omitting certain sections to reduce memory usage. Note that availability of Model methods and attributes depends on which sections are loaded. This dependency is described in more detail in section Data Fusion Model Structure.

The sections argument can be a string or a list of strings specifying which sections to load:

  • "all": all sections (default).
  • "none": minimum model information, does not load any other section (the minimum load).
  • "model": main model section, required for model evaluation and smoothing methods.
  • "info": model information, required for info.
  • "comment": comment section, required for comment.
  • "annotations": annotations section, required for annotations.
  • "training_sample": a copy of training sample data, required for training_sample.
  • "iv_info": internal validation data, required for iv_info.
  • "build_log": model training log, required for build_log.

To get a list of sections available for load, use available_sections().

grad(point, order=0, blackbox=None)

Evaluate model gradient.

Parameters:
Returns:

model gradients

Return type:

pandas.DataFrame if point is a pandas type; otherwise ndarray, 3D or 2D

Raise:

FeatureNotAvailableError if the model does not support blackbox-based gradient calculation but blackbox is not None

Changed in version 3.0 Release Candidate 1: returns ndarray.

Changed in version 5.1: blackbox support added.

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Evaluates model gradients for a data sample or a single point, optionally requesting low fidelity data from the blackbox (check has_bb before performing blackbox-based gradient calculation). In general form, point is a 2D array-like (a data sample). Several simplified argument forms are also supported, similar to calc().

The returned array is 3D if point is a sample, and 2D if point is a single point.

When using pandas data samples (point is a pandas.DataFrame), a 3D array in return value is represented by a pandas.DataFrame with multi-indexing (pandas.MultiIndex). In this case, the first element of the multi-index is the point index from the input sample. The second element of the multi-index is:

  • the index or name of a model’s output, if order is F_MAJOR (default)
  • the index or name of a model’s input, if order is X_MAJOR

When point is a pandas.Series, its index becomes the row index of the returned pandas.DataFrame.

Using a low-fidelity blackbox in model gradient evaluations increases accuracy, but at the same time effectively limits the model domain to the blackbox domain: if the point to evaluate is outside the blackbox variable bounds (see bounds in da.p7core.blackbox.Blackbox.add_variable()), blackbox-based grad() returns NaN values of gradients since it receives NaN responses from the blackbox.

grad_ae(point, order=0, blackbox=None)

Calculate gradients of the accuracy evaluation function.

Parameters:
Returns:

accuracy evaluation gradients

Return type:

pandas.DataFrame if point is a pandas type; otherwise ndarray, 3D or 2D

Raise:

FeatureNotAvailableError if the model does not provide accuracy evaluation

New in version 6.18.

Changed in version 6.20: blackbox may be a gtapprox.Model or a gtdf.Model.

Changed in version 6.25: supports pandas.DataFrame and pandas.Series as 2D and 1D array-likes, respectively; returns a pandas data type if point is a pandas data type.

Check has_ae before using this method. It is available only if the model was trained with GTDF/AccuracyEvaluation on.

Evaluates gradients of the accuracy evaluation function for a data sample or a single point, optionally requesting low fidelity data from the blackbox (check has_ae_bb before performing blackbox-based calculation of accuracy evaluation gradients). In general form, point is a 2D array (a data sample). Several simplified argument forms are also supported, similar to calc().

The returned array is 3D if point is a sample, and 2D if point is a single point.

When using pandas data samples (point is a pandas.DataFrame), a 3D array in return value is represented by a pandas.DataFrame with multi-indexing (pandas.MultiIndex). In this case, the first element of the multi-index is the point index from the input sample. The second element of the multi-index is:

  • the index or name of a model’s output, if order is F_MAJOR (default)
  • the index or name of a model’s input, if order is X_MAJOR

When point is a pandas.Series, its index becomes the row index of the returned pandas.DataFrame.

grad_bb(blackbox, point, order=0)

Evaluate model gradient, requesting low fidelity data from the blackbox.

Deprecated since version 5.1: use grad() instead.

This method is deprecated since the blackbox support was added to grad(), and is kept for compatibility only. It is recommended to use grad() with the blackbox argument to perform blackbox-based model gradient evaluation.

has_ae

Accuracy evaluation support.

Type:bool

Check this attribute before using calc_ae(). If True, the model supports accuracy evaluation. If False, then accuracy evaluation is not available, and calc_ae() raises an exception.

has_ae_bb

Blackbox-based accuracy evaluation support.

Type:bool

New in version 5.1.

Check this attribute before using calc_ae() with the blackbox argument. If True, the model supports blackbox-based accuracy evaluation. If False, then blackbox-based accuracy evaluation is not available, and calc_ae() raises an exception if blackbox is not None.

has_bb

Blackbox-based model evaluation support.

Type:bool

New in version 5.1.

Check this attribute before using calc() or grad() with the blackbox argument. If True, the model supports blackbox-based evaluation and gradient calculation. If False, then blackbox-based calculations are not available. In this case calc() and grad() raise an exception if blackbox is not None.

info

Model description.

Type:dict

Contains all technical information which can be gathered from the model, including error evaluation.

iv_info

Internal validation results.

Type:dict

New in version 3.0 Beta 1.

A dictionary containing error values calculated during internal validation. Has the same structure as the details["Training Dataset"]["Accuracy"] dictionary in GTApprox model details — see section Accuracy in Model Details for a full description.

Additionally, if the model was trained with GTDF/IVSavePredictions on, iv_info also contains raw validation data: model values calculated during internal validation, reference inputs, and reference outputs. This data is stored under the "Dataset" key.

If internal validation was not required when training the model (see GTDF/InternalValidation), iv_info is an empty dictionary.

license

Model license.

Type:License

General license information interface. See section License Usage for details.

load(file, sections='all')

Load a model from file.

Parameters:
  • file (file or str) – file object or path
  • sections (list or str) – model sections to load
Returns:

None

Changed in version 6.6: added the sections argument.

Deprecated since version 6.29: use Model constructor instead.

A model can be loaded partially, omitting certain sections to reduce memory usage and load time. Note that availability of Model methods and attributes depends on which sections are loaded. This dependency is described in more detail in section Data Fusion Model Structure.

The sections argument can be a string or a list of strings specifying which sections to load:

  • "all": all sections (default).
  • "none": minimum model information, does not load any other section (the minimum load).
  • "model": main model section, required for model evaluation and smoothing methods.
  • "info": model information, required for info.
  • "comment": comment section, required for comment.
  • "annotations": annotations section, required for annotations.
  • "training_sample": a copy of training sample data, required for training_sample.
  • "iv_info": internal validation data, required for iv_info.
  • "build_log": model training log, required for build_log.

To get a list of sections available for load, use available_sections().

modify(comment=None, annotations=None, x_meta=None, f_meta=None)

Create a copy of the model with modified metainformation.

Parameters:
  • comment (str) – new comment
  • annotations (dict) – new annotations
  • x_meta (list) – descriptions of inputs
  • f_meta (list) – descriptions of outputs
Returns:

copy of this model with modifications

Return type:

Model

New in version 6.6.

Changed in version 6.14: can edit descriptions of inputs and outputs.

This method is intended to edit model annotations, comment, and input and output descriptions found in details. Parameters are similar to build() — see the full description there. If a parameter is None, corresponding information in the modified model remains unchanged. If you specify a parameter, corresponding information in the modified model is fully replaced.

Note that modify() returns a new modified model, which is identical to the original except your edits to the model metainformation.

save(file, sections='all')

Save the model to file.

Parameters:
  • file (file or str) – file object or path
  • sections (list or str) – model sections to save
Returns:

None

Changed in version 6.6: sections argument added.

When saving, certain sections of the model can be skipped to reduce the model file size (see Data Fusion Model Structure for details). The sections argument can be a string or a list of strings specifying which sections to save:

  • "all": all sections (default).
  • "model": main model section, required for model evaluation. This section is always saved even if not specified.
  • "info": model information, info.
  • "comment": comment section, comment.
  • "annotations": annotations section, annotations.
  • "training_sample": a copy of training sample data, training_sample.
  • "iv_info": internal validation data, iv_info.
  • "build_log": model training log, build_log.

Note that the main model section is always saved, so sections="model" and sections=[] are equivalent.

size_f

Model output dimension.

Type:long
size_x

Model input dimension.

Type:long
tostring(sections='all')

Serialize the model.

Parameters:sections (list or str) – model sections to save
Returns:serialized model
Return type:str

Changed in version 6.6: sections argument added.

When serializing, certain sections of the model can be skipped to reduce the model size (see Data Fusion Model Structure for details). The sections argument can be a string or a list of strings specifying which sections to include:

  • "all": all sections (default).
  • "model": main model section, required for model evaluation. This section is always included even if not specified.
  • "info": model information, info.
  • "comment": comment section, comment.
  • "annotations": annotations section, annotations.
  • "training_sample": a copy of training sample data, training_sample.
  • "iv_info": internal validation data, iv_info.
  • "build_log": model training log, build_log.

Note that the main model section is always included, so sections="model" and sections=[] are equivalent.

training_sample

Model training samples, in order of increasing fidelity, optionally stored with the model.

Type:list

New in version 6.6.

If GTDF/StoreTrainingSample was on when training the model, this attribute contains a copy of training data. Otherwise it will be an empty list.

Training data (list contents) is one or more dict elements sorted in order of increasing fidelity. Each dictionary has the following keys:

  • "x" — the input part of the training sample (values of variables).
  • "f" — the response part of the training sample (function values).
  • "tol" — response noise variance. This key is optional and may be absent.
  • "weights" — sample point weights. This key is optional and may be absent.

Note

Training sample data is stored in lightweight NumPy arrays that have limited lifetime which cannot exceed the lifetime of the model object. It means that you should avoid assigning these arrays to new variables. Either use them directly, or if you want to read this data without keeping the model object, create copies of arrays: train_x = my_model.training_sample["x"].copy.

validate(pointsX, pointsY, blackbox=None, weights=None)

Validate the model using a reference inputs-responses array.

Parameters:
  • pointsX (float or array-like, 1D or 2D) – reference inputs
  • pointsY (float or array-like, 1D or 2D) – reference responses
  • blackbox (Blackbox) – optional low fidelity blackbox
  • weights (array-like, 1D) – optional weights of the reference points
Returns:

accuracy data

Return type:

dict

Raise:

FeatureNotAvailableError if the model does not support blackbox-based calculations but blackbox is not None

Changed in version 5.1: blackbox support added.

Changed in version 6.17: added the weights argument.

Validates the model against the reference array, evaluating model responses to pointsX and comparing them to pointsY. Optionally can request low fidelity data from the blackbox (check has_bb before running blackbox-based validation).

Generally, pointsX and pointsY should be 2D arrays. Several simplified argument forms are also supported, similar to calc().

Returns a dictionary containing lists of error values calculated componentwise, with names of errors as keys. The returned dictionary has the same structure as the details["Training Dataset"]["Accuracy"]["Componentwise"] dictionary in GTApprox model details — see section Accuracy in Model Details for a full description.

Using a blackbox in validation increases accuracy, but at the same time effectively limits the model domain to the blackbox domain. Due to this, points in the reference array have to satisfy the blackbox variable bounds (see bounds in da.p7core.blackbox.Blackbox.add_variable()). Otherwise validate() returns NaN error values since it receives NaN responses from the blackbox.

validate_bb(blackbox, pointsX, pointsY)

Validate the model using a reference inputs-responses array and requesting low fidelity data from the blackbox.

Deprecated since version 5.1: use validate() instead.

This method is deprecated since the blackbox support was added to validate(), and is kept for compatibility only. It is recommended to use validate() with the blackbox argument to perform blackbox-based model validation.