Results Analysis

This page explains how to collect data from a workflow and how to use the tools available in Analyze to process this data and create reports.

General Information

Data analysis documents in pSeven, called reports, are created and edited in Analyze. The relation of reports to workflows is not strict: a report can include data from different workflows and combine results of multiple workflow runs. To make this possible, each pSeven project has its own database which stores all data collected from workflows. Reports use the project database as a data source, but also have their own (simpler) databases that store linked copies of data. The advantages of this storage scheme are the following:

  • Workflow files (.p7wf) do not store results, so even if a workflow gets deleted or corrupt, results are still available from the project database.
  • Reports are linked to the project database, so if new results appear there, a report can be easily updated.
  • Reports also store copies of source data, so if the project database is corrupt, reports are not broken (only the update function becomes unavailable).

In general, obtaining and analysing results in pSeven consists of the following steps:

  1. Create a workflow solving your task, configure and connect blocks.
  2. Set up port monitoring: decide what data you want to gather from the workflow and specify corresponding block ports.
  3. Run the workflow. While it runs, monitored data is saved to the project database.
  4. In Analyze, create a new report, select needed data from the project database and add it to the report database.
  5. Process the data from the report database using various analysis and visualization tools.

This is the most general work sequence. Some steps can be simplified — for example, blocks have default monitoring settings which are often enough; editing a report database is not required of you just want to do a quick check of results, and so on.

Also, if you are working with a preconfigured project (such as the pSeven example projects - see section Examples), the sequence of actions becomes extremely simple:

  1. Open a workflow and the related report (in examples, project descriptions explain which one to use).
  2. Run the workflow.
  3. Refresh the report to see the latest results.

See also

Simple Workflow tutorial, section Results
Provides a basic example of report editing.
Results and Reports tutorial, in particular section Report Database
Provides an example of working with a report database and advanced report configuration.

Monitoring

pSeven allows to monitor any port of any block in a workflow, meaning that all data sent or received through this port is captured and stored in the project database. The monitoring mechanism is the main method to collect data for analysis. Its advantages are that you can gather not only the final results, but any data that appeared during workflow execution, which allows to trace the process.

Port monitoring can be set up in two ways:

  1. When configuring a block, you can enable port monitors 1 on the Ports tab 2 in the configuration dialog.

    _images/page_results_01_blconfports.png

    Note that blocks, by default, enable monitoring for their important ports, and in most cases these settings are enough. For example, the Optimizer block shown above automatically enables monitoring for the following ports (shown monitoring settings are default):

    • The ports that output solution data — optimal_x, optimal_f, optimal_c.
    • The ports that output values of variables (x1, x2, x3) and accept values of objective and constraint functions (f, c) while Optimizer iteratively solves an optimization problem. Monitoring these ports yields the optimization history.
    • Information ports, info and status.
  2. You can use the workflow configuration tool b_runconf available both in Edit and Run. The Monitoring tab lists all ports with enabled monitoring, and allows to add or remove 1 monitored ports.

    _images/page_results_02_wfconf.png

    Here you can also assign aliases 2 to ports and change their descriptions 3. If assigned, an alias becomes the name of the record in the project database that will store values collected from this port. Otherwise pSeven will use a default name in the “Blockname.portname” format. Default port descriptions are defined by blocks; you can edit them, for example, to be more relevant to the current task. Aliases and descriptions are also shown in Run on the Configuration tab.

    Note that if the list of ports is long, you can use the name filter 4 to quickly find a port by its name or alias. The filter is case-insensitive, can match partial names, and include a dot separator. For example:

    • Par and par will find the optimal_f port (alias match).
    • x will find ports optimal_x, x1, x2, and x3 (port name match).
    • zer.x will find x1, x2, and x3 only (block.port name match).

Ports with enabled monitoring appear in Run on the right pane of the Configuration tab.

_images/page_results_03_runmonitorpane.png

Before running your workflow, you can adjust monitoring settings by switching monitors on and off 1. Switching off a monitor in Run means that its data will not be captured, but does not affect the monitoring settings in block configuration or workflow configuration (so it can be used to disable monitoring temporarily when needed).

Note that if you have assigned a port alias or changed its description, they are now shown here 2 instead of defaults.

On this pane you can also set the name of next workflow run 3 — this will be the name of the run record in the project database. Run names support placeholders: %d is the date and %t is the time of workflow start, so the default name (%d %t) is a simple timestamp. Note that if you set a fixed run name (for example, TestRun) and run your workflow multiple times, pSeven will append newer data to older, not overwrite — so the same database record will accumulate values collected from all runs. If you do not want this behavior, you are free to include the above placeholders in your run name — for example, TestRun %d, started at %t.

When you start a workflow, pSeven begins writing data to the project database. In Analyze you can add some or all of this data to a report to analyze results, create plots and so on. Note that it is possible to create a new report and start configuring it even while the workflow is still running. Once a report is configured, it can be quickly updated with new data using the refresh function.

Project Database

A project database is a common data storage for all workflows in your current project. The database is a single file named project.p7pdb, located in the project directory. This file stores monitored data, values of workflow inputs, outputs, parameters, and additional information for every workflow run. Workflows themselves (.p7wf files) do not save any results data, so if you want to send a workflow with results to someone else, you will have to pack the entire project.

You can manage data in the project database using the Project Database pane in Analyze.

_images/page_results_04_pdbpane.png

The project database structure is the following:

  • On the top level there are workflow records 1. Their names are the same as the name of the corresponding workflow.

    • Each workflow record contains a number of run records 2. The name of a run record is the name you set on the Configuration tab in Run. The special “<last run>” record is a shortcut to the most recently active run record. A recently active record is the one that was created last (“TestRun 2015-09-10, started at 15-54-09” in the example above) or updated last — this is the case when you set a run name that already exists in the project database (for example, run name is fixed and you run the workflow multiple times). The “<last run>” record can be used to create reports that are easily updated after re-running a workflow (using the refresh function).

      A run record contains several “folders”:

      • “Inputs”: if your workflow has inputs (shown on the Inputs pane in Run), their values are saved here. Each input is stored in its own record, like “Initial guesses” in the example above. The name of this record is equal to the input alias defined in workflow configuration (General tab), or the original port name if it has no alias. Workflow inputs are always saved automatically (there is no need to explicitly enable monitoring for the root ports that work as workflow inputs).

      • “Monitoring”: stores data collected from monitored ports. Each monitored port creates its own record here; this record contains a sequence of values collected from the port. If you have assigned an alias to the port (set in workflow configuration, Monitoring tab), it will be name of the port’s record. In the example above, “Pareto” is an alias assigned to the optimal_f port of the optimization block. For a port with no alias, the record will have a default name in the “Blockname.portname” format.

        Note that if you disable monitoring for some port in Run using the monitoring settings in the Configuration tab, it is not recorded to the database.

      • “Outputs”: like “Inputs”, saves values of workflow outputs if there are any (the Outputs pane in Run). Each output creates its own record with the name equal to its alias (set in workflow configuration, General tab) or the original port name. Workflow outputs are also always saved automatically.

      • “Parameters”: saves values of workflow parameters, if any. Record’s name (“Solver log level” in the example above) is equal to the parameter alias defined in workflow configuration (Parameters tab) or the original port or option name if no alias is set. All parameters are saved automatically.

      • “Run info”: additional information including:

        • “duration”: run duration.
        • “preset”: the name of a parameter preset applied to this run. Parameter presets are created and selected in Run on the left pane of the Configuration tab.
        • “status”: workflow finish status.
        • “steps_<blockname>”: how many times the respective block started in the workflow.
        • “timestamp_start”, “timestamp_finish”: workflow start and finish timestamps.

Contents of the records described above can also be viewed in the Project Database pane (expand a record to see values).

The context menu b_context of the Project Database pane provides the following commands:

  • Add to New Table, Add to Data Series, Add to Data Series (Transposed): add data from project database to a report. These commands are available only when you have a report open (see section Reports for details).
  • Collapse All: collapses all open (expanded) records to clean up the Project Database pane view.
  • Rename...: changes the name of the selected database record, opens the rename dialog.
  • Remove: deletes selected records from the database (multiple records can be selected when holding Ctrl or Shift). Note that for safety reasons the data is actually not removed from the project database, it is only marked for deletion; due to this the size of the project database file (project.p7pdb) is not reduced.
  • Reduce Database Size: does a database cleanup, completely removing the data contained in deleted records and rebuilding the database structure to reduce the project.p7pdb file size. This action is irreversible. Note that this command is available only if there is no workflow running. The cleanup can take significant time depending on the amount of data in the database.

To use data in analysis, it should be added from the project database to the database of a report you are editing. Report data can be processed using various analysis and visualization tools (see Viewers).

Reports

Data analysis documents in pSeven are called reports. Each report can contain a number of data viewers and has its own database. Unlike project database, the data from a report database is available to this report only. The report is actually a file pair: a .p7rep file (the report document) and a .p7rdb file with the same file name (the report database). When you manage (copy, move) reports in pSeven, it automatically handles the report and database as a pair, and never shows .p7rdb files in Workspace. If you use a system file manager to transfer reports, you will have to remember to copy both files.

_images/page_results_05_repdbpane.png

Report database contains data series — 1D arrays formed from multidimensional project database records. This data is shown on the Data Series pane 1 (click 2 to toggle). Note that data series are parts of the report, so contents of the Data Series pane change when you switch to another report.

Every window in the report (sample, plot) is just another viewer for data series. Viewers do not connect to the project database but use the report database as a source. However, the report database is synchronized with the project database. Each data series remembers its source 3 — the original project database record. When you click b_refresh on the view toolbar, the report updates its data series (re-reads data from project), then refreshes all viewers (which re-read data series).

As a safety measure, the report database does not synchronize deletions. That is, if original data is deleted from the project database, or the project database gets corrupt, the report still keeps a copy of data. In such a case this copy is no longer synchronized (nowhere to sync from) and the refresh function stops working. However it does not break the report — even if you delete the project database and refresh the report, it will not delete data from the report.

Note

Data series can be removed only manually, using the context menu b_context on the Report Database pane.

Data series are not deleted even if they are not used by any viewer. For example, if you add a sample viewer by dragging records from the project database and then remove it, the data series that were created in background remain in the report database.

Adding Data to a Report

Data from the project database can be added to a report in several ways. The simplest one is when you select one or more records in the project database and drag them to the report editing area (use Ctrl+Click or Shift+Click to select multiple records). This action automatically creates a sample viewer and adds data series to the report database in background. Created date series can then be used in other viewers too. The Add to New Table command from the Project Database pane context menu does the same: creates a new sample viewer and adds data series in background.

Another option is to create data series manually, and then use them in various viewers. To do this, select data in the project database, then either drag selected records to the Data Series pane, or use commands from the Project Database pane context menu. Note that since data series in a report database are 1-dimensional, data is usually reshaped when you add it from the project database. In general, all values in a project database record are stacked and then processed as a single matrix, with each column becoming a separate data series. For example, if a record contains multiple matrices, they are stacked vertically, and the obtained matrix is split into columns. Vectors are stacked in such a way that each vector becomes a row, so generated data series will hold values of vector elements with the same index. A sequence of scalar values is interpreted as a single column matrix, and so on.

In some cases it is needed to generate data series from matrix rows instead of columns. For example, if a record contains a single vector value of length \(k\), by default it will generate \(k\) data series containing 1 element. To avoid this, the record data can be transposed before converting it to data series, using one of the following ways:

  1. The Add to Data Series (Transposed) command from the Project Database pane context menu will first transpose the matrix obtained by stacking the values contained in selected records, then generate data series.
  2. The same processing is done if you select not the record itself, but the values it contains, and drag this data to the Data Series pane or to the report editing area. For example, when you want to convert a single vector to a single data series, you can select only this value to apply the transpose.

The following list summarizes the rules applied when converting the data selected in the Project Database pane to data series. Note that the Add to Data Series command is the same as dragging a record from the project database, and the Add to Data Series (Transposed) command is the same as dragging only the values contained in a record.

  • Selection: single scalar value.
    • Add to Data Series: single data series of length 1.
    • Add to Data Series (Transposed): single data series of length 1.
  • Selection: record containing a single scalar value.
    • Add to Data Series: single data series of length 1.
    • Add to Data Series (Transposed): single data series of length 1.
  • Selection: record containing \(n\) scalar values.
    • Add to Data Series: single data series of length \(n\).
    • Add to Data Series (Transposed): \(n\) data series of length 1.
  • Selection: single vector value of length \(k\).
    • Add to Data Series: single data series of length \(k\).
    • Add to Data Series (Transposed): \(k\) data series of length 1.
  • Selection: record containing a single vector value of length \(k\).
    • Add to Data Series: \(k\) data series of length 1.
    • Add to Data Series (Transposed): 1 data series of length \(k\).
  • Selection: record containing \(n\) vector values of length \(k\) each.
    • Add to Data Series: \(k\) data series of length \(n\). Vectors are stacked vertically, so each vector becomes a row in a matrix; then each matrix column becomes a new data series.
    • Add to Data Series (Transposed): \(n\) data series of length \(k\). Each vector becomes a new data series.
  • Selection: single matrix value, \(i\) rows and \(j\) columns.
    • Add to Data Series: \(j\) data series of length \(i\). Each column becomes a new data series.
    • Add to Data Series (Transposed): \(i\) data series of length \(j\). Each row becomes a new data series.
  • Selection: record containing a single matrix with \(i\) rows and \(j\) columns.
    • Add to Data Series: same as a single matrix value.
    • Add to Data Series (Transposed): same as a single matrix value.
  • Selection: record containing \(n\) matrices with \(i\) rows and \(j\) columns each.
    • Add to Data Series: \(j\) data series of length \(n \cdot i\). Matrices are stacked vertically and then processed as a single matrix with \(n \cdot i\) rows and \(j\) columns.
    • Add to Data Series (Transposed): \(n \cdot i\) data series of length \(j\). Each row of every matrix becomes a new data series.

Finally note that you can begin adding data to a report as soon as records appear in the project database. It means that you can configure a report while a workflow is still running, and then use the refresh function to automatically update it with new data.

Split Data

The Split data command can be used to split a data set (already added to a report) in two new data sets — for example, to create training and test samples for the Model builder or Model validator.

_images/page_results_split_data_dialog.png

Settings in the Split data dialog are:

  • Input data, Output data: the data series to split. From each of these data series, two new data series will be generated.
  • Detect tensor structure: if enabled, pSeven will test the data set for tensor structure and try to generate new data sets in such a way that they have tensor structure too. This structure is a specific type of DoE similar to full factorial and required by tensor approximation techniques. It is recommended to enable this feature if your input data set is similar to a full factorial. However note that the tensor structure test can require much time for high-dimensional data sets (tens of data series).
  • Training subset ratio: the percentage of points that will be included into the training subset; remaining points will be added to the test subset.
  • Training subset prefix, Test subset prefix: the prefixes that will be added to names of new data series.
  • Splitting method: selects the method to use when distributing points to the training and test subsets. Using CART is recommended unless you want a reproducible split, in which case you can use the random splitting method with a fixed seed.
    • CART (Classification And Regression Tree): an algorithm that performs point clustering before generating subsets; selecting points on a cluster basis allows to create better distributed point subsets.
    • Random with seed: simple random distribution. Specified seed makes the result (the distribution of points between subsets) always reproducible.

Viewers

Each window in a report is a data viewer — a specific data analysis or visualization tool that uses data from the report database as a source. New viewers can be created in a few different ways:

  • If you select some data series in the Data Series pane and click a button on the report toolbar, it creates a viewer and automatically adds the selected data to its sources. Such viewers are created with default configuration which can then be adjusted your needs.
  • If you click a button on the report toolbar without selecting any data series, it adds an empty viewer, and you will have to edit its configuration to add data sources.
  • Certain viewers (the sample viewer, for example) provide their own methods of creating additional viewers with the same data sources (or a subset of them).
  • A generic method to create a new viewer of different type with the same data sources is also available. If you open the Data Series pane and select an existing viewer, you can note that the data series it uses are automatically selected (highlighted in the Data Series pane). Clicking a button on the report toolbar after this creates a new viewer that takes the selected data series.

Datasets

Viewers operate datasets — collections of data series which are processed separately in a way specific to the current viewer. For example, in the 2D plot viewer each dataset can be rendered as a line on the plot; in the 3D plot viewer a dataset can be rendered as a surface or a point cloud, and so on (most viewers support multiple datasets). In viewer configuration dialogs, each dataset is configured on its own tab. Such a tab contains various visualization settings (specific to the current viewer, described further) and a common Dimensions pane where you can edit the dataset’s contents.

_images/page_results_06_dimpane.png
  • Axis: in plot viewers, specifies the coordinate axis. If the viewer has a fixed number of axes, you can switch them in this column.
  • Data Source: specifies the source data series. To change, hover the table cell and click the edit icon, or double-click the cell (opens a drop-down list showing data series currently found in the report database).
  • Error: in viewers that support displaying uncertainties in values (such as error bars on a 2D plot), this column allows to select 1 or 2 additional data series that contain the error values. Error type can be set to:
    • Symmetric — allows to select 1 additional data series containing variations of values.
    • Relative offset — allows to select 2 additional data series that contain upper and lower variations. These values are added to (upper) or subtracted from (lower) the values contained in the data series selected in the Data Source column — for example, to obtain the coordinates of the upper and lower points of an error bar.
    • Absolute range — allows to select 2 additional data series that contain upper and lower values “as is” — for example, the coordinates of the upper and lower points of error bars.
  • Filter: applies a value or a string filter to the dataset. When filtering numeric values, you can specify an inclusive range by setting its upper and lower bounds. Use ‘Exclude value range’ to make the range exclusive. Use ‘Include NaNs and empty values’ to include or exclude NaNs/empty values.
_images/page_results_07_filterdialog.png

To enable the filter select the appropriate checkbox. ‘Include NaNs and empty values’ is a default setting for Sample viewer and Page viewer. In other viewers, it is already enabled for newly created dataseries not mapped to any particular axis.

A string filter is used to find all the occurrences of a given match pattern in a string. It supports two wildcard characters: an asterisk (\(*\)) and a question mark (\(?\)). The asterisk is a placeholder for “zero or more characters”. The question mark is a placeholder for “exactly one character”. If multiple filters are added, the viewer will show only those points that pass all filters.

  • Format: useful for tables only, sets the number format (general or scientific notation, the number of decimal digits).

Note that a dataset can contain more dimensions (data series) than the supposed dimension of the viewer. Additional data series can be used for point filtering, or you can use axis selection to switch between different sources in the same plot.

The context menu of the Dimensions pane provides the following commands:

  • Add Dimension: adds a new data source to the dataset. After adding a dimension, you can select the source data series from the report database using a drop-down list in the Data Source column.
  • Remove Dimension: removes the selected data source from the dataset. This command does not remove data from the report database, it affects only the current viewer configuration.
  • Move Dimension Up, Move Dimension Down: allow to reorder data sources in the list. These commands can be used instead of switching axes in the Axis column.
  • Rename Data Source...: allows to rename the source data series in the report database without closing the viewer configuration dialog. Note that this command works with the report database, not only with the current viewer. For example, if the selected data series is also used by another viewer, you will receive a warning asking if you really want to rename it and whether that viewer should continue using the renamed data source.
  • Number Format...: applies number formatting in tables.

Sample Viewer

Sample viewer is a tool for initial data analysis which can also work as a starting point in new reports. Other viewers can be created quickly by selecting data in a sample viewer and clicking a button on the report toolbar.

The sample viewer window contains four tabs:

  • Data — shows sample data. Other viewers can be created from this tab.
  • Statistics — provides descriptive statistics.
  • Correlations — calculates correlations, shows scatter plots for pairs of sample dimensions and value distribution in each dimension. From this tab you can also create additional viewers for the plots and histograms it shows.
  • Dependency — allows to test how well the sample can be approximated by a linear or quadratic model.

New sample viewer can be created in several ways:

  • When you select one or more records on the Project database pane and drag them to the report, pSeven automatically creates a new sample viewer and opens its Data tab (see Adding Data to a Report).
  • Like any other viewer, a sample viewer can also be created manually by selecting data series on the Report database pane and clicking the viewer button on the report toolbar. When you add a sample viewer manually, you can select the tab to open from the menu that appears when you click the sample viewer button.

Data

The Data tab shows raw sample values. Each column (sample dimension) corresponds to a data series in the report database. If additional data series with error values are selected in viewer configuration, the table also shows error values. Cell colors indicate high (red) and low (blue) values.

_images/page_results_viewers_sample_viewer_01_data.png

You can use the Data tab to create other viewers in the report: select the columns to use as data sources, then click any viewer button on the report toolbar. The data you selected in sample viewer will be automatically added to the new viewer’s configuration. To select or deselect all columns, you can use the buttons on the viewer toolbar.

You can also export selected columns to a CSV file using the Export data to file... command from the viewer’s menu.

Note that if a value filter is specified in the viewer’s dataset configuration (see Datasets), the table shows only those values that pass the filter. In this case, other tabs also work with the filtered data — that is, descriptive statistics and correlations are always calculated for the data you see on the Data tab; approximation models on the Dependency tab are also trained on a filtered sample. However the value filter is not applied when you export data or create new viewers from the sample viewer — these functions always work with the full sample data.

Statistics

The Statistics tab shows descriptive statistics. As noted above, all statistics are calculated only for those values that pass the dataset filter.

_images/page_results_viewers_sample_viewer_02_statistics.png

The Summary groupbox shows general sample statistics, while the table shows statistics for each sample dimension. Most statistics are self-explanatory; in case you need more information, a brief description is provided in tooltips that appear when you hover the names in the left column.

Summary statistics may also contain notes on specific sample properties such as:

  • Tensor structure — the sample has a specific structure suitable for the Tensor Approximation technique; see section Tensor Products of Approximations for details.
  • Full factorial — the sample can be used for a full factorial experiment.
  • Orthogonal design — the sample is an orthogonal array, and other.

The detailed statistics table can be exported to a CSV file using the Export statistics to file... command from the viewer’s menu.

Correlations

The Correlations tab is used to spot correlations in the sample. As noted above, correlation analysis is performed only for those values that pass the dataset filter. Plots and histograms on this tab also use the filtered data.

_images/page_results_viewers_sample_viewer_03_correlations.png

The main pane on the Correlations tab shows:

  • scatter plots for pairs of sample dimensions,
  • correlation measures and p-values for these pairs, and
  • value distribution (histogram) for each dimension (sample column).

The dimensions to analyze can be specified using the Dimensions selector. The Correlation measure selector specifies which method is used to compute correlation coefficients which are shown on the main pane and in the table on the right.

Note that correlation coefficients can not be calculated when both dimensions (sample columns) in the pair are constant. For such pairs the viewer shows N/A instead of a numeric value.

The Correlation threshold and p-value threshold sliders specify which correlations should be considered significant. Correlation is significant only if:

  • the correlation coefficient value is greater than the correlation threshold, and
  • the p-value is less than the p-value threshold.

The p-value is a confidence measure for the calculated correlation coefficient. By definition, a p-value is the estimated probability of obtaining a correlation coefficient value equal to or larger than shown while there is actually no correlation (that is, true correlation coefficient is 0, and shown value is a random result). That is, smaller p-values mean that correlation analysis results are more reliable. Usually, a p-value equal to 0.05 (5% probability) is considered small enough to ensure that correlation is significant.

Correlations can be quickly recalculated for a subset of the sample data by selecting an area on any of the scatter plots. When a selection is active, plot points outside the selection are grayed, and values on the main pane and in the correlation table are automatically updated. To reset the selection, click any plot again.

Any plot or histogram from the main pane can be shown in a separate viewer (2D plot or Histogram) by selecting it and clicking the zoom button on the sample viewer’s toolbar. Note that in this case the dataset filters do apply to the new viewer (in contrast with the viewers created from the Data tab).

The plots from the main pane on the Correlations tab can also be exported to a file using the Export scatter matrix image... command from the viewer’s menu.

Dependency

On the Dependency tab you can try to fit the sample with a linear or quadratic approximation model. After you select the input and output columns and the approximation technique to use, pSeven will train a model in the background and validate it on the sample data.

_images/page_results_viewers_sample_viewer_04_dependency.png

The Model statistics group box shows training sample details and model validation results.

  • Effective sample size — the number of points in the filtered sample. The filtered sample is obtained from the original one, with all duplicates and lines containing non-numeric values including missing, infinity and NaN ones being removed.
  • Duplicate points — the number of exact duplicates.
  • Ambiguous points — the number of points with ambiguous output values — that is, points with the same input values but different outputs.

Validation results are shown in the model accuracy table and on the scatter plot which compares sample values with the values predicted by model. The predictions can be obtained either by calculating model outputs for the sampled inputs (default) or from cross-validation which is automatically performed when training the model. To switch these modes on the plot you can use the predictions selector. The model accuracy table always shows both kinds of errors:

  • Train accuracy — errors calculated using the training sample. Note that this validation method always overestimates model accuracy.
  • Internal Validation — errors estimated during the cross-validation procedure.

The feature importance chart shows estimates of inputs influence on the model output:

  • “sole effect” is the fraction of model variance that this input would explain if all others are fixed.
  • “interactions” is the remaining fraction of model variance that this input explains (so it is the part that exists due to input interaction).

Note that the accuracy of feature importance estimates depends on model accuracy: more accurate models give more accurate estimates.

The scatter plot and the feature importance chart can be exported to images using commands from the viewer’s menu.

Configuration

_images/page_results_viewers_sample_viewer_05_conf.png

General settings for the sample viewer are the following:

  • Title: sets the viewer’s title.
  • Correlations and scatterplot matrix — settings related to the Correlations tab:
    • Selected dimensions: analyzed sample columns.
    • Correlation measure: the correlation coefficient to calculate.
    • Correlation threshold: the significance threshold.
    • p-value threshold: the confidence threshold (see section Correlations for details).
    • Histogram bins count: specifies the number of histogram bins. Available options are:
      • Adaptive: adapts the number of bins to the AMISE-optimal bandwidth \(h = \sigma (\frac{24 \sqrt{\pi}}{n})^{1/3}\), where \(n\) is the number of points in the dataset and \(\sigma\) is the non-biased sample standard deviation. The number of bins is \(\lceil \frac{x_{max} - x_{min}}{h} \rceil\), where \(x_{min}\) and \(x_{max}\) are the value axis bounds.
      • Sturges’ rule: the number of bins is \(\lceil \log_2 n + 1 \rceil\), where \(n\) is the number of points in the dataset. This rule is derived from a binomial distribution (assumes an approximately normal distribution) and implicitly bases the bin sizes on the range of the data.
      • Manual: allows to specify the number of bins directly.
    • Display limit of scatter matrix: the maximum number of points to show on plots.
    • Marker color: sets solid or gradient colors for plot points. When a gradient color is selected, the gradient is applied according to the color axis setting.
    • Color axis: specifies which sample dimension to use to assign gradient colors to plot points. Points are sorted by this dimension.
    • Marker size: point size on scatter plots.
    • Marker opacity: point opacity on scatter plots.
    • Marker stroke: enables point marker outlines.
  • Dependency - settings related to the Dependency tab:
    • Inputs: sample columns containing input values.
    • Outputs: sample column with output values.
    • Technique: selects linear or quadratic approximation.
    • Display limit of scatterplot: the maximum number of points to show on the validation plot.
    • Draw errors in scatterplot: specifies how to calculate model predictions (see section Dependency for details).

2D Plot

The 2D plot tool creates a plot that can contain multiple curves and point sets, with optional error bars or bands.

_images/page_results_12_2dplot1.png

The plot can be zoomed in by selecting the area of interest. To reset zoom, double-click the main area (plot grid) or click b_plot_zoom1 in the title bar.

_images/page_results_13_2dplot2.png

Hovering a point on the plot shows a tooltip with coordinate values. If error data is available, this tooltip also shows the upper and lower error values (for example, the lower and upper coordinates of the error bar).

General Settings

_images/page_results_14_2dplot_gen_conf.png

General settings for the 2D plot viewer are the following:

  • Title: sets the plot title.
  • Legend location: sets the legend location and its placement — over the plot or outside the plot grid.
  • Display limit: sets the maximum number of points displayed on the plot. If this number is less than the size of a dataset, points are sieved so the number of shown points does not exceed the limit. This is purely a display setting that does not apply any data filters to the dataset itself.
  • Sort plot points by X axis value: controls the order of connecting points with a line. If enabled (default), points are connected in the order of increasing their X coordinate value. If disabled, points are connected in the same order as they follow in the dataset.

The Axes pane allows to set axis labels, plot ranges and axis scales. Note that if an axis range is not specified, the viewer sets it automatically in such a way that it includes all points from all datasets.

Dataset Configuration

The 2D plot viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. Generally, each dataset is rendered as another line on the plot (or you can hide the line and show point markers only). Note also that you can re-order the plot “layers” by dragging the dataset tabs on the left tab bar.

_images/page_results_15_2dplot_ds_conf.png

Basic settings are:

  • Name: specifies the name displayed in the plot legend.
  • Visible: allows to temporarily hide this dataset from the plot.
  • Extrapolate: if enabled, and the dataset is rendered as a line, the line will be linearly extrapolated up to the plot’s bounds. Another usage is to draw a horizontal line on the plot: if the dataset contains only one point, extrapolation will be constant.

Dataset contents are edited in the Dimensions pane where you can select source data series, add error data, and apply value filters (see section Datasets for details).

Point marker settings:

  • Draw markers: specifies whether to show point markers for this dataset on the plot.
  • Color: marker color. Clicking the box opens a color selector.
  • Style: sets the marker style.
  • Size: sets the size of markers.

Line settings:

  • Draw lines: specifies whether to draw the line connecting points on the plot.
  • Color: line color. Clicking the box opens a color selector.
  • Style: sets the line style — solid, dashed, or dotted.
  • Thickness: sets the line thickness.
  • Fill under: if enabled, the area under the line will be filled with line color. This setting works even if drawing the line is disabled.
  • Smooth line: applies minor smoothing to the drawn line.

Error settings:

  • Draw errors: specifies whether to draw error bars (or the error band, depending on the selected style).
  • Color: sets the color of error bars or the band.
  • Style: allows to switch between drawing errors as error bars individually for each point, or drawing a solid error band around the line.
  • Thickness: applies to error bars only, sets their line thickness.

Chart

A chart presents data with vertical bars with lengths proportional to the values that they represent.

_images/page_results_16_chart1.png

The chart viewer supports multiple datasets, each dataset is rendered as another set of bars. The bars can be drawn side-by-side or stacked; in the stacked mode, values in the same category (with the same point index in the dataset) are drawn on top of each other.

_images/page_results_17_chart2.png

Hovering a bar on the chart shows its numeric value. If the bars are stacked, the tooltip shows the value represented by the hovered section of the bar (not the total bar value).

General Settings

_images/page_results_18_chart_gen_conf.png

General settings for the chart viewer are the following:

  • Title: sets the plot title.
  • Display limit: sets the maximum number of displayed bars.
  • Legend location: sets the legend location and its placement — over the chart or outside the plot grid.
  • Stacked bars: enables or disables stacked bars.

The Axes pane allows to set axis labels and the scale of the value axis.

Font settings specify font sizes in the plot title, axis labels, legend labels, and bar labels (value tooltips).

Dataset Configuration

The chart viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. Each dataset is rendered as another set of bars. Note also that you can change the order of bars (in the stacked mode, for example) by dragging the dataset tabs on the left tab bar.

_images/page_results_19_chart_ds_conf.png

Available settings are:

  • Name: specifies the name displayed in the legend.
  • Visible: allows to temporarily hide this dataset from the chart.
  • Color: sets the bar color for this dataset. Clicking the box opens a color selector.
  • Bar labels: specifies when to show value tooltips: always, never, or only when a bar is hovered with the mouse cursor.
  • Label location: specifies the location of value tooltips.
  • Label background: adds a background color to value tooltips when enabled, or makes it transparent if disabled.

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that a dataset can contain multiple data series, but only one of them will be rendered. Additional data series can be used to filter points in the dataset or to switch between different data sources on the same chart.

Histogram

A histogram plot provides a graphical representation of the distribution of numerical data.

_images/page_results_20_hist.png

Hovering a histogram bar shows a tooltip with the number of points in this bin and the bin’s bounds.

General Settings

_images/page_results_21_hist_gen_conf.png

General settings for the histogram viewer are the following:

  • Title: sets the plot title.
  • Display limit: sets the maximum number of bins (bars) displayed by the histogram.
  • Legend location: sets the legend location and its placement — over the plot or outside the plot grid.
  • Bins count: specifies the number of histogram bins. Available options are:
    • Adaptive: adapts the number of bins to the AMISE-optimal bandwidth \(h = \sigma (\frac{24 \sqrt{\pi}}{n})^{1/3}\), where \(n\) is the number of points in the dataset and \(\sigma\) is the non-biased sample standard deviation. The number of bins is \(\lceil \frac{x_{max} - x_{min}}{h} \rceil\), where \(x_{min}\) and \(x_{max}\) are the value axis bounds.
    • Sturges’ rule: the number of bins is \(\lceil \log_2 n + 1 \rceil\), where \(n\) is the number of points in the dataset. This rule is derived from a binomial distribution (assumes an approximately normal distribution) and implicitly bases the bin sizes on the range of the data.
    • Manual: allows to specify the number of bins directly.
  • Overlayed histograms: changes the drawing style of histograms that display multiple datasets. If disabled, all bars are drawn separately; if enabled, gives a more compact view.

The range of values axis for histograms (the Values Axis groupbox) is set separately because there are different methods to determine it.

  • Robust (default): assumes an approximately normal distribution, sets the range to \([\bar{x} - 3 \sigma, \bar{x} + 3 \sigma]\) where \(\bar{x}\) is the sample mean and \(\sigma\) is the non-biased sample standard deviation.
  • Include all values from the data source: sets the range to \([x_{min}, x_{max}]\) — bounds are the minimum and maximum values found in the dataset.
  • Manual: allows to specify the range directly.

Note that if the robust or manual range is selected, there may be values that are outside the axis bounds. In such a case the histogram will automatically add one or two more bins (trailing bins) representing the out-of bounds intervals:

  • If the dataset contains at least one value \(x \lt r_{min}\) (less than the lower axis bound), the left trailing bin is added, representing the interval \((-\infty, r_{min})\).
  • If the dataset contains at least one value \(x \gt r_{max}\) (greater than the upper axis bound), the right trailing bin is added, representing the interval \((r_{max}, \infty)\).

Dataset Configuration

The histogram plot viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. Each dataset is rendered as a separate histogram; general settings, like the number of bins and axis ranges, apply to all these histograms.

_images/page_results_22_hist_ds_conf.png

Available settings are:

  • Name: specifies the name displayed in the legend.
  • Visible: allows to temporarily hide this dataset from the plot.
  • Color: sets the bar color for this dataset. Clicking the box opens a color selector.
  • Bar labels: specifies when to show tooltips with bin information: always, never, or only when a bar is hovered with the mouse cursor.
  • Label location: specifies the location of bin tooltips.
  • Label background: adds a background color to bin tooltips when enabled, or makes it transparent if disabled.

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that a dataset can contain multiple data series, but only one of them will be rendered. Additional data series can be used to filter points in the dataset or to switch between different data sources on the same histogram plot.

3D Plot

The 3D plot tool creates an interactive plot that can contain multiple surfaces and point clouds.

_images/page_results_23_3dplot.png

When viewing the plot, you can rotate it by mouse drag and zoom with the mouse wheel. You can also use buttons in the plot title bar to quickly change the angle of view.

General Settings

_images/page_results_24_3dplot_gen_conf.png

General settings for the 3D plot viewer are the following:

  • Title: sets the plot title.
  • Legend location: sets the legend location.
  • Camera perspective: changes the 3D perspective mode.
  • Display limit: sets the maximum number of points displayed on the plot. If this number is less than the size of a dataset, points are sieved so the number of shown points does not exceed the limit. This is purely a display setting that does not apply any data filters to the dataset itself.
  • Show bounding box: shows or hides the 3D box around the plot.
  • Enable light: enables or disables rendering the lightning effect on 3D surfaces.
  • LATEX syntax: enables the math syntax in plot titles and labels. For example, you can use it to show subscripts (like x_1 for \(x_1\)) or special symbols (like \hat{f} for \(\hat{f}\)).

The Axes pane allows to set axis labels, plot ranges and axis scales. Note that if an axis range is not specified, the viewer sets it automatically in such a way that it includes all points from all datasets.

Dataset Configuration

The 3D plot viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. Each dataset is rendered as another surface or point cloud on the plot. Note also that you can re-order the plot “layers” by dragging the dataset tabs on the left tab bar.

_images/page_results_25_3dplot_ds_conf.png

Basic settings are:

  • Name: specifies the name displayed in the legend.
  • Visible: allows to temporarily hide this dataset from the plot.

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that a dataset can contain more than 3 data series, but additional data series are not rendered. They can be used to filter points in the dataset or to switch between different data sources on the same plot.

Point marker settings:

  • Draw markers: specifies whether to show point markers for this dataset.
  • Draw trajectory: if enabled, connects points with a trajectory line.
  • Color: marker color. Clicking the box opens a color selector. The color can be solid (same color for all markers) or gradient (affected by the color axis).
  • Colorbar: if enabled, a color bar is added to the plot. The bar shows color map applied to point markers.
  • Marker size: sets the size of markers.
  • Marker style: sets the marker style.
  • Line thickness: sets the thickness of the trajectory line, if it is drawn.
  • Value labels: adds value tooltips to the point markers.
  • Draw stems: adds vertical stems to the plotted points.

Surface settings:

  • Draw surface: specifies whether to render a 3D surface for this dataset.
  • Color: sets the surface color. Clicking the box opens a color selector. The color can be solid or gradient; if a solid color is selected, it is recommended to enable light in plot’s general settings.
  • Colorbar: if enabled, a surface color bar is added to the plot. Note that this is a separate colorbar — that is, there can be two different color bars for point markers and the surface.
  • Reconstruction method: selects the surface reconstruction method. In general, the default (“grid XY”) is the fastest, but other methods (triangulation) may be more precise in certain cases.
  • Grid mesh density: controls the accuracy of surface reconstruction. Higher density makes the surface more smooth but requires more rendering time.
  • Wireframe mesh: draws a mesh over the surface.
  • Transparent: adds a certain degree of transparency to the surface, can be useful when the plot contains multiple surfaces.

Parallel Coordinates Plot

Parallel coordinates plot is a useful tool for visualizing multidimensional data. It shows each point as a polyline with vertices representing its coordinates along each axis.

_images/page_results_26_pc1.png

When viewing the plot, you can re-order axes by dragging them and make selections to examine certain subsets of data. When you add a selection to an axis, all points outside this selection are grayed out. To reset selections, double-click the plot.

_images/page_results_27_pc2.png

Hovering a line vertex highlights the point (or all nearby points if the lines are closely packed) and shows a tooltip with coordinate values.

General Settings

_images/page_results_28_pc_gen_conf.png

General settings for the parallel coordinates plot viewer are the following:

  • Title: sets the plot title.
  • Display limit: sets the maximum number of points displayed on the plot (the number of lines drawn). If this number is less than the size of a dataset, points are sieved so the number of shown points does not exceed the limit. This is purely a display setting that does not apply any data filters to the dataset itself.
  • Hide points outside selection: if enabled, lines representing points outside selection are hidden from the plot instead of graying them out.

The Axes pane allows to set axis labels, ranges, and scales. You can also use it to add precise selections to axes (the Selection column). Note that if an axis range is not specified, the viewer sets it automatically in such a way that it includes all points from the dataset.

Font settings specify font sizes in the plot title and axis labels.

Dataset Configuration

Parallel coordinates plots support only one dataset. The name of this dataset is used on the left tab bar only and is never displayed on the plot.

_images/page_results_29_pc_ds_conf.png

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that in parallel coordinates plots you can add as many dimensions to the dataset as you wish, and each dimension will add another axis on the plot (so the plot always shows all dataset contents).

Lines settings allow to change the line style and apply different coloring. Note that you can select an additional data series as the color axis. Default color axis applies the gradient according to point index in the dataset; if you select some data series, the color will be applied according to values in this data series.

Predictive Modeling Toolkit

The predictive Modeling Toolkit is a set of tools to work with approximation models. These tools are powered by the same pSeven Core component that is used in the ApproxBuilder and ApproxPlayer blocks — the Generic Tool for Approximation (see section GTApprox). Compared to the ApproxBuilder block which is intended to automate model training, the Predictive Modeling Toolkit is designed to work interactively and does not require you to create workflows.

Using the Predictive Modeling Toolkit, you can:

All components of the Predictive Modeling Toolkit are available in Analyze and can work both with the data gathered from a workflow and the data imported to pSeven from CSV or Excel files. A model created with the Predictive Modeling Toolkit can be evaluated in Analyze, or you can export it and integrate into a workflow using the ApproxPlayer block.

Note that in order to work with the Predictive Modeling Toolkit, you have to create a report first and import training data or models to analyze into the report database. For more details on adding data to reports, see sections General Information and Reports on this page. Most functions of the Predictive Modeling Toolkit are available from the Report database pane; the Model validator and Model explorer tools are found on the main report toolbar.

Model Builder

To begin building a model, open the Report database pane and select data series that contain training data. After this select Build model... from the Data series pane menu, or click the model builder button on the toolbar.

_images/page_results_pmt_01_build.png

The data series you selected are automatically added as training data sources in Model builder configuration. Note you can also open the configuration dialog without selecting any data, and then add data sources manually.

_images/page_results_pmt_02_builder_conf.png

General model settings are:

  • Model name: the name under which the model will be saved to the report database.
  • Comment: a short description that will be saved to model details.
  • Initial model: an advanced setting that allows to specify the model for incremental training (see section Incremental Training for details).

In the Model builder dialog you can add training and test data (see Data Settings) and configure the building mode (see Modes). When finished with the configuration, click Build or Queue to start the builder:

  • Build closes the Model builder dialog and starts building the model.
  • Queue adds the model to build queue (and starts building if the queue is empty). This button does not close the Model builder dialog, so you can change builder settings and queue multiple models.

Queued models and ready models are shown on the Models pane in the report database. Using the Models pane menu, you can evaluate models, export and import them and perform other tasks. Note also that you can stop building a model using the Stop training command from this menu.

Data Settings

Data sources and additional data properties are specified in the Data settings table in the Model builder dialog.

  • Training data: the data series that contain training inputs and outputs.
  • Test data: the data series that contain reference data for model quality estimation. Test data is optional.
  • Type: specifies inputs and outputs.
  • Categorical: your data can contain some parameters that are not continuous but take only predefined values. Such inputs should be selected in this column. This setting is not valid for outputs.
  • Output noise variance: if output noise data is available, here you can select the data series that contain noise values. This setting is not valid for inputs.
  • Data filters: allow to select data subsets from the training and test samples. If a filter is applied, only those points that pass the filter will be used when building the model and calculating model errors.

In the Data settings table you can also change the order of inputs and outputs, add or remove data, using buttons on the toolbar.

The Point weights selector below the table is used to apply sample weighting. Here you can select a data series that contains weight values; for more details on this feature, see section Sample Weighting.

Modes

Two modes are available in Model builder: the default SmartSelection mode and the advanced manual mode.

  • SmartSelection is an intelligent algorithm that is designed to automatically train high-quality models.
  • Manual mode allows to set training options directly, similarly to the ApproxBuilder block.

SmartSelection does not require additional configuration but can use various hints to speed up training or build a more accurate model. Hints are added from the menu that appears when you click inside the SmartSelection pane. A detailed description of SmartSelection features is available in the Smart Training section of the GTApprox guide.

Manual mode is very similar to the ApproxBuilder block configuration and uses the same options as this block. A guide to manual builder configuration is available is section Manual Training.

Incremental Training

Incremental training allows to update an existing model with new training data. Its common use case is when not all data is available initially; another possible scenario is the case of a huge sample which is expensive to process at once.

Incremental training can be started either by specifying an initial model in the Model builder dialog or by selecting the Update... command from the Models pane menu. Note that incremental training is currently supported only by the GBRT technique and is not available for models built using other techniques.

For more details on incremental training, see the GBRT technique description in section Gradient Boosted Regression Trees, in particular section Incremental Training.

Model Smoothing

Trained models can be additionally smoothed using the Smooth model... command from the Models pane menu (opens the Smooth model dialog).

_images/page_results_pmt_03_smoothing.png

Note that the smoothed model is saved under a new name, so the original model is not changed. After smoothing, both the original and smoothed models will be found on the Models pane.

The amount of smoothing is controlled by the smoothing factor which can be a single value for simple smoothing or a matrix for anisotropic smoothing.

Simple method applies the same smoothing to all outputs. In this case, the smoothing factor is a value in range \([0.0, 1.0]\) where 0.0 means no smoothing and 1.0 is extreme (almost linear) smoothing.

Anisotropic smoothing is an advanced method that allows to control smoothing of each output component individually and apply different smoothing per model input (direction in the input space). It is configured by a matrix of individual smoothing factors; matrix row corresponds to model output, while each column corresponds to an input. Each element of the matrix sets smoothness for a model output in the direction of the corresponding input.

Note that some model training techniques do not support additional smoothing, and if you select such a model in the Smooth model... dialog, the smoothing cannot be started.

Model Details

Double-clicking a model on the Models pane shows detailed information about the model. You can also open this dialog using the Show details command from the pane menu.

_images/page_results_pmt_04_model_details.png
  • The Parameters tab shows a quick summary, options that were used when building the model, and model structure (for RSM models only).
  • The Training sample tab shows sample statistics.
  • The Accuracy tab contains error values calculated on the training data set and obtained during internal validation (if it was enabled when building the model).
  • The Training log tab contains a full model training log. Note that you can also open this tab while building the model to view the log in real time.
  • The Annotations tab contains model comments.

Making Predictions

To evaluate a model, select it on the Models pane and use the Make predictions... command from the pane’s menu. This command is also available on the quick toolbar.

_images/page_results_pmt_05_make_predictions.png

In the Make predictions dialog you can also select the model to evaluate and configure inputs and outputs.

Available input options are:

  • Latin hypercube sampling: generate a new LHS sample and use it as input. Generated sample will be saved to the report database.
  • Full factorial sampling: generate a full factorial sample and use it as input. Generated sample will be saved to the report database.
  • Inputs from data series: get input data from the selected data series in the report database.

On the Outputs pane you can set names of the new data series that will store calculated model outputs.

  • Predict and show evaluates model outputs and automatically opens a new sample viewer showing the input and output data. The data is also stored to the report database.
  • The Predict button evaluates model outputs and stores data to the report database without showing it.

Model Validator

Model validator is a tool to estimate model quality and compare models. It allows to test models against reference data and find the most accurate model using error plots and statistics.

To validate models, select them on the Models pane and click the Model validator button on the report toolbar. If you also select data on the Data series pane, it is added to validation as a test sample for every model.

_images/page_results_pmt__model_validator_scatter.png

In Model validator, you can use the Models pane to add, remove, or reorder models, select the outputs to validate, change plot colors, hide or show models on plots. You can also open the Make predictions dialog from here (see Making Predictions).

The selectors above the plot switch plot type, calculated error type, and type of data used in validation (see Comparing Models). The table at the bottom shows error metrics.

Comparing Models

Model validator can show two kinds of plots, changed using the Plot selector on top:

  • Scatter plot directly compares reference sample outputs with model predictions.
  • Quantile plot (default) is useful to analyze error distribution.

On the quantile plot, each point shows the fraction of sample points, for which errors are lower than the value on the horizontal axis. A steeper curve is better: it means that error value is lower for a larger fraction of points, probably with a few outliers that form a long “tail” on top.

_images/page_results_pmt__model_validator_quantile.png

By default, the quantile plot and error metrics are based on absolute error values. Using the Errors selector you can switch them to normalized error which is the absolute error divided by the standard deviation of the output from the reference sample. Normalized error is useful for estimating error significance considering the output value range.

The Sample selector changes the source of reference data used for model validation:

  • If “training” is selected, reference data is the model’s training sample. This sample is available in validation only if it was saved with the model when training (samples are saved by default).
  • If “test” is selected, reference data comes from the test sample. This data is selected from the report database — either when adding a Model validator, or later in its configuration.
  • If “internal validation” is selected, both reference and prediction data are read from the model’s internal validation results. This data is available only if internal validation and saving validation data were enabled when training the model (these options are also enabled by default).

It is recommended to use a test data sample when possible: test sample validation shows model’s ability to predict outputs for new input values that were not available in training. Training sample validation tends to overestimate model accuracy. Low errors on the training sample (steeper error quantile curves) can actually be a sign of overfitting, especially if the same model shows significantly higher errors on a test sample. If holdout test data is not available, it is recommended to switch to internal validation: this data is obtained from cross-validation tests that run when building the model (see section Internal Validation for more details).

The table at the bottom contains prediction error metrics. Best metric values are highlighted. The metrics are:

  • \(R^2\): coefficient of determination. Indicates the proportion of output variation that can be explained by the model.
  • RMS: the root-mean-squared error.
  • Maximum: the maximum prediction error over the cross-validation or test sample.
  • Q99: the 99-th percentile. For 99% of reference points, prediction error is lower than this value.
  • Q95: the 95-th percentile. For 95% of reference points, prediction error is lower than this value.
  • Median: the median of prediction error values.
  • Mean: the arithmetic mean of prediction error values.

\(R^2\) (the coefficient of determination) is the most robust metric; values closer to 1.0 are better. For other metrics, lower values are better.

Model Import and Export

Models can be imported and exported on the Models pane using respective commands from the pane menu or buttons on the quick toolbar.

Import from file... imports a new model from a binary file in the GTApprox format (.gtapprox). Imported models appear on the Models pane.

Export to file... can export a model from the report to a number of formats. Available formats are:

  • GTApprox model: binary file in the GTApprox format (.gtapprox). Select this format if you want to load the model with an ApproxPlayer block.
  • Octave script: model code compatible with MATLAB.
  • C source for MEX: source code for a MATLAB MEX file.
  • C source for standalone program: C source code with the main() function for a complete command-line program.
  • C header for library: header for a model DLL.
  • C source for library: C header and implementation for compiling a model DLL.
  • C source for Microsoft Excel VBA: C implementation of the model intended for creating a DLL compatible with Microsoft Excel.
_images/page_results_pmt_06_export.png

For C source code formats you will also have to specify the name of the model function. Function description is optional; this text will be added to the source code as a comment.

Model Explorer

Model explorer is a tool designed to help in analysis of multidimensional models. It allows to study input-output dependencies by plotting a series of two-dimensional slices, each showing an input-output pair.

_images/page_results_pmt_07_model_explorer.png

The models to analyze are selected on the Models pane. Here you can add and remove models, reorder them, set model colors, and show or hide models on the plots. If a model provides accuracy estimation information, AE curve can also be displayed on slices by enabling it in the “AE” column.

The sliders below this pane set coordinates of the origin point — the point where all slices intersect. This point is also marked on the slice plots.

By default, the origin point is the same for all models. To set different origin points (as shown above), deselect the “Same slice settings” option in the Models pane menu.

Slices can be rotated around the origin point using the Change slice orientation slider. When this option is disabled (default), slicing planes are parallel to the coordinate axes in the input space. If enabled, moving the slider rotates slicing planes around F-axes in a roughly uniform fashion, allowing to “scan” the model space around the origin point. When you change slice orientation, the main directional vector is shown in the “Slice direction” column in the table at the bottom. You can also edit this column to set the direction manually.

Slice plots, in addition to model curves, display bars that show how much an output is influenced by an input.