Results Analysis

This page explains how to collect data from a workflow and how to use the tools available in Analyze to process this data and create reports.

General Information

Data analysis documents in pSeven, called reports, are created and edited in Analyze. Reports are not strictly bound to workflows: a report can include data from different workflows and combine results of multiple workflow runs. To make this possible, each pSeven project contains a database which stores all project data. Workflows write to this project database, and reports use the project database as their data source. Each report, in turn, has its own simpler database that stores linked copies of data (see Reports for details). This storage scheme has some additional advantages:

  • Workflow files (.p7wf) do not store results, so even if a workflow gets deleted or corrupt, results are still available from the project database.
  • Reports are linked to the project database, so when new results appear in the project, a report can be easily updated.
  • Reports also store copies of source data, so if the project database becomes corrupt, reports are not broken. Report update temporarily becomes unavailable, but if the project database is repaired or recreated, update will work again.
  • You can import external data to a report database and combine it with workflow results (see Data Import).

In general, obtaining and analyzing results in pSeven consists of the following steps:

  1. Create a workflow solving your task, configure and connect blocks.
  2. Set up port monitoring: decide what data you want to gather from the workflow and select corresponding block ports.
  3. Run the workflow. While it runs, monitored data is saved to the project database.
  4. In Analyze, create a new report, select data from the project database and add it to the report database.
  5. Process the data from the report database using various analysis and visualization tools.

Most of these steps can be simplified — for example, blocks have default monitoring settings which are often enough; you can add data directly from a project database to a report (pSeven will update the report database in background), and so on.

If you are working with a completed run-ready project, you will usually use a preconfigured report which updates with a single click:

  1. Open a workflow and the related report.
  2. Run the workflow.
  3. Refresh the report.

The pSeven example projects (see Examples) include many reports configured this way. Project descriptions in examples explain which reports to use when you want to see the latest results.

See also

Simple Workflow tutorial, section Results
Provides a basic example of report editing.
Results and Reports tutorial, in particular section Report Database
Provides an example of working with a report database and advanced report configuration.

Monitoring

pSeven allows to capture the activity of any input and output port in a workflow. This approach is called monitoring; it means that all data going through this port is collected and stored in the project database. The main advantage of port monitoring is that you can gather not only final results, but also any data that appeared during workflow run, which allows to trace the process.

Port monitoring can be set up in two ways:

  1. When you configure a block. For most blocks, the ports to be monitored are specified on the Ports tab in the configuration dialog. In some blocks, such as Design space exploration and Uncertainty quantification, port monitors 1 are enabled by ticking the respective checkbox in the Ports and parameters dialog 2.

    _images/page_results_01_blconfports.png

    Note that most blocks enable monitoring for their important ports by default, and these settings are usually enough. For example, Design space exploration automatically enables monitoring for the All designs, Feasible designs, and Optimal designs ports. In the figure above, the optimal designs ports are monitored.

  2. You can use the workflow configuration tool b_runconf available both in Edit and Run. The Monitoring tab lists all ports with enabled monitoring, and allows to add or remove 1 monitored ports.

    _images/page_results_02_wfconf.png

    Here you can also assign aliases 2 to ports and change their descriptions 3. If assigned, an alias becomes the name of the record in the project database that will store values collected from this port. Otherwise pSeven will use a default name in the “Blockname.portname” format. Default port descriptions are defined by blocks; you can edit them, for example, to be more relevant to the current task. Aliases and descriptions are shown in the Monitoring pane in Run.

    Note that if the list of ports is too long, you can use the name filter 4 to quickly find a port. A filtered list will show only those ports which names or aliases contain the filter string. You can also filter for names in the block.port format using the dot separator, or show ports of a specific block by typing the block name with a trailing dot (block.) in the filter. The filtering is case-insensitive.

Ports with enabled monitoring appear on the run settings pane in Run.

_images/page_results_03_runmonitorpane.png

Before running a workflow, you can adjust monitoring settings by switching monitors on and off 1. Switching off a monitor in Run means that its data will not be captured, but it does not affect the monitoring settings in block configuration or workflow configuration (so it can be used to disable monitoring temporarily when needed). Note that if you have assigned a port alias or changed its description, they are now shown here 2 instead of defaults.

On this pane, you can also set the name of the next workflow run 3 — this will be the name of the run record in the project database. Run names support placeholders: %d is the date and %t is the time of the workflow start, so the default name (%d %t) is a simple timestamp. Note that if you set a fixed run name (for example, TestRun) and run your workflow multiple times, pSeven will append new data to old one without overwriting it. It means that the same database record will accumulate values collected from all runs. If you do not want to store data from previous runs, use the Overwrite run history command from the b_context menu on the run settings pane. To store workflow outputs in a different record each time, you can also include the above placeholders in the run name — for example, TestRun %d, started at %t.

When you start a workflow, pSeven begins writing data to the project database. In Analyze you can add some or all of this data to a report to analyze results, create plots and so on. Note that it is possible to create a new report and start configuring it even while the workflow is still running. Once a report is configured, it can be quickly updated with new data using the b_refresh button on the report toolbar.

Project Database

A project database is a common data storage for all workflows in your current project. The database is a single file named project.p7pdb, located in the project directory. This file stores monitored data, values of workflow inputs, outputs, parameters, and additional information for every workflow run. Workflows themselves (.p7wf files) do not save any results data, so if you want to send a workflow with results to someone else, you will have to pack the entire project.

You can manage data in the project database using the Project database pane in Analyze.

_images/page_results_04_pdbpane.png

The project database structure is the following:

  • On the top level there are workflow records 1. Their names are the same as the name of the corresponding workflow.

    • Each workflow record contains a number of run records 2. The name of a run record is the name you specified in the Parameters pane in Run. The special “<last run>” record is a shortcut to the most recently active run record. A recently active record is the one that was created last (“TestRun 2019-01-17, started at 12-51-48” in the example above) or updated last — this is the case when you set a run name that already exists in the project database (for example, the run name is fixed and you run the workflow multiple times). The “<last run>” record can be used to create reports, which are easily updated after re-running a workflow using the refresh function (see Report Update).

      A run record contains several “folders”:

      • “Inputs”: if your workflow has inputs (shown on the Inputs pane in Run), their values are saved here. Each input is stored in its own record, like “1st flow velocity”, “2nd flow velocity”, “Nozzle angle”, and “Nozzle diameter” in the example above. The name of this record is the input alias specified in workflow configuration (General tab), or the original port name if there is no alias. Workflow inputs are always saved automatically (there is no need to explicitly enable monitoring for the root ports that work as workflow inputs).
      • “Monitoring”: stores data collected from monitored ports. Each monitored port creates its own record here; this record contains a sequence of values collected from the port. If you have assigned an alias to the port (set in workflow configuration, Monitoring tab), it will serve as the name of the port’s record. If a port has no alias, the record will have a default name in the “Blockname.portname” format. Note that if you disable monitoring for some port in the Monitoring pane in Run, it is not recorded to the database.
      • “Outputs”: like “Inputs”, saves values of workflow outputs if there are any (the Outputs pane in Run). Each output is stored in its own record bearing the same name as its alias (set in workflow configuration, General tab) or the original port name. Workflow outputs are also always saved automatically.
      • “Parameters”: saves values of workflow parameters, if any. Record’s name (like “Fluent path” in the example above) is the parameter alias specified in workflow configuration (Parameters tab) or the original port or option name if no alias is set. All parameters are saved automatically.
      • “Run info”: additional information including:
        • “duration”: run duration.
        • “errors”: contains information on errors that occurred during workflow execution.
        • “preset”: the name of a parameter preset applied to this run. Parameter presets are created and selected on the parameters pane in Run.
        • “status”: workflow finish status.
        • “steps_<blockname>”: how many times the respective block started in the workflow.
        • “time_<blockname>”: execution time of the respective block (in seconds).
        • “timestamp_start”, “timestamp_finish”: workflow start and finish timestamps.

Contents of the records described above can also be viewed in the Project database pane (expand a record to see values).

The Project database pane’s b_context menu provides the following commands:

  • Add to new table, Add to data series, Add to data series (transposed): add data from the project database to a report. These commands are available only when you have a report open. See section Adding Workflow Data for details.
  • Collapse all: collapses all open (expanded) records to clean up the Project database pane view.
  • Rename…: changes the name of the selected database record, brings up the rename dialog.
  • Remove: deletes selected records from the database. Hold Ctrl or Shift to select multiple records. Note that for safety reasons the data is actually not removed from the project database but only marked for deletion. Due to this, removing records does not reduce the size of the project database file (project.p7pdb).
  • Reduce database size: performs a database cleanup, completely removing the records marked for deletion and all their data. After the cleanup, it rebuilds the database structure to reduce the project.p7pdb file size. Note that the cleanup and rebuild can take significant time, depending on the amount of data in the database. This action is irreversible, and the command is available only if no workflow is running.

To use data in analysis, it should be added from the project database to the database of a report you are editing. You can also use the data import tool to add data from CSV, Excel and Composite cache files into a report database, and then work with this data in the same way as with the results obtained from a workflow (see Data Import for details). Report data can be processed using various analysis and visualization tools (see Viewers).

Reports

Data analysis documents in pSeven are called reports. Each report can contain a number of analysis tiles (tool windows) and has its own database. Unlike project database, the data from a report database is available to this report only. The report is actually a file pair: a .p7rep file (the report document) and a .p7rdb file with the same file name (the report database). When you manage (copy, move) reports in pSeven, it automatically handles the report and database as a pair, and never shows .p7rdb files in Workspace. If you use a system file manager to transfer reports, you will have to remember to copy both files.

_images/page_results_05_repdbpane.png

To access the report data, click 1 on the report toolbar to toggle the Report database pane 2 which hosts the Data series pane. The data series 3 are plain 1D arrays of numbers or strings contained in the report database. Each report has its own database, so contents of the Data series pane change when you switch reports. Every tile in the report (table, plot) is just another viewer for data series. Viewer tiles can get data only from data series — they cannot load data directly from the project database or external files.

Data series can be created using different data sources:

  • The data which is collected from monitored ports during a workflow run and stored in the project database (see Adding Workflow Data). These data series contain linked copies of data: they remember their sources and are synchronized with the project database so you can easily update them (see Report Update). You can also manually change the project database record, which these data series use as the source (see Change Source).
  • The data imported from files stored on disk: Excel spreadsheets, CSV files, and Composite block cache files. These data series simply store imported data, they do not remember sources. As a result, they cannot be updated automatically — to update their data, you will have to re-import the file manually. See section Data Import for more details.

Data series are never deleted automatically, even if they are unused. For example, when you add a Sample viewer by dragging records from the project database to a report, pSeven automatically creates required data series in the background (see Adding Workflow Data for details). If you later remove that Sample viewer, the data series remain intact in the report database. Even if you remove the source record from the project database, the data series remain functional since they are linked copies. Removing the source breaks the link (the data series can no longer be updated automatically), but recently synchronized copies of data are still stored in the data series, so you can continue to use them in the report, export the data and so on.

The Data series pane’s b_context menu provides a few additional commands to manage existing data series. These commands are described in section Working with Data Series.

Adding Workflow Data

Data generated by a workflow can be added to a report in several ways. The simplest one is when you select one or more records in the project database and drag them to the report editing area (use Ctrl+Click or Shift+Click to select multiple records). This action automatically creates a Sample viewer and adds data series to the report database in background. Created data series can then be used in other viewers, too. Note that you can synchronize selections between some viewers, if such viewers share at least one common data source and also have linked selection enabled. More details are given in Linked Selection section.

The Add to new table command from the Project database pane context menu does the same: creates a new Sample viewer and adds data series in background.

Another option is to create data series manually, and then use them in various viewers. To do this, select data in the project database, then either drag selected records to the Data series pane, or use commands from the Project database pane context menu. Note that since data series in a report database are 1-dimensional, data is usually reshaped when you add it from the project database. In general, all values in a project database record are stacked and then processed as a single matrix, with each column becoming a separate data series. For example, if a record contains multiple matrices, they are stacked vertically, and the obtained matrix is split into columns. Vectors are stacked in such a way that each vector becomes a row, so generated data series will hold values of vector elements with the same index. A sequence of scalar values is interpreted as a single column matrix, and so on.

In some cases it is needed to generate data series from matrix rows instead of columns. For example, if a record contains a single vector value of length \(k\), by default it will generate \(k\) data series containing 1 element. To avoid this, the record data can be transposed before converting it to data series, using one of the following ways:

  1. The Add to data series (transposed) command from the Project database pane context menu will first transpose the matrix obtained by stacking the values contained in selected records, then generate data series.
  2. The same processing is done if you select not the record itself, but the values it contains, and drag this data to the Data series pane or to the report editing area. For example, when you want to convert a single vector to a single data series, you can select only this value to apply the transpose.

The following list summarizes the rules applied when converting the data selected in the Project database pane to data series.

  • Selection: single scalar value.
    • Add to data series: single data series of length 1.
    • Add to data series (transposed): single data series of length 1.
  • Selection: record containing a single scalar value.
    • Add to data series: single data series of length 1.
    • Add to data series (transposed): single data series of length 1.
  • Selection: record containing \(n\) scalar values.
    • Add to data series: single data series of length \(n\).
    • Add to data series (transposed): \(n\) data series of length 1.
  • Selection: single vector value of length \(k\).
    • Add to data series: single data series of length \(k\).
    • Add to data series (transposed): \(k\) data series of length 1.
  • Selection: record containing a single vector value of length \(k\).
    • Add to data series: \(k\) data series of length 1.
    • Add to data series (transposed): 1 data series of length \(k\).
  • Selection: record containing \(n\) vector values of length \(k\) each.
    • Add to data series: \(k\) data series of length \(n\). Vectors are stacked vertically, so each vector becomes a row in a matrix; then each matrix column becomes a new data series.
    • Add to data series (transposed): \(n\) data series of length \(k\). Each vector becomes a new data series.
  • Selection: single matrix value, \(i\) rows and \(j\) columns.
    • Add to data series: \(j\) data series of length \(i\). Each column becomes a new data series.
    • Add to data series (transposed): \(i\) data series of length \(j\). Each row becomes a new data series.
  • Selection: record containing a single matrix with \(i\) rows and \(j\) columns.
    • Add to data series: same as a single matrix value.
    • Add to data series (transposed): same as a single matrix value.
  • Selection: record containing \(n\) matrices with \(i\) rows and \(j\) columns each.
    • Add to data series: \(j\) data series of length \(n \cdot i\). Matrices are stacked vertically and then processed as a single matrix with \(n \cdot i\) rows and \(j\) columns.
    • Add to data series (transposed): \(n \cdot i\) data series of length \(j\). Each row of every matrix becomes a new data series.

Finally note that you can begin adding data to a report as soon as records with some initial contents appear in the project database. It means that you can configure a report while a workflow is still running, and then use the b_refresh button to automatically update the report with new data (see Report Update).

Data Import

You can add data from a file on disk to a report database and then use resulting data series in the report. Supported file formats are CSV, Excel and block cache — a special file format to save the data processed by the Composite block (see section Cache for details). Note that data series created by import do not remember their sources. Updating a report with the b_refresh button has no effect on imported data: the report re-reads data from the project database, but does not re-import data from files.

To import data, use the b_analyze_import_command button on the Data series pane toolbar or select one of the import commands from the pane’s b_context menu menu.

Import Data from CSV

To import data from a CSV file, click b_analyze_import_command button on the Data series pane toolbar and select the Import data from CSV… command. This command opens the import dialog where you can select the source CSV file and adjust CSV parser settings.

Click b_browse and navigate to the file containing the data you want to import.

_images/page_results_import_csv.png

When the file is loaded, configure the CSV parser to read your file correctly. The available settings are the following:

  • Data columns separated by delimiter: select this option if the values in your CSV file are separated by a delimiter. The available delimiter options are “Tab”, “Comma”, “Semicolon”, “Space”, and “Other”. The default is “Comma”. “Other” means any other character that can be used as field delimiter. You can select more than one delimiter.
  • Merge field delimiters: if enabled, delimiter characters following each other are interpreted as a single delimiter separating two fields. For example, “3,,,,8” is read as “3.0, 8.0”. If disabled (default), the parser assumes that such delimiters appear because the file contains empty fields. In this case, “3,,,,8” is read as “3.0, None, None, None, 8.0”.
  • Fixed width columns: select this option if data items in a column are the same length. Input a comma-separated list to specify the number of characters in each column.
  • First line is header: if enabled, the first uncommented line in the CSV file is read as the table header. If disabled, the line is read as a table row with data. “Auto” (default) tries to detect the header automatically.
  • Start from line: specifies the line number to start reading the data. The preceding lines will be ignored.
  • Decimal separator: the character which the file uses as the decimal separator in field values: point or comma.
  • Digit group separator: the character which the file uses to separate digit groups in values: space, dot, comma, or no group separator.
  • Comment character: lines starting with this character are skipped by the parser.
  • Quotation character: the character which the file uses to quote field values. For example, in CSV files using comma as the field delimiter, those fields that contain strings with commas in them are often enclosed in double quotes ".
  • Escape character: instead of quoting fields which contain the same character which is used as a field delimiter, the file can escape the delimiter character inside a field by preceding it with a specific character, usually a backslash \. You may need to change this setting if your file uses another escape character or does not use backslash as escape.
  • Error cell value: if pSeven fails to parse a field value as a number, it can either parse it as a string or replace with the value specified here. Note that when a field is parsed as a string, the data series parsed from the column containing this field will have the string type. Consequently, all other values from this column will become strings even if they were numeric.
  • Text encoding: specifies the text encoding used in the CSV file.

The Preview tab shows the data as it will appear when split into columns. You can adjust the column width manually by dragging the slider right or left, if required. Cells which failed to parse are highlighted red in the preview.

Note

For large CSV files, previews may take a long time to load. Therefore, if you modify parser settings, the preview will not display changes immediately. Use the b_refresh button on the Preview tab to update it.

You can select or deselect columns to include them in the result. Use b_selectall_w and b_deselectall_w buttons for bulk operations with data.

The Result settings pane allows to edit the names of columns that are imported from a CSV file by adding name prefixes. This is done to avoid possible conflicts when the data is loaded to the report database. The name prefix field is empty if the CSV file has a header that is parsed correctly. If the file has no header or the header is parsed incorrectly the prefix will be added automatically. If needed, you can also specify the prefix manually. To remove the prefix, use the b_common_sweep button.

By default, the result will contain only those columns which are selected on the Preview tab. If you want to add and remove result columns manually, disable this mode using Sync with preview command from the context menu or b_sync button. Note that your manual result settings will be reset if you enable sync again.

Switch to the Result tab to view the data to be imported to the report database. Note that the column name is the name of the data series shown in the Data series pane.

If the names of data you import from a CSV file match the ones already existing in the report database you can either overwrite existing data series or append import data to existing one. In case of overwriting, the data from your CSV file will permanently replace the data already stored in the report database. In case of appending, the new data is simply added to the existing one. The append mode is enabled by ticking the respective checkbox in the Results settings pane. It may be useful when aggregating data from multiple CSV files.

Import Data from Excel

To import data from an Excel file, click b_analyze_import_command button on the Data series pane toolbar and select the Import data from Excel… command. This command opens the import dialog where you can select the source Excel file and adjust Excel parser settings.

Click b_browse and navigate to the file containing the data you want to import.

_images/page_results_import_excel.png

Configure an Excel parser so that it can read your file correctly. The available settings are the following:

  • Sheet: selects the working sheet. The Source tab at the bottom of the dialog displays the contents of the selected sheet.
  • Parse header: specifies the method of detecting column names in the loaded document.
    • Auto: if cells in the first row contain text, they are used as column names. Otherwise, default column names are used.
    • Manual: specifies the row which contains column names. If there is more than one header row above your data, use advanced settings to indicate how many rows are read as headers 1 and which row 2 contains column names. In our example, cells in the first row contain column names; the first and the second row should be parsed as the header to remove an empty line from results.
  • Import data from the cell range: specifies a cell range of the selected sheet to be used as data source for import. For example, specify "B3:D10" to exclude the “Index” column from results.
    • Rows: specify the number of the first and the last row in a range.
    • Columns: specify the first and the last column in a range.
  • Error cell value: if pSeven fails to parse a cell value as a number, it can either parse it as a string or replace with the value specified here. Note that when a cell is parsed as a string, the data series parsed from the column containing this cell will have the string type. Consequently, all other values from this column will become strings even if they were numeric.
_images/page_results_import_excel_preview.png

The Preview tab shows the data as it will appear when split into columns. You can adjust the column width manually by dragging the slider right or left, if required. Cells which failed to parse are highlighted red in the preview.

Note

For large Excel files, previews may take a long time to load. Therefore, if you modify parser settings, the preview will not display changes immediately. Use the b_refresh button on the Preview tab to update it.

Use b_selectall_w and b_deselectall_w buttons for bulk operations with data.

The Result settings pane allows to edit the names of columns that are imported from an Excel file by adding name prefixes. This is done to avoid possible conflicts when the data is loaded to the report database. The name prefix field is empty if the Excel file has a header that is parsed correctly. If the file has no header or the header is parsed incorrectly or you switched the header parsing mode from “Auto” to “Manual” the prefix will be added automatically. If needed, you can also specify the prefix manually. To remove the prefix, use the b_common_sweep button.

By default, the result will contain only those columns which are selected on the Preview tab. If you want to add and remove result columns manually, disable this mode using Sync with preview command from the context menu or b_sync button. Note that your manual result settings will be reset if you enable sync again. To add columns to result in the manual mode use the b_add_to_result button.

Switch to the Result tab to view the data to be imported to the report database. Note that the column name is the name of the data series shown in the Data series pane.

If the names of data you import from an Excel file match the ones already existing in the report database you can either overwrite existing data series or append import data to existing one. In case of overwriting, the data from your Excel file will permanently replace the data already stored in the report database. In case of appending, the new data is simply added to the existing one. The append mode is enabled by ticking the respective checkbox on the Results settings pane. It may be useful when aggregating data from multiple Excel files.

Import Data from Composite Block Cache

A Composite cache file is a special file format to save the data processed by the Composite block during workflow run. Cached data is saved to a file on disk and can be imported to a report database and used as data series in your report.

To import data from a Composite cache file click b_analyze_import_command button on the Data series pane toolbar and select the Import data from Composite cache… command. In the import dialog, click b_browse and navigate to the source file with the data to import.

_images/page_results_import_composite_cache.png

The Preview tab shows the data as it will appear when split into columns. You can adjust the column width manually by dragging the slider right or left, if required. Use b_selectall_w and b_deselectall_w buttons for bulk operations with data.

Note that as the Composite cache file is a special file format, the CSV parser settings are not available. You can switch to the Source tab to see the contents of the original file.

_images/page_results_import_composite_cache_source.png

The Result settings pane allows to edit the names of columns that are imported from the Composite cache file by adding name prefixes. This is done to avoid possible conflicts when the data is loaded to the report database. The name prefix field is empty if the Composite cache file has a header that is parsed correctly. If the file has no header or the header is parsed incorrectly the prefix will be added automatically. If needed, you can also specify the prefix manually. To remove the prefix, use the b_common_sweep button.

By default, the result will contain only those columns which are selected on the Preview tab. If you want to add and remove result columns manually, disable this mode using Sync with preview command from the context menu or b_sync button. Note that your manual result settings will be reset if you enable sync again.

_images/page_results_import_composite_cache_result.png

Result settings. 12 Deselected do and done columns that are not included in the result and thus not shown in the Result settings pane. 3 The b_add_to_result button that is used to add columns to result in the manual mode. 4 Disabling Sync with preview to rename columns.

Switch to the Result tab to view the data to be imported to the report database. Note that the column name is the name of the data series shown in the Data series pane.

If the names of data you import from a Composite cache file match the ones already existing in the report database you can either overwrite existing data series or append import data to existing one. In case of overwriting, the data from your cache file will permanently replace the data already stored in the report database. In case of appending, the new data is simply added to the existing one. The append mode is enabled by ticking the respective checkbox in the Results settings pane. It may be useful when aggregating data from multiple Composite cache files.

Report Update

Reports which use data from the project database as a source can be easily updated with new data (synchronized with the project database). The update button b_refresh is located at the bottom or report toolbar.

Typical uses of this function are:

  • If you repeatedly run the same workflow (for example, varying its input parameters), create data series using the special “<last run>” project database record as a source (see the database structure description in section Project Database). Then you can update the report to view the most recent run data.
  • If you want to monitor workflow execution (for example, using a convergence plot), set up port monitoring, run the workflow and immediately switch to Analyze. Data in the project database is available as soon as it is output by the monitored ports, so you can start creating a report and adding data to it while the workflow is running. After configuring the viewers you need, update the report at any time to see the newest results.

The update function works because data series in the report database are linked copies. They hold a copy of data which was recently synchronized from the project database, and also stay connected to the project database records (remember the data sources). When you click b_refresh on the report toolbar, pSeven synchronizes the report database with the project database (updates data series), and then refreshes all report tiles, which re-read the updated data series.

Note that update does not work with data series which were created by importing data from files: these data series do not remember their data sources, so they are skipped during synchronization.

As a safety measure, update does not synchronize deletions. That is, if you delete the source record from the project database, or it gets corrupt, data series in the report still keep a recent copy of data. In such a case this copy becomes unlinked from the project database — since the source is no longer available, update is effectively disabled. However it does not break the report, because you can continue to work with the last synchronized copy. Even if you delete the entire project database and then update the report, it will not remove any data from the report.

Working with Data Series

This section describes additional functions, which work with report data (existing data series). Related commands are available from the Data series pane’s b_context menu. For details on creating new data series, see sections Adding Workflow Data and Data Import.

Remove

pSeven never deletes any data from the report or project databases automatically. For example, if you add a report tile which uses some data series, and then delete this tile, the data series are kept intact. If you create a data series which uses a project database record as its source, and then delete the record, the data series keeps a copy of data. If you run a report update after this, the data series is not updated (since the source is no longer available), but it still keeps a copy of data.

That is to say, data series are rather persistent. The only way to delete a data series permanently is to use the Remove command from the Data series pane’s b_context menu. Note that it does not remove any data from the project database — deleting a data series only removes a copy of data stored in the report database.

Also, deleting data series does not affect the configuration of viewers which used them. The viewer becomes empty (shows a placeholder), but its configuration keeps the names of data series. If you later create new data series with the same names, the viewer will recover automatically.

In many cases it is possible to avoid manual deletion and re-creation of data series:

  • If you create data series with the special “<last run>” record as a source, use the update function to load new data (see Report Update).
  • If your report uses results of a specific workflow run, and you want to switch it to another run, use the Change source… command (see Change Source).
  • If you import data to a report from a file or a set of files, use matching names for imported columns. When importing, you can select to overwrite old data or to merge new data into existing data series (see Data Import for details).
  • If you need to partition some dataset, you can probably use the Split data… command (see Split Data).
  • Finally, if you want to cleanup the report database or temporarily remove some data, consider using the Discard values command (see Discard Values).

Rename

You can rename any data series for convenience. For example, various viewers use names of data series as plot labels, column titles, and such. So, if you give proper names to your data series, you will not need to edit these names in viewer configuration manually.

To rename a single data series, you can double-click its name, or hover it with your mouse cursor and click the b_editval icon, or select the data series and hit F2.

If you select multiple data series (hold Ctrl or Shift for multiselection), the Rename… command and the F2 hotkey open the Rename data series dialog.

_images/page_results_data_series_multi_rename_dialog.png

This dialog is intended to edit the common part at the beginning of the names you have selected (the common prefix). It finds the prefix automatically and allows you to change or remove it.

When you rename data series, pSeven automatically updates configuration of viewers which use these data series (changes names of data sources in viewer configuration).

Discard Values

Instead of deleting data series, you can use the Discard values command to remove their contents while keeping the data series in the report database. These data series will have zero size (contain no data), but will remain functional — for example, if their source is a project database, you can later use the report update function (see Report Update) to fill the data series.

_images/page_results_data_series_discard_values.png

The Discard values command is mostly intended for creating clean preconfigured reports, which should be updated after running a workflow. In particular, many reports in the example projects (see Examples) are prepared with the help of this function.

Split Data

The Split data… command can be used to split some dataset (several data series) in the report into two new data sets — for example, to create training and test samples which would be used for model training or in other predictive modeling tools in Analyze.

_images/page_results_split_data_dialog.png

Settings in the Split data dialog are:

  • Input data, Output data: the data series to split. From each of these data series, two new data series will be generated.

  • Detect tensor structure: if enabled, pSeven will test the data set for tensor structure and try to generate new data sets in such a way that they have tensor structure too. This structure is a specific type of DoE similar to full factorial and required by tensor approximation techniques. It is recommended to enable this feature if your input data set is similar to a full factorial. However note that the tensor structure test can require much time for high-dimensional data sets (tens of data series).

  • Training subset ratio: the percentage of points that will be included into the training subset; remaining points will be added to the test subset.

  • Training subset prefix, Test subset prefix: the prefixes that will be added to names of new data series.

  • Splitting method: selects the method to use when distributing points to the training and test subsets. Using CART or DUPLEX is recommended. These two methods are deterministic: if you use the same inputs and outputs and set the same training subset ratio, the resulting distribution of points in the training and test subsets is always the same. If you want to create multiple different splits in this case, you can use the random split method with different seeds.

    • CART (Classification And Regression Tree): uses a variance-based algorithm to assign points to the train and test subsets. This method aims to create subsets which both provide a good representation of input and output variance. CART is a deterministic method.

    • DUPLEX: uses a distance-based algorithm which iteratively assigns points to the train and test subsets. On the first step, it selects two pairs of distant points — a pair for each of the subsets. Next steps add points one by one, selecting those which are farthest from the points already selected. DUPLEX is a deterministic method.

    • Random with seed: assigns points randomly. Note that a random split can sometimes generate highly non-uniform samples which do not cover some design space areas properly. For example, it is possible that some areas will be covered by the training sample but not by the test sample, which is bad for model validation. However, with this method you can change the random seed and retry splitting to “remix” the points. This is a pseudorandom method: the algorithm is non-deterministic, but results can be reproduced by using the same random seed value.

      Note that the Split data dialog does not keep the last used seed: a new seed is automatically selected each time you open the dialog.

Change Source

The Change source… command can be used to switch a data series to another source record in the project database. It may be helpful if you want to switch between results of different workflow runs to compare them, or update the report to show data obtained from the latest workflow run.

_images/page_results_data_series_change_source_dialog.png

Note that the database record which you select as the new source should hold the same type of data as the old source (for example, data from the same port but in different workflow runs). Otherwise you can get unexpected results.

This command is disabled for data series created by importing data from a file, since these ones do not remember their sources.

Note also that in many cases using the report update function is more convenient than manually changing sources of data series (see Report Update for details).

Data Export

You can export data from a report to a CSV or Excel file.

_images/page_results_export_menu.png
  • Open the Report database pane and select data series to export.
  • Click b_analyze_export_command on the Data series pane toolbar and select the export format. You can also use the Export data to CSV… and Export data to Excel… commands from the pane’s b_context menu.

Both commands open a dialog where you can adjust export settings and preview results.

Export Data to CSV

To export data to a CSV file:

  • On the Data series pane, select the data series to export (hold Ctrl or Shift for multiselection).
  • Click b_analyze_export_command on the Data series pane toolbar and select the Export data to CSV… command.
  • In the export dialog, click b_browse and select the export file location.
  • In the export dialog you can also select or deselect data series to export using the checkboxes on the Source tab.
  • Adjust settings as needed and click Export to save the file.
_images/page_results_export_csv.png

CSV export dialog. 1 The Source tab shows contents of the report database. Highlighted columns are selected for export. 2 Quick buttons to select or deselect all columns.

Available CSV export settings are:

  • Data columns separated by delimiter: sets the character which is used to separate values in the CSV file (field delimiter). You can select one of the commonly used delimiters or specify any other character.
  • Write header: if enabled (default), the first line in the CSV will contain column names.
  • Quotation character: sets the character which is used to quote field values. A data value in CSV is quoted when it contains a character which is used as a delimiter — for example, the delimiter is comma, and data is 3,142.
  • Escape character: sets the character which is used to escape the quotation character, when the quotation character is a part of the data value. For example, if the value is Case "A", and " is the quotation character, the quotes need to be escaped.
  • Missing value: missing values (shown as None on the Source tab) are replaced with the number or string specified here. Default is empty, so None values are exported as empty CSV fields.
  • Line endings: the line endings to use in a generated CSV file — Windows (CRLF) or Linux (LF).
  • Text encoding: selects the text encoding for the CSV file.
  • Result settings pane: enables reordering and renaming columns in the file.
_images/page_results_export_csv_result.png

By default, column names saved to the CSV file are the same as names of the data series you have selected to export. On the Result settings pane, you can change the names 1 which are added to the CSV header line 2 shown on the Result tab. You can also change the order of columns in the file using the b_blconf_up and b_blconf_down buttons on the pane’s toolbar. All these changes are applied to the exported file only — data series in the report keep their names and order.

Export Data to Excel

To export data to an Excel file:

  • On the Data series pane, select the data series to export (hold Ctrl or Shift for multiselection).
  • Click b_analyze_export_command on the Data series pane toolbar and select the Export data to Excel… command.
  • In the export dialog, click b_browse and select the export file location and format (.xlsx or .xls).
  • In the export dialog you can also select or deselect data series to export using the checkboxes on the Source tab.
  • Adjust settings as needed and click Export to save the file.
_images/page_results_export_excel.png

Excel export dialog. 1 The Source tab shows contents of the report database. Highlighted columns are selected for export. 2 Quick buttons to select or deselect all columns.

Available Excel export settings are:

  • Sheet: specifies the name of the working sheet.
  • First cell: specifies the cell in the working sheet where to start writing data (the top left cell of the exported table).
  • Write header: if enabled (default), the first row in the exported table will contain column names.
  • Missing value: missing values (shown as None on the Source tab) are replaced with the number or string specified here. Default is empty, so None values are exported as empty cells.
  • Result settings pane: enables reordering and renaming columns in the file.
_images/page_results_export_excel_result.png

By default, column names saved to the Excel file are the same as names of the data series you have selected to export. On the Result settings pane, you can change the names 1 which are added to the first table row 2 shown on the Result tab. You can also change the order of columns in the table using the b_blconf_up and b_blconf_down buttons on the pane’s toolbar. All these changes are applied to the exported file only — data series in the report keep their names and order.

Viewers

Each window in a report is a data viewer — a specific data analysis or visualization tool that uses data from the report database as a source. New viewers can be created in a few different ways:

  • If you select some data series in the Data series pane and click a button on the report toolbar, it creates a viewer and automatically adds the selected data to its sources. Such viewers are created with default configuration which can then be adjusted to your needs.
  • If you click a button on the report toolbar without selecting any data series, it adds an empty viewer, and you will have to edit its configuration to add data sources.
  • Certain viewers (the Sample viewer, for example) provide their own methods of creating additional viewers with the same data sources (or a subset of them).
  • A generic method to create a new viewer of different type with the same data sources is also available. If you open the Data series pane and select an existing viewer, you can note that the data series it uses are automatically selected (highlighted in the Data series pane). Clicking a button on the report toolbar after this creates a new viewer that takes the selected data series. Note that you can synchronize selections between some viewers, if such viewers share at least one common data source and also have linked selection enabled. More details are given in Linked Selection section.

To create a copy of the current viewer use the Duplicate command (Ctrl + D) from the context menu. The Remove command (Ctrl + F4) deletes a viewer from the report. Note that the data series it uses are not deleted and will remain in the report database.

Each viewer window can be minimized and maximized for convenience purposes.

Datasets

Viewers operate datasets — collections of data series which are processed separately in a way specific to the current viewer. For example, in the 2D plot viewer each dataset can be rendered as a line on the plot; in the 3D plot viewer a dataset can be rendered as a surface or a point cloud, and so on (most viewers support multiple datasets). In viewer configuration dialogs, each dataset is configured on its own tab. Such a tab contains various visualization settings (specific to the current viewer, described further) and a common Dimensions pane where you can edit the dataset’s contents.

_images/page_results_06_dimpane.png
  • Column: in the Sample viewer, a column stands for sample dimension. Each column corresponds to a data series in the report database.
  • Axis: in plot viewers, specifies the coordinate axis. If the viewer has a fixed number of axes, you can switch them in this column.
  • Data source: specifies the source data series. To change, hover the table cell and click the edit icon, or double-click the cell (opens a drop-down list showing data series currently found in the report database).
  • Error: in viewers that support displaying uncertainties in values (such as error bars on a 2D plot), this column allows to select 1 or 2 additional data series that contain the error values. Error type can be set to:
    • Symmetric — allows to select 1 additional data series containing variations of values.
    • Relative offset — allows to select 2 additional data series that contain upper and lower variations. These values are added to (upper) or subtracted from (lower) the values contained in the data series selected in the Data source column — for example, to obtain the coordinates of the upper and lower points of an error bar.
    • Absolute range — allows to select 2 additional data series that contain upper and lower values “as is” — for example, the coordinates of the upper and lower points of error bars.
  • Filter: applies a value or a string filter to the dataset. When filtering numeric values, you can specify an inclusive range by setting its upper and lower bounds. Enable “Exclude value range” to make the range exclusive. Use “Include NaNs and empty values” to include or exclude NaNs/empty values.
_images/page_results_07_filterdialog.png

To enable the filter select the appropriate checkbox. “Include NaNs and empty values” is a default setting for the Sample viewer and Page viewer. In other viewers, it is already enabled for newly created dataseries not mapped to any particular axis. A string filter is used to find all the occurrences of a given match pattern in a string. It supports two wildcard characters: an asterisk (*) and a question mark (?). The asterisk is a placeholder for “zero or more characters”. The question mark is a placeholder for “exactly one character”. If multiple filters are added, the viewer will show only those points that pass all filters.

  • Format: useful for tables only, sets the number format (general or scientific notation, the number of decimal digits).

Note that a dataset can contain more dimensions (data series) than the supposed dimension of the viewer. Additional data series can be used for point filtering and custom visualizations in tooltips or you can use axis selection to switch between different sources in the same plot.

The Dimensions pane’s b_blconf_context menu provides the following commands:

  • Copy from…: selects the same data sources that are used in another dataset.
  • Copy to…: applies the selection of data sources used in the current dataset to other datasets. The command brings up the Copy to dialog where you can specify one or several datasets to use the same selection of data sources.
  • Set filters…: brings up the Filter dialog where you can further apply a value or a string filter to the dataset.
  • Set number formats…: changes the number format for selected data sources — for example, in Sample viewer this changes the number format in corresponding columns on the Data tab.
  • Move dimension up, Move dimension down: allow to reorder data sources in the list. These commands can be used instead of switching axes in the Axis column.
  • Add dimension: adds a new data source to the dataset. After adding a dimension, you can select the source data series from the report database using a drop-down list in the Data source column.
  • Remove dimensions: removes the selected data source from the dataset. This command does not remove data from the report database, it affects only the current viewer configuration.

Sample Viewer

Sample viewer is a tool for initial data analysis which can also work as a starting point in new reports. Other viewers can be created quickly by selecting data in a Sample viewer and clicking any viewer button on the report toolbar. The data you selected in the Sample viewer will be automatically added to the new viewer’s configuration.

The Sample viewer window contains four tabs:

  • Data — shows sample data. Other viewers can be created from this tab.
  • Statistics — provides descriptive statistics.
  • Correlations — calculates correlations, shows scatter plots for pairs of sample dimensions and value distribution in each dimension. From this tab you can also create additional viewers for the plots and histograms it shows.
  • Dependency — allows to test how well the sample can be approximated by a linear or quadratic model.

New Sample viewer can be created in several ways:

  • When you select one or more records in the Project database pane and drag them to the report, pSeven automatically creates a new Sample viewer and opens its Data tab (see Adding Workflow Data).
  • Like any other viewer, a Sample viewer can also be created manually by selecting data series in the Report database pane and clicking the viewer button on the report toolbar. When you add a Sample viewer manually, you can select the tab to open from the menu that appears when you click the Sample viewer button.

Data

The Data tab shows raw sample values. Each column (sample dimension) corresponds to a data series in the report database. If additional data series with error values are selected in the viewer’s configuration, the table also shows error values. Cell colors indicate high (red) and low (blue) values. To disable background cell coloring, use the Cell colors command from the viewer’s b_blconf_context menu. To navigate to a specific row, use the Ctrl G hotkey or the Go to row command from the menu.

_images/page_results_viewers_sample_viewer_01_data.png

The buttons in the viewer’s title bar 1 provide quick access to frequently used commands:

  • b_sample_viewer_copy_value (Ctrl C) copies the selected cell value or values from selected rows to clipboard. To select a row, click its index 2 (hold Ctrl or Shift for multiselection). Note that when you copy rows, pSeven skips the cells from deselected columns 3. Copied values can be inserted into a spreadsheet or workflow inputs and parameters (see below).
  • b_sample_viewer_set_format changes the number format in the selected columns. Deselected columns 3 keep their current format.
  • b_sample_selectall and b_sample_deselectall select and deselect all columns.

The above commands are also available from the viewer’s b_blconf_context menu.

Data is copied from the Sample viewer as a tabular-formatted text, so you can paste it directly to Excel or other spreadsheet application, which supports the plain text tabular format. You can also paste it to workflow inputs or parameters in Run, provided that there is a match between the names of inputs or parameters and the names of Sample viewer columns:

  • Set up a Sample viewer which has a column corresponding to each workflow input or parameter — that is, the column name is the same as the input or parameter name in Run. If the names of columns do not match with names in Run, you will have to either rename data series in the report database (to change column names) or adjust the aliases in workflow configuration. You can have extra columns — since their names do not match with Run, values from these columns will be ignored.
  • Optionally, select required columns using the checkboxes in column titles.
  • Select the row to copy by clicking its index, then use the Ctrl C hotkey.
  • Switch to Run, select the pane (inputs or parameter) to insert the values, and hit Ctrl V.

The table on the Data tab can be filtered by index or by column values by specifying filters in the viewer’s dataset configuration (see Datasets). When filtered, the table shows only those values that pass the filter. For example, setting the index range from 1 to 25 limits the table view to the first 25 rows. You can export this filtered data to a new set of data series, using the Create new data series… command from the viewer’s menu. Note that it also works with selected columns only, and if you deselect all columns, the command is disabled.

Other tabs in the Sample viewer also respect value filters — that is, descriptive statistics and correlations are always calculated for the data you see on the Data tab; approximation models on the Dependency tab are trained on the filtered sample as well. However the value filter is not applied when you export data or create new viewers from the Sample viewer — these functions always work with the full sample data.

When creating a new 2D plot or a Parallel coordinates plot based on selections made in the Sample viewer, you can synchronize the data between these three viewers by linking them. Actually it means that selections you make on the Data tab will select corresponding points in other viewers linked to it (see Linked Selection for details).

Linked selection option is enabled by default. To disable data sharing with other viewers, toggle b_linked_selection on the viewer toolbar or tick the respective checkbox in the viewer’s configuration (see Configuration for details).

Statistics

The Statistics tab shows descriptive statistics. As noted above, all statistics are calculated only for those values that pass the dataset filter.

_images/page_results_viewers_sample_viewer_02_statistics.png

The Summary groupbox shows general sample statistics, while the table shows statistics for each sample dimension. Most statistics are self-explanatory; in case you need more information, a brief description is provided in tooltips that appear when you hover the names in the left column.

Summary statistics may also contain notes on specific sample properties such as:

  • Tensor structure — the sample has a specific structure suitable for the Tensor Approximation technique; see section Tensor Products of Approximations for details.
  • Full factorial — the sample can be used for a full factorial experiment.
  • Orthogonal design — the sample is an orthogonal array, and other.

The detailed statistics table can be exported to a CSV or text file using the Export statistics to file… command from the viewer’s menu.

Correlations

The Correlations tab is used to spot correlations in the sample. As noted above, correlation analysis is performed only for those values that pass the dataset filter. Plots and histograms on this tab also use the filtered data.

_images/page_results_viewers_sample_viewer_03_correlations.png

You can switch between two plot modes using the selector 1: the X vs X mode shows all pairwise combinations, while the X vs Y shows inputs against outputs and allows you to select the input and output data series.

In the X vs X mode, the Correlations tab shows:

  • pairwise scatter plots for all sample dimensions (lower triangle),
  • correlation measures and p-values for these pairs (upper triangle), and
  • value distribution (histogram) for each dimension (diagonal).

The Correlation selector 2 specifies which method is used to compute correlation coefficients shown on scatter plots and in the table to the right. The dimensions to analyze are set using the Dimensions selector 3.

Correlations can be quickly recalculated for a subset of the sample data by selecting an area on any of the scatter plots. When a selection is active, plot points outside the selection are grayed, and values on the main pane and in the correlation table are automatically updated. To reset the selection, click any plot again.

In the X vs Y mode, the Correlations tab displays a scatter plot for each of the pairwise combinations of inputs and outputs you select:

_images/page_results_viewers_sample_viewer_03_1_correlations.png

If you select the Correlations only checkbox 1, the main area shows correlation coefficients and p-values instead of scatter plots.

_images/page_results_viewers_sample_viewer_03_2_correlations.png

Note that correlation coefficients cannot be calculated when both dimensions in the pair are constant. For such pairs, the viewer shows N/A instead of a numeric value. Cells that do not satisfy correlation or p-value thresholds are shown in gray (see section Configuration for details).

You can show any plot or histogram from the main pane in a separate viewer: select it and click the b_sample_viewer_zoomplus button on the viewer’s toolbar, or use the Zoom selected dimensions slice to 2D scatter plot… command from the b_blconf_context menu. Note that in this case the dataset filters do apply to the new viewer (in contrast with the viewers created from the Data tab).

To copy correlation coefficients and p-values to the clipboard, click the b_sample_viewer_copy_value button on the toolbar, or use the Copy correlation coefficients command from its b_blconf_context menu. This copies the contents of the right pane as tabular-formatted text, so you can paste them directly to Excel or other spreadsheet application, which supports the plain text tabular format.

Plots from the main pane on the Correlations tab can also be exported to a file using the Export scatter matrix image… command from the viewer’s b_blconf_context menu.

Dependency

On the Dependency tab, you can try to fit the sample with a linear or quadratic approximation model. After you select the input and output columns and the approximation technique to use, pSeven will train a model in the background and validate it on the sample data.

_images/page_results_viewers_sample_viewer_04_dependency.png

The Model statistics group box shows training sample details and model validation results.

  • Effective sample size — the number of points in the filtered sample. The filtered sample is obtained from the original one, with all duplicates and lines containing non-numeric values including missing, infinity and NaN ones being removed.
  • Duplicate points — the number of exact duplicates.
  • Ambiguous points — the number of points with ambiguous output values — that is, points with the same input values but different outputs.

Validation results are shown in the model accuracy table and on the scatter plot which compares sample values with the values predicted by the model. The predictions can be obtained either by calculating model outputs for the sampled inputs (default) or from cross-validation which is automatically performed when training the model. To switch these modes on the plot you can use the predictions selector. The model accuracy table always shows both kinds of errors:

  • Train accuracy — errors calculated using the training sample. Note that this validation method always overestimates model accuracy.
  • Internal validation — errors estimated during the cross-validation procedure.

The feature importance chart shows estimates of inputs influence on model output:

  • The sole effect estimate for an input can be understood as the fraction of model output variance, which is “explained” by this input assuming all other inputs are fixed.
  • The interactions estimate for an input can be understood as the remaining fraction of model variance, which is “explained” by this input when all inputs can vary — that is, the part of output variance, which exists due to some interactions between this input and other inputs.

Note that the accuracy of feature importance estimates depends on model accuracy: more accurate models give more accurate estimates.

On the Dependency tab, you can also copy the errors and feature importance data to the clipboard, and export the scatter plot and feature importance chart as images:

  • To copy data, click the b_sample_viewer_copy_value button on the toolbar or select the Copy dependency analysis data command from the b_blconf_context menu. The data is copied as a tabular-formatted text, so you can paste it directly to Excel or other spreadsheet application, which supports the plain text tabular format.
  • To export images, use the Export feature importance chart… and Export approximation errors image… commands from the b_blconf_context menu. These commands open the Export image dialog where you can adjust image size with a few other settings, and select the export location.

Configuration

_images/page_results_viewers_sample_viewer_05_conf.png

General settings for the Sample viewer are the following:

  • Title: sets the viewer’s title.
  • Cell colors: if enabled, columns on the Data tab are colored in red and blue to indicate high and low values, respectively.
  • Linked selection: when enabled, selections made on the Data tab also select corresponding points in other viewers linked to this one. See Linked Selection for details.
  • Correlations and scatterplot matrix — settings related to the Correlations tab:
    • Plot: the type of plot to show (X vs X or X vs Y).
    • Correlation: the correlation coefficient to calculate.
    • Selected dimensions: analyzed sample columns.
    • Selected rows: selected x-dimensions.
    • Selected columns: selected y-dimensions.
    • Correlation threshold: the significance threshold. Note that correlation is significant only if: the correlation coefficient value is greater than the correlation threshold, and the p-value is less than the p-value threshold.
    • p-value threshold: a confidence measure for the calculated correlation coefficient. By definition, a p-value is the estimated probability of obtaining a correlation coefficient value equal to or larger than shown while there is actually no correlation (that is, true correlation coefficient is 0, and shown value is a random result). That is, smaller p-values mean that correlation analysis results are more reliable. Usually, a p-value equal to 0.05 (5% probability) is considered small enough to ensure that correlation is significant.
    • Display limit of scatter matrix: the maximum number of points to show on plots.
    • Marker color: sets solid or gradient colors for plot points. When a gradient color is selected, the gradient is applied according to the color axis setting.
    • Color axis: specifies which sample dimension to use to assign gradient colors to plot points. Points are sorted by this dimension.
    • Marker size: point size on scatter plots.
    • Marker opacity: point opacity on scatter plots.
    • Marker stroke: enables point marker outlines.
    • Histogram bins count: specifies the number of histogram bins. Available options are:
      • Adaptive: adapts the number of bins to the AMISE-optimal bandwidth \(h = \sigma (\frac{24 \sqrt{\pi}}{n})^{1/3}\), where \(n\) is the number of points in the dataset and \(\sigma\) is the non-biased sample standard deviation. The number of bins is \(\lceil \frac{x_{max} - x_{min}}{h} \rceil\), where \(x_{min}\) and \(x_{max}\) are the value axis bounds.
      • Sturges’ rule: the number of bins is \(\lceil \log_2 n + 1 \rceil\), where \(n\) is the number of points in the dataset. This rule is derived from a binomial distribution (assumes an approximately normal distribution) and implicitly bases the bin sizes on the range of the data.
      • Manual: allows to specify the number of bins directly.
  • Dependency — settings related to the Dependency tab:
    • Inputs: sample columns containing input values.
    • Outputs: sample column with output values.
    • Technique: selects linear or quadratic approximation.
    • Display limit of scatterplot: the maximum number of points to show on the validation plot.
    • Draw errors in scatterplot: specifies how to calculate model predictions (see section Dependency for details).

2D Plot

The 2D plot tool creates a plot that can contain multiple curves and point sets, with optional error bars or bands. The viewer supports multiple datasets, each dataset is rendered as a layer on the plot.

_images/page_results_12_2dplot1.png

Hovering a point shows its coordinate values in a tooltip. If error data is available, the tooltip also shows the upper and lower error values (for example, the lower and upper coordinates of the error bar). Note that you can customize the tooltip’s content to include more details using additional data series (see Dataset Configuration).

You can select points on the plot or define an area of interest to zoom to this area using the respective tools on the viewer toolbar. Two available modes are detailed below.

Selection Mode

Use the b_2dplot_select tool to activate the point selection mode. Click a point to select it, or click and drag to select all points in a plot area. Hold Ctrl to modify existing selection (add or deselect points). Double-click anywhere on the plot to deselect all points.

Note that if the plot has linked selection enabled (default setting), your point selections will be synchronized with the ones shown on other viewers linked to this plot (see Linked Selection for details). Toggle b_linked_selection button on the viewer toolbar or tick the respective checkbox in the viewer’s configuration to disable data sharing with other viewers.

Zoom Mode

The plot can be zoomed in by selecting the area of interest using the b_2dplot_zoom tool. Double-click anywhere on the plot or click b_plot_zoom1 to reset zoom and show the full plot.

You can export a 2D plot to an image file using the Export image… command from the viewer’s menu. It is also possible to save copies of the data currently shown on the plot to the report database with the help of a Create new data series… command. It brings up the respective dialog that creates new data series taking into account all selections and filters applied in the viewer’s configuration. Prefixes are automatically added to the names of new data series to avoid conflicts. Note that in the zoom mode the command does not save the current zoomed in view but all points shown on the plot.

General Settings

_images/page_results_14_2dplot_gen_conf.png

General settings for the 2D plot viewer are the following:

  • Title: sets the plot title.
  • Legend location: sets the legend location on the plot.
  • Display limit: sets the maximum number of points displayed on the plot. If this number is less than the size of a dataset, points are sieved so the number of shown points does not exceed the limit. This is purely a display setting that does not apply any data filters to the dataset itself.
  • Sort plot points by X axis value: controls the order of connecting points with a line. If enabled (default), points are connected in the order of increasing their X coordinate value. If disabled, points are connected in the same order as they follow in the dataset.
  • Linked selection: When enabled (default), selections you make on this plot also select corresponding points in other viewers linked to this plot. See Linked Selection for details.

The Axes pane allows to set axis labels, plot ranges and axis scales. Note that if an axis range is not specified, the viewer sets it automatically in such a way that it includes all points from all datasets.

Dataset Configuration

The 2D plot viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. Generally, each dataset is rendered as another line on the plot (or you can hide the line and show point markers only). Note also that you can re-order the plot “layers” by dragging the dataset tabs on the left tab bar.

_images/page_results_15_2dplot_ds_conf.png

Basic settings are:

  • Name: specifies the name displayed in the plot legend.
  • Visible: allows to temporarily hide this dataset from the plot.
  • Extrapolate: if enabled, and the dataset is rendered as a line, the line will be linearly extrapolated up to the plot’s bounds. Another usage is to draw a horizontal line on the plot: if the dataset contains only one point, extrapolation will be constant.

Dataset contents are edited in the Dimensions pane where you can select source data series, add error data, and apply value filters (see section Datasets for details). Note that a dataset can contain more than two dimensions (for example, “optimal dP” and “pressure” in the example above), but such additional data series are not rendered. They can be used to:

  • Filter points in the dataset shown on the plot. For example, filter settings applied for “optimal dP” data series will also be respected for other dimensions.
  • Include details on data values not represented on the plot to multiline tooltips.
  • Add images generated by a workflow to point tooltips by including a data series containing the image data.
  • Switch between different data sources on the same plot.

Point marker settings:

  • Markers: ticking the box specifies whether to show point markers for this dataset on the plot.
  • Color: marker color. Clicking the box opens a color selector.
  • Style: sets the marker style.
  • Size: sets the size of markers.

Line settings:

  • Lines: ticking the box specifies whether to draw the line connecting points on the plot.
  • Color: line color. Clicking the box opens a color selector.
  • Style: sets the line style — solid, dashed, or dotted.
  • Thickness: sets the line thickness.
  • Fill under: if enabled, the area under the line will be filled with line color. This setting works even if drawing the line is disabled.

Error settings:

  • Errors: ticking the box specifies whether to draw error bars (or the error band, depending on the selected style).
  • Color: sets the color of error bars or the band.
  • Style: allows to switch between drawing errors as error bars individually for each point, or drawing a solid error band around the line.
  • Thickness: applies to error bars only, sets their line thickness.

Column Chart

A column chart displays data as vertical bars which lengths are proportional to the values they represent.

_images/page_results_16_chart1.png

The chart viewer supports multiple datasets, each dataset is rendered as another set of bars. The bars can be drawn side-by-side or stacked; in the stacked mode, values in the same category (with the same point index in the dataset) are drawn on top of each other.

_images/page_results_17_chart2.png

Hovering a bar on the chart shows its numeric value. If the bars are stacked, the tooltip shows the value represented by the hovered section of the bar (not the total bar value).

General Settings

_images/page_results_18_chart_gen_conf.png

General settings for the column chart viewer are the following:

  • Title: sets the plot title.
  • Display limit: sets the maximum number of displayed bars.
  • Legend location: sets the legend location on the plot.
  • Stacked bars: enables or disables stacked bars.
  • Enable scroll: allows the chart to scroll horizontally (default setting).

The Axes pane allows to set axis labels and the scale of the value axis.

The Fonts pane is used to specify font sizes for axis labels, legend labels, and bar labels (value tooltips).

Dataset Configuration

The column chart viewer supports multiple datasets, which can be added using the Add chart button on the left tab bar. Each dataset is rendered as another set of bars. Note also that you can re-order the bars (in the stacked mode, for example) by dragging the dataset tabs on the left tab bar.

_images/page_results_19_chart_ds_conf.png

Available settings are:

  • Name: specifies the name displayed in the legend.
  • Visible: if disabled, allows to temporarily hide the dataset from the chart.
  • Color: sets the bar color for this dataset. Clicking the box opens a color selector.
  • Bar labels: specifies when to show value tooltips: always, never, or only when you hover over a bar.
  • Label location: specifies the location of value tooltips.
  • Label background: if the plot is configured to always show bar labels, this setting enables or disables label background. Bar labels which are shown on hover always have a background regardless of this setting.

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that a dataset can contain multiple data series, but only one of them will be rendered. Additional data series can be used to filter points in the dataset or to switch between different data sources on the same chart.

Histogram

A histogram plot provides a graphical representation of the distribution of numerical data.

_images/page_results_20_hist.png

Hovering a histogram bar shows a tooltip with the number of points in this bin and the bin’s bounds.

General Settings

_images/page_results_21_hist_gen_conf.png

General settings for the histogram viewer are the following:

  • Title: sets the plot title.
  • Display limit: sets the maximum number of bins (bars) displayed by the histogram, from 1 to 1000.
  • Legend location: sets the legend location.
  • Bins count: specifies the number of histogram bins. Available options are:
    • Adaptive: assumes an approximately normal distribution and adapts the number of bins to the AMISE-optimal bandwidth. First, it calculates the bandwidth \(h = \sigma (\frac{24 \sqrt{\pi}}{n})^{1/3}\), where \(n\) is the number of points in the dataset and \(\sigma\) is the non-biased sample standard deviation. Then, the suggested number of bins is determined as \(N = \lceil \frac{r_{max} - r_{min}}{h} \rceil\), where \(r_{min}\) and \(r_{max}\) are the bounds set for the value axis. Finally, the display limit \(d\) is checked and the minimum of these two values \(\min(N, d)\) is selected as the number of bins — so you can use the display limit setting to set an upper limit for the number of bins calculated adaptively.
    • Sturges’ rule: assumes an approximately normal distribution and implicitly bases the bin sizes on the range of the data. The suggested number of bins is \(N = \lceil \log_2 n + 1 \rceil\), where \(n\) is the number of points in the dataset. Similarly to the adaptive method, this suggested number is checked against the display limit \(d\) and the minimum \(\min(N, d)\) is selected as the number of bins.
    • Manual: allows to specify the number of bins \(N\) from 1 to 1000. The number you specify is also checked against the display limit \(d\) and the minimum \(\min(N, d)\) is selected, so it makes sense to set maximum display limit if you want to specify the number of bins manually. Also note that the histogram may automatically add trailing bins depending on the settings for the values axis range (see below), so the final number of bins may be 1 or 2 more than the number you have specified.
  • Overlayed histograms: changes the drawing style of histograms that display multiple datasets. If disabled, all bars are drawn separately; if enabled, the setting gives a more compact view.

The range of values axis is set in the Values axis groupbox. When drawing the histogram, this range is divided into a number of bins, which is determined by the bins count settings described above.

  • Robust: sets the range to \([\bar{x} - 3 \sigma, \bar{x} + 3 \sigma]\), where \(\bar{x}\) is the sample mean and \(\sigma\) is the non-biased sample standard deviation. This method assumes an approximately normal distribution, and the range it determines may be less than the range of values in the dataset (for example, if the sample contains outliers). In this case, the histogram automatically adds trailing bins to include the outlying values (see below).
  • Include all values from data source: sets the range to \([x_{min}, x_{max}]\) — bounds are the minimum and maximum values found in the dataset.
  • Manual: allows to specify the range directly. If you set a range which is less than the range of values in the dataset, trailing bins are also added. If you select this method but do not specify the range, it will be automatically set to \([x_{min}, x_{max}]\).

When the robust or manual axis range is selected, it is possible that the source data contains values which are outside the values axis bounds. For example, the range of values \([x_{min}, x_{max}]\) may be greater than the \(3 \sigma\) range assumed by the robust method, or you can manually specify axis range which does not fully include the \([x_{min}, x_{max}]\) range. In such cases, 1 or 2 more bins (the trailing bins) are automatically added to the histogram:

  • If the dataset contains at least one value \(x \lt r_{min}\) (less than the lower axis bound), the left trailing bin is added, representing the interval \((-\infty, r_{min})\).
  • If the dataset contains at least one value \(x \gt r_{max}\) (greater than the upper axis bound), the right trailing bin is added, representing the interval \((r_{max}, \infty)\).

When trailing bins are present, the range of the values axis is automatically extended in order to display them. Also note that the trailing bins are not included in the count when checking the display limit and when determining the number of bins. For example, you can set both the display limit and the bins count to 15, and set values axis range \([r_{min}, r_{max}]\) so that \(x_{min} \lt r_{min}\) and \(r_{max} \lt x_{max}\). In this case, the histogram will show 17 bins: 15 as specified in the settings and 2 more trailing bins.

Dataset Configuration

The histogram viewer supports multiple datasets, which can be added using the Add histogram button on the left tab bar. Each dataset is rendered as a separate histogram; general settings, like the number of bins and axis ranges, apply to all these histograms.

_images/page_results_22_hist_ds_conf.png

Available settings are:

  • Name: specifies the name displayed in the legend.
  • Visible: if disabled, allows to temporarily hide the dataset from the plot.
  • Color: sets the bar color for this dataset. Clicking the box opens a color selector.
  • Bar labels: specifies when to show tooltips with bin information: always, never, or only when you hover over a bar.
  • Label location: specifies the location of bin tooltips.
  • Label background: adds a background color to bin tooltips when enabled, or makes it transparent if disabled.

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that a dataset can contain multiple data series, but only one of them will be rendered. Additional data series can be used to filter points in the dataset or to switch between different data sources on the same histogram plot.

3D Plot

The 3D plot tool creates an interactive plot that can contain multiple surfaces and point clouds.

_images/page_results_23_3dplot.png

When viewing the plot, you can rotate it by mouse drag and zoom with the mouse wheel. You can also use buttons in the plot title bar to quickly change the angle of view.

General Settings

_images/page_results_24_3dplot_gen_conf.png

General settings for the 3D plot viewer are the following:

  • Title: sets the plot title.
  • Font size: specifies font size for axis, tick, and legend labels.
  • Legend location: sets the legend location.
  • Camera perspective: changes the 3D perspective mode.
  • Display limit: sets the maximum number of points displayed on the plot. If this number is less than the size of a dataset, points are sieved so the number of shown points does not exceed the limit. This is purely a display setting that does not apply any data filters to the dataset itself.
  • Show bounding box: shows or hides the 3D box around the plot.
  • Enable light: enables or disables rendering the lightning effect on 3D surfaces.
  • LATEX syntax: enables the math syntax in plot titles and labels. For example, you can use it to show subscripts (like x_1 for \(x_1\)) or special symbols (like \hat{f} for \(\hat{f}\)).

The Axes pane allows to set axis labels, plot ranges and axis scales. Note that if an axis range is not specified, the viewer sets it automatically in such a way that it includes all points from all datasets.

Dataset Configuration

The 3D plot viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. Each dataset is rendered as another surface or point cloud on the plot. Note also that you can re-order the plot “layers” by dragging the dataset tabs on the left tab bar.

_images/page_results_25_3dplot_ds_conf.png

Basic settings are:

  • Name: specifies the name displayed in the legend.
  • Visible: if disabled, allows to temporarily hide this dataset from the plot.

Dataset contents are edited in the Dimensions pane (see section Datasets for details). Note that a dataset can contain more than 3 data series, but additional data series are not rendered. They can be used to filter points in the dataset or to switch between different data sources on the same plot.

Point marker settings:

  • Draw markers: specifies whether to show point markers for this dataset.
  • Draw trajectory: if enabled, connects points with a trajectory line.
  • Color: marker color. Clicking the box opens a color selector. The color can be solid (same color for all markers) or gradient (affected by the color axis).
  • Colorbar: if enabled, a color bar is added to the plot. The bar shows color map applied to point markers.
  • Marker size: sets the size of markers.
  • Marker style: sets the marker style.
  • Line thickness: sets the thickness of the trajectory line, if it is drawn.
  • Value labels: adds value tooltips to the point markers.
  • Draw stems: adds vertical stems to the plotted points.

Surface settings:

  • Draw surface: specifies whether to render a 3D surface for this dataset.
  • Color: sets the surface color. Clicking the box opens a color selector. The color can be solid or gradient; if a solid color is selected, it is recommended to enable light in plot’s general settings.
  • Colorbar: if enabled, a surface color bar is added to the plot. Note that this is a separate colorbar — that is, there can be two different color bars for point markers and the surface.
  • Reconstruction method: selects the surface reconstruction method. In general, the default (“grid XY”) is the fastest, but other methods (triangulation) may be more precise in certain cases.
  • Grid mesh density: controls the accuracy of surface reconstruction. Higher density makes the surface more smooth but requires more rendering time.
  • Wireframe mesh: draws a mesh over the surface.
  • Transparent: adds a certain degree of transparency to the surface, can be useful when the plot contains multiple surfaces.

Parallel Coordinates Plot

Parallel coordinates plot is a useful tool for visualizing multidimensional data. It shows each point as a polyline with vertices representing its coordinates along each axis.

_images/page_results_26_pc1.png

Hovering over a line vertex highlights the point (or all nearby points if the lines are closely packed) and shows a tooltip with coordinate values.

_images/page_results_27_pc2.png

When viewing the plot, you can re-order axes by dragging them and make selections to examine certain subsets of the data (see section General Settings for details on axes ranges and selections). You can select the lines within the desired range by clicking on the respective axis and dragging the mouse up or down. To move the selection rectangle, click inside the rectangle, and drag the rectangle upward or downward. Making selection with Ctrl pressed adds new lines to the current selection one by one. When you add a selection to an axis, all points outside this selection are either grayed out or hidden (the setting is enabled in general configuration of the viewer). Double-click the axis to reset axis selections. To clear all selections, double-click anywhere on the plot.

You can export the parallel coordinates plot to an image file using the Export image… command from the viewer’s menu. It is also possible to save copies of the data currently shown on the plot to the report database with the help of a Create new data series… command. It brings up the respective dialog that creates new data series taking into account all selections and filters applied in the viewer’s configuration. Prefixes are automatically added to the names of new data series to avoid conflicts.

Note that if the parallel coordinates plot has linked selection enabled (default setting), selections you make on this plot also select corresponding points in other viewers linked to this plot (see Linked Selection for details). Toggle b_linked_selection button on the viewer toolbar or tick the respective checkbox in the viewer’s configuration to disable data sharing with other viewers.

General Settings

_images/page_results_28_pc_gen_conf.png

General settings for the Parallel coordinates plot viewer are the following:

  • Title: sets the plot title.
  • Display limit: sets the maximum number of points displayed on the plot (the number of lines drawn). If this number is less than the size of a dataset, points are sieved so that the number of shown points does not exceed the limit. This is purely a display setting that does not apply any data filters to the dataset itself.
  • Hide points outside selection: if enabled, lines representing points outside selection are hidden from the plot instead of graying them out.
  • Linked selection: When enabled (default), selections you make on this plot also select corresponding points in other viewers linked to this plot. See Linked Selection for details.
  • Distance between axes: sets the distance between axes. The default setting is "Auto", you can also manually adjust the spacing by moving the slider to the desired position.

The Axes pane allows to set axis labels, ranges, and scales (directly in the table or using the respective buttons on the toolbar). You can also use it to show or hide axes and add precise selections (the Selection column).

The configurable options are as follows:

  • Show: if disabled, the respective axis is hidden from the plot.
  • Label: specifies axis labels (the same as the names of data series from the report database). To change, hover the table cell and click the edit icon, or double-click the cell and specify your label.
  • Range: specifies the lower and upper value limits for the current axis. If an axis range is not specified, the viewer sets it automatically in such a way that it includes all points from the dataset. If an axis range is specified, all outlying points are removed from the plot. Note that range settings are ignored for hidden axes.
  • Selection: specifies the subset of the data to be displayed on an axis. It does not affect the data itself or its rendering on the plot but aims to highlight the points of interest. Note that if you make selections on an axis that is marked as not shown in the viewer’s configuration, they will be ignored.

Font settings specify font sizes for axis and tick labels.

Dataset Configuration

Parallel coordinates plots support multiple datasets, which can be added using the Add dataset button on the left tab bar. Each dataset is rendered as a layer on the plot. You can re-order the plot layers by dragging the dataset tabs on the left tab bar.

_images/page_results_29_pc_ds_conf.png

Dataset contents are edited in the Dimensions pane (see section Datasets for details on data filtering). In parallel coordinates plots you can add as many dimensions to the dataset as you wish, and each dimension will add another axis on the plot. Note that if you are using multiple datasets, they must have the same dimension (the same number of axes). However, the data source for an axis can be empty, so if you need to display two datasets of different dimensions on one parallel coordinates plot, you can simply add axes without data sources to the dataset with lesser dimension.

Lines settings allow to change the line style and apply solid or gradient coloring. Note that you can select an additional data series as the color axis. Default color axis applies the gradient according to point index in the dataset; if you select some data series, the color will be applied according to values in this data series.

Page Viewer

Page viewer is an efficient tool for custom data visualization. It accepts a list of strings and renders them as pages which you can scroll in the viewer window. The strings can contain HTML, CommonMark, or plain text.

_images/page_results_page_viewer_intro.png

This viewer is also used to display JPEG or PNG image files generated by the workflow and saved to the project database. The image files can be organized in multi-page reports enabling you to explore the evolution of the computational model during the simulation or design optimization.

_images/page_results_page_viewer_model_evolution.png

It is possible to export one or all pages contained in the viewer using the respective commands from the context menu.

To start using the tool, select the data series to be displayed on the Data series pane 1 and click the b_page_viewer button 2 on the report toolbar. It creates a viewer and automatically adds the selected data to its sources.

_images/page_results_page_viewer_add_data.png

If you click the button on the report toolbar without selecting any data series, it adds an empty viewer, and you will have to edit its configuration to add data sources.

Note that if the Page viewer has linked selection enabled (default setting), scrolling through the pages also highlights corresponding points in other viewers linked to it (see Linked Selection for details). However, if you make a selection including multiple points in another plot or a Sample viewer linked to a Page viewer, the Page viewer will show only the first selected point. It is also possible that the point you select is filtered out in the Page viewer — in this case, it will keep showing the last viewed page.

Configuration

You can specify the viewer’s title on the General settings tab. The viewer supports multiple datasets, which can be added using the Add dataset button on the left tab bar. You can re-order datasets by dragging the dataset tabs on the left tab bar.

_images/page_results_page_viewer_config.png

Basic settings are:

  • Name: specifies the name displayed in the viewer legend.
  • Content type: selects the desired format for data presentation. Available formats are: Plain text, HTML and CommonMark.

Dataset contents are edited in the Dimensions pane where you can select source data series, apply value filters and set the number format (see section Datasets for details).

A dataset can contain multiple data series; the data source that goes first in the list is displayed in the main pane of the viewer, other data series are rendered as the the elements of a table, if they are not strings. Otherwise each string value will be shown on a separate page. The index column in the table contains page number, other columns show parameter values. Adding data sources in viewer configuration adds more columns to the table. You can adjust the column width manually by dragging the slider right or left. You can also browse through the document using the page selector.

Text Viewer

Text viewer is a simple tool to add comments or additional information to pSeven reports. It supports plain text, HTML, CommonMark, and math formulas.

_images/page_results_text_viewer_intro.png

To add a Text viewer, click b_text_viewer button on the report toolbar. A new Text viewer is always empty. To edit its contents, click b_text_viewer_edit_mode on the viewer’s toolbar, or select the viewer and hit F2. To accept changes and finish editing, hit Ctrl Enter or click anywhere outside of the viewer. Esc works similar to closing a file without save: it exits the editing mode and discards recent changes, bringing the viewer back to its previous saved state. While editing, you can also use the common undo and redo hotkeys (Ctrl Z and Ctrl Y).

Note that to save changes done in a Text viewer, you have to save the report (File ‣ Save report or Ctrl S). When you exit the editing mode, changes are only shown in the viewer, but the report file is not saved automatically.

Configuration

Text viewer does not use any data from the report database. All contents are saved directly to the viewer.

_images/page_results_text_viewer_config.png

Text viewer provides a few basic settings:

  • Title: sets the viewer’s title.
  • Text color, Background color: change colors.
  • Mode: switches between the normal mode, which shows viewer’s title and contents in the report, and the header mode, which shows the title only. The header mode can be useful to add section titles to the report. If you add text contents to a viewer with the header mode selected, these contents are also saved to the viewer, although they are hidden from the report.
  • Text: raw viewer contents, the same which you see when working with the viewer in the edit mode.

Text viewer supports CommonMark, which allows writing in a simple markup or plain text and adding HTML snippets. You can also add formulas using the \(\TeX\) syntax: enclose your formula in $$ ... $$. For example:

$$\int_a^b {\! f(x) \, \mathrm{d}x}$$

Math rendering is handled by KaTeX, which supports a subset of \(\TeX\) functions — see Supported Functions for details.

Linked Selection

Linked selection is a feature available for some viewers, which provides a fast and intuitive way to explore your data from different points of view. When there is a group of viewers which have common data sources, selections you make in one viewer are automatically reproduced by all other viewers. For example, you can have a Sample viewer linked with a Parallel coordinates plot. When you select ranges on the plot axis, rows containing data for the selected points are highlighted in the Sample viewer.

_images/page_results_linked_selection_sv_pc.png

Linked selection is supported by:

  • Sample viewer,
  • 2D plot,
  • Parallel coordinates plot, and
  • Page viewer.

When you create a new viewer, it has linked selection enabled by default. You can toggle the linked selection mode for each viewer individually, using the b_linked_selection button in the viewer’s title bar.

Note that pSeven finds data relations between viewers automatically, so you do not need to specify which data series belong to the shared dataset. Viewers become linked if they have at least one common data source. This can even be an additional data series, which is not shown by a viewer, such as the third dimension in a 2D plot configuration. Viewers can also form a relationship chain — for example, if the plot A is linked to B, and B is linked to C, then A and C also become linked even if they do not have any common data source (B mediates the connection).

Predictive Modeling Tools

In addition to the results post-processing tools, Analyze provides a number of tools to work with approximation models. The modeling tools are powered by the same pSeven Core component which is used in the ApproxBuilder and Approximation model blocks — the Generic Tool for Approximation (see section GTApprox). While the ApproxBuilder block is intended to automate model training, the tools described in this section are designed for interactive usage and do not require you to create workflows.

You can use the modeling tools to:

All modeling tools can work both with data gathered from a workflow and data imported to pSeven from CSV, Excel and Composite cache files. For example, a model created in Analyze can be evaluated using the Make predictions… command (see Making Predictions). You can also export this model and integrate it into a workflow using the Approximation model block.

Note that in order to use the predictive modeling tools, you first have to create a report and import training data or models into the report database. For more details on adding data to reports, see sections General Information, Adding Workflow Data and Data Import on this page. Most of the predictive modeling tools are available from the b_context menu in the Models pane. The Model Validator and Model Explorer tools are found on the main report toolbar.

Model Training

To train an approximation model in Analyze, begin with a report which contains the training data.

  • Open the Report database pane and select data series which contain the training data.
  • Use the Build model… command from the Data series pane’s b_context menu or click the b_build_model button on the toolbar.
_images/page_results_pmt_01_build.png
  • The Build model… command brings up the Build model dialog where you can set up training, adjust data settings and supply additional data. The data series you have selected are automatically added as training data sources listed in the Data settings table. Note that you can also open this dialog without selecting any data, and then add data sources manually.
_images/page_results_pmt_02_builder_conf.png

General model settings are:

  • Model name: the name which is used when saving the model to the report database.
  • Comment: a short description which will be saved to the model.

In the Build model dialog you can add training and test data (see Data Settings) and configure the training mode (see Training Modes). When you have finished with the configuration, click Build or Build and continue to start model training:

  • Build closes the Build model dialog and starts training the model.
  • Build and continue adds the model to the build queue (and starts training, if the queue is empty). This button does not close the Build model dialog, so you can change configuration settings and queue multiple models.

Trained, queued and ready models are shown on the Models pane in the report database. Using the Models pane’s b_context menu, you can evaluate existing models, retrain and update them, export and import models, and perform other tasks.

If you need to stop training or remove a model from the training queue, you can select a model and use the Stop training command from the pane’s b_context menu. When you interrupt training which is already in progress, pSeven saves an intermediate model when possible. Note that such models may be inaccurate or miss certain features — for example, internal validation or accuracy evaluation run only after the final model is available, so an intermediate model, which is saved when you interrupt training, usually does not contain this data even if you had added such requirements to your training settings. When you cancel a queued model, it is removed from the queue but remains in the list on the Models pane, with the INTERRUPTED status. Later you can continue training such a model — either select it and use the Retrain… command from the b_context menu (see Retraining a Model), or just double-click the model to open the retrain dialog.

For a model which is currently in training, you can also double-click the model to open the model details dialog (see Model Details), then click the Stop training button in this dialog.

Data Settings

Data sources and additional data properties are specified in the Data settings table in the Build model dialog.

  • Training data: the data series which contain training inputs and outputs.

  • Test data: the data series which contain reference data for model quality estimation. Test data is optional.

  • Type: specifies model inputs and outputs.

  • Categorical: your data can contain some parameters which are not continuous variables but take only predefined values. To process such inputs correctly, select which inputs are categorical using the checkboxes in this column. This setting is not valid for outputs.

  • Output transformation: applies log transformation to the training sample outputs before training the model. This preprocessing step often improves model accuracy in cases when the training sample contains output values which are exponentially distributed. The following options are available:

    • auto — pSeven will decide whether to apply the transformation to this output. In the manual configuration mode, this decision is based on a quick statistical test. In the SmartSelection mode, behavior depends on the Try output transformations hint — see section Training Modes for details.
    • lnp1 — apply log transformation of the form \(y^* = \text{sgn}(y) \cdot \ln(|y| + 1)\), where \(\text{sgn}\) is the sign function.
    • none (default) — do not apply transformation to this output.

    The output transformation setting is not valid for inputs. Note that to apply the same setting to multiple outputs, you can select them all (hold Ctrl or Shift for multiselection) and use the Output transformation… command from the Data settings table’s b_blconf_context menu.

  • Output noise variance: if output noise data is available, here you can select data series which contain noise values. This setting is not valid for inputs.

  • Training data filter, Test data filter: apply value filters to the respective data samples. Only those points which pass the filter will be used when training the model and calculating model errors.

In the Data settings table you can also set the type of selected dimensions, change the order of inputs and outputs, and add or remove data using the toolbar buttons.

The Point weights selector below the table is used to apply sample weighting. Here you can select a data series that contains weight values; for more details on this feature, see section Sample Weighting.

Training Modes

Two modes are available when training an approximation model: SmartSelection (default) and manual (advanced). The main point of difference between them is as follows: when selecting the manual mode, you are aware of a particular problem to be solved, appropriate approximation techniques to be used and the result to be obtained; SmartSelection is a nice solution for cases, when some information crucial from the technical point of view is unknown and can be retrieved with the help of intelligent selection of parameters and heuristic. A detailed description of each mode is given below.

SmartSelection is a method to build the best approximation model which automatically selects training settings using an adaptive algorithm. This algorithm does not require any configuration, but can use various hints to speed up training or build a more accurate model. Hints can provide more details about the training data, set certain requirements to the model, or specify some of the training settings. They are divided into three groups accordingly: data features, model requirements, and training features. Finally, you can add custom settings using the Advanced option hint.

To add a hint, click anywhere inside the SmartSelection pane 1 or use the b_blconf_add button 2 to open the list 3 with available hints.

_images/page_results_pmt_03_smartselectionhints.png

For hints which require additional settings, a dialog appears when you select the hint from a list. When you finish adding a hint, it shows on the SmartSelection pane and becomes disabled (grayed out) in the list.

Data features hints provide additional information about the training data:

  • Linear dependency — the dependency specified by the training sample is supposed to be linear.
  • Quadratic dependency — the dependency specified by the training sample is supposed to be quadratic.
  • Discontinuous dependency — the dependency specified by the training sample is supposed to be discontinuous.
  • Dependent outputs… — specifies the type of dependency between different outputs. The following options are available for this hint:
    • All dependent: different components of the output are treated as possibly dependent.
    • Partial linear: before training, pSeven will search for linear dependencies between outputs in the training data. If such dependencies are found, pSeven will train a model which keeps these dependencies.
  • Tensor structure — input points in the training sample are placed in a grid-like pattern. This is usually the case when inputs are generated by some factorial technique for design of experiments.

Model requirements hints add specific requirements for the trained model:

  • Acceptable quality… — the metric and acceptable level of prediction error used to validate the model. With this hint, model training is stopped once the acceptable value of the metric is reached.
  • Smooth model — the built model should be smooth (reduces noise).
  • Accuracy evaluation – the built model should support accuracy evaluation.
  • Exact fit — the built model should fit training data points exactly.
  • Do not store training sample — training sample should not be stored inside the model (by default, the training sample is stored inside the model). Enabling the hint reduces the size of the model stored on disk, in particular, when tensor techniques are used. You can also use this hint, for example, if you want to transfer the model but not its training data.
  • Enable NaN prediction — the built model should predict NaN output values in areas near those points of the training sample that contain NaN output values.
  • Do not store internal validation data — the final model should not contain cross-validation data samples (by default, SmartSelection runs cross-validation for the final model and saves model outputs obtained in all cross-validation sessions, so you can review this data later). Adding this hint can reduce the model size. Note that a model trained with this hint can still contain internal validation statistics, if cross-validation is selected as the method to estimate quality of intermediate models (see the Validation type… hint).

Training features hints are used to tune the training process:

  • Validation type… — specifies the method to estimate quality of intermediate models which SmartSelection creates during training:
    • Auto (default): automatically selects one of the following methods, basing on data properties and other settings. Prefers validation on a test sample when it is available, otherwise can automatically split the sample into the train and test subsets, or use internal validation as the least preferred method. This is the default SmartSelection behavior, which is also used if you do not add the Validation type… hint. Note that in case when SmartSelection automatically switches to internal validation, the model will contain internal validation statistics even if you add the Do not store internal validation data hint (this hint removes only the cross-validation data samples).
    • Internal validation: uses cross-validation. With this setting, you can also use the Advanced option… hint to specify the number of data subsets and training sessions in cross-validation.
    • Test set: validates models on the test sample data. Test data is required for this method.
    • Split training sample to train/test subsets: automatically splits the sample into two subsets, one of which is used to train models, and the other to validate them. You can change size of the training subset using the Training subset ratio slider. This method is similar to using the Split data… command to create the train and test samples (see Split Data), and then performing validation on the test set.
  • Randomized training — enable randomization in certain internal training algorithms. Randomized training can produce models that are slightly different.
  • Fixed random seed… — use a fixed seed in those training algorithms that support randomized training. This hint makes the behavior of randomized algorithms fully deterministic (controlled by the seed value).
  • Training time limit… — specifies a recommended time limit for the model training. Note that if such a limit is specified, it may be generally detrimental to model quality.
  • Try output transformations — decides whether to apply log transformation to the outputs data in the training sample. Using the hint means that two models will be trained and compared: one with log transformation applied to training sample outputs and the other without the transformation. This comparison is done only for outputs which have output transformation set to "auto".

You can also use the Advanced option… hint to specify some training options manually. Selecting this hint brings up the Advanced option dialog with option settings. Since this hint provides access to several options, it can be added multiple times with different settings.

_images/page_results_pmt_advanced_option.png

Available options are:

  • Technique: specifies the approximation technique to use.
  • SubmodelTraining: selects whether to train submodels in parallel or sequentially.
  • MaxParallel: sets the maximum number of parallel threads to use for training. The value can be any positive integer. Note that it is not recommended to set it higher than the number of physical CPU cores, otherwise you can experience a performance degradation.
  • IVSubsetCount: specifies the number of cross-validation subsets.
  • IVTrainingCount: sets the number of training sessions in cross-validation.
  • InputDomainType: specifies the input domain for the model. Unbound (default) is an unlimited domain. Box limits the domain to the training sample’s bounding box. Auto limits the domain to the intersection of the sample’s bounding box and the region bound by an ellipsoid which envelops the sample.
  • Code: arbitrary option code.

You can select more than one hint. If you add hints which are in conflict 1, a warning appears on the Issues pane 2.

_images/page_results_pmt_05_incompatible_hints.png

In this example, the Advanced option hint selects a specific training technique (GBRT) which does not support exact fit — the requirement added by the Exact fit hint.

A detailed description of SmartSelection features is available in the Smart Training section of the GTApprox guide.

Manual mode allows to set training options directly. It is very similar to the ApproxBuilder block configuration and uses the same options as this block. A guide to the configuration for manual training is available in section Manual Training.

Retraining a Model

Model training is often an iterative process — you train the first version of a model, analyze its behavior and estimate its quality (for example, using Model Explorer and Model Validator), then probably decide to train a next version of the model with adjusted training settings or using another approximation technique, and so on. To reduce the amount of repetitive configuration actions, pSeven offers the model retrain feature. It is a convenience function, which essentially re-opens the model training dialog with the same settings, which were used to train the existing version of a model, and lets you change these settings and restart training to obtain a next model. It is also useful when you need to train a number of models with the same training settings but using different training data samples.

Model retraining begins with an existing model, but it does not actually use this model in any way except reading the previous training settings from it. This is the key difference between retraining a model and updating a model:

  • Retraining a model simply restarts the training with adjusted settings or new data. The training is purely data-based, so it always needs a full data sample.
  • Model update is intended to add information (new data) to an existing model without a full retrain, so it is often faster and more practical. This feature is detailed in section Updating a Model.

General steps to retrain a model are:

  • In the Models pane, select the model to retrain.
  • Use the Retrain… command from the pane’s b_context menu.
_images/page_results_model_retrain_select.png

The Retrain… command brings up the Retrain model dialog where you can make changes in training settings.

_images/page_results_model_retrain_config.png

The Retrain model dialog is very similar to the Build model dialog. The difference is that the Retrain model dialog already contains the settings which were previously used when training this model. You can change any of these settings, for example:

  • Select other training data sources or adjust data settings.
  • Switch from SmartSelection to the manual training mode and back (see Training Modes).
  • Change SmartSelection hints or manual training options.

As noted above, changing the training data sources is intended for those tasks where you need to train a number of models with the same settings and different data. If you are looking to improve an existing model by adding new data, you may prefer to update the model instead of retraining (see Updating a Model).

Note

If the model you are going to retrain was previously updated, the Retrain… command brings up the Update dialog. In this case, Retrain is intended as an option to re-do the last update step with adjusted settings.

In the case when you are sure to retrain the model using an updated data sample, instead of updating the model, it is convenient to keep the names of data series which contain the training sample. If the training data comes from some workflow results, this is usually achieved automatically by updating the report you use to train models (see Report Update). If the training data is contained in a file on disk, you can reimport this file and overwrite existing data series (see Data Import). If the file contains only the new data, you can keep the data series which contain old data and use the append mode when importing the new data from file — it matches the names of data columns in the file with the names of existing data series and adds new data to the old one rather than overwriting it. Note that even if the column names do not match automatically, you can manually specify the data series to update using the Result settings pane in the data import dialog.

Updating a Model

Model update is useful in various scenarios where a model is trained incrementally — that is, an initial model is trained using some subset of data, and then this model is gradually improved using new data subsets. For example, suppose you have an initial model, trained with some initial data sample, and another sample with new data (update data), which was obtained only after training the initial model. In this case you have the following options:

  • Compile a new full data sample, joining the initial and update data, and use this compiled sample to train a new model (retrain the model).
  • Evaluate differences between the initial model and the update data (residuals) and train a new “mixed” model, which recognizes these differences and fits both the initial and update data.

Retraining the model may seem simple and straightforward, but the downside of this method is that it involves a full retrain of the model, which can be time-consuming. Also, the new model’s behavior may be significantly different, compared to the old one. You can retrain the model in the manual mode (see Retraining a Model), applying the same training settings as initially selected — but it does not guarantee high similarity between the models, and by training with fixed settings you miss the chance to get a better new model by adjusting the settings or using SmartSelection. You can retrain the model with new settings, too, but in general the result is an entirely different model.

On the contrary, updating a model means that pSeven will use the initial model as a “starting point” and train another model which “patches” the initial one to fit the update data. These models are then combined to obtain the final updated model. This method relies on the special technique called Mixture of Approximators (MoA), so the updated model is a MoA model, regardless of the technique which was used to train the initial one.

General steps to update a model are:

  • In the Data series pane, select the data series which contain the update data.
  • In the Models pane, select the model to update.
  • Use the Update… command from the Models pane’s b_context menu.
_images/page_results_model_update_select.png

The Update… command brings up the Update model dialog where you can configure the training sample and change training options. Note that the approximation technique is fixed to MoA, and all options apply to this technique only.

_images/page_results_model_update_config.png

The Initial model field in the dialog shows the model you are updating. You can use the b_model_detailsw button 1 to view detailed model information (see Model Details for a reference).

The training sample 2 is pre-configured with the data series you have selected before using the Update command. If you have not selected anything, the update automatically selects those data series which were used to train the initial model. You can also change other data settings, add a test sample and so on (see Data Settings for details).

Update always works in the manual training mode — that is, it allows you to change every training option 3 except the training technique, which is fixed to MoA. The options which were set when training the initial model are ignored by the update and do not affect it in any way — the update process evaluates the initial model internally but does not read any settings from it and does not directly apply them to the new model. The options you set manually work in the context of the MoA technique:

  • Common options and internal validation options work as usual.
  • Technique options which are not directly related to MoA apply to the MoA’s internal submodels. For example, if you set some options for the GP technique, they will apply to any GP submodel created internally by MoA (see Mixture of Approximators for more details on how the MoA technique works).

Note

If you update a GBRT model, technique is fixed to GBRT, and MoA cannot be used. The GBRT technique has in-built support for incremental training (see Incremental Training in Gradient Boosted Regression Trees).

Also, the output transformation settings in this case must be the same as for the initial GBRT model. pSeven automatically selects the correct output transformation when you update a GBRT model.

Finally it should be noted that you can run model update using exactly the same data which was used to train the initial model — that is, without any update data. Such update can improve model accuracy since it results in obtaining a better fit to the initial training sample data. However, you should be careful to avoid overtraining the model — that is, creating a model which is accurate on the training dataset but has low prediction quality (high errors in areas not covered by the training data). As usual, it is advised to validate models on a separate test dataset to prevent overtraining.

Model Smoothing

Trained models can be additionally smoothed using the Smooth model… command from the Models pane menu (opens the Smooth model dialog).

_images/page_results_pmt_03_smoothing.png

Note that the smoothed model is saved under a new name, so the original model is not changed. After smoothing, both the original and smoothed models will be found on the Models pane.

The amount of smoothing is controlled by the smoothing factor which can be a single value for simple smoothing or a matrix for anisotropic smoothing.

Simple method applies the same smoothing to all outputs. In this case, the smoothing factor is a value in range \([0.0, 1.0]\) where 0.0 means no smoothing and 1.0 is extreme (almost linear) smoothing.

Anisotropic smoothing is an advanced method that allows to control smoothing of each output component individually and apply different smoothing per model input (direction in the input space). It is configured by a matrix of individual smoothing factors; matrix row corresponds to model output, while each column corresponds to an input. Each element of the matrix sets smoothness for a model output in the direction of the corresponding input.

Note that some model training techniques do not support additional smoothing, and if you select such a model in the Smooth model… dialog, the smoothing cannot be started.

Model Details

The b_model_details button on the Models pane’s toolbar brings up the details dialog for the model selected on the pane. You can also open this dialog using the Model details… command from the pane’s b_context menu, or simply double-click a model in the list.

If the model is in training, the details dialog shows only a real-time training log and allows you to interrupt model training with the Stop training button.

If the model is ready, the dialog shows full model information including its training options, errors, sample statistics, and other details. From this full dialog, you can also export model information as plain text using the Export model details to file and Copy model details to clipboard buttons.

Note

For large models (with many variables or submodels), model details formatted as plain text can be too big to copy them to the clipboard. In this case, only the export to file is available, and the Copy model details to clipboard button is disabled.

Parameters

The Parameters tab shows a model summary, options that were used when training the model, and model structure (for models trained with RSM and TBL approximation techniques).

_images/page_results_pmt_04_md_parameters.png

The Summary groupbox 1 provides useful details about the model, including its training time statistics and relevant information about the available features.

The Training options table 2 lists all options that were set to non-default values when training. These includes both options that were set manually and the ones that were optimized by smart training (if you trained the model in the SmartSelection mode).

The Model structure table 3 is shown only for RSM and TBL models. It lists model terms and their weights for each output. From here you can also copy the model formula or save it to plain text using the b_copy_as_formula or b_analyze_export_command_w buttons or the b_blconf_context menu commands.

Training Sample

The Training sample tab shows general sample properties 1 and descriptive statistics 2 for the input and output parts of the training sample. If the model stores its training sample, you can use the b_extract_sample button 3 to extract this sample to the report database. After this you can re-use the model’s training data in the report, or export it to CSV or Excel. The b_extract_sample button works the same as the Extract training sample… command from the Models pane — see Extracting the Training Sample for details.

_images/page_results_pmt_04_md_training_sample.png

Constraints

The Constraints tab shows information about the input 1 and output 2 constraints of the model (if any) in the analytical form.

_images/page_results_pmt_04_md_constraints.png

By default, approximation models have no input constraints; these constraints appear only when the model is trained with some option which limits the input domain — for example, when GTApprox/InputDomainType is used, or when GTApprox/OutputNanMode is set to predict and the training sample contains NaN values in outputs.

Output constraints appear when the model is trained with the the GTApprox/DependentOutputs option set to PartialLinear. pSeven searches for linear dependencies between outputs in the training sample, and trains a model which keeps these dependencies.

Accuracy

The Accuracy tab shows model errors for each output, calculated on the training dataset 1 and obtained during internal validation 2 (if it was enabled when training the model).

_images/page_results_pmt_04_md_accuracy.png

Training Log

On the Training log tab you can view the saved training log 1 and export it to a plain text file using the b_analyze_export_command_w button 2.

_images/page_results_pmt_04_md_log.png

Making Predictions

To evaluate a model, select it on the Models pane and use the Make predictions… command from the pane’s menu. This command is also available on the quick toolbar.

_images/page_results_pmt_05_make_predictions.png

In the Make predictions dialog you can also select the model to evaluate and configure inputs and outputs.

Available input options are:

  • Latin hypercube sampling: generate a new LHS sample and use it as input. Generated sample will be saved to the report database.
  • Full factorial design: generate a full factorial sample and use it as input. Generated sample will be saved to the report database.
  • Use existing data series: get input data from the selected data series in the report database.

In the New data series table you can see names of the new data series that will store calculated model outputs and generated inputs if you select Latin hypercube sampling option or full factorial design.

  • Predict and show evaluates model outputs and automatically opens a new sample viewer showing the input and output data. The data is also stored to the report database.
  • The Predict button evaluates model outputs and stores data to the report database without showing it.

Model Validator

Model validator is a tool to estimate model quality and compare models. It allows you to test models against reference data and find the most accurate model using error plots and statistics.

To validate models, select them in the Models pane and click the Model validator button on the report toolbar. If you also select data in the Data series pane, it is added to validation as a test sample for every model.

_images/page_results_pmt__model_validator_scatter.png

In Model validator, you can use the Models pane to add, remove, or reorder models, select the outputs to validate, change plot colors, hide or show models on plots.

The selectors above the plot switch plot type, calculated error type, and type of data used in validation (see Comparing Models). The table at the bottom shows error metrics and model training time.

Comparing Models

Model validator can show two kinds of plots, changed using the Plot selector on top:

  • Scatter plot directly compares reference sample outputs with model predictions.
  • Quantile plot (default) is useful to analyze error distribution.

On the quantile plot, each point shows the fraction of sample points, for which errors are lower than the value on the horizontal axis. A steeper curve is better: it means that error value is lower for a larger fraction of points, probably with a few outliers that form a long “tail” on top.

_images/page_results_pmt__model_validator_quantile.png

By default, the quantile plot and error metrics are based on absolute error values. Using the Errors selector you can switch them to normalized error which is the absolute error divided by the standard deviation of the output from the reference sample. Normalized error is useful for estimating error significance considering the output value range.

The Sample selector changes the source of reference data used for model validation:

  • If “training” is selected, reference data is the model’s training sample. This sample is available in validation only if it was saved with the model when training (samples are saved by default).
  • If “test” is selected, reference data comes from the test sample. This data is selected from the report database — either when adding a Model validator, or later in its configuration.
  • If “internal validation” is selected, both reference and prediction data are read from the model’s internal validation results. This data is available only if internal validation and saving validation data were enabled when training the model (these options are also enabled by default).

It is recommended to use a test data sample when possible: test sample validation shows model’s ability to predict outputs for new input values that were not available in training. Training sample validation tends to overestimate model accuracy. Low errors on the training sample (steeper error quantile curves) can actually be a sign of overfitting, especially if the same model shows significantly higher errors on a test sample. If holdout test data is not available, it is recommended to switch to internal validation: this data is obtained from cross-validation tests that run when building the model (see section Internal Validation for more details).

The table at the bottom contains prediction error metrics. Best metric values are highlighted. The metrics are:

  • \(R^2\): coefficient of determination. Indicates the proportion of output variation that can be explained by the model.
  • RMS: the root-mean-squared error.
  • Maximum: the maximum prediction error over the cross-validation or test sample.
  • Q99: the 99-th percentile. For 99% of reference points, prediction error is lower than this value.
  • Q95: the 95-th percentile. For 95% of reference points, prediction error is lower than this value.
  • Median: the median of prediction error values.
  • Mean: the arithmetic mean of prediction error values.

\(R^2\) (the coefficient of determination) is the most robust metric; values closer to 1.0 are better. For other metrics, lower values are better.

The table also shows model training time for reference.

Model Import and Export

Models can be imported and exported in the Models pane using the Import from file… and Export to file… commands from the pane’s b_context menu or the b_analyze_import_command and b_analyze_export_command buttons on the pane’s toolbar.

  • Import from file… imports a new model from a binary file in the GTApprox format (.gtapprox). Imported models appear in the Models pane.
  • Export to file… can export a model from the report to a number of formats.

Default export format is a binary GTApprox model (.gtapprox), which is compatible only with pSeven — for example, you can load a model in the GTApprox format to an Approximation model block, or import it to another report.

_images/page_results_pmt_06_export.png

Other supported export formats are:

  • Executable: command-line executable for the platform, on which pSeven currently runs (.exe for Windows and .bin for Linux). Note that it is not possible to export an executable file for another platform — for example, you cannot export a Windows executable under Linux.

  • Excel document with a linked DLL: an Excel document with macros (.xlsm), which evaluates the model stored into a complementary DLL. In addition to the Excel document and two model DLLs (for the 32-bit and 64-bit Excel editions), this format also provides a file containing the code of a VBA wrapper (.bas) for the model DLL, and C source (.c) of the DLL. Export to this format is supported only in the Windows version of pSeven.

    For convenience, the DLL names are based on the name of the Excel document. However, DLL names (hence, the Excel document name) are also used in the VBA macros code. Due to this, the document name must contain only characters, which can be represented in the system locale’s encoding (see Language for non-Unicode programs in Windows’ language settings). For compatibility across different local versions of Windows, it is recommended to use English characters only.

  • FMU for Co-Simulation 1.0: FMI model (Functional Mock-up Unit, .fmu) in the Co-Simulation format, with source and binary.

  • FMU for Model Exchange 1.0: FMI model (Functional Mock-up Unit, .fmu) in the Model Exchange format, with source and binary.

  • FMU for Co-Simulation 1.0 (source only): FMI model in the Co-Simulation format, with source only.

  • FMU for Model Exchange 1.0 (source only): FMI model in the Model Exchange format, with source only.

  • C# source (experimental): source code (.cs) to compile the model with C# compiler.

  • C# library (experimental): a compiled .NET DLL (.dll). Note that using this export format requires a C# compiler installed (pSeven does not include a C# compiler).

    • In Windows: requires .NET Framework or another package which provides the C# compiler (csc.exe). pSeven finds the compiler automatically; if there are several versions installed, the latest is used. If you want to select a specific version, you can set the CSHARP_COMPILER_ROOT environment variable. Its value should be the full path to the directory which contains csc.exe.
    • In Linux: requires the dotnet command line tool which is a part of .NET Core SDK. The following environment variables are also required: CSHARP_COMPILER_ROOT must contain the path to the compiler executable (csc.dll), and CSHARP_LIBRARIES_ROOT must contain the full path to the directory where the System.dll and System.Private.CoreLib.dll libraries are located. Finally, the dotnet executable should be added to PATH.
  • C source for standalone program: C source code with the main() function, which you can compile to a complete command-line program.

  • C header for library: the header for a model compiled to a shared library (DLL or .so).

  • C source for library: C header and model implementation, which you can compile to a shared library (DLL or .so).

  • C source for MEX: source code for a MATLAB MEX file.

  • Octave script: model code compatible with MATLAB.

Note

An FMI model in either format (Co-Simulation or Model Exchange) can be exported as a FMU with a binary, or as a source-only FMU.

The binary FMU is ready to use, but it is platform-dependent: you can use it only on the same platform where it has been exported. For example, it is not possible to export a FMU with a Windows binary if you are running pSeven on Linux. However, this FMU also contains source code, so you can recompile it for any platform.

The source-only FMU does not contain any binaries, so you will always have to compile it in order to obtain a working FMU.

For C and C# source code formats you will also have to specify the name of the model function. For C#, this name becomes the namespace containing the model class. If the name contains dots ., it is used to generate a hierarchy of namespaces, using dots as separators. Each part of the name must be a valid C# namespace identifier: it must start with @, _, or a letter, and the rest can contain only alphanumeric characters and underscores _.

Function description is optional: this text will be added to the source code as a comment.

You can export the whole model, or select only specific sections containing different parts of model information. Removing certain model sections can noticeably reduce size and memory consumption of the exported model.

Select the sections to include in the exported model as export features. Note that when you export a model to Octave and FMU formats, the model is always exported “as is”. In other words, you cannot intentionally remove any of its sections. For a model exported to an executable, Excel, or C source formats, you can remove only accuracy evaluation data, if any. When saving a model in pSeven binary model format (.gtapprox) you can also remove smoothing data, training sample, internal validation data, annotations, training log and comment sections. See section Approximation Model Structure for details on sections that are contained in a model file.

Extracting the Training Sample

By default, a model stores the data used in its training. When a model is loaded in Analyze, you can copy the training sample data from the model to the current report’s database. This will create new data series in the database.

To extract the training sample data, select a model on the Models pane and use the Extract training sample… command from the pane’s context menu. This command opens the Extract training sample… dialog.

_images/page_results_pmt_extracting_a_training_sample.png

The names of new data series are generated based on the names of model inputs and outputs. You can change prefixes added to the input and output names (default prefix is the model name). The New data series table shows a preview of the names which will be assigned to the new data series.

Note

Storing the sample data to a model can be disabled during training. For such models, the Extract training sample… command is disabled in the menu.

Model Explorer

Model explorer is a tool designed to help in analysis of multidimensional models. It allows to study input-output dependencies by plotting a series of two-dimensional slices, each showing an input-output pair.

_images/page_results_pmt_07_model_explorer.png

The models to analyze are selected on the Models pane. Here you can add and remove models, reorder them, set model colors, and show or hide models on the plots. If a model provides accuracy estimation information, AE curve can also be displayed on slices by enabling it in the “AE” column.

The sliders below this pane set coordinates of the origin point — the point where all slices intersect. This point is also marked on the slice plots.

By default, the origin point is the same for all models. To set different origin points (as shown above), deselect the “Same slice settings” option in the Models pane menu.

Slices can be rotated around the origin point using the Change slice orientation slider. When this option is disabled (default), slicing planes are parallel to the coordinate axes in the input space. If enabled, moving the slider rotates slicing planes around F-axes in a roughly uniform fashion, allowing to “scan” the model space around the origin point. When you change slice orientation, the main directional vector is shown in the “Slice direction” column in the table at the bottom. You can also edit this column to set the direction manually.

Slice plots, in addition to model curves, display bars that show how much an output is influenced by an input.