Sample-Based Approximation

This tutorial explains how to train an approximation model with ApproxBuilder, given a file with the training data, and save the model to disk.

Before You Begin

This tutorial requires an existing prjTutorials project. If you have not created this project yet, see Tutorial Project first.

  • Open the prjTutorials project.
  • Create a new empty workflow. If you have never created a workflow before, see the Simple Workflow tutorial.
  • Save the workflow as wfApproxTrainSample.
  • Switch to Workspace and verify that the workflow file (ApproxTrainSample.p7wf) is added to the project. You can see it in the Project pane.
  • Switch to Edit and select the workflow you have just created to continue with the tutorial.

Task

The task in this tutorial is to obtain an approximation model using a sample of data containing inputs (values of variables) and responses (function values) of some unknown dependency.

The training sample is a file named approx_train.csv located in the samples subdirectory of the prjTutorials project. You can open this file in Workspace to view its contents (in the Project pane, expand samples and double-click approx_train.csv).

../_images/page_tutorials_approx_sample_01_csv.png

The file contains 100 points sampled from a function we assume to be unknown (in fact, it is the well-known Branin function). Note that both inputs and responses are in the same file: the first two columns are values of variables, and the third column contains function values. The first three lines, beginning with #, are text comments which can be ignored by ApproxBuilder. Next line (“x1 x2 f”) is a header containing column names.

pSeven does not require a specific sample format. For example, approx_train.csv implies it is in CSV format (comma-separated values), but actually it uses tabulators to delimit fields. This is not a problem since the block that reads sample files (CSVParser) provides additional settings which allow to parse any text file with a tabular format.

Solution

Training a sample-based approximation model in pSeven can be divided into the following general steps:

  1. Configure a CSVParser block to read the sample data from file correctly. Usually you are required to specify the characters that are used to delimit data fields and comment lines in a file. CSVParser reads the sample and outputs all data as a single matrix.
  2. Prepare the sample for ApproxBuilder. This block has separate ports for the input and response parts of the training sample, so when all data is contained in the same file, the matrix output by CSVParser has to be split vertically to get two separate samples.
  3. Send the training data to ApproxBuilder. This block trains the model and outputs it to a port by default. In this tutorial, ApproxBuilder is reconfigured to save the model to a file on disk. Note that you can also combine both — for example, if the model is used in the same workflow, but you want to save a copy of it after training.

Loading Sample

Begin with configuring a CSVParser to load and parse the training sample, approx_train.csv.

  • Add a CSVParser block. Name it Sample.

Open the Sample block configuration. When you load a file, the block displays a results preview (parsed sample) in the Preview pane, so you can check whether the file is parsed correctly and adjust settings when needed.

../_images/page_tutorials_approx_sample_02_parser.png
  • In the Input File pane, click b_browse and browse to the sample file in your project (samples/approx_train.csv).
  • Verify that the file is parsed correctly (see the Preview pane). Luckily the file uses the delimiter and comment characters which are default in CSVParser configuration, so no changes are needed.
  • Leave other settings default and click b_ok.

When the workflow starts, Sample will read the file, convert the data into a RealMatrix and output this matrix to the matrix port.

Preparing Sample

The matrix output by Sample cannot be sent directly to an ApproxBuilder block because it contains both variable and response (function) values. ApproxBuilder accepts variable and response values to different ports, so the matrix should be first split into two samples.

Note

ApproxBuilder has separate inputs for variables and responses because in many cases these samples come from different sources.

To separate the input and response values you can use a Submatrix block.

  • Add a Submatrix block. Name it Splitter.

Open the Splitter block configuration. This block allows to define submatrices that include specified rows and columns from the input matrix. To add a submatrix, click b_blconf_add in the toolbar (default configuration is empty).

Select the input part of the training sample.

../_images/page_tutorials_approx_sample_03_submatrix.png
  • Specify the name: x_sample.
  • Leave rows default (: selects all rows).
  • Specify columns: 0-1 (the values of variables are first two columns, indexed 0 and 1).

Similarly, select the response part.

  • Name: f_sample.
  • Rows: : (default, all rows).
  • Columns: 2 (the third column in the file, indexed 2, contains the response values).

Verify Splitter configuration.

../_images/page_tutorials_approx_sample_04_splitter.png

Adding a submatrix automatically adds its corresponding output port to the block, so Splitter now has two outputs: x_sample and f_sample, and one (default) input: matrix (see the Ports tab).

  • Finally, link Sample.matrix to Splitter.matrix.

This completes the sample preparations. Splitter will receive a matrix from Sample, split it according to your submatrix settings, and output the input and response parts to the Splitter.x_sample and Splitter.f_sample ports. Next, these ports have to be connected to ApproxBuilder inputs.

Model Training

Approximation models are trained by ApproxBuilder. Basically you only need to send training data to ApproxBuilder; other settings are optional.

  • Add an ApproxBuilder block. Name it Builder.
  • Link Splitter.x_sample to Builder.x_sample, Splitter.f_sample to Builder.f_sample.

By default, the trained model is only output to Builder.model. To save it to disk, select the file to save in Builder configuration.

  • Open Builder configuration and click b_browse in the Output model pane to bring up the file selection dialog.
../_images/page_tutorials_approx_sample_05_modelfile.png
  • In the File Origin pane choose the Project origin.
  • Specify the file name, for example: model.gtapprox.
  • Leave other settings default and click b_ok to close the dialog.

As a result, the saved model (model.gtapprox) will be found in the project directory after the workflow finishes (for more details on this file configuration, you may check FAQ: How do the files of project origin work with absolute and relative paths?).

The model is saved to a binary format native to pSeven. To evaluate the model, you can then load it into an ApproxPlayer block and send it an input sample to get predictions on function values. If you want to evaluate the model out of pSeven, you can also export the model code after training, using an ApproxPlayer block.

Model Export

Note

This part is optional. Using exported models is out of the scope of this tutorial; the instructions below only explain how to get the model code from pSeven.

pSeven supports approximation model export to an M-file (.m), MEX-file (.mex), or C source code (.c). Model export is an additional feature of ApproxPlayer (the main purpose of this block is model evaluation in pSeven, see the Model Evaluation tutorial).

  • Add an ApproxPlayer block. Name it Export.

ApproxPlayer accepts the model to the model input by default. It can also load the model from file in pSeven format (like the model.gtapprox file configured in Builder). Note that despite you have reconfigured Builder to save the model to disk, the Builder.model output is not disabled. Thus currently it is easier to pass the model directly from Builder to Export.

  • Link Builder.model to Export.model.

Note

Loading the model from file would require explicit synchonization between Builder and Export. That is, Export should start only after Builder finishes and writes the model to disk, otherwise Export will not be able to find the model file. Block synchronization is easy in fact: if you connect Builder.done to Export.do, it will make Export to wait until Builder finishes. You can test this version of workflow later.

Select the export file in the Export block configuration.

  • Open Export configuration and click b_browse in the Export File pane to bring up the file selection dialog.
../_images/page_tutorials_approx_sample_06_exportfile.png
  • In the File Origin pane choose the Project origin.
  • Specify the file name, for example: gtapprox_model.m. Note that the export format is determined by the file extension (here, .m). Alternatively, you can click b_browse and select the export file location, name and format in the file dialog (this dialog explicitly lists the supported formats).
  • Leave other settings default and click b_ok to close the dialog.

Click b_ok in the Export configuration to save settings.

Note

If you want to load a model from a file, you will also have to change settings in the Model File pane: select the Project origin and input model.gtapprox in the File path field in the file configuration dialog.

Workflow

Finished workflow is a typical representation of a sample-based approximation task.

../_images/page_tutorials_approx_sample_07_wf.png

The Sample and Splitter blocks are used to prepare the training sample for Builder. When Builder receives the prepared input and response samples, it starts model training; when completed, Builder outputs the model to its model output port and saves a copy of this model to model.gtapprox in the project directory (binary model format specific to pSeven). Optionally, the model from port is sent to the Export block which saves the model code.

  • Save the workflow and run it to see the results.

Results

The main result of this tutorial is the approximation model model.gtapprox saved to the prjTutorials project directory in a binary pSeven format. After the workflow finishes, you can switch to Workspace to see this file in the Project pane.

  • Verify that model.gtapprox exists after running the workflow. Note that this model is required in the Model Evaluation tutorial.

If you have added the optional model export part, you shall also find gtapprox_model.m in the project directory. Since it is a text file, its contents can be viewed in Workspace.

../_images/page_tutorials_approx_sample_08_mfile.png

Conclusion

This tutorial shows only the basics of training approximation models in pSeven. It does not discuss advanced ApproxBuilder options which allow to control the quality of approximation and can have a noticeable effect on model performance and behavior. For more details, you can see the ApproxBuilder block page, in particular section Options which details various settings available in ApproxBuilder.

The model trained by ApproxBuilder can be used in pSeven to predict function values at points not found in the training sample. Depending on the task, approximation models are either saved to disk or sent directly to an ApproxPlayer block for evaluation. This tutorial suggests saving the model to a file (model.gtapprox) so it can be used later; model evaluation is then explained in the Model Evaluation tutorial. Note that you can also merge the training and evaluation workflows, sending the model from ApproxBuilder to ApproxPlayer like it was done in section Model Export.