DoE

Tag: Exploration

The DoE block generates data samples with specific properties. It can use a variety of generation techniques and has several working modes.

See also

GTDoE guide
The guide to the Generic Tool for Design of Experiments (GTDoE) — the pSeven Core component for data sample generation used by the DoE block.

Introduction

The DoE block is mainly used to generate input samples for other blocks — for example, to test approximation models or to obtain a response sample from a block evaluating some dependency, as shown in the Model Evaluation and Sampling Loop tutorials. In other working modes it can be used as an infinite point generator or as a driver of an adaptive DoE cycle. There are four modes:

  • Batch — default, generates a sample.
  • Sequential — infinite point generator.
  • Adaptive blackbox-based — generates a sample that allows to gain maximum information from some dependency (the blackbox).
  • Adaptive sample-based — can update an existing DoE sample or analyse a pair of input and response samples to provide a new input sample.

The DoE block supports a wide selection of generation techniques, but not all of them are available in all modes. Most techniques can be configured further with various options. See GTDoE/Technique for the list of available techniques and section Options for configuration details.

See also

Model Evaluation
Shows an example of using the DoE block in section Test Sample Generation.

Configuration Dialog

The DoE block is mainly configured through options, and certain settings are accepted to input ports. The Configuration tab contains the mode selector 1 and the list of options 2.

../_images/page_blocks_DoE_config.png

Options can also be set on the common Options tab — see section Block Configuration for details.

The generation algorithm to use is specified by the GTDoE/Technique option. Note that available techniques are different depending on the selected working mode. Corresponding notes are given in the descriptions of working modes below.

Batch Mode

This is the default working mode. In the batch mode, DoE has two required inputs: bounds and count. Values received to these ports describe the design space (dimension and bounds) and specify the number of points to generate, respectively. The generated sample is output to points.

../_images/page_blocks_DoE_batchbounds.png

Bounds must be a RealMatrix containing exactly two rows: the first row sets lower bounds, the second sets upper bounds. The number of columns in this matrix becomes the space dimension (the number of design variables).

Ususally it is convenient to assign bounds and count values to ports, as shown above, or to add them to parameters so they can be set in Run.

The sample output to points is a RealMatrix where each row is a design point. That is, the number of rows is set by count, and the number of columns in points is equal to the number of columns in bounds.

In the batch mode, all techniques except the “Adaptive” (adaptive sample generation) are available. Selecting “Auto” defaults to “LHS” (Latin hypercube sampling).

Sequential Mode

In this mode DoE works as an infinite point generator. Design space bounds are set by the bounds option (there is no bounds port in this mode). Option value is a RealMatrix with the same properties as described in the Batch Mode section.

In sequential mode, DoE waits for a signal to the input port named next. Every time a signal is received, the block outputs a single point to the point output. A point is a RealVector, and its dimension is equal to the number of columns in the bounds matrix.

This mode supports only those techniques that can continue generation indefinitely — that is, “RandomSeq” (random uniform), “SobolSeq” (Sobol sequence), “HaltonSeq” (Halton sequence), and “FaureSeq” (Faure sequence). Selecting “Auto” defaults to “SobolSeq” (Sobol sequence).

Blackbox-Based Adaptive Mode

In this mode DoE is connected to another block that should evaluate responses for the points generated iteratively. The DoE block analyses these responses and adds new points trying to gain maximum information for the specified budget (sample size).

../_images/page_blocks_DoE_adoebbwf.png

To start, it requires the bounds and budget inputs; these are similar to the bounds and point count settings in the batch mode (the budget is the maximum number of evaluations DoE is allowed to request from the blackbox). During the generation cycle, DoE outputs subsamples to x and waits for responses at f. Both input and response samples have RealMatrix type; the number of columns in the x matrix is equal to design space dimension — that is, the number of columns in bounds. When the budget limit is reached, the block outputs the final DoE sample to points and the response sample (all results received from the blackbox) to responses.

Optionally you can also send initial input and response samples to init_x and init_y. Two variants are possible:

  • If only init_x is received, it is used as an initial DoE sample and will be sent to the blackbox for evaluation.
  • If both init_x and init_y are received, this data is used to train an initial approximation model.

Note that in the adaptive mode the DoE block trains and evaluates internal approximation models to study the blackbox behavior. Due to this the adaptive generation takes significant time and requires much more computational resources than the batch generation. See GTDoE/Adaptive/Accelerator and other adaptive options for the ways to control the complexity of the adaptive generation algorithm.

The blackbox-based adaptive mode allows to select the “Adaptive” technique only.

Sample-Based Adaptive Mode

This mode is similar to the batch mode: requires bounds and count, outputs generated sample to points. Sample-based adaptive mode does not require a blackbox. Instead, it works with the initial sample received to init_x and init_y:

  • If only init_x is received, DoE uses it as an initial sample and adds new points uniformly. If this sample is an (optimized) Latin hypercube, DoE can preserve sample properties (see further).
  • If both init_x and init_y are received, this data is used to train an approximation model, and the result is a sample that can improve the model (that is, the input part of such a sample).

The sample-based adaptive mode allows to select “Adaptive”, “LHS”, and “OLHS” techniques (“Auto” defaults to “Adaptive”). Selecting “LHS” or “OLHS” is intended to inform the block that the initial sample is an (optimized) Latin hypercube. In this case, DoE changes the adaptive generation method. It will generate a new (optimized) Latin hypercube that extends the initial one — that is, space-filling properties of the initial sample are preserved.

Options


GTDoE/Adaptive/Accelerator

Five-position switch to control the trade-off between speed and accuracy of approximations used by adaptive DoE.

Value:integer in range \([1, 5]\)
Default:1

This option is identical to GTApprox/Accelerator and controls the training time of approximator used by adaptive DoE.

GTDoE/Adaptive/AnnealingCount

The number of criterion evaluations in simulated annealing procedure.

Value:integer in range \([1, 2^{31}-2]\)
Default:0 (auto)

Each sample point added by adaptive DoE process is the result of optimization by selected adaptive DoE criterion. This optimization is iterative, and the more iterations it can make, the better the new point will probably be. This option adjusts the number of optimizer iterations. Note that it also directly affects the working time of adaptive DoE algorithm.

Default (0) is an automatic estimate based on the design space dimensionality (\(d_{in}\)), which sets the number of iterations to \(min(500 + 100 \cdot d_{in}, 3000)\).

GTDoE/Adaptive/Criterion

Control the behavior of adaptive generation algorithm.

Value:"IntegratedMseGainMaxVar", "MaximumVariance", "Uniform", or "Auto"
Default:"Auto"

Sets the criterion for placing new DoE points.

  • "IntegratedMseGainMaxVar": most accurate and time-consuming method. Estimates the error of approximation with new candidate point added to the sample and selects the point which minimizes the expected error.
  • "MaximumVariance": samples points in the region with highest uncertainty, relying on model accuracy evaluation. Faster but less accurate.
  • "Uniform": does not aim to increase model quality. Generates next sample point in such a way that overall sample is as uniform as possible. Note that in fact this is the only valid criterion for the sample-based adaptive DoE when only the input part of the initial sample is available.
  • "Auto": defaults to "MaximumVariance" if both input and response parts of the initial sample are available, and to "Uniform" if only the input part is available.

GTDoE/Adaptive/ExactFitRequired

Require all approximations to fit the training data exactly.

Value:Boolean
Default:off

If this option is on (True), all approximations constructed in the adaptive DoE process fit the sample points exactly. If off (False) then no fitting condition is imposed.

This option sets GTApprox/ExactFitRequired on for the internal approximator used in the adaptive DoE.

GTDoE/Adaptive/InitialCount

The size of initial sample.

Value:0 (auto), or an integer in range \([2 \cdot d_{in} + 3,\) budget \(]\) (except when GTDoE/Adaptive/InitialDoeTechnique is "FullFactorial"), where \(d_{in}\) is the design space dimensionality, budget is blackbox budget
Default:0 (auto)

In case initial training set was not provided by user, the tool generates a sample using the technique specified by the GTDoE/Adaptive/InitialDoeTechnique option (Latin hypercube sampling by default). This sample is then evaluated with the blackbox, and the generated inputs and obtained blackbox outputs are used as an initial training set.

GTDoE/Adaptive/InitialCount sets the size of this sample. If left default, the size will be automatically set equal to \(2 \cdot d_{in} + 3\), where \(d_{in}\) is the design space dimensionality.

Note that if GTDoE/Adaptive/InitialDoeTechnique is set to "FullFactorial", the GTDoE/Adaptive/InitialCount value has to be greater or equal to \(2^{d_{in}}\), so valid range becomes \([max(2 \cdot d_{in} + 3, 2^{d_{in}}),\) budget \(]\). It also means that for \(d_{in} \geq 4\) user must change the default value in order to generate the initial full factorial sample properly.

This option is used only by the blackbox-based adaptive DoE. In the sample-based adaptive DoE it is ignored.

GTDoE/Adaptive/InitialDoeTechnique

DoE technique used to generate an initial sample in the adaptive mode.

Value:"RandomSeq", "FaureSeq", "HaltonSeq", "SobolSeq", "BoxBehnken", "FullFactorial", "LHS", "OLHS", "OptimalDesign", or "ParametricStudy"
Default:"LHS"

New in version 1.10.0: allows to select the Parametric Study technique.

New in version 1.10.1: allows to select the Box-Behnken design generation technique.

New in version 3.5: allows to select the Optimal Design technique (requires input dimension 4 or greater).

DoE technique used for initial training set generation in case when a training sample is not provided by user. Note that the Optimal Design technique can be used only if the blackbox input dimension is 4 or greater.

This option works with the blackbox-based adaptive mode only; in the sample-based adaptive mode it is ignored.

GTDoE/Adaptive/InternalValidation

Enable or disable internal validation for approximations used by adaptive DoE.

Value:Boolean
Default:off

Enables internal validation of the approximation models built by adaptive DoE process. Note that internal validation scores are computed for every model built, including intermediate models built between DoE iterations, so switching this on may significantly increase DoE generation time.

Note that in the sample-based adaptive DoE mode, internal validation requires the response part of the initial sample. This is due to the fact that sample-based adaptive DoE with initial input part only is just random uniform generation which does not involve approximation models, so the option has no sense.

GTDoE/Adaptive/OneStepCount

The number of points added to DoE on each iteration.

Value:integer in range \([1, 2^{31}-2]\)
Default:1

Each adaptive DoE step may generate more than one point. This option sets the amount of points requested on each step. Note that it is not always possible to generate requested number of points; in such case maximum possible number of points is generated.

This option takes effect in the blackbox-based adaptive DoE mode only. Sample-based adaptive DoE disregards this option.

GTDoE/Adaptive/TrainIterations

The number of adaptive DoE iterations between rebuilds of approximation model.

Value:integer in range \([1, 2^{31}-2]\)
Default:1

Usually the approximation model used by adaptive DoE process is expected to only change slightly when a few points are added to the training set, so there is no need to rebuild the model at every step. This assumption, however, is not always true, especially when training set is not big enough to ensure that approximation has reasonable quality. This option sets the number of steps between rebuilds.

This option takes effect in the blackbox-based adaptive DoE mode only. Sample-based adaptive DoE disregards this option.

GTDOE/BoxBehnken/IsFull

Always generate a full Box-Behnken design regardless of the requested number of points.

Value:Boolean
Default:off

New in version 1.10.1.

If this option is on (True), GTDoE always generates a full Box-Behnken design including \(2d(d - 1) + 1\) points where \(d\) is the number of design variables. In this case, the setting for the number of points is silently ignored.

By default (when GTDOE/BoxBehnken/IsFull is False), Box-Behnken design generation respects the set number of points to generate by randomly excluding some points from a full design to return the requested number of points in case the latter is less than the number of points in the full design (\(2d(d - 1) + 1\)). Note that the full design is the maximum sample size which can be generated by the Box-Behnken technique, so if the requested number of points exceeds \(2d(d - 1) + 1\), GTDOE/BoxBehnken/IsFull is False and GTDoE/Technique is set to "BoxBehnken", point generation will not start.

GTDoE/CategoricalVariables

Declares categorical variables.

Value:a list in JSON format
Default:[] (empty list)

New in version 1.9.1: makes the GTDoE/OptimalDesign/CategoricalVariables option obsolete.

New in version 1.10.1: also supported by the Box-Behnken design generation technique.

New in version 3.0: also supported by the Fractional Factorial technique.

New in version 6.0: also supported by the Orthogonal Array technique.

New in version 6.3: also supported when using adaptive DoE.

Categorical variables are supported by several space-filling GTDoE techniques (since 1.9.1: Full Factorial, LHS, OLHS, and Optimal Design; since 1.10.1: Box-Behnken design; since 3.0: Fractional Factorial; see GTDoE/Technique; since 6.0: Orthogonal Array; see GTDoE/Technique). This option specifies the indices of categorical variables and their categories for the GTDoE generator. Option value is a list in the following format: [idx, [ctg, ctg, ...], ...], where idx is a zero-based index of the variable in the list of blackbox variables or in the lists contained in the bounds tuple, and ctgs are category numbers (only int and float values are accepted as category numbers). Also note that when this option is used with the Fractional Factorial technique, each categorical variable must have exactly two categories.

For example, [0, [2., 3.], 4, [0.1, 0.2, 0.3]] specifies that two of the design variables (indexed 0 and 4) are categorical, and defines the categories for each.

Note

Categorical variables are not affected by bounds — that is, corresponding elements in the bounds tuple are ignored when generating a value of a categorical variable. Still, some placeholder numeric values should be present in bounds just to keep the order of variables.

Note

For techniques other than Full Factorial, Fractional Factorial, LHS, OLHS, Optimal Design, Orthogonal Array, Box-Behnken and adaptive (sample- or blackbox-based) the GTDoE/CategoricalVariables option is ignored. Also it is ignored completely in the sequential space-filling mode.

Note

Adaptive DoE requires that either none or all variables are categorical. It means that if DoE is used in the sample-based or blackbox-based adaptive generation mode, GTDoE/CategoricalVariables must specify categories for each variable. Note also that in this case the only supported initial DoE techniques (see GTDoE/Adaptive/InitialDoeTechnique) are "LHS" and "OLHS", and the GTDoE/Adaptive/AnnealingCount and GTDoE/Adaptive/OneStepCount options will be ignored.

Note

Box-Behnken design requires at least three categories defined for every categorical variable.

Note

The Fractional Factorial technique implements 2-level fractional design only, so when used with this technique, GTDoE/CategoricalVariables must define exactly two categories (levels) for each categorical variable.

GTDoE/Deterministic

Require generation to be reproducible.

Value:Boolean
Default:off

If this option is on (True), then a fixed seed (the one set by the GTDoE/Seed option) is used in all randomized GTDoE algorithms.

GTDoE/FractionalFactorial/GeneratingString

Specifies alias structure for a fractional factorial design, optional.

Value:a string containing generator expressions, separated by whitespace
Default:"" (empty)

This option uses the conventional fractional factorial notation to create an alias structure that determines which effects are confounded with each other. Each generator expression contains one or more letters. Single letter expression specifies a main factor, letter combinations give interactions for confound factors. For example, "a b c ab bcd d" means that variables indexed 0, 1, 2, and 5 are main factors and a full factorial design for them is generated; design values for variables 3 and 4 are generated from main factor values: for each point, the value of variable 3 is the product of variables 0 and 1 (“ab”), and the value of variable 4 is the product of variables 1, 2, and 3 (“bcd”).

Note

In 2-level fractional factorial design implemented by the Fractional Factorial technique, factor levels (possible values of variables) are conventionally denoted \(-\) (low level) and \(+\) (high level). Above, “product” actually means the rule for selecting a high or low value for a dependent factor, and may be understood as a product of values mapped to \(\pm 1\) or logical equality (XNOR).

Note that the number of generator expressions is equal to the total number of variables, counting both categorical variables, specified by GTDoE/CategoricalVariables, and continuous variables.

If GTDoE/FractionalFactorial/GeneratingString is empty, main factors are selected by GTDoE/FractionalFactorial/MainFactors, and remaining generator expressions are created automatically.

If both these options are left default (empty) when selecting the Fractional Factorial technique (see GTDoE/Technique), GTDoE first selects a number of main factors so it is enough to generate the requested number of points, then adds generator expressions for the remaining factors.

If both GTDoE/FractionalFactorial/GeneratingString and GTDoE/FractionalFactorial/MainFactors are specified, they must be consistent (that is, select the same main factors).

GTDoE/FractionalFactorial/MainFactors

Specifies main (independent) design factors, optional.

Value:list of indices of main factors (variables)
Default:[] (empty list)

This option provides a simplified way to create an alias structure for a fractional factorial design (compared with GTDoE/FractionalFactorial/GeneratingString). The list contains only the indices of variables to be selected as main factors, and interactions for confound factors are then created automatically (unless you also set GTDoE/FractionalFactorial/GeneratingString).

If both GTDoE/FractionalFactorial/GeneratingString and GTDoE/FractionalFactorial/MainFactors are specified, they must be consistent (that is, select the same main factors).

If both these options are left default (empty) when selecting the Fractional Factorial technique (see GTDoE/Technique), GTDoE first selects a number of main factors so it is enough to generate the requested number of points, then adds generator expressions for the remaining factors.

GTDoE/LogLevel

Set minimum log level.

Value:"Debug", "Info", "Warn", "Error", "Fatal"
Default:"Info"

If this option is set, only messages with log level greater than or equal to the threshold are dumped into log.

GTDoE/MaxParallel

Set the maximum number of parallel threads to use when generating DoE.

Value:positive integer or 0 (auto)
Default:0 (auto)

New in version 5.0rc1.

GTDoE can run in parallel to speed up design generation. This option sets the maximum number of threads it is allowed to create. Default setting (0) uses the value given by the OMP_NUM_THREADS environment variable, which by default is equal to the number of virtual processors, including hyperthreading CPUs. Other values override OMP_NUM_THREADS.

GTDoE/OLHS/Iterations

Maximum number of optimization iterations in OLHS generation.

Value:integer in range \([2, 65535]\)
Default:300

This option allows user to specify maximum number of optimization iterations for OLHS. OLHS optimization may take a long time, so it may be useful to decrease number of optimization iterations.

GTDoE/OptimalDesign/CategoricalVariables

Deprecated since version 1.9.1: kept for compatibility purposes, use GTDoE/CategoricalVariables instead.

This option is an older version of GTDoE/CategoricalVariables and has the same behavior and valid values, except that it affects only the Optimal Design technique (GTDoE/CategoricalVariables is supported in more techniques).

GTDoE/OptimalDesign/Model

The type of the regression model to optimize for.

Value:"linear", "interaction", "quadratic", or "purequadratic"
Default:"linear"

This option controls the order of the regression model.

  • "linear" — model includes constant and linear terms.
  • "interaction" — model includes constant, linear, and cross product terms.
  • "quadratic" — model includes constant, linear, cross product and squared terms.
  • "purequadratic" — model includes constant, linear and squared terms.

GTDoE/OptimalDesign/Tries

The number of optimal design generation tries.

Value:integer in range \([1, 2^{32}-1]\)
Default:1

Sets maximum number of tries to generate a design from new starting point, using random points for each try.

GTDoE/OptimalDesign/Type

Sets optimality criterion.

Value:"D" (D-optimal) or "I" (I-optimal)
Default:"D"

Specifies the type of objective function to evaluate experimental design.

  • D-optimality (determinant): seeks to minimize \(|(X'X)^{−1}|\), or equivalently maximize the determinant of the information matrix \(X'X\) of the design. This criterion results in maximizing the differential Shannon information content of the parameter estimates.
  • I-optimality (integrated): seeks to minimize the average prediction variance over the design space.

GTDoE/OrthogonalArray/LevelsNumber

Specifies levels for each factor, required.

Value:list of indices of main factors (variables)
Default:no default value

Array with the number of levels for each factor of the orthogonal array . It should contain the same number of elements as the number of factors. Each element should be an integer which is greater or equal than two.

GTDoE/OrthogonalArray/MaxIterations

Maximum number of iterations per dimension for greedy search, optional.

Value:integer in range \([1, 1000]\)
Default:\(10\)

Maximum number of iterations per dimension for the orhogonal array search.

GTDoE/OrthogonalArray/MultistartIterations

Number of algorithm multistart iterations, optional.

Value:integer in range \([1, 1000]\)
Default:\(10\)

Maximum number of algorithm trials to find orthogonal array.

GTDoE/Seed

Fixed random seed.

Value:integer in range \([1, 2^{31}-1]\)
Default:100

This option sets fixed seed value, which is used in all randomized algorithms if GTDoE/Deterministic option is on. If GTDoE/Deterministic is off, the GTDoE/Seed value is ignored.

GTDoE/Sequential/Leap

Sequence leap size.

Value:integer in range \([0, 65535]\)
Default:0

This option allows sequential techniques to leap over elements of sequence. Its value is the leap size (number of elements). Default is no leaping.

Combined with GTDoE/Sequential/Skip, it results in the following: let \(x_i\) be the i-th element of the original sequence, \(l\) be the leap size (GTDoE/Sequential/Leap value), and \(s\) be the skip size (GTDoE/Sequential/Skip value); then the resulting sequence is \(X = \{x_s, x_{s + (l+1)}, x_{s + (l+1) \cdot 2}, ..., x_{s + (l+1) \cdot n}\}\).

GTDoE/Sequential/Skip

Skip elements at the beginning of sequence.

Value:integer in range \([0, 65535]\)
Default:0

This option specifies the number of elements to skip from sequence start. May be combined with GTDoE/Sequential/Leap.

GTDoE/Technique

Specify the generation algorithm to use.

Value:"RandomSeq", "SobolSeq", "HaltonSeq", "FaureSeq", "FullFactorial", "FractionalFactorial", "LHS", "OLHS", "OptimalDesign", "OrthogonalArray", "ParametricStudy", "BoxBehnken", "Adaptive" or "Auto"
Default:"Auto"

New in version 1.10.0: Parametric Study technique.

New in version 1.10.1: Box-Behnken design.

New in version 2.0: sample-based adaptive DoE technique.

New in version 3.0: Fractional Factorial technique.

New in version 6.0: Orthogonal Array technique.

This option allows to specify the DoE generation algorithm explicitly. Note that certain techniques are not compatible with the sequential space-filling mode. Default value, "Auto", selects a compatible technique automatically.

  • "RandomSeq" — random uniform generation.
  • "SobolSeq" — Sobol sequence.
  • "HaltonSeq" — Halton sequence.
  • "FaureSeq" — Faure sequence.
  • "FullFactorial" — uniform mesh. Not compatible with the sequential space-filling mode.
  • "FractionalFactorial" — a design consisting of a subset (fraction) of a full factorial design. Not compatible with the sequential space-filling mode.
  • "LHS" — Latin Hypercube Sampling. Not compatible with the sequential space-filling mode.
  • "OLHS" — Optimized Latin Hypercube Sampling. Not compatible with the sequential space-filling mode.
  • "OptimalDesign" — optimal design for response surface models. Not compatible with the sequential space-filling mode.
  • "OrthogonalArray" —is the design with multilevel discrete design variables. Not compatible with the sequential space-filling mode.
  • "ParametricStudy" — parametric study process (select a central point and generate points from center by changing one component). Not compatible with the sequential space-filling mode.
  • "BoxBehnken" — Box-Behnken design, a classic design for response surface methodology. Requires at least three design variables. Not compatible with the sequential space-filling mode.
  • "Adaptive" — sample-based adaptive DoE (added in 2.0). Acts as a mode selector.

Note that the blackbox-based adaptive mode completely ignores this option. See options GTDoE/Adaptive/InitialDoeTechnique and GTDoE/Adaptive/Criterion if you are using the adaptive mode.