7.1. Introduction¶

Generic Tool for Design of Experiments (GTDoE) implements the procedure for selecting an efficient experiment design with the general goal of maximizing the amount of information gained from a limited number of design points and responses.

Sections

Definitions
Generation Modes
Random and Deterministic Generation
Types of Variables
Uniformity

7.1.1. Definitions ¶

This manual uses the following definitions:

Design variables

Parameters or quantities to be varied during the experiment. A synonym in the statistical literature is factors.

Design variable is represented as an element \(x_k\) in a \(d\)-dimensional point \(x\), where \(k = 1,\dots,d\), and \(d\) is the total number of design variables.
Design space

The \(d\)-dimensional box \(B\) defined by the lower and upper bounds of each design variable. So, \(B = \prod_{k=1}^d [a_k, b_k] = \left\{ x \in \mathbb{R}^d | a \le x \le b \right\}\).
Design vector or design point:

A specific instance of \({x}\), where all values in the vector \({x}\) fall within the bounds of the design space.
Response

A dependent quantity that is measured or evaluated for a specific design point.
Design of experiments or DoE

A finite subset of the design space. It will be denoted as \(X = \{{x}^i\}_{i=1}^{N}\), where \(N\) is the number of design points in DoE.
DoE technique

A DoE generation method.

7.1.2. Generation Modes ¶

GTDoE has four different generation modes, each pursuing a different goal when generating DoE:

Sequential
Batch
Adaptive Blackbox-Based
Adaptive Sample-Based

7.1.2.1. Sequential ¶

The sequential space-filling mode requires the user to specify the bounds in which to generate points only.

In this mode, data sample would not be created initially. Instead, it would provide infinite points generator. This mode is compatible with a limited number of generation techniques only.

Supported techniques: Random, Halton sequence, Sobol sequence, Faure sequence

7.1.2.2. Batch ¶

For the batch space-filling mode the user needs to the specify bounds in which to generate points and the number of ponts to generate.

As a result, the user gets data sample which can not be further updated.

In this mode, all techniques except Adaptive DoE may be used.

Supported techniques: Random, LHS, OLHS, Full–Factorial, Fractional Factorial, Parametric Study, Optimal RSM designs

7.1.2.3. Adaptive Blackbox-Based ¶

For the blackbox-based adaptive DoE the user needs to specify blackbox, input variables’ bounds and number of points to generate. Optionally the user can also specify the initial sample.

The GTDoE/Technique option is ignored in the blackbox-based mode, but you can use GTDoE/Adaptive/InitialDoeTechnique to control initial DoE generation, if there is no initial sample.

In this mode, the bounds are intended to contract the point generation area. Due to this, the generator does not automatically apply the bounds set for blackbox variables. You have to ensure that the generator bounds at least intersect with the blackbox variable bounds, because DoE points are generated only in the area of intersection. If the areas of the generator and blackbox bounds do not intersect, an exception is raised.

Supported techniques: Adaptive DoE

7.1.2.4. Adaptive Sample-Based ¶

The sample-based adaptive DoE has two modes:

Filling Mode — aims to add new points to the design according to the selected criterion.
Property Preservation Mode — aims to preserve certain space-filling property of the given initial design.

7.1.2.4.1. Filling Mode¶

The filling mode aims to add new points to the design according to the selected criterion.

To run the sample-based adaptive DoE in the filling mode set GTDoE/Technique to "Adaptive" and specify bounds, count and an initial sample.

The initial sample can include either values of variables, or both variable and response values. The former effectively limits adaptive DoE to random uniform generation. Other generation methods are based on approximation models and require a response sample for model training.

Supported techniques: Adaptive DoE

For more technical details, please refer to API documentation.

7.1.2.4.2. Property Preservation Mode¶

The property preservation mode aims to preserve certain space-filling property of the given initial design.

To run the sample-based adaptive DoE in the Property-preservation mode set GTDoE/Technique to "LHS" or "OLHS" specify bounds, count and an initial design which includes values of variables.

In this mode, the generator preserves LHS space-filling property, if possible. That is if the size of the initial design is a factor of the additional sample size count, and if all variables are continuous, then the union of the initial design and generated design has "LHS" space-filling property. Otherwise, the generator tries to construct a design so that the union of the initial design and the generated design is as close to "LHS" design as possible, while fulfilment of "LHS" property is not guaranteed.

Supported techniques: LHS, OLHS

For more technical details, please refer to API documentation.

7.1.3. Random and Deterministic Generation ¶

Some DoE techniques involve pseudo-randomly generated numbers. This approach has the advantage of diversifying the results, but also has the drawback of making the results less reproducible and harder to track. The techniques implemented in GTDoE can be divided into two groups as follows:

Involving pseudo–randomness: Random, LHS, OLHS, Adaptive DoE, Optimal RSM designs.
Fully deterministic: Full–Factorial, Fractional Factorial, Halton sequence, Sobol sequence, Faure sequence, Parametric Study.

GTDoE pseudo-random techniques have seed parameter. By fixing the seed one can reproduce the results of the random process. By default, the seed is set by the internal clock.

7.1.4. Types of Variables ¶

New in version 1.7.2: initial support for categorical variables (Optimal Design technique only).

Changed in version 1.9.1: added categorical variables support for the Full Factorial, Latin Hypercube Sampling, and Optimal Latin Hypercube Sampling techniques.

Changed in version 1.10.1: added categorical variables support for the Box-Behnken technique.

Changed in version 3.0: added categorical variables support for the Fractional Factorial technique.

Changed in version 6.0: added categorical variables support for the Orthogonal Array technique.

Changed in version 6.3: added categorical variables support for the Adaptive Design of Experiments.

Changed in version 6.16: allowed categorical variables with 1 level (effectively constants).

New in version 6.14: initial support for discrete variables (Adaptive Design technique only).

Changed in version 6.26: discrete and categorical variables are now supported by more space-filling techniques; both types are supported in Adaptive Design of Experiments with some limitations; Adaptive Design supports discrete variables but does not support categorical.

Changed in version 6.29: added stepped variables support.

Changed in version 6.33: added categorical variables support in the Adaptive Design technique.

Changed in version 6.36: for compatibility with Adaptive Design, most techniques now handle categorical variables in a similar manner — run an independent DoE for each possible combination of categories — so generation result may be used as an initial sample in Adaptive Design without issues; only the Full Factorial, Fractional Factorial and Orthogonal Array techniques do not change their behavior from 6.35.

Changed in version 6.45: added the support for mixed designs (include continuous and discrete variables) to the Adaptive Design of Experiments technique.

GTDoE recognizes the following types of variables:

Continuous — a generic continuous variable that can take any value within its lower and upper bounds.
Discrete — a discontinuous numeric variable that can take any value from the predefined set of allowed values (levels).
Stepped — a special kind of a continuous variable that is bound to a certain grid. This type is intended for a rather common case of a variable that represents a continuous quantity, but can be changed only in certain increments due to various practical reasons.
Categorical — a special type of variable that has a finite number of possible values (categories) and does not assume any numerical meaning, order or metrics. If your design includes categorical variables, most techniques generate all possible combinations of categories and run a DoE for each of those combinations. Notable exceptions are the Fractional Factorial and Orthogonal Array techniques, which generally handle categorical variables in the same way as discrete ones.

The type of a variable is specified when adding a variable in the GTDoE problem definition, using the @GT/VariableType hint in add_variable(). Categorical variables may also be specified using the GTDoE/CategoricalVariables option.

The continuous type is default, supported by all GTDoE techniques.

Discrete variables have a finite number of possible values (levels). Levels are numerical, so they assume natural numerical order, and numerical metrics are applicable. For example, a discrete variable may be defined with levels [0.0, 0.1, 0.25, 0.6, 1.0].

Discrete variables are supported by all space-filling DoE techniques except the low-discrepancy sequence techniques (Sobol, Halton, Faure). The Box-Behnken technique has an additional limitation — it requires at least 3 levels for each discrete variable, except frozen ones (constants), which are defined with 1 level.

The stepped variable type is intended for variables that represent a continuous quantity, but for some reason can be changed only in certain increments. Such variables often take values distributed evenly within bounds, from lower to upper, at regular intervals (steps). For example, a stepped variable may be defined with steps [0.0, 0.1, 0.2, ..., 0.8, 0.9, 1.0]. Irregular steps are also supported but in this case the step intervals should at least be of the same order of magnitude. Setting the variable type to stepped implies that the underlying variable-response dependency is continuous, despite the variable has a limited set of allowed values — so, for example, first order methods are still applicable, contrary to the discrete type, which implies discontinuity. This distinction between the discrete and stepped types becomes important in adaptive DoE and optimization techniques. Space-filling DoE techniques support stepped variables primarily for compatibility, so that a GTDoE problem definition including stepped variables may be used in a combined DoE and optimization study that employs several techniques.

Stepped variables are supported by all DoE techniques except low-discrepancy sequence techniques (Sobol, Halton, Faure). Also, the Box-Behnken technique supports stepped variables but requires each stepped variable to have 3 or more allowed values, except frozen variables (constants), which are defined with 1 level.

Categorical variables are of a special type that does not assume any numerical meaning, natural order of values, or metrics. They have a finite number of possible values (categories). For example, color (“black”, “green”, “red”), gender (“male”, “female”), blood type of a person (“A”, “B”, “AB”, “0”) can be represented by categorical variables.

If your design includes categorical variables, GTDoE generates all possible combinations of categories, then for each of those combinations generates a DoE using the technique you have specified — except when you use the following techniques, which do not generate all possible combinations:

Orthogonal Array — produces the minimum number of combinations required to generate an array of the requested type and size (see Orthogonal Array for details).
Fractional Factorial — combinations are generated as per the generating string (see Fractional Factorial for details).

Note

The above behavior regarding categorical variables is new in version 6.36. For compatibility with previous versions, GTDoE also supports switching back to behavior from 6.35 and below — see section Version Compatibility Issues for details.

7.1.5. Uniformity ¶

One of the most important properties of a DoE is its uniformity in the design space. Uniformity may refer to a somewhat different aspect of the DoE, and there are many different ways to characterize it, in general. A number of numeric quantities that have been introduced in the literature to measure uniformity are described below.

Let \(X = {x_i , i=1,...,N }\) be an \(N\)-point DoE in the design space \(B = \prod_{k=1}^d [a_k, b_k]\). Intuitively obvious quality measure of DoE distribution in the design space. This measure of efficiency may have quite different strict formulations. Below we provide some common formulations.

Discrepancy , \(D\)

\[D(X) = \sup _{a \le u < v \le b} \left| \frac{\#\{x \in X | u \le x \le v \}}{N} - \prod_{k = 1}^{d} \frac{(v_k - u_k)}{(b_k - a_k)}\right|,\]

where \(\#(\cdot)\) is the number of points.

Minimax Interpoint Distance , \(\rho\)

\[\rho(X) = \max_i\min_{j: j\ne i} \|x^i - x^j\|,\]

where \(\|\cdot\|\) is the Euclidean norm in \(\mathbb{R}^d\).

The \(\phi\) -metric

\[\phi_p(X) = \left(\sum _{i < j}^{N} \|x^i-x^j\|^{-p}\right)^{1/p}.\]

The special case \(p = 2\) is sometimes referred to as potential energy.

In the limit \(p\to\infty\) we get \(\phi_\infty(X) = \frac{1}{\min_{i < j}\|x^i-x^j\|}.\)

Note that more uniform DoE corresponds to lower values of the above metrics.

Some of the techniques implemented in GTDoE are based on these quality measures. In particular:

The above metrics are used for the internal optimization in the OLHS technique (see Optimized LHS).

The Halton and Sobol sequences are known as low–discrepancy sequences with respect to the above definition of discrepancy. This metric is not, however, directly used in the construction of DoE using these techniques.

GTDoE enables the user to calculate the above metrics for any DoE (see Measures).