Distribution

Tag: [Beta]

Warning

This is a beta block, and its interface and functionality are still subject to changes. A workflow including a Distribution block may be incompatible with future pSeven releases.

The Distribution block allows to specify a probabilistic model to perform an uncertainty quantification study. This model simulates uncertainties in inputs of the analyzed model.

Introduction

Let there be a model for analysis which input parameters are affected by some kind of uncertainty. The uncertainty quantification is the determination of the effect of input uncertainties on the analyzed model output (see also the UQ block).

The Distribution block allows to specify a probabilistic model that simulates input uncertainties. Uncertainty of any parameter can be specified based on the sample of this parameter examples (see Sample-Based Variables) or based on expert knowledge (see User-Defined Variables).

Note

If you are familiar with OpenTURNS, the Distribution block configuration in the OpenTURNS methodology would be Step B (quantification of the uncertainty sources).

Variables

The Distribution block configuration requires adding variables. Variables relate to the input parametres of analyzed model. No variables are defined by default.

../_images/page_blocks_Distribution_ud_empty.png

While adding the variable, the user can choose its type, distribution and, in case of User defined variable, related distribution parameters.

../_images/page_blocks_Distribution_add_var.png ../_images/page_blocks_Distribution_add_var_edit.png

Types of Variables

Depending on the source of information for probabilistic model construction, there are two types of variables: Sample based and User defined. The type is selected via ‘From sample’ field of variables table.

Probabilistic modeling for Sample based variables is based on the sample of uncertain parameters.

Probabilistic modeling in case of User defined variables is based on expert knowledge on the uncertainties of these variables. User defined type allows to define explicitly the probabilistic model, i.e. to select the types and parameters for distributions and dependencies, describing the uncertainties of the input parameters of the analyzed model.

User-Defined Variables

Sometimes, the only available information on the variables uncertainties is an expert/engineering judgement. It may be based on the analysis of underlying physics, feedback of experience, dedicated literature, etc.

In this case User defined type for variable can be used. It allows to select a parametric model and set its parameters. It is available a list of parametric models that describe various types of uncertainty thanks to a small number of parameters (see Parametric Distributions).

../_images/page_blocks_Distribution_ud.png

Sample-Based Variables

Sample based type of variables allows to choose probabilistic model parameters automatically based on the sample provided by the user.

This type allows to select one of parametric or non-parametric models (see Distribution List) and then calculate automatically the parameters of this model. It is also possible to use Auto-selection of distributions types based on Bayesian information criterion (BIC).

The Distribution block enables the use of the Kolmogorov-Smirnov Goodness-of-Fit test that allows to verify the accuracy of the probabilistic model fitting. Also the Anderson Darling Test and the Cramer Von Mises Test for normality are available.

The sample can be loaded via standard pSeven blocks, such as CSVParser (see also Ports). The sample matrix sizes must be equal to the number of examples times the number of variables (see also Variables)

Dependencies

The user can specify the dependencies between variables uncertainties: Independent copula and Normal copula based on Kendall and Spearman correlations are available (see options ‘Copula’ and ‘Correlation matrix’).

Results

Block’s results include the constructed probabilistic model, text report and data for graphics (see also Ports).

The constructed probabilistic model is supposed to be used with a UQ block for the uncertainty quantification study. Text report describes probabilistic model construction and contains its text representation and statistics for the sample (see Sample-Based Variables). Data for graphics include PDF/CDF for marginals of the probabilistic model.

Distribution List

All distributions (models) are divided into two classes: parametric class and non-parametric one. Parametric Distributions are based on a small number of parameters, set by the user or derived from a dataset. Non-parametric Distributions are based on a dataset.

Distributions of both parametric and non-parametric classes are available for Sample based type, but for User defined type only parametric class is available.

Parametric models requires much smaller datasets than non-parametric models, especially if the uncertainty study focus on rare events. But there exists a risk of choosing a non-relevant parametric model, which may result in an inaccurate uncertainty study. The user may avoid this risk by choosing a non-parametric class: results are only data-driven, which ensures robustness.

Parametric Distributions

‘Arcsine’

  • Continuity: continuous
  • Parameters: ‘a’, ‘b’
  • Description: ‘lower bound’, ‘upper bound’
  • Default: -1.0, 1.0
  • Conditions: b > a

‘Beta’

  • Continuity: continuous
  • Parameters: ‘r’, ‘t’, ‘a’, ‘b’
  • Description: ‘first shape parameter’, ‘second shape parameter’, ‘lower bound’, ‘upper bound’
  • Default: 2.0, 4.0, -1.0, 1.0
  • Conditions: r > 0, t > r, b > a

‘Burr’

  • Continuity: continuous
  • Parameters: ‘c’, ‘k’
  • Description: ‘c’, ‘k’
  • Default: 1.0, 1.0
  • Conditions: c > 0, k > 0

‘Chi’

  • Continuity: continuous
  • Parameters: ‘nu’
  • Description: ‘degrees of freedom’
  • Default: 1.0
  • Conditions: nu > 0

‘ChiSquare’

  • Continuity: continuous
  • Parameters: ‘nu’
  • Description: ‘degrees of freedom’
  • Default: 1.0
  • Conditions: nu > 0

‘Const’

  • Continuity: discrete
  • Parameters: value
  • Description: constant value
  • Default: 0
  • Conditions: None

‘Dirichlet’

  • Continuity: continuous
  • Parameters: ‘theta1’, ‘theta2’
  • Description: ‘theta1’, ‘theta2’
  • Default: 1.0, 1.0
  • Conditions: theta1 > 0, theta2 > 0

‘Exponential’

  • Continuity: continuous
  • Parameters: ‘lambda’, ‘gamma’
  • Description: ‘scale parameter’, ‘shift parameter’
  • Default: 1.0, 0.0
  • Conditions: lambda > 0

‘FisherSnedecor’

  • Continuity: continuous
  • Parameters: ‘d1’, ‘d2’
  • Description: ‘first scale parameter’, ‘second scale parameter’
  • Default: 1.0, 1.0
  • Conditions: d1 > 0, d2 > 0

‘Gamma’

  • Continuity: continuous
  • Parameters: ‘k’, ‘lambda’, ‘gamma’
  • Description: ‘k’, ‘lambda’, ‘gamma’
  • Default: 1.0, 1.0, 0.0
  • Conditions: k is integer, k > 0, lambda > 0

‘GeneralizedPareto’

  • Continuity: continuous
  • Parameters: ‘sigma’, ‘xi’
  • Description: ‘scale parameter’, ‘extremal index’
  • Default: 1.0, 0.0
  • Conditions: sigma > 0

‘Gumbel’

  • Continuity: continuous
  • Parameters: ‘alpha’, ‘beta’
  • Description: ‘scale parameter (the inverse)’, ‘location parameter’
  • Default: 1.0, 0.0
  • Conditions: alpha > 0

‘InverseNormal’

  • Continuity: continuous
  • Parameters: ‘lambda’, ‘mu’
  • Description: ‘first scale parameter’, ‘second scale parameter’
  • Default: 1.0, 1.0
  • Conditions: lambda > 0, mu > 0

‘Laplace’

  • Continuity: continuous
  • Parameters: ‘lambda’, ‘mu’
  • Description: ‘scale parameter’, ‘mean value’
  • Default: 1.0, 0.0
  • Conditions: lambda > 0

‘Logistic’

  • Continuity: continuous
  • Parameters: ‘alpha’, ‘beta’
  • Description: ‘mean value’, ‘scale parameter’
  • Default: 0.0, 1.0
  • Conditions: beta >= 0

‘LogNormal’

  • Continuity: continuous
  • Parameters: ‘muLog’, ‘sigmaLog’, ‘gamma’
  • Description: mean value of log(variable), standard deviation of log(variable), real value
  • Default: 0.0, 1.0, 0.0
  • Conditions: sigmaLog > 0

‘LogUniform’

  • Continuity: continuous
  • Parameters: ‘a’, ‘b’
  • Description: ‘lower bound of log(variable)’, ‘upper bound of log(variable)’
  • Default: -1.0, 1.0
  • Conditions: a < b

‘Normal’

  • Continuity: continuous
  • Parameters: ‘mu’, ‘sigma’
  • Description: ‘mean’, ‘standard deviation’
  • Default: 0.0, 1.0
  • Conditions: sigma > 0

‘Rayleigh’

  • Continuity: continuous
  • Parameters: ‘sigma’, ‘gamma’
  • Description: ‘sigma’, ‘gamma’
  • Default: 1.0, 0.0
  • Conditions: sigma > 0

‘Rice’

  • Continuity: continuous
  • Parameters: ‘sigma’, ‘nu’
  • Description: ‘sigma’, ‘nu’
  • Default: 1.0, 0.0
  • Conditions: sigma > 0, nu >= 0

‘Student’

  • Continuity: continuous
  • Parameters: ‘nu’
  • Description: ‘generalised number degree of freedom’
  • Default: 3.0
  • Conditions: nu > 1

‘Triangular’

  • Continuity: continuous
  • Parameters: ‘a’, ‘b’, ‘m’
  • Description: ‘lower bound’, ‘upper bound’, ‘mode’
  • Default: -1.0, 0.0, 1.0
  • Conditions: a <= m <= b

‘Uniform’

  • Continuity: continuous
  • Parameters: ‘a’, ‘b’
  • Description: ‘lower bound’, ‘upper bound’
  • Default: -1.0, 1.0
  • Conditions: a < b

‘Weibull’

  • Continuity: continuous
  • Parameters: ‘alpha’, ‘beta’, ‘gamma’
  • Description: ‘shape parameter’, ‘scale parameter’, ‘location parameter’
  • Default: 1.0, 1.0, 0.0
  • Conditions: alpha > 0, beta > 0

Non-parametric Distributions

‘Histogram’

  • Continuity: continuous
  • Description: Histogram approximates the distribution of the sample provided. The bandwidth is the AMISE-optimal one.

‘KernelSmoothing’

  • Continuity: continuous
  • Description: Kernel smoothing with Gaussian kernel approximates the distribution of the sample provided. The bandwidth is chosen automatically. The automatic bandwidth selection method depends on the size of the sample provided.

Ports

The Distribution block has the following ports:

  • sample - input of type RealMatrix (see Sample-Based Variables for details), the sample of uncertain parameters, the sample matrix sizes must be equal to number of examples times the number of variables (see also Variables).
  • report - output of type StringScalar (see section Results for details), a text report, describing probabilistic model construction.
  • report_data - output of type Dict (see section Results for details), graphical data related to probabilistic model construction.
  • variable_name_pdf and variable_name_cdf - outputs of type RealMatrix, that allow to visualize PDF/CDF of corresponding variables distributions.

Options

‘Copula’

  • Description: allow the user to specify the copula that defines the type of dependencies between variables in probabilistic model. Options ‘Spearman’ and ‘Kendall’ mean Normal copula based on Spearman and Kendall correlations correspondingly.
  • Parameters: ‘Independent’, ‘Spearman’, ‘Kendall’
  • Default: ‘Independent’

‘Correlation Matrix’

  • Description: allows the user to specify the correlation matrix for the copula.
  • Parameters: symmetric, values in range [-1, 1]
  • Default: identity matrix