11.13. da.p7core.stat

Statistical utilities module.

New in version 1.10.0.

>>> from da.p7core import stat

Classes

da.p7core.stat.Analyzer() Implements various statistical analysis methods.
da.p7core.stat.DistributionCheckResult(tests) Distribution tests result.
da.p7core.stat.ElementaryStatistics(statistics) Elementary statistics computation results.
da.p7core.stat.OutlierDetectionResult(…) Outlier detection result.

11.13.1. Analyzer — analysis interface

class da.p7core.stat.Analyzer

Implements various statistical analysis methods.

calculate_statistics(sample, confidence=0.95, covariance='auto', rank_components=[])

Calculates elementary statistics.

Parameters:
  • sample (array-like) – data sample
  • confidence (float) – confidence level for the lower and upper quantiles [0.5, 1]
  • covariance ('empirical'|'robust'|'auto') – type of covariance calculation for non-rank variables
  • rank_components (list[int]) – indices of rank components. For rank variables calculate_statistics() computes the Kendall rank corrleation coefficient, ignoring the calculation type set by covariance.
Returns:

sample statistics

Return type:

ElementaryStatistics

Calculates various statistics for the given data sample(s). See the Elementary Statistics section for details.

check_distribution(sample, tests='all', confidence=0.99, budget=1000000000)

Check sample points on uniformity and normality.

Parameters:
  • sample (array-like) – data sample
  • tests ('all' or 'uniform|'normal_skewness'|'normal_kurtosis' or list of those) – tests to be performed
  • confidence (float) – confidence level for check of distribution. Higher confidence means more strict limitations for the sample to be considered as generated by the specific probability distribution.
  • budget (int) – maximum number points to be processed
Returns:

boolean results of tests

Return type:

DistributionCheckResult

Checks data sample for specific types of distribution. See the Distribution Tests section for details.

detect_outliers(sample, covariance='auto', score_type='auto', confidence=0.95)

Finds outliers in data

Parameters:
  • sample (array-like) – data sample
  • covariance ('empirical'|'robust'|'auto') – type of covariance calculation
  • score_type ('probability'|'distance'|'auto') – determines which type of scores should be computed
  • confidence (float) – real value from [0, 1] which means fraction of objects in the sample which will be considered as inliers (opposite to outliers)
Returns:

scores for objects to be an outlier and corresponding decisions

Return type:

OutlierDetectionResult

Detects outliers in the given data sample. See section Outlier Detection for details.

set_logger(logger)

Set logger.

Parameters:logger – logger object
Returns:none
set_watcher(watcher)

Set watcher.

Parameters:watcher – watcher object
Returns:none

11.13.2. DistributionCheckResult — sample distribution

class da.p7core.stat.DistributionCheckResult(tests)

Distribution tests result.

A DistributionCheckResult object is only returned by the check_distribution() function and must not be instantiated by user.

uniform

Boolean result of checking sample on uniform distribution. If test was not performed attribute value will be set to None.

normal_kurtosis

Boolean result of checking sample on normal distribution via kurtosis test. If test was not performed attribute value will be set to None.

normal_skewness

Boolean result of checking sample on normal distribution via skewness test. If test was not performed attribute value will be set to None.

11.13.3. ElementaryStatistics — sample statistics

class da.p7core.stat.ElementaryStatistics(statistics)

Elementary statistics computation results.

A ElementaryStatistics object is only returned by the calculate_statistics() function and must not be instantiated by user.

min

A list of minimal values for each dimension of the data sample.

max

A list of maximal values for each dimension of the data sample.

mean

A list of mean values for each dimension of the data sample.

median

A list of median values for each dimension of the data sample.

range

New in version 1.10.2.

A list of value ranges for each dimension of the data sample.

quantile_lower

A list of lower quantiles for the specified confidence level for each dimension of the data sample.

quantile_upper

A list of upper quantiles for the specified confidence level for each dimension of the data sample.

std

A list of standard deviations for each dimension of the data sample.

correlation

A matrix of correlation coefficients between different dimensions in the data sample.

11.13.4. OutlierDetectionResult — outliers

class da.p7core.stat.OutlierDetectionResult(scores, outliers)

Outlier detection result.

A OutlierDetectionResult object is only returned by the detect_outliers() function and must not be instantiated by user.

scores

A list containing scores in range [0, 1]. Each value can be interpreted as probability of corresponding object to be an outlier.

outliers

A list containing boolean values indicating that corresponding objects are outliers or not.