11.13. da.p7core.stat
¶
Statistical utilities module.
New in version 1.10.0.
>>> from da.p7core import stat
Classes
da.p7core.stat.Analyzer () |
Implements various statistical analysis methods. |
da.p7core.stat.DistributionCheckResult (tests) |
Distribution tests result. |
da.p7core.stat.ElementaryStatistics (statistics) |
Elementary statistics computation results. |
da.p7core.stat.OutlierDetectionResult (…) |
Outlier detection result. |
11.13.1. Analyzer
— analysis interface¶
-
class
da.p7core.stat.
Analyzer
¶ Implements various statistical analysis methods.
-
calculate_statistics
(sample, confidence=0.95, covariance='auto', rank_components=[])¶ Calculates elementary statistics.
Parameters: - sample (array-like) – data sample
- confidence (
float
) – confidence level for the lower and upper quantiles [0.5, 1] - covariance (
'empirical'|'robust'|'auto'
) – type of covariance calculation for non-rank variables - rank_components (
list[int]
) – indices of rank components. For rank variables calculate_statistics() computes the Kendall rank corrleation coefficient, ignoring the calculation type set by covariance.
Returns: sample statistics
Return type: Calculates various statistics for the given data sample(s). See the Elementary Statistics section for details.
-
check_distribution
(sample, tests='all', confidence=0.99, budget=1000000000)¶ Check sample points on uniformity and normality.
Parameters: - sample (array-like) – data sample
- tests (
'all'
or'uniform|'normal_skewness'|'normal_kurtosis'
or list of those) – tests to be performed - confidence (
float
) – confidence level for check of distribution. Higher confidence means more strict limitations for the sample to be considered as generated by the specific probability distribution. - budget (
int
) – maximum number points to be processed
Returns: boolean results of tests
Return type: Checks data sample for specific types of distribution. See the Distribution Tests section for details.
-
detect_outliers
(sample, covariance='auto', score_type='auto', confidence=0.95)¶ Finds outliers in data
Parameters: - sample (
array-like
) – data sample - covariance (
'empirical'|'robust'|'auto'
) – type of covariance calculation - score_type (
'probability'|'distance'|'auto'
) – determines which type of scores should be computed - confidence (
float
) – real value from [0, 1] which means fraction of objects in the sample which will be considered as inliers (opposite to outliers)
Returns: scores for objects to be an outlier and corresponding decisions
Return type: Detects outliers in the given data sample. See section Outlier Detection for details.
- sample (
-
set_logger
(logger)¶ Set logger.
Parameters: logger – logger object Returns: none
-
set_watcher
(watcher)¶ Set watcher.
Parameters: watcher – watcher object Returns: none
-
11.13.2. DistributionCheckResult
— sample distribution¶
-
class
da.p7core.stat.
DistributionCheckResult
(tests)¶ Distribution tests result.
A
DistributionCheckResult
object is only returned by thecheck_distribution()
function and must not be instantiated by user.-
uniform
¶ Boolean result of checking sample on uniform distribution. If test was not performed attribute value will be set to
None
.
-
normal_kurtosis
¶ Boolean result of checking sample on normal distribution via kurtosis test. If test was not performed attribute value will be set to
None
.
-
normal_skewness
¶ Boolean result of checking sample on normal distribution via skewness test. If test was not performed attribute value will be set to
None
.
-
11.13.3. ElementaryStatistics
— sample statistics¶
-
class
da.p7core.stat.
ElementaryStatistics
(statistics)¶ Elementary statistics computation results.
A
ElementaryStatistics
object is only returned by thecalculate_statistics()
function and must not be instantiated by user.-
min
¶ A list of minimal values for each dimension of the data sample.
-
max
¶ A list of maximal values for each dimension of the data sample.
-
mean
¶ A list of mean values for each dimension of the data sample.
-
median
¶ A list of median values for each dimension of the data sample.
-
range
¶ New in version 1.10.2.
A list of value ranges for each dimension of the data sample.
-
quantile_lower
¶ A list of lower quantiles for the specified confidence level for each dimension of the data sample.
-
quantile_upper
¶ A list of upper quantiles for the specified confidence level for each dimension of the data sample.
-
std
¶ A list of standard deviations for each dimension of the data sample.
-
correlation
¶ A matrix of correlation coefficients between different dimensions in the data sample.
-
11.13.4. OutlierDetectionResult
— outliers¶
-
class
da.p7core.stat.
OutlierDetectionResult
(scores, outliers)¶ Outlier detection result.
A
OutlierDetectionResult
object is only returned by thedetect_outliers()
function and must not be instantiated by user.-
scores
¶ A list containing scores in range [0, 1]. Each value can be interpreted as probability of corresponding object to be an outlier.
-
outliers
¶ A list containing boolean values indicating that corresponding objects are outliers or not.
-