Utils Functions¶

combo.utils.data module¶

Utility functions for manipulating data

combo.utils.data.evaluate_print(clf_name, y, y_pred)[source]¶

Utility function for evaluating and printing the results for examples. Default metrics include accuracy, roc, and F1 score

Parameters

clf_name (str) – The name of the estimator.
y (list or numpy array of shape (n_samples,)) – The ground truth.
y_pred (list or numpy array of shape (n_samples,)) – The raw scores as returned by a fitted model.

combo.utils.utility module¶

A set of utility functions to support model combination.

combo.utils.utility.argmaxn(value_list, n, order='desc')[source]¶

Return the index of top n elements in the list if order is set to ‘desc’, otherwise return the index of n smallest ones.

Parameters

value_list (list, array, numpy array of shape (n_samples,)) – A list containing all values.
n (int) – The number of elements to select.
order (str, optional (default='desc')) –
The order to sort {‘desc’, ‘asc’}:
- ’desc’: descending
- ’asc’: ascending

Returns

index_list – The index of the top n elements.

Return type

numpy array of shape (n,)

combo.utils.utility.check_detector(detector)[source]¶

Checks if fit and decision_function methods exist for given detector

Parameters: detector (combo.models) – Detector instance for which the check is performed.

combo.utils.utility.generate_bagging_indices(random_state, bootstrap_features, n_features, min_features, max_features)[source]¶

Randomly draw feature indices. Internal use only.

Modified from sklearn/ensemble/bagging.py

Parameters

random_state (RandomState) – A random number generator instance to define the state of the random permutations generator.
bootstrap_features (bool) – Specifies whether to bootstrap indice generation
n_features (int) – Specifies the population size when generating indices
min_features (int) – Lower limit for number of features to randomly sample
max_features (int) – Upper limit for number of features to randomly sample

Returns

feature_indices – Indices for features to bag

Return type

numpy array, shape (n_samples,)

combo.utils.utility.generate_indices(random_state, bootstrap, n_population, n_samples)[source]¶

Draw randomly sampled indices. Internal use only.

See sklearn/ensemble/bagging.py

Parameters

random_state (RandomState) – A random number generator instance to define the state of the random permutations generator.
bootstrap (bool) – Specifies whether to bootstrap indice generation
n_population (int) – Specifies the population size when generating indices
n_samples (int) – Specifies number of samples to draw

Returns

indices – randomly drawn indices

Return type

numpy array, shape (n_samples,)

combo.utils.utility.get_label_n(y, y_pred, n=None)[source]¶

Function to turn raw outlier scores into binary labels by assign 1 to top n outlier scores.

Parameters

y (list or numpy array of shape (n_samples,)) – The ground truth. Binary (0: inliers, 1: outliers).
y_pred (list or numpy array of shape (n_samples,)) – The raw outlier scores as returned by a fitted model.
n (int, optional (default=None)) – The number of outliers. if not defined, infer using ground truth.

Returns

labels – binary labels 0: normal points and 1: outliers

Return type

numpy array of shape (n_samples,)

Examples

>>> from combo.utils.utility import get_label_n
>>> y = [0, 1, 1, 0, 0]
>>> y_pred = [0.1, 0.5, 0.3, 0.2, 0.7]
>>> get_label_n(y, y_pred)
array([0, 1, 0, 0, 1])

combo.utils.utility.invert_order(scores, method='multiplication')[source]¶

Invert the order of a list of values. The smallest value becomes the largest in the inverted list. This is useful while combining multiple detectors since their score order could be different.

Parameters

scores (list, array or numpy array with shape (n_samples,)) – The list of values to be inverted
method (str, optional (default='multiplication')) –
Methods used for order inversion. Valid methods are:
- ’multiplication’: multiply by -1
- ’subtraction’: max(scores) - scores

Returns

inverted_scores – The inverted list

Return type

numpy array of shape (n_samples,)

Examples

>>> scores1 = [0.1, 0.3, 0.5, 0.7, 0.2, 0.1]
>>> invert_order(scores1)
array([-0.1, -0.3, -0.5, -0.7, -0.2, -0.1])
>>> invert_order(scores1, method='subtraction')
array([0.6, 0.4, 0.2, 0. , 0.5, 0.6])

combo.utils.utility.list_diff(first_list, second_list)[source]¶

Utility function to calculate list difference (first_list-second_list)

Parameters

first_list (list) – First list.
second_list (list) – Second list.

Returns

diff

Return type

different elements.

combo.utils.utility.precision_n_scores(y, y_pred, n=None)[source]¶

Utility function to calculate precision @ rank n.

Parameters

y (list or numpy array of shape (n_samples,)) – The ground truth. Binary (0: inliers, 1: outliers).
y_pred (list or numpy array of shape (n_samples,)) – The raw outlier scores as returned by a fitted model.
n (int, optional (default=None)) – The number of outliers. if not defined, infer using ground truth.

Returns

precision_at_rank_n – Precision at rank n score.

Return type

float

combo.utils.utility.score_to_label(pred_scores, outliers_fraction=0.1)[source]¶

Turn raw outlier outlier scores to binary labels (0 or 1).

Parameters

pred_scores (list or numpy array of shape (n_samples,)) – Raw outlier scores. Outliers are assumed have larger values.
outliers_fraction (float in (0,1)) – Percentage of outliers.

Returns

outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

combo.utils.utility.score_to_proba(scores)[source]¶

Internal function to random score matrix into probability.

Parameters: scores (numpy array of shape (n_samples, n_classes)) – Raw score matrix.
Returns: proba – Scaled probability matrix.
Return type: numpy array of shape (n_samples, n_classes)

combo.utils.utility.standardizer(X, X_t=None, keep_scalar=False)[source]¶

Conduct Z-normalization on data to turn input samples become zero-mean and unit variance.

Parameters

X (numpy array of shape (n_samples, n_features)) – The training samples
X_t (numpy array of shape (n_samples_new, n_features), optional (default=None)) – The data to be converted
keep_scalar (bool, optional (default=False)) – The flag to indicate whether to return the scalar

Returns

X_norm (numpy array of shape (n_samples, n_features)) – X after the Z-score normalization
X_t_norm (numpy array of shape (n_samples, n_features)) – X_t after the Z-score normalization
scalar (sklearn scalar object) – The scalar used in conversion

Utils Functions¶

combo.utils.data module¶

combo.utils.utility module¶

Module contents¶