Utils Functions

combo.utils.data module

Utility functions for manipulating data

combo.utils.data.evaluate_print(clf_name, y, y_pred)[source]

Utility function for evaluating and printing the results for examples. Default metrics include accuracy, roc, and F1 score

Parameters
  • clf_name (str) – The name of the estimator.

  • y (list or numpy array of shape (n_samples,)) – The ground truth.

  • y_pred (list or numpy array of shape (n_samples,)) – The raw scores as returned by a fitted model.

combo.utils.utility module

A set of utility functions to support model combination.

combo.utils.utility.argmaxn(value_list, n, order='desc')[source]

Return the index of top n elements in the list if order is set to ‘desc’, otherwise return the index of n smallest ones.

Parameters
  • value_list (list, array, numpy array of shape (n_samples,)) – A list containing all values.

  • n (int) – The number of elements to select.

  • order (str, optional (default='desc')) –

    The order to sort {‘desc’, ‘asc’}:

    • ’desc’: descending

    • ’asc’: ascending

Returns

index_list – The index of the top n elements.

Return type

numpy array of shape (n,)

combo.utils.utility.check_detector(detector)[source]

Checks if fit and decision_function methods exist for given detector

Parameters

detector (combo.models) – Detector instance for which the check is performed.

combo.utils.utility.generate_bagging_indices(random_state, bootstrap_features, n_features, min_features, max_features)[source]

Randomly draw feature indices. Internal use only.

Modified from sklearn/ensemble/bagging.py

Parameters
  • random_state (RandomState) – A random number generator instance to define the state of the random permutations generator.

  • bootstrap_features (bool) – Specifies whether to bootstrap indice generation

  • n_features (int) – Specifies the population size when generating indices

  • min_features (int) – Lower limit for number of features to randomly sample

  • max_features (int) – Upper limit for number of features to randomly sample

Returns

feature_indices – Indices for features to bag

Return type

numpy array, shape (n_samples,)

combo.utils.utility.generate_indices(random_state, bootstrap, n_population, n_samples)[source]

Draw randomly sampled indices. Internal use only.

See sklearn/ensemble/bagging.py

Parameters
  • random_state (RandomState) – A random number generator instance to define the state of the random permutations generator.

  • bootstrap (bool) – Specifies whether to bootstrap indice generation

  • n_population (int) – Specifies the population size when generating indices

  • n_samples (int) – Specifies number of samples to draw

Returns

indices – randomly drawn indices

Return type

numpy array, shape (n_samples,)

combo.utils.utility.get_label_n(y, y_pred, n=None)[source]

Function to turn raw outlier scores into binary labels by assign 1 to top n outlier scores.

Parameters
  • y (list or numpy array of shape (n_samples,)) – The ground truth. Binary (0: inliers, 1: outliers).

  • y_pred (list or numpy array of shape (n_samples,)) – The raw outlier scores as returned by a fitted model.

  • n (int, optional (default=None)) – The number of outliers. if not defined, infer using ground truth.

Returns

labels – binary labels 0: normal points and 1: outliers

Return type

numpy array of shape (n_samples,)

Examples

>>> from combo.utils.utility import get_label_n
>>> y = [0, 1, 1, 0, 0]
>>> y_pred = [0.1, 0.5, 0.3, 0.2, 0.7]
>>> get_label_n(y, y_pred)
array([0, 1, 0, 0, 1])
combo.utils.utility.invert_order(scores, method='multiplication')[source]

Invert the order of a list of values. The smallest value becomes the largest in the inverted list. This is useful while combining multiple detectors since their score order could be different.

Parameters
  • scores (list, array or numpy array with shape (n_samples,)) – The list of values to be inverted

  • method (str, optional (default='multiplication')) –

    Methods used for order inversion. Valid methods are:

    • ’multiplication’: multiply by -1

    • ’subtraction’: max(scores) - scores

Returns

inverted_scores – The inverted list

Return type

numpy array of shape (n_samples,)

Examples

>>> scores1 = [0.1, 0.3, 0.5, 0.7, 0.2, 0.1]
>>> invert_order(scores1)
array([-0.1, -0.3, -0.5, -0.7, -0.2, -0.1])
>>> invert_order(scores1, method='subtraction')
array([0.6, 0.4, 0.2, 0. , 0.5, 0.6])
combo.utils.utility.list_diff(first_list, second_list)[source]

Utility function to calculate list difference (first_list-second_list)

Parameters
  • first_list (list) – First list.

  • second_list (list) – Second list.

Returns

diff

Return type

different elements.

combo.utils.utility.precision_n_scores(y, y_pred, n=None)[source]

Utility function to calculate precision @ rank n.

Parameters
  • y (list or numpy array of shape (n_samples,)) – The ground truth. Binary (0: inliers, 1: outliers).

  • y_pred (list or numpy array of shape (n_samples,)) – The raw outlier scores as returned by a fitted model.

  • n (int, optional (default=None)) – The number of outliers. if not defined, infer using ground truth.

Returns

precision_at_rank_n – Precision at rank n score.

Return type

float

combo.utils.utility.score_to_label(pred_scores, outliers_fraction=0.1)[source]

Turn raw outlier outlier scores to binary labels (0 or 1).

Parameters
  • pred_scores (list or numpy array of shape (n_samples,)) – Raw outlier scores. Outliers are assumed have larger values.

  • outliers_fraction (float in (0,1)) – Percentage of outliers.

Returns

outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

combo.utils.utility.score_to_proba(scores)[source]

Internal function to random score matrix into probability.

Parameters

scores (numpy array of shape (n_samples, n_classes)) – Raw score matrix.

Returns

proba – Scaled probability matrix.

Return type

numpy array of shape (n_samples, n_classes)

combo.utils.utility.standardizer(X, X_t=None, keep_scalar=False)[source]

Conduct Z-normalization on data to turn input samples become zero-mean and unit variance.

Parameters
  • X (numpy array of shape (n_samples, n_features)) – The training samples

  • X_t (numpy array of shape (n_samples_new, n_features), optional (default=None)) – The data to be converted

  • keep_scalar (bool, optional (default=False)) – The flag to indicate whether to return the scalar

Returns

  • X_norm (numpy array of shape (n_samples, n_features)) – X after the Z-score normalization

  • X_t_norm (numpy array of shape (n_samples, n_features)) – X_t after the Z-score normalization

  • scalar (sklearn scalar object) – The scalar used in conversion

Module contents