Utils Functions¶
combo.utils.data module¶
Utility functions for manipulating data
combo.utils.utility module¶
A set of utility functions to support model combination.
-
combo.utils.utility.
argmaxn
(value_list, n, order='desc')[source]¶ Return the index of top n elements in the list if order is set to ‘desc’, otherwise return the index of n smallest ones.
- Parameters
- Returns
index_list – The index of the top n elements.
- Return type
numpy array of shape (n,)
-
combo.utils.utility.
check_detector
(detector)[source]¶ Checks if fit and decision_function methods exist for given detector
- Parameters
detector (combo.models) – Detector instance for which the check is performed.
-
combo.utils.utility.
generate_bagging_indices
(random_state, bootstrap_features, n_features, min_features, max_features)[source]¶ Randomly draw feature indices. Internal use only.
Modified from sklearn/ensemble/bagging.py
- Parameters
random_state (RandomState) – A random number generator instance to define the state of the random permutations generator.
bootstrap_features (bool) – Specifies whether to bootstrap indice generation
n_features (int) – Specifies the population size when generating indices
min_features (int) – Lower limit for number of features to randomly sample
max_features (int) – Upper limit for number of features to randomly sample
- Returns
feature_indices – Indices for features to bag
- Return type
numpy array, shape (n_samples,)
-
combo.utils.utility.
generate_indices
(random_state, bootstrap, n_population, n_samples)[source]¶ Draw randomly sampled indices. Internal use only.
See sklearn/ensemble/bagging.py
- Parameters
random_state (RandomState) – A random number generator instance to define the state of the random permutations generator.
bootstrap (bool) – Specifies whether to bootstrap indice generation
n_population (int) – Specifies the population size when generating indices
n_samples (int) – Specifies number of samples to draw
- Returns
indices – randomly drawn indices
- Return type
numpy array, shape (n_samples,)
-
combo.utils.utility.
get_label_n
(y, y_pred, n=None)[source]¶ Function to turn raw outlier scores into binary labels by assign 1 to top n outlier scores.
- Parameters
y (list or numpy array of shape (n_samples,)) – The ground truth. Binary (0: inliers, 1: outliers).
y_pred (list or numpy array of shape (n_samples,)) – The raw outlier scores as returned by a fitted model.
n (int, optional (default=None)) – The number of outliers. if not defined, infer using ground truth.
- Returns
labels – binary labels 0: normal points and 1: outliers
- Return type
numpy array of shape (n_samples,)
Examples
>>> from combo.utils.utility import get_label_n >>> y = [0, 1, 1, 0, 0] >>> y_pred = [0.1, 0.5, 0.3, 0.2, 0.7] >>> get_label_n(y, y_pred) array([0, 1, 0, 0, 1])
-
combo.utils.utility.
invert_order
(scores, method='multiplication')[source]¶ Invert the order of a list of values. The smallest value becomes the largest in the inverted list. This is useful while combining multiple detectors since their score order could be different.
- Parameters
- Returns
inverted_scores – The inverted list
- Return type
numpy array of shape (n_samples,)
Examples
>>> scores1 = [0.1, 0.3, 0.5, 0.7, 0.2, 0.1] >>> invert_order(scores1) array([-0.1, -0.3, -0.5, -0.7, -0.2, -0.1]) >>> invert_order(scores1, method='subtraction') array([0.6, 0.4, 0.2, 0. , 0.5, 0.6])
-
combo.utils.utility.
list_diff
(first_list, second_list)[source]¶ Utility function to calculate list difference (first_list-second_list)
-
combo.utils.utility.
precision_n_scores
(y, y_pred, n=None)[source]¶ Utility function to calculate precision @ rank n.
- Parameters
y (list or numpy array of shape (n_samples,)) – The ground truth. Binary (0: inliers, 1: outliers).
y_pred (list or numpy array of shape (n_samples,)) – The raw outlier scores as returned by a fitted model.
n (int, optional (default=None)) – The number of outliers. if not defined, infer using ground truth.
- Returns
precision_at_rank_n – Precision at rank n score.
- Return type
-
combo.utils.utility.
score_to_label
(pred_scores, outliers_fraction=0.1)[source]¶ Turn raw outlier outlier scores to binary labels (0 or 1).
- Parameters
pred_scores (list or numpy array of shape (n_samples,)) – Raw outlier scores. Outliers are assumed have larger values.
outliers_fraction (float in (0,1)) – Percentage of outliers.
- Returns
outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1].
- Return type
numpy array of shape (n_samples,)
-
combo.utils.utility.
score_to_proba
(scores)[source]¶ Internal function to random score matrix into probability.
- Parameters
scores (numpy array of shape (n_samples, n_classes)) – Raw score matrix.
- Returns
proba – Scaled probability matrix.
- Return type
numpy array of shape (n_samples, n_classes)
-
combo.utils.utility.
standardizer
(X, X_t=None, keep_scalar=False)[source]¶ Conduct Z-normalization on data to turn input samples become zero-mean and unit variance.
- Parameters
X (numpy array of shape (n_samples, n_features)) – The training samples
X_t (numpy array of shape (n_samples_new, n_features), optional (default=None)) – The data to be converted
keep_scalar (bool, optional (default=False)) – The flag to indicate whether to return the scalar
- Returns
X_norm (numpy array of shape (n_samples, n_features)) – X after the Z-score normalization
X_t_norm (numpy array of shape (n_samples, n_features)) – X_t after the Z-score normalization
scalar (sklearn scalar object) – The scalar used in conversion