trendminer_experimental.anomaly_detection.model module¶

This module contains functions for fitting, predicting, scoring and deploying Self Organising Map based models.

class trendminer_experimental.anomaly_detection.model.TMAnomalyDetectionModelParameters(mean_of_train_data, sd_of_train_data, anomaly_detection_container, filtered_codebook, normalized_train_data)¶

Bases: tuple

The set of parameters defining a TMAnomalyModel.

anomaly_detection_container¶: Contains: BMU coordinates, Point to BMU distances, Anomalous point set (indexes), BMU index container

filtered_codebook¶: Codebook (ie coordinates of BMU’s) in input space, after filtering out of BMU’s with zero hits from training data

mean_of_train_data¶: The mean of each variable in the training data, to normalize test data

normalized_train_data¶: Training data after normalizing

sd_of_train_data¶: Standard deviation of each variable in the training data, to normalize test data

class trendminer_experimental.anomaly_detection.model.TMAnomalyModel¶

Bases: object

A model for detecting anomalies in (multi-variate) time series data, based on a Self Organising Map.

fit(train_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], number_of_iterations: int, verbose: bool = True, random_seed: Optional[int] = None) → Tuple[List[float], List[float]]¶

Perform fitting of the SOM, given a dataframe of normal data, and number of iterations

Example:

                              import matplotlib.pyplot as plt
%matplotlib inline

from trendminer.trendminer_client import TrendMinerClient
from trendminer.views import Views
from trendminer_experimental.anomaly_detection.model import TMAnomalyModel
from trendminer.ml.models import ZementisModels

client = TrendMinerClient('<token>', '<TrendMiner url>')
views = Views(client)
models = ZementisModels(client)

data_frames = views.load_view('<view_id'>)

model = TMAnomalyModel()
quantization_error, topological_error = model.fit(data_frames[0], 200)
plt.plot(quantization_error)
plt.show()
plt.plot(topological_error)
plt.show()

pmml = model.to_pmml('Anomalous Model', 0.95)
model_id = models.deploy_model(pmml)
model_id

                            

Parameters

train_data – A Pandas DataFrame containing the training data, or a list of Pandas DataFrames (e.g., when loading a view with the TrendMiner Data Science SDK). Do note that if a list is passed, all data frames should have the same set of columns, because the data frames are concatenated into one big data frame.
number_of_iterations – Number of iterations in the training loop to fit the model to the provided data.
verbose – Set to False to suppress output of the training algorithm.
random_seed – Set a random seed to make the training deterministic.

Returns

Two measures of fit. Ordered list of quantization errors (the lower this value, the better the model fits). Ordered list of topological errors. The topological error measures how twisted the net is.

get_params() → trendminer_experimental.anomaly_detection.model.TMAnomalyDetectionModelParameters ¶

Returns the parameters of the Anomaly Model.

Returns: The set of parameters of the model, see TMAnomalyDetectionModelParameters.

predict(test_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], threshold_percentage: float) → pandas.core.frame.DataFrame¶

Perform predictions of anomaly on previously trained SOM, given test data and percentage threshold (type-1 error)

Parameters

test_data – A (list of) dataframe(s) with the same variables as the training data.
threshold_percentage – Percentage threshold to classify as anomalous [0, 1].

Returns

The test data frame with an additional column classifying each point as normal or anomalous.

score_samples(test_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]]) → pandas.core.frame.DataFrame¶

Perform scoring of anomaly on previously trained SOM, given test data and percentage threshold

Parameters: test_data – A (list of) dataframe(s) with the same variables as the training data.
Returns: Test data frame with the score for each row as an additional column.

to_pmml(model_name: str, threshold_percentage: float, variable_names: Optional[List[str]] = None) → str¶

Converts this model to PMML representation.

Parameters

model_name – Name of the model
threshold_percentage – Threshold for classifying as anomalous [0, 1]
variable_names – Optional list of variable names to use in the PMML representation of the model.

Returns

PMML representation of this model

Raises

WrongNumberOfVariableNamesException – if provided list of variable names does not align with number of variables in the model.
DuplicateVariableNamesException – If the list of variable names contains duplicates.