trendminer_experimental.anomaly_detection.model module

This module contains functions for fitting, predicting, scoring and deploying Self Organising Map based models.

class trendminer_experimental.anomaly_detection.model.TMAnomalyDetectionModelParameters(mean_of_train_data, sd_of_train_data, anomaly_detection_container, filtered_codebook, normalized_train_data)

Bases: tuple

The set of parameters defining a TMAnomalyModel.

anomaly_detection_container

Contains: BMU coordinates, Point to BMU distances, Anomalous point set (indexes), BMU index container

filtered_codebook

Codebook (ie coordinates of BMU’s) in input space, after filtering out of BMU’s with zero hits from training data

mean_of_train_data

The mean of each variable in the training data, to normalize test data

normalized_train_data

Training data after normalizing

sd_of_train_data

Standard deviation of each variable in the training data, to normalize test data

class trendminer_experimental.anomaly_detection.model.TMAnomalyModel

Bases: object

A model for detecting anomalies in (multi-variate) time series data, based on a Self Organising Map.

fit(train_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], number_of_iterations: int, verbose: bool = True, random_seed: Optional[int] = None) → Tuple[List[float], List[float]]

Perform fitting of the SOM, given a dataframe of normal data, and number of iterations

Example:

import matplotlib.pyplot as plt
%matplotlib inline

from trendminer.trendminer_client import TrendMinerClient
from trendminer.views import Views
from trendminer_experimental.anomaly_detection.model import TMAnomalyModel
from trendminer.ml.models import ZementisModels

client = TrendMinerClient('<token>', '<TrendMiner url>')
views = Views(client)
models = ZementisModels(client)

data_frames = views.load_view('<view_id'>)

model = TMAnomalyModel()
quantization_error, topological_error = model.fit(data_frames[0], 200)
plt.plot(quantization_error)
plt.show()
plt.plot(topological_error)
plt.show()

pmml = model.to_pmml('Anomalous Model', 0.95)
model_id = models.deploy_model(pmml)
model_id
Parameters
  • train_data – A Pandas DataFrame containing the training data, or a list of Pandas DataFrames (e.g., when loading a view with the TrendMiner Data Science SDK). Do note that if a list is passed, all data frames should have the same set of columns, because the data frames are concatenated into one big data frame.

  • number_of_iterations – Number of iterations in the training loop to fit the model to the provided data.

  • verbose – Set to False to suppress output of the training algorithm.

  • random_seed – Set a random seed to make the training deterministic.

Returns

Two measures of fit. Ordered list of quantization errors (the lower this value, the better the model fits). Ordered list of topological errors. The topological error measures how twisted the net is.

get_params()trendminer_experimental.anomaly_detection.model.TMAnomalyDetectionModelParameters

Returns the parameters of the Anomaly Model.

Returns

The set of parameters of the model, see TMAnomalyDetectionModelParameters.

predict(test_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], threshold_percentage: float) → pandas.core.frame.DataFrame

Perform predictions of anomaly on previously trained SOM, given test data and percentage threshold (type-1 error)

Parameters
  • test_data – A (list of) dataframe(s) with the same variables as the training data.

  • threshold_percentage – Percentage threshold to classify as anomalous [0, 1].

Returns

The test data frame with an additional column classifying each point as normal or anomalous.

score_samples(test_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]]) → pandas.core.frame.DataFrame

Perform scoring of anomaly on previously trained SOM, given test data and percentage threshold

Parameters

test_data – A (list of) dataframe(s) with the same variables as the training data.

Returns

Test data frame with the score for each row as an additional column.

to_pmml(model_name: str, threshold_percentage: float, variable_names: Optional[List[str]] = None) → str

Converts this model to PMML representation.

Parameters
  • model_name – Name of the model

  • threshold_percentage – Threshold for classifying as anomalous [0, 1]

  • variable_names – Optional list of variable names to use in the PMML representation of the model.

Returns

PMML representation of this model

Raises