trendminer_experimental.anomaly_detection.model module¶
This module contains functions for fitting, predicting, scoring and deploying Self Organising Map based models.
-
class
trendminer_experimental.anomaly_detection.model.
TMAnomalyDetectionModelParameters
(mean_of_train_data, sd_of_train_data, anomaly_detection_container, filtered_codebook, normalized_train_data)¶ -
Bases:
tuple
The set of parameters defining a TMAnomalyModel.
-
anomaly_detection_container
¶ -
Contains: BMU coordinates, Point to BMU distances, Anomalous point set (indexes), BMU index container
-
filtered_codebook
¶ -
Codebook (ie coordinates of BMU’s) in input space, after filtering out of BMU’s with zero hits from training data
-
mean_of_train_data
¶ -
The mean of each variable in the training data, to normalize test data
-
normalized_train_data
¶ Training data after normalizing
-
sd_of_train_data
¶ -
Standard deviation of each variable in the training data, to normalize test data
-
-
class
trendminer_experimental.anomaly_detection.model.
TMAnomalyModel
¶ -
Bases:
object
A model for detecting anomalies in (multi-variate) time series data, based on a Self Organising Map.
-
fit
(train_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], number_of_iterations: int, verbose: bool = True, random_seed: Optional[int] = None) → Tuple[List[float], List[float]]¶ -
Perform fitting of the SOM, given a dataframe of normal data, and number of iterations
Example:
import matplotlib.pyplot as plt %matplotlib inline from trendminer.trendminer_client import TrendMinerClient from trendminer.views import Views from trendminer_experimental.anomaly_detection.model import TMAnomalyModel from trendminer.ml.models import ZementisModels client = TrendMinerClient('<token>', '<TrendMiner url>') views = Views(client) models = ZementisModels(client) data_frames = views.load_view('<view_id'>) model = TMAnomalyModel() quantization_error, topological_error = model.fit(data_frames[0], 200) plt.plot(quantization_error) plt.show() plt.plot(topological_error) plt.show() pmml = model.to_pmml('Anomalous Model', 0.95) model_id = models.deploy_model(pmml) model_id
- Parameters
-
-
train_data – A Pandas DataFrame containing the training data, or a list of Pandas DataFrames (e.g., when loading a view with the TrendMiner Data Science SDK). Do note that if a list is passed, all data frames should have the same set of columns, because the data frames are concatenated into one big data frame.
-
number_of_iterations – Number of iterations in the training loop to fit the model to the provided data.
-
verbose – Set to False to suppress output of the training algorithm.
-
random_seed – Set a random seed to make the training deterministic.
-
- Returns
-
Two measures of fit. Ordered list of quantization errors (the lower this value, the better the model fits). Ordered list of topological errors. The topological error measures how twisted the net is.
-
get_params
() → trendminer_experimental.anomaly_detection.model.TMAnomalyDetectionModelParameters¶ -
Returns the parameters of the Anomaly Model.
- Returns
-
The set of parameters of the model, see TMAnomalyDetectionModelParameters.
-
predict
(test_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]], threshold_percentage: float) → pandas.core.frame.DataFrame¶ -
Perform predictions of anomaly on previously trained SOM, given test data and percentage threshold (type-1 error)
- Parameters
-
-
test_data – A (list of) dataframe(s) with the same variables as the training data.
-
threshold_percentage – Percentage threshold to classify as anomalous [0, 1].
-
- Returns
-
The test data frame with an additional column classifying each point as normal or anomalous.
-
score_samples
(test_data: Union[pandas.core.frame.DataFrame, List[pandas.core.frame.DataFrame]]) → pandas.core.frame.DataFrame¶ -
Perform scoring of anomaly on previously trained SOM, given test data and percentage threshold
- Parameters
-
test_data – A (list of) dataframe(s) with the same variables as the training data.
- Returns
-
Test data frame with the score for each row as an additional column.
-
to_pmml
(model_name: str, threshold_percentage: float, variable_names: Optional[List[str]] = None) → str¶ -
Converts this model to PMML representation.
- Parameters
-
-
model_name – Name of the model
-
threshold_percentage – Threshold for classifying as anomalous [0, 1]
-
variable_names – Optional list of variable names to use in the PMML representation of the model.
-
- Returns
-
PMML representation of this model
- Raises
-
-
WrongNumberOfVariableNamesException – if provided list of variable names does not align with number of variables in the model.
-
DuplicateVariableNamesException – If the list of variable names contains duplicates.
-
-