Code Examples
A repository of 155 code examples for BeepBeep
mining.extraction.KmeansSymbolDistribution Class Reference

Create clusters over the distribution of symbols in a set of input streams. More...

Static Public Member Functions

static void main (String[] args) throws FunctionException
 

Detailed Description

Create clusters over the distribution of symbols in a set of input streams.

In this example, input traces are made of symbols a and b. A set of seven such traces is located in the file strings-1.csv. For each of these traces, the pattern processor β computes a feature vector made of two numbers, corresponding to the fraction of a's and b's in the trace. This is done by

  • slicing the input trace according to the current symbol
  • counting the events in each slice
  • taking the values of the resulting map into a list, and normalizing them (so that their sum is equal to 1)

For example, on the input sequence

a, b, a, a, b, b

the resulting feature vector would be (0.4, 0.6).

We then use the K-means clustering algorithm to find the centroids of two clusters based on those feature vectors.

The processor mining function is therefore parameterized as follows:

ParameterValue
Processor graph
Processor graph
Processor graph
Processor graph
(K-means with K=2)

The traces in the input CSV file either have an approximate 30%-70% distribution of a's and b's, or the reverse. The feature vectors can be plotted as follows, with each dot representing the a-b distribution of a single trace.

Clustering graph

Applying the k-means algorithm, with k=2, will compute two cluster centers, represented by crosses in the above plot.

Author
Sylvain Hallé

Definition at line 96 of file KmeansSymbolDistribution.java.


The documentation for this class was generated from the following file: