Ticc-timeseries Analyze

Source: Internet
Author: User
Tags git clone
TICC

Toeplitz Inverse covariance-based Clustering (TICC).
TICC is a Python solver for efficient segmentation and clustering of multivariate time series.
The input of the TICC is the T*n data matrix, the regularization parameter "lambda" and the Smoothness parameter "β" as input, the window size "W" and the cluster number "K".
TICC splits the timestamp into each fragment.
It is implemented by running the EM algorithm, in which TICC uses DP algorithm to assign clustering alternately and update the clustering parameters by solving the problem of Toeplitz inverse covariance estimation. Details can be found in the paper. Download & Setup

Download the source code, by running in the terminal:

            git clone https://github.com/davidhallac/TICC.git
Files

The TICC package has the following important files: ticc.py
Run an instance of the TICC algorithm.

Parameters

Lambda_parameter:lambda the regularization parameters, as described in this article

The Beta:beta parameter controls the smoothness of the output, as described in this article

Number_of_cluster: Number of cluster "K" aggregated by timestamp

Window_size: The size of the sliding window

Prefix_string: The location of the output file

Threhsold: Used to generate cross time diagrams. Not used for TICC algorithms

The location of the Input_file:t*n data file.

Maximum iteration number of MAXITERS:TICC algorithm

return

Save a. csv file for each cluster inverse variance
Save the. csv file with an assignment list with each timestamp to the ' K ' cluster
If you specify the correct method for calculating the obfuscation matrix, print the binary precision

    car.py

Run an instance of the TICC algorithm in the car sample (case study), as described in this article. The parameters are the same as the TICC example. Note: This file is particularly useful for car examples, with special input data processing for automotive data sets. For instances that run the TICC algorithm, use either ticc.py or ticc_solver.py.

    network_accuracy.py

Runs a instance of TICC algorithm on the T-BY-N data matrix as described in the paper. Used for generating the network accuracy table as shown in the paper. The parameters are the same as the TICC example.
Run an instance of the TICC algorithm on the t-by-n data matrix, as described in this article.
Used to generate a network precision table, as shown in this article. The parameters are the same as the TICC example.
return:
Save a. csv file for each of the cluster anti-covariance
Save the. csv file with an assignment list with each timestamp to the ' K ' cluster
Print the network F1 score for each cluster, assuming the "real" network is stored in the way specified in the file.

    generate_synthetic_data.py

Use the methods described in this article to generate data.
Data is generated from a "K" cluster. The ' T ' time tag is divided into segments, and the length of the corresponding cluster is mentioned in the ' Break_points ' array and the ' Seg_ids ' list respectively. Therefore the length of the segment ' I ' = break_points [i + 1]-break_points [i].
Parameters
Window_size: The size of the sliding window
Number_of_sensors: Output t-by-n the dimension ' n ' of the data matrix.
Sparsity_inv_matrix: The sparse nature of each cluster's MRF. The sparsity of the inverse covariance matrix for each cluster.
Rand_seed: Random seeds used to generate random numbers
Number_of_cluster: Number of cluster "K" Generating timestamp
Cluster_ids: generates the corresponding cluster ID for the segment.
Break_points: End of paragraph. So the length of the segment ' I ' = break_points [i + 1]-break_points [i]
Save_inverse_covariances:boolean. Indicates whether the calculated inverse covariance for each cluster should be saved as "inverse covariance cluster = cluster #.csv"
Out_file_name: The filename of the. csv data matrix should be stored.
return
Use data matrix T-by-n to save. csv files
If the save_inverse_covariances flag is true, the. csv file for each inverse covariance of each cluster is saved.

    scalability_test.py

Run an instance of the extensibility test. The time required to print each step: E-Step (DP algorithm) and M-Step (optimized using Toeplitz graphical lasso).

Parameters
Number_of_cluster: Number of cluster "K" aggregated by timestamp
Window_size: The size of the sliding window
The location of the Input_file:t-by-n data file.
Maximum iteration number of MAXITERS:TICC algorithm
Output
Prints out the time spent in each step of the TICC algorithm. This function is used to generate an extensibility diagram in this article.

    ticc_solver.py

The solver of the TICC algorithm. Contains all the important features. The solution function in the file can run an instance of the TICC algorithm. The details of the resolution function are as follows:

Parameters
Window_size: The size of the sliding window
The maximum number of iterations before the MAXITERS:TICC algorithm is coordinated. The default value is 100.
Lambda_parameter: The sparsity of each cluster's MRF. The sparsity of the inverse covariance matrix for each cluster.
BETA:TICC algorithm used in the switching penalty. The same as the beta parameters described in this article.
Number_of_clusters: Number of cluster "K" Generating timestamp
Threshold: A threshold parameter for visualization. is not part of the TICC algorithm.
Input_file: The position of the data matrix with a size of t-by-n.
Prefix_string: The location of the folder where you want to save the output.
Write_out_file:boolean. Indicates whether the calculated inverse covariance for each cluster should be saved as "inverse covariance cluster = cluster #.csv"
return
Returns a set of cluster allocations at each point in time.
Returns a dictionary of keys with cluster_id (from 0 to k-1), and these values are cluster MRF. using the sample

Generate Data. If you already have a data matrix, skip this step.

To generate the data mentioned in this article, use generate_synthetic_data.py.

Change the parameters of break_points and seg_ids to define the time pattern for the time series to be generated.

Use Sparsity_inv_matrix to define the sparsity of each cluster's MRF. Window_size,number_of_sensors can also be set appropriately according to your application. Then run the following command:

Python generate_synthetic_data.py

Next, use the ticc.py file to run an instance of the TICC algorithm on the data matrix.

The ticc.py method should be initialized with the following parameters: Smoothness parameter ' beta ', sparse regularization ' lambda ', window size, maximum number of iterations before convergence, clustering number, input and output file location. After updating in the ticc.py file, run the following command:

Python ticc.py

To generate a network precision diagram, use the network.py file. Add the same parameters as above in the network_accuracy.py file and save the true inverse covariance as "inverse covariance cluster = ' in the same directory as the network_accuracy.py file" cluster# '. csv '.

Next Run:

Python network_accuracy.py

For running extensibility experiments, use the scalability_test.py file. Set the parameters in the file to be the same as the ticc.py file and run the following command:

Python scalability_test.py

For the use of the solver, for your data, the usage is shown below. Enter the parameters, as described in this article. Use cluster_assignments and CLUSTER_MRF command output, depending on the needs of your application.

Import Ticc_solver as TICC
(cluster_assignment, cluster_mrfs) = ticc.solve (window_size = 10,number_of_clusters = 5, L Ambda_parameter = 11e-2, beta = maxiters, threshold = 2e-5, Write_out_file = False, input_file = "Data.csv", PR efix_string = "output_folder/"):
References

TICC paper:http://stanford.edu/~hallac/ticc.pdf
Code and solver are available at http://snap.stanford.edu/ticc/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.