Time Series Clustering Time series clustering is to partition time series data into groups based on similarity or distance, so that time series I n the same cluster is similar. For time series clustering with R, the first step was to work out a appropriate distance/similarity metric, and then, at t He second step, use existing clustering techniques, such as k-means, hierarchical clustering, density-based clustering or SubSpace clustering, to find clustering structures. Dynamic time Warping (DTW) finds optimal alignment between the time series, and DTW distance is used as a distance metric In the example below.
A data set of synthetic control Chart time Series is used here, which contains-examples of control charts. Each control chart was a time series with the values. There is six classes:1) 1-100 Normal, 2) 101-200 Cyclic, 3) 201-300 increasing trend, 4) 301-400 decreasing trend, 5) 401 -500 upward shift, and 6) 501-600 downward shift. The dataset is downloadable at UCI KDD Archive. > SC <-read.table ("E:/rtmp/synthetic_control.data", Header=f, sep= "") # randomly sampled n cases from each class, to make it easy for plotting > N <-10 > S <-sample (1:100, N) > IDX <-C (S, 100+s, 200+s, 300+s, 400+s, 500+s) > Sample2 <-sc[idx,] > Observedlabels <-C (Rep (1,n), Rep (2,n), Rep (3,n), Rep (4,n), Rep (5,n), Rep (6,n)) # COMPUTE DTW Distances > Library (DTW) > Distmatrix <-Dist (sample2, method= "DTW") # Hierarchical Clustering > HC <-hclust (Distmatrix, method= "average") > Plot (hc, labels=observedlabels, main= "")
Time Series Classification Time series classification is to build a classification model based on labelled time series and then use the model to pred ICT the label of Unlabelled time series. The On-time series classification with extract and build features from time series data first, and then apply Existing classification techniques, such as SVM, k-nn, neural networks, regression and decision trees, to the feature set . Discrete Wavelet Transform (DWT) provides a multi-resolution representation using wavelets and is used in the example Belo W. Another popular feature extraction technique is discrete Fourier Transform (DFT). # Extracting DWT coefficients (with Haar filter) > Library (wavelets) > Wtdata <-NULL > for (i-in 1:nrow (SC)) { + a <-t (Sc[i,]) + WT <-DWT (A, filter= "Haar", boundary= "periodic") + wtdata <-rbind (Wtdata, Unlist (C (wt@w,wt@v[[wt@level)])) + } > Wtdata <-as.data.frame (wtdata) # Set class labels into categorical values > ClassId <-C (Rep ("1″,100), Rep (" 2″,100), Rep ("3″,100), + Rep ("4″,100"), Rep ("5″,100"), Rep ("6″,100") > WTSC <-data.frame (Cbind (ClassId, Wtdata)) # Build a decision tree with Ctree () > Library (Party) > Ct <-ctree (classId ~., DATA=WTSC, + controls = Ctree_control (minsplit=30, minbucket=10, maxdepth=5)) > Pclassid <-predict (CT) # Check predicted classes against original class labels > table (classId, Pclassid)
# accuracy > (sum (CLASSID==PCLASSID))/Nrow (WTSC) [1] 0.8716667 > Plot (CT, ip_args=list (pval=false), Ep_args=list (digits=0)) More examples on time series analysis and mining with R and other data mining techniques can is found in my book "R and D ATA Mining:examples and Case Studies ", which is downloadable as a. PDF file at the link. |