Introduction to FCM Clustering algorithm

Source: Internet
Author: User
Tags in domain

FCM algorithm is a clustering algorithm based on partition, and its idea is to make the similarity between the objects divided into the same cluster is the largest, and the similarity between different clusters is the least. The fuzzy C-means algorithm is an improvement of the ordinary C-means algorithm , the ordinary C- means algorithm is hard to divide the data, and FCM is a kind of flexible fuzzy division. Before introducing FCM specific algorithms, we introduce some basic knowledge of fuzzy sets.

1 basic knowledge of fuzzy sets

First explainDegree of membership functionThe concept. The membership function is a representation of an objectXBelong to the collectionAThe degree of the function, usually written μa (x), whose range of arguments is all that may belong to the collectionAThe object (that is, the collectionAAll points in the space), the value range is[0,1]0<=μa (x) <=1 x Fully subordinate to the set a, equivalent to the traditional set concept x∈ax={x} Defines a fuzzy set ax={x} x1,x2, ,xn fuzzy collection can be represented as:

(6.1)

With the concept of fuzzy sets, an element belongs to the fuzzy set is not hard, in the clustering problem, clustering clusters can be regarded as a fuzzy set, therefore, each sample point is subordinate to the cluster's membership degree is [0,1] The value of the interval.

2 K mean Clustering algorithm (Hcm,k-means) introduction

K-Means clustering (K-means), known as C- means clustering, has been applied in various fields. Its core ideas are as follows: The algorithm divides the n-vector XJ (1,2...,n) into C- group Gi (i=1,2,..., C), and asks for each group of cluster centers, The value function (or objective function) of the non-similarity (or distance) indicator is minimized. When the Euclidean distance is selected as a non-similarity index between the group J Vector XK and the corresponding cluster center CI, the value function can be defined as:

(6.2)

Here is the value function within group I. The value of this Ji depends on the geometrical characteristics of the GI and the position of the CI.

Generally, a universal distance function D (XK,CI) is used instead of the vector xk in Group I , then the corresponding total value function can be expressed as:

(6.3)

For simplicity, the Euclidean distance is used as a non-similarity indicator of the vector, and the total value function is represented as (6.2).

Divided groups are generally defined by a cxn two-dimensional membership matrix U . If the J data point XJ belongs to group I, the element in U is uij 1; otherwise, the element takes 0. Once you have identified the cluster center CI, you can export the following (6.2) Minimum uij:

(6.4)

Again, if CI is the nearest cluster center of XJ, then XJ belongs to group I. Since a given data can only belong to one group, the membership matrix U has the following properties:

(6.5)

And

(6.6)

On the other hand, if the uij is fixed, the smallest optimal cluster Center (6.2) is the mean value of all vectors in Group I:

(6.7)

Here | gi| is the size of GI or.

To facilitate batch mode operation, the DataSet XI (1,2 ...) is given here. ,n) The K- mean algorithm, which uses the following steps to determine the cluster center CI and the membership matrix U:

Step 1: Initialize the cluster center ci,i=1,..., C. A typical practice is to take a C point from all data points .

Step 2: Use the formula (6.4) to determine the membership matrix U.

Step 3: Calculate the value function according to the formula (6.2). If it is less than a certain threshold, or if it is less than a certain threshold relative to the last value function, then the algorithm stops.

Step 4: Modify the cluster center according to the formula (6.5). Return to step 2.

The algorithm itself is iterative and does not ensure that it converges to the optimal solution. The performance of the K-mean algorithm relies on the initial location of the cluster center. Therefore, in order to make it desirable, either with some front-end method to find a good initial cluster center, or each time with a different initial cluster center, the algorithm is run multiple times. In addition, the above algorithm is only a representative method; We can also initialize an arbitrary membership matrix before performing the iterative process.

The K -means algorithm can also be run online. At this point, the corresponding cluster centers and corresponding groups are exported through time averaging. For a given data point x, the algorithm asks for the nearest cluster center CI and corrects it with the following formula:

(6.8)

This online formula essentially embeds the learning rules of many unsupervised learning neuron networks.

3 Fuzzy C- mean clustering

Fuzzy C mean Clustering (FCM), known as Fuzzy ISODATA, is a clustering algorithm that uses membership degree to determine the degree of each data point belonging to a certain cluster. 1973 years,Bezdek proposed this algorithm as an improvement of the early hard C- means clustering (HCM) method.

FcmPutNA vectorXI (i=1,2,..., n) is divided intoc A fuzzy group, and the cluster center of each group, so that the value function of non-similarity index is minimized. fcm and HCM< Span style= "font-family: the song Body;" The main difference is that fcm Use fuzzy partitioning to make each given data point use a value of 0,1u allow values to be 0,11

(6.9)

Then,the value function (or objective function) of FCM isthe generalized form of the formula (6.2):

(6.10)

Here Uij is between 0and1 ;CI is the cluster center of Fuzzy Group I,dij=| | ci-xj| | Is the Euclidean distance between the first cluster center and the number of points J, and is a weighted exponent.

The following new objective function is constructed to obtainthe necessary conditions to achieve the minimum value of the (6.10) formula:

(6.11)

Here LJ,j=1 to N, is (6.9) the N- constrained Lagrange multiplier. The necessary conditions for the derivation of all input parameters tominimize the formula (6.10) are:

(6.12)

And

(6.13)

By the above two necessary conditions, the fuzzy C -means clustering algorithm is a simple iterative process. When running in batch mode,FCM uses the following steps to determine the cluster center CI and Membership matrix u[1]:

Step 1: Initialize the membership matrix U with a random number between 0 and 1 to make theconstraint in its satisfying formula (6.9)

Step 2:calculate the C cluster Center CI using the formula (6.12) ,i=1,..., C.

Step 3: Calculate the value function according to the formula (6.10). If it is less than a certain threshold, or if it is less than the value of the last value function, the algorithm stops.

Step 4:calculate the new U matrix with (6.13) . Return to step 2.

The above algorithm can also initialize the cluster center before performing the iterative process. There is no guarantee that FCM converges to an optimal solution. The performance of the algorithm relies on the initial clustering center. Therefore, we either use another fast algorithm to determine the initial cluster center, or each time with a different initial cluster center to start the algorithm, run FCM multiple times.

Application of 4 FCM algorithm

  through the above discussion, we are not difficult to see fcmc and another parameter mc far less than the total number of cluster samples, and to ensure c>1m It is a flexible parameter of the control algorithm, if m Too large, then the clustering effect will be very, if m clustering algorithm.

The output of the algorithm is a fuzzy partition matrix of the C cluster center point vector and c*n, which represents the degree of membership of each sample point belonging to each class. According to this partitioning matrix, the maximum membership principle in the fuzzy set determines which class each sample point belongs to. The cluster center represents the average characteristics of each class and can be considered as the representative point of the class.

It is not difficult to see from the derivation of the algorithm that the algorithm is good for satisfying the distributed data clustering effect, and the algorithm is sensitive to outliers.

Introduction to FCM Clustering algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.