Zheng Jie "machine Learning algorithm principles and programming Practices" study notes (sixth. Neural network) 6.3 Self-organizing feature map neural networks (SMO)

Source: Internet
Author: User

Specific principle website: http://wenku.baidu.com/link?url=zSDn1fRKXlfafc_ Tbofxw1mtay0lgth4gwhqs5rl8w2l5i4gf35pmio43cnz3yefrrkgsxgnfmqokggacrylnbgx4czc3vymiryvc4d3df3

Self-organizing feature map neural network (self-organizing Feature map. Also called Kohonen Mapping), referred to as the SMO network, is mainly used to solve the problem of pattern recognition class. The SMO network is a unsupervised learning algorithm similar to the previous Kmeans algorithm. The difference is that the SMO network does not need to provide the number of clusters in advance, and the number of categories is automatically recognized by the network.

Basic idea: Divide the collection of small distances into the same category, and divide the individual sets of large distances into different categories.

6.4.1 SMO Network Framework

The SMO network is relatively simple, with only input and output layers.

  

The output layer of the SMO network is relatively characteristic, and unlike other neural networks, it establishes a lateral connection to the neurons in the same layer and can form a specific pattern through the learning of weights. The output layer is arranged into a checkerboard shape. The neurons of the output layer can form many forms, and different forms can map different patterns, such as dimensional array, two-dimensional planar array and three-dimensional grid. For the two-dimensional training data, the arrangement is generally two-dimensional planar array.

Implementation steps:
1. Input Layer Network

Input layer network nodes and dataset peers, the same number of columns, but the data set to be normalized.

2. Output Network

The output network is typically built on the dimensions of the dataset. For example, two-dimensional case, the hope is divided into 4 classes, the output layer can be designed as 4*2 matrix

3. Weight node

Defines the dimension of the weight node based on the dimension of the input layer's dataset and the estimated number of output layers. For example, the data set is two-dimensional, the weight of the number of rows is set to 2, divided into 4 categories, the weight of the column is selected 4. The weight value is generally given as a random value between 0~1.

4. Define the learning rate

The rate of learning affects the rate of convergence, and you can define a dynamic learning function that converges as the number of iterations increases. The learning functions in this example are:

  

, the maxlrate is the maximum learning rate, the minlrate is the minimum learning rate, the maxiteration is the maximum number of iterations, and I is the current number of iterations.

5. Define the cluster RADIUS function

The learning radius affects the effect of clustering. You can define a dynamic narrowing radius function that shrinks as the number of iterations increases, and the RADIUS function defined in this example:

In the number and formula, the MAXR is the maximum cluster radius, the MINR is the minimum cluster radius, the maxitertion is the maximum number of iterations, and I is the number of times of the current iteration.

  

6. The process of clustering:

    • Accept input: First calculate the learning rate and learning radius of this iteration, and randomly select a sample from the training set
    • Find the winning node: calculates the distance from the other samples in the dataset to the smallest winning node in the dot product
    • Calculate the winning neighborhood: calculates the neighborhood of the cluster based on the two nodes and finds all the nodes in this neighborhood.
    • Adjustment weight: Adjust the weights according to the learning rate and sample data.
    • Assigns a category label to the dataset, based on the results of the calculation.
    • Evaluation Result: SMO network belongs to unsupervised clustering, the result of the output is the cluster label, if the training set has been classified, that is, with the classification label, then through the comparison of old and new tags can reflect the accuracy of clustering results.

6.4.2 Som class

Initial construction Method:

  

classKohonen (object):def __init__(self): Self.lratemax= 0.8#Maximum Learning rate-European-style distanceSelf.lratemin = 0.05#minimum Learning rate-European-style distanceSelf.rmax = 5.0#Maximum cluster Radius--based on data setSelf.rmin = 0.5#Minimum cluster RADIUS--based on data setSelf. Steps = 1000#Number of iterationsSelf.lratelist = []#convergence curve of learning rateSelf.rlist = []#Learning radius Convergence curveSELF.W = []#weight Vector GroupSelf. M = 2#total number of m*n clustersSelf. N = 2#m/n represents the parameters of the neighborhoodSelf.datamat = []#external Import Data setSelf.classlabel = []#category labels after clustering

6.4.3 function function

(1) Normalization of data

  

def normlize (self,datamat):         # normalized normalization of data     [m,n] = shape (datamat)        for in xrange (n-1):         = (Datamat[:,i]-mean (datamat[:,i))/(STD (datamat[:,i]) +1.0e-10 )     return Datamat

(2) Calculate Euclidean distance:

    def Disteclud (VECA,VECB):         # European distance        EPS = 1.0e-6        return linalg.norm (VECA-VECB) + EPS    

(3) Loading data files

  

 defLoaddataset (Self,filename):#load a data setNumfeat = len (open (filename). ReadLine (). Split ('\ t'))-1FR=open (filename) forLineinchfr.readlines (): Linearr=[] CurLine= Line.strip (). Split ('\ t') Linearr.append (float (curline[0)) Linearr.append (float (curline[1])) self.dataMat.append (Linearr) Self.datamat= Mat (Self.datamat)

(4) Initializing the second layer of grid

  

#Coding:utf-8 fromNumPyImport*classKohonen (object):def __init__(self): Self.lratemax= 0.8#Maximum Learning rate-European-style distanceSelf.lratemin = 0.05#minimum Learning rate-European-style distanceSelf.rmax = 5.0#Maximum cluster Radius--based on data setSelf.rmin = 0.5#Minimum cluster RADIUS--based on data setSelf. Steps = 1000#Number of iterationsSelf.lratelist = []#convergence curve of learning rateSelf.rlist = []#Learning radius Convergence curveSELF.W = []#weight Vector GroupSelf. M = 2#total number of m*n clustersSelf. N = 2#m/n represents the parameters of the neighborhoodSelf.datamat = []#external Import Data setSelf.classlabel = []#category labels after clustering    defRatecalc (self,i):#Learning rate and radiusLearn_rate = self.lratemax-((i+1.0) * (self.lratemax-self.lratemin))/Self . Steps r_rate= self.rmax-((i+1.0) * (self.rmax-self.rmin))/Self . Stepsreturnlearn_rate,r_ratedefNormlize (Self,datamat):#Normalization of data normalization[M,n] =shape (Datamat) forIinchXrange (n-1): Datamat[:,i]= (Datamat[:,i]-mean (datamat[:,i))/(STD (datamat[:,i)) +1.0e-10)        returnDatamatdefDisteclud (SELF,VECA,VECB):#European distanceEPS = 1.0e-6Data= veca-VECB Data= Linalg.norm (VECA-VECB) +EPSreturnLinalg.norm (VECA-VECB) +EPSdefLoaddataset (Self,filename):#load a data setNumfeat = len (open (filename). ReadLine (). Split ('\ t'))-1FR=open (filename) forLineinchfr.readlines (): Linearr=[] CurLine= Line.strip (). Split (' ') Linearr.append (float (curline[0)) Linearr.append (float (curline[1])) self.dataMat.append (Linearr) Self.datamat=Mat (Self.datamat)defInit_grid (self): K= 0#Building a second-tier network modelGrid = Mat (Zeros (self. M*self. n,2)))         forIinchxrange (self. M): forJinchxrange (self. N): Grid[k,:]=[I,j] k+ = 1returnGriddefTrain (self): Dm,dn= Shape (Self.datamat)#1. Build the input networkNormdataset = Self.normlize (Self.datamat)#Normalization of dataGrid = Self.init_grid ()#initializing the second tier classifierSELF.W = Random.rand (self. M*self. N,DN)#3. Randomly initialize the weight value between two tiers        #distm = Self.disteclud #确定距离公式        #4. Iterative Solutions        ifSelf. Steps < 5*dm:self. Steps= 5*DM#set the minimum number of iterations         forIinchxrange (self. Steps): Lrate,r= Self.ratecalc (i)#1) Calculate the learning rate and the classification radius under the current iteration countself.lratelist.append (lrate) self.rlist.append (R)#2) Randomly generate a sample index and extract a sampleK =random.randint (0,DM) mysample=normdataset[k,:]#3) Calculate the optimal node: Returns the index value of the minimum distanceMinindx =[self.disteclud (mysample,i) forIinchSELF.W] Minindx=minindx.index (min (minindx))#self.w[minindx,:] = self.w[minindx,:]+lrate* (Mysample[0]-self.w[minindx,:])            #4) Compute NeighborhoodD1 = Ceil (minindx/self. M#calculates the position of this node in the second-level matrixD2 =MoD (minindx,self. M) Distmat= [Self.disteclud (Mat ([d1,d2]), i) forIinchGrid] Nodelindx= (Array (Distmat) <R). Nonzero () forJinchXrange (Shape (SELF.W) [1]):#5) Update weights by column                ifsum (nodelindx = =j): Self.w[:,j]= self.w[:,j]+lrate* (mysample[0]-Self.w[:,j])#end of main loopSelf.classlabel = Range (DM)#assigning and storing category labels after clustering         forIinchxrange (DM): Self.classlabel[i]=distm (NORMDATASET[I,:],SELF.W). Argmin () Self.classlabel=Mat (Self.classlabel) smonet=Kohonen () smonet.loaddataset ('TestSet2.txt') Smonet.train ()

  

Source: Zheng Jie "machine Learning algorithm principles and programming practices" for study only

Zheng Jie "machine Learning algorithm principles and programming Practices" study notes (sixth. Neural network) 6.3 Self-organizing feature map neural networks (SMO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.