LIBSVM code reading: About svm_group_classes function analysis

Source: Internet
Author: User

Currently LIBSVM the latest version is 3.17, the main change is to add a few lines of code in the Svm_group_classes function. The official notes are as follows:

Version 3.17 released on April Fools ' Day, 2013. We slightly adjust the labels is handled internally. By default labels is ordered by their first occurrence in the training set. Hence for a set with-1/+1 labels, if-1 appears first, then Internally-1 becomes +1. This has caused confusion. Now for data with-1/+1 labels, we specifically ensure that internally the binary SVM have positive data corresponding to T He +1 instances. For developers, see changes in the subrouting svm_group_classes of Svm.cpp.

This article analyzes this function:

The function of the svm_group_classes function is: Group training data of the same class

Important: How to classify a bunch of data together, the same kind of continuous storage! Refer to this function.

The function prototypes are as follows:

[CPP] View plain copy <embed id=zeroclipboardmovie_1 height=18 name=zeroclipboardmovie_1 Type=application/x-shockwa Ve-flash align=middle Pluginspage=http://www.macromedia.com/go/getflashplayer width=18 src=http:// static.blog.csdn.net/scripts/zeroclipboard/zeroclipboard.swf wmode= "Transparent" flashvars= "id=1&width=18 &height=18 "allowfullscreen=" false "allowscriptaccess=" Always "bgcolor=" #ffffff "quality=" Best "menu=" false " Loop= "false" >
    1. void svm_group_classes (const svm_problem *prob, int *nr_class_ret, int **label_ret, int **start_ret, int **count_ret, int *perm)

The main input is the prob pointer, which points to the sample data set that svm_group_classes will process, and the other parameters are pointer types, which can be equivalent to the output data, where:

    1. nr_class_ret--statistics on the total number of categories in the sample set
    2. label_ret--an array that points to the storage class designator
    3. start_ret--an array that stores the starting position of each category
    4. count_tet--A pointer to an array that stores the number of samples per category
    5. perm--an indexed array that points to the original data
Here, let's look at some of the code, the function of the for loop in the code: Count the total number of categories, assign the corresponding category Y[i] to the corresponding label, and count the number of samples for each category. Set an example: {There are 6 samples, a total of 4 classes, where y[0]=y[1],y[2]=y[3],y[4],y[5]}, the For loop runs as follows: I=0 label[0]=y[0], Data_label[0]=0i=1 Lab EL[0]=Y[0]=Y[1], data_label[1]=0 count[0]=2i=2 label[1]=y[2], data_label[2]=1i=3 label[1]=y[2]=y[3], Data_ Label[3]=1 count[1]=2i=4 label[2]=y[4], data_label[2]=2 count[2]=1i=5 label[3]=y[5], data_label[2]=3 Count[3]=1

[CPP] View plain copy <embed id=zeroclipboardmovie_2 height=18 name=zeroclipboardmovie_2 Type=application/x-shockwa Ve-flash align=middle Pluginspage=http://www.macromedia.com/go/getflashplayer width=18 src=http:// static.blog.csdn.net/scripts/zeroclipboard/zeroclipboard.swf wmode= "Transparent" flashvars= "id=2&width=18 &height=18 "allowfullscreen=" false "allowscriptaccess=" Always "bgcolor=" #ffffff "quality=" Best "menu=" false " Loop= "false" >
  1. Label:label name, Start:begin of each class, Count: #data of classes, perm:indices to the original data
  2. Perm, length L, must be allocated before calling this subroutine
  3. static void svm_group_classes (const svm_problem *prob, int *nr_class_ret, int **label_ret, int **start_ret, int **count_re T, int *perm)
  4. {
  5. int L = Total number of prob->l;//samples
  6. int max_nr_class = 16;//is not enough, auto growth is twice times the original (see below)
  7. int nr_class = 0;
  8. int *label = malloc (Int,max_nr_class),//malloc (type,n) (type *) malloc ((n) *sizeof (type))
  9. int *count = Malloc (Int,max_nr_class);
  10. int *data_label = Malloc (int,l);
  11. int i;
  12. for (i=0;i<l;i++)
  13. {
  14. int this_label = (int) prob->y[i];//assigns the category to This_label
  15. Int J;
  16. for (j=0;j<nr_class;j++)
  17. {
  18. if (This_label = = Label[j])//Although there is no value at the beginning of the label, but the first cycle of the inner layer is not run
  19. {
  20. ++COUNT[J];
  21. Break
  22. }
  23. }
  24. Data_label[i] = j;
  25. if (j = = Nr_class)
  26. {
  27. if (Nr_class = = Max_nr_class)
  28. {
  29. Max_nr_class *= 2;//Expand the maximum number of categories
  30. label = (int *) realloc (label,max_nr_class*sizeof (int));
  31. Count = (int *) realloc (count,max_nr_class*sizeof (int));
  32. }
  33. Label[nr_class] = This_label;
  34. Count[nr_class] = 1;//this is 1.
  35. ++nr_class;
  36. }
  37. }

This version updates the section: This section mainly deals with class two classifications, when the first occurrence is-1, which is responsible for swapping the data-1 and +1.

[CPP] View plain copy <embed id=zeroclipboardmovie_3 height=18 name=zeroclipboardmovie_3 Type=application/x-shockwa Ve-flash align=middle Pluginspage=http://www.macromedia.com/go/getflashplayer width=18 src=http:// static.blog.csdn.net/scripts/zeroclipboard/zeroclipboard.swf wmode= "Transparent" flashvars= "id=3&width=18 &height=18 "allowfullscreen=" false "allowscriptaccess=" Always "bgcolor=" #ffffff "quality=" Best "menu=" false " Loop= "false" >
  1. //
  2. Labels is ordered by their first occurrence in the training set.
  3. However, for Two-class sets with-1/+1 labels and-1 appears first,
  4. We swap labels to ensure that internally the binary SVM have positive data corresponding to the +1 instances.
  5. //
  6. if (Nr_class = = 2 && label[0] = = 1 && label[1] = = 1)
  7. {
  8. Swap (label[0],label[1]);
  9. Swap (count[0],count[1]);
  10. for (i=0;i<l;i++)
  11. {
  12. if (data_label[i] = = 0)
  13. Data_label[i] = 1;
  14. Else
  15. Data_label[i] = 0;
  16. }
  17. }

The following section of code is used to calculate the starting position of each category start, and the perm array of index positions in the original data after each sample classification. Where perm[i]=j:i represents the current homogeneous sample position, J represents the original data location.

Important: How to classify a bunch of data together, the same kind of continuous storage! Refer to this function.

[Cpp]   view plain copy  <embed id=zeroclipboardmovie_4 height=18 name= Zeroclipboardmovie_4 Type=application/x-shockwave-flash Align=middle pluginspage=http://www.macromedia.com/go/ Getflashplayer width=18 src=http://static.blog.csdn.net/scripts/zeroclipboard/zeroclipboard.swf wmode= " Transparent "flashvars=" Id=4&width=18&height=18 "allowfullscreen=" false "allowscriptaccess=" always " Bgcolor= "#ffffff" quality= "Best" menu= "false" loop= "false" >
    1. Int *start = malloc (int,nr_class);   
    2. start[0] = 0;  &NBSP
    3. for (i=1;i<nr_class;i++)   
    4.     start[i] =  start[i-1]+count[i-1];  
    5. for (i=0;i<l;i++)   
    6. {  
    7.     perm[start[data_label[i]]] = i;  
    8.      ++start[data_label[i]];  
    9. }  
    10. start[0] = 0;  
    11. for (i=1;i<nr_class;i++)   
    12.     start[i] = start[i-1]+count[ i-1];  
    13.   
    14. *nr_class_ret = nr_class;  
    15. *label_ ret = label;  
    16. *start_ret = start;  
    17. *count_ret =  count;  
    18. free (data_lab

LIBSVM code reading: About svm_group_classes function analysis (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.