JAVA implementation of KNN classification __java

Source: Internet
Author: User
Tags format definition

Reprint please indicate the source: http://blog.csdn.net/xiaojimanman/article/details/51064307

Http://www.llwjy.com/blogdetail/f74b497c2ad6261b0ea651454b97a390.html

Personal Blog Station has been online, the Web site www.llwjy.com ~ welcome you to spit out the groove ~

-------------------------------------------------------------------------------------------------

Before starting a small ad, create a QQ group: 321903218, click on the link to join the group "Lucene case Development", mainly used to exchange how to use Lucene to create site search background, but also at irregular intervals in the group open the relevant open class, Interested children's shoes can be added to the exchange.


KNN algorithm, also known as the nearest neighbor algorithm, is a commonly used classification algorithm in data mining, and the core idea of the KNN algorithm is to search for the nearest k individuals, which belong to the category of the target. For example, K is 7, so we find 7 samples from the data that are closest to the target (or the highest similarity), the categories for which the 7 samples were added were A, B, C, A, A, a, B, and the target category was a (because the 7 samples had the highest number of samples in Category A).


Algorithm Implementation

definition of training data format

Here is a simple introduction to how to use Java to achieve the KNN classification, first we need to store the training set (including attributes and corresponding categories), where we have to the unknown attributes using generics, the category we use String storage.

/**  
 * @Description:  A record storage format in the KNN classification model/ 
package Com.lulei.datamining.knn.bean;  
  
public class knnvaluebean<t>{
	private T value;//record value
	private String typeid;//category ID
	
	public Knnvaluebean (T value, String typeid) {
		this.value = value;
		This.typeid = typeid;
	}

	Public T GetValue () {return
		value;
	}

	public void SetValue (T value) {
		this.value = value;
	}

	Public String GetTypeId () {return
		typeid;
	}

	public void Settypeid (String typeid) {
		This.typeid = typeid;
	}
}

Second, K nearest neighbor category data format definition

In the statistics to get K nearest neighbor, we need to record the first k samples of the classification and corresponding similarity, we use the following data format:

/**  
 * @Description: K nearest-neighbor category score/ 
package Com.lulei.datamining.knn.bean;  
  
public class Knnvaluesort {
	private String typeid;//category ID
	private double score;//the category score public
	
	Knnvaluesort ( String typeID, double score {
		This.typeid = typeid;
		This.score = score;
	}
	Public String GetTypeId () {return
		typeid;
	}
	public void Settypeid (String typeid) {
		This.typeid = typeid;
	}
	Public double Getscore () {return
		score;
	}
	public void SetScore (double score) {
		this.score = score;
	}
}

Basic attributes of KNN algorithm

In the KNN algorithm, the most important indicator is the value of k, so we need to set a property k in the base class and set an array to store the data of the known categories.

Private list<knnvaluebean> DataArray;
private int K = 3;

Iv. Adding known classified data

Before using the KNN classification, we need to add data to it that we know is classified, and then use that data to predict the classification of unknown data.

/**
 * @param value *
 @param typeid *
 @Author: Lulei  
 * @Description: Add a record to the model/public
void AddRecord (T value, String typeid) {
	if (DataArray = null) {
		DataArray = new arraylist<knnvaluebean> (); 
  
   }
	Dataarray.add (new knnvaluebean<t> (value, typeid));
}
  

similarity of five or two samples (or distance)

In the KNN algorithm, the most important method is how to determine the similarity between two samples (or distance), because we are using generics, there is no way to determine the similarity between two objects, once we set it to abstract methods, let subclasses to implement. Here our method is defined as the similarity, that is, the larger the return value, the more similar the two, the shorter the distance between .

/**
 * @param o1 *
 @param o2 *
 @return
 * @Author: Lulei  
 * @Description: Similarity between O1 O2
 * * Public
abstract Double Similarscore (t O1, T O2);

Vi. getting the nearest K-sample classification

The core idea of KNN algorithm is to find the nearest K nearest neighbor, so this step is also the core of the whole algorithm. Here we use the array to save the similarity of the largest K-sample classification and similarity, in the calculation of the process by looping through all the samples, the array holds up to the current calculation point of the most similar k samples corresponding to the category and similarity, specifically implemented as follows:

/**
 * @param value *
 @return
 * @Author: Lulei  
 * @Description: Get the nearest k classification * *
Private Knnvaluesort[] Getktype (T value) {
	int k = 0;
	knnvaluesort[] topk = new Knnvaluesort[k];
	for (knnvaluebean<t> Bean:dataarray) {
		Double score = Similarscore (Bean.getvalue (), value);
		if (k = = 0) {
			//The number of records in the array is 0 is directly added
			topk[k] = new Knnvaluesort (Bean.gettypeid (), score);
			k++;
		} else {
			if (!) ( K = = k && score < Topk[k-1].getscore ()) {
				int i = 0;
				Locate the point you want to insert for
				(; I < K && score < Topk[i].getscore (); i++);
				int j = k-1;
				if (K < k) {
					j = k;
					k++;
				}
				for (; j > i; j--) {
					topk[j] = topk[j-1];
				}
				Topk[i] = new Knnvaluesort (Bean.gettypeid (), score);
	}}} return TOPK;
}

Statistics of the most frequent categories of K-samples

This step is a simple count, the number of k samples in the most occurrences of the classification, which is the classification of the target data we want to predict.

/**
 * @param value *
 @return
 * @Author: Lulei  
 * @Description: KNN classification to judge the category of value *
 * public
String GetTypeId (T value) {
	knnvaluesort[] array = getktype (value);
	hashmap<string, integer> map = new hashmap<string, integer> (K);
	for (Knnvaluesort bean:array) {
		if (bean!= null) {
			if (Map.containskey (Bean.gettypeid ())) {
				Map.put (bean . GetTypeId (), Map.get (Bean.gettypeid ()) + 1);
			} else {
				map.put (Bean.gettypeid (), 1);
			}
	}} String Maxtypeid = null;
	int maxcount = 0;
	Iterator<entry<string, integer>> iter = Map.entryset (). iterator ();
	while (Iter.hasnext ()) {
		entry<string, integer> Entry = Iter.next ();
		if (Maxcount < Entry.getvalue ()) {
			Maxcount = Entry.getvalue ();
			Maxtypeid = Entry.getkey ();
		}
	}
	return maxtypeid;
}

So far, the abstract base class for KNN classification has been written, before the test we say a few more, the KNN classification is the statistics k the most frequent occurrence of the classification, this in some cases is not particularly reasonable, such as K=5, the first 5 samples corresponding to the classification of A, A, B, B, B, The corresponding similarity scores are 10, 9, 2, 2, 1, and if the above method is used, the forecast classification is B, but it is more reasonable to look at the data and predict the classification as a. Based on this situation, we propose the following optimizations to the KNN algorithm (this does not provide code, only to provide a simple idea): After obtaining the most similar k samples and similarity, you can do a function of similarity and occurrence k, such as weighting, the most important classification of the function value is the prediction of the target classification.

base class source code

 /** * @Description: KNN classification */package COM.LULEI.DATAMINING.KNN;
Import java.util.ArrayList;
Import Java.util.HashMap;
Import Java.util.Iterator;
Import java.util.List;

Import Java.util.Map.Entry;
Import Com.lulei.datamining.knn.bean.KnnValueBean;
Import Com.lulei.datamining.knn.bean.KnnValueSort;
  
Import Com.lulei.util.JsonUtil; @SuppressWarnings ({"Rawtypes"}) public abstract class Knnclassification<t> {private list<knnvaluebean>
	DataArray;
	
	private int K = 3;
	public int getk () {return K;
		public void setk (int K) {if (K < 1) {throw new IllegalArgumentException ("K must greater than 0"); } this.
	K = k; /** * @param value * @param typeid * @Author: Lulei * @Description: Adding records to the model/public void AddRecord (T va
		Lue, String typeid) {if (DataArray = null) {DataArray = new arraylist<knnvaluebean> ();
	} dataarray.add (New knnvaluebean<t> (value, typeid));  
}/** * @param value * @return * @Author: Lulei	 * @Description: KNN classification to judge the category of value * * Public String GetTypeId (T value) {knnvaluesort[] array = getktype (value);
		System.out.println (Jsonutil.parsejson (array));
		hashmap<string, integer> map = new hashmap<string, integer> (K); for (Knnvaluesort Bean:array) {if (bean!= null) {if (Map.containskey (Bean.gettypeid ())) {Map.put (bean.ge
				Ttypeid (), Map.get (Bean.gettypeid ()) + 1);
				else {map.put (Bean.gettypeid (), 1);
		}} String Maxtypeid = null;
		int maxcount = 0;
		Iterator<entry<string, integer>> iter = Map.entryset (). iterator ();
			while (Iter.hasnext ()) {entry<string, integer> Entry = Iter.next ();
				if (Maxcount < Entry.getvalue ()) {maxcount = Entry.getvalue ();
			Maxtypeid = Entry.getkey ();
	} return Maxtypeid; /** * @param value * @return * @Author: Lulei * @Description: Get the nearest k category * * Private knnvaluesort[] Getkt
		Ype (T value) {int k = 0; knnvaluesort[] Topk = new KNNVALUESORT[K];
			for (knnvaluebean<t> Bean:dataarray) {Double score = Similarscore (Bean.getvalue (), value);
				if (k = = 0) {//The number of records in the array is 0 is directly added topk[k] = new Knnvaluesort (Bean.gettypeid (), score);
			k++; } else {if (!
					K = = k && score < Topk[k-1].getscore ()) {int i = 0;
					Locate the point you want to insert for (; I < K && score < Topk[i].getscore (); i++);
					int j = k-1;
						if (K < k) {j = k;
					k++;
					for (; j > i; j--) {topk[j] = topk[j-1];
				} Topk[i] = new Knnvaluesort (Bean.gettypeid (), score);
	}} return TOPK; /** * @param O1 * @param o2 * @return * @Author: Lulei * @Description: Similarity between O1 O2/public abstract D
Ouble Similarscore (t O1, T O2);
 }

Specific Subclass Implementation

For the abstract base class described above for the KNN classification, we need to inherit the base class and implement the similarity abstract method in the base class for the actual problem, here we do a simple implementation.

/**  
 * @Description: * * 
package com.lulei.datamining.knn.test;  

Import com.lulei.datamining.knn.KnnClassification;
Import Com.lulei.util.JsonUtil;
  
public class Test extends knnclassification<integer>{
	
	@Override public
	double Similarscore (Integer O1, Integer O2) {
		return-1 * Math.Abs (O1-O2);
	}
	
	/**  
	 * @param args
	 * @Author: Lulei  
	 * @Description: * * public
	static void Main (string[) args) { C17/>test test = new test ();
		for (int i = 1; i < i++) {
			Test.addrecord (i, I > 5?) "0": "1");
		System.out.println (Jsonutil.parsejson (Test.gettypeid (0)));
		
	}


Here we have added 1, 2, 3, 4, 5, 6, 7, 8, 9 of these 9 sets of data, the first 5 groups of the category is 1, after 4 groups of 0, two data between the similarity between the difference between the absolute value of the opposite number, the following forecast 0 should belong to the classification, where the default value of K is 3, So the most recent K samples were 1, 2, 3, respectively, "1", "1" and "1", because the final forecast was classified as "1".

-------------------------------------------------------------------------------------------------
Small welfare
-------------------------------------------------------------------------------------------------
The individual in the Geek College "Lucene case Development" course has been online, welcome everyone to spit out the groove ~

First lesson: Lucene Overview

The second lesson: Lucene Common Function Introduction

Lesson Three: web crawler

Lesson Four: Database connection pool

Lesson Five: Collection of novel websites

Lesson Six: A novel website database operation

The seventh lesson: the realization of the novel website distributed crawler

Lesson Eighth: Lucene Real-time search

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.