Machine learning: Bayesian classifier (ii)--Gaussian naive Bayesian classifier code implementation

Last Update:2018-08-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Implementation of a Gaussian naive Bayesian classifier code

On-line search does not call Sklearn implementation of naïve Bayesian classifier is very few, even if there is also a combination of text classification of the polynomial or Bernoulli type, so I wrote a direct encapsulation of the Gaussian type NB classifier, of course, compared with the real source code a lot less properties and methods, interested can add their own. The code is as follows (with detailed comments):

Class Naivebayes (): "Gaussian naive Bayesian classifier" Def __init__ (self): Self._x_train = None Self._y_train = None Self._classes = None Self._priorlist = None Self._meanmat = None Self._varmat = None def fit (s Elf, X_train, y_train): Self._x_train = X_train Self._y_train = Y_train self._classes = np.un        Ique (Self._y_train) # Gets the various categories priorlist = [] Meanmat0 = Np.array ([[0, 0, 0, 0]])            Varmat0 = Np.array ([[[0, 0, 0, 0]]) for I, C in enumerate (self._classes): # Calculates the mean, variance, prior probability of each species X_index_c = Self._x_train[np.where (Self._y_train = = c)] # "Matrix" consisting of samples belonging to a category Priorlist.append (X_index_c     . shape[0]/self._x_train.shape[0]) # Calculate a priori probability of a category X_index_c_mean = Np.mean (X_index_c, axis=0, Keepdims=true)            # calculates the mean of each feature under this category, resulting in a two-dimensional state [[3 4 6 2 1]] X_index_c_var = Np.var (X_index_c, Axis=0, keepdims=true) # Variance Meanmat0 = Np.appEnd (Meanmat0, X_index_c_mean, axis=0) # The characteristic mean matrix under each category is a new matrix, each representing a category.  Varmat0 = Np.append (Varmat0, X_index_c_var, axis=0) self._priorlist = priorlist Self._meanmat = meanmat0[1:,                :] #除去开始多余的第一行 Self._varmat = varmat0[1:,:] def predict (self,x_test):                                                EPS = 1e-10 # prevents denominator of 0 classof_x_test = [] #用于存放测试集中各个实例的所属类别 for x_sample in x_test:matx_sample = Np.tile (x _sample, (Len (self._classes), 1)) #将每个实例沿列拉长, number of rows is the number of categories of the sample Mat_numerator = Np.exp (-(MATX_SAMPLE-SELF._MEANM AT) * * 2/(2 * self._varmat + EPS)) Mat_denominator = np.sqrt (2 * np.pi * self._varmat + EPS) list_ Log = Np.sum (Np.log (mat_numerator/mat_denominator), Axis=1) # class conditional probabilities in each category are added after the logarithm prior_class_x = List_log + np.log (self._priorlist) # Plus logarithmic prio of class priori probabilitiesR_class_x_index = Np.argmax (prior_class_x) # The index with the largest logarithm probability classof_x = self._classes[prior_class_x _index] # Returns an instance corresponding to the category Classof_x_test.append (classof_x) return classof_x_test def Scor E (self, X_test, y_test): j = 0 for I in range (Len (self.predict (x_test)): If Self.predict (x_test) [i] = = Y_test[i]: j + = 1 return (' accuracy: {:. 10%} '. Format (J/len (y_test)))

For the manual implementation of the Gaussian type NB classifier, the iris data is used to test the same as the Sklearn Library's classifier results, basically hovering around 93-96. This is due to multiple 28 splits, which is equivalent to a number of time-saving methods. To calculate more accurate accuracy, cross-validation is possible and multiple evaluation methods are selected, which are no longer implemented.

import numpy as npfrom sklearn import datasetsfrom sklearn.model_selection import train_test_splitfrom sklearn import preprocessing# 获取数据集，并进行8:2切分iris = datasets.load_iris()X = iris.datay = iris.target# print(X)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)nb = NaiveBayes()nb.fit(X_train,y_train)print(nb.predict(X_test))print(nb.score(X_test,y_test))#输出结果如下：[0, 2, 1, 1, 1, 2, 1, 0, 2, 0, 1, 1, 1, 0, 2, 2, 2, 2, 0, 1, 1, 0, 2, 2, 2, 0, 1, 0, 1, 0]accuracy: 96.6666666667%

Two other

Naive Bayes, which is based on the hypothesis of attribute condition independence, is often difficult to establish in reality, so it produces a "semi-naïve Bayesian classifier". The basic idea is to take proper consideration of the interdependent information among some attributes, so that we do not need to do a complete joint probability calculation, and do not completely ignore the strong attribute dependency. "Independent dependency Estimation" is the most common strategy, assuming that each property depends on a maximum of one other property outside of the category. Including Spode method, Tan method, Aode method and so on.
Np.unique (): Returns a new array of non-repeating elements in the original array with elements from small to large.

y = np.array([1, 2, 9, 1,2,3])classes = np.unique(y)                     # 返回y中所有不重复的元素组成的新array([1,2,3,9])print(classes)                             # 结果为np.array([1,2,3,9])

Np.where (): operation on Array

'''1. np.where(condition, x, y)满足条件(condition)，满足进行x操作，不满足进行y操作'''a= np.array([[9, 7, 3], [4, 5, 2], [6, 3, 8]])b=np.where(a > 5, 1, 0)               #对于a中的元素如果大于5，则改写成1，否则写成0.                print(b)输出结果：[[1 1 0] [0 0 0] [1 0 1]]

  "2. Np.where (condition) only condition (condition), without x and y, the output satisfies the conditional element's coordinates (equivalent to Numpy.nonzero). Here the coordinates are given in the form of a tuple, usually the original array has how many dimensions, the output tuple contains several arrays, respectively, corresponding to the dimension coordinates of the conditional element. "C = Np.array ([[9, 7, 3], [4, 5, 2], [6, 3, 8]]) d = Np.where (C > 5) #条 Pieces for elements greater than 5print (d) output as follows (tuple): (Array ([0, 0, 2, 2], Dtype=int64), array ([0, 1, 0, 2], Dtype=int64)) indicates that the following table is 00, and 01 20,22 elements meet the criteria. A = Np.array ([1,3,6,9,0]) b = Np.where (a > 5) print (b) output (Array ([2, 3], dtype=int64), the element that coordinates 2 and 3 satisfies, note the comma at the end, Indicates that the one-dimensional real output tuple is two-dimensional, 2_,3_ is nothing but back, a dimension of greater than or equal to 2 o'clock, the same tuple and a-dimensional number. The result of the output is that it can be directly labeled as an array. x = Np.array ([[1, 5, 8, 1], [2, 4, 6, 8], [3, 6, 7, 9], [6, 8, 3, 1]] print (x[b]) The result is an array of the 2nd, 3 rows of x [[3] 6 7] [9 6 8 3]] , equivalent to x[[2,3]],x[2,3] output as an element 9,x[[2],[3]] [9].

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning: Bayesian classifier (ii)--Gaussian naive Bayesian classifier code implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning: Bayesian classifier (ii)--Gaussian naive Bayesian classifier code implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support