This article mainly introduces PHP implementation of the naïve Bayesian algorithm of machine learning, combined with an example of the naïve Bayesian algorithm in detail the concept, principle and PHP implementation skills, the need for friends can refer to the following
In this paper, a simple Bayesian algorithm for machine learning in PHP is described. Share to everyone for your reference, as follows:
Machine learning has become commonplace in our lives. From when you're at home, for example, the thermostat starts working to the Smart car and the smart phone in our pocket. Machine learning seems to be ubiquitous and a very worthwhile area to explore. But what is machine learning? In general, machine learning is about keeping the system constantly learning and predicting new problems. From simple predictive shopping to complex digital assistant forecasts.
In this article I will use naive Bayesian algorithm clasifier as a class to introduce. This is a simple and easy to implement algorithm, and can give satisfactory results. But this algorithm requires a bit of statistical knowledge to understand. In the last part of the article you can see some example code and even try to do your own machine learning yourself.
Start
So, what is this classifier for? In fact, it is mainly used to determine whether a given statement is positive or negative. For example, "Symfony is the good" is a positive statement, "No Symfony is bad" is a negative statement. So after given a statement, I want this classifier to return a statement type when I don't give a new rule.
I named classifier a class with the same name and included a guess method. This method takes the input of a statement and returns whether the statement is positive or negative. This class is like this:
Class classifier{Public Function guess ($statement) {}}
I prefer to use enum-type classes instead of strings as my return value. I named this enum-type class as type and contained two constants: one positive, one negative. These two constants will be used as the return value of the Guess method.
Class type{Const POSITIVE = ' POSITIVE '; const negative = ' negative ';}
The initialization work is done, and the next step is to write our algorithm for prediction.
Naive Bayesian
The naive Bayesian algorithm is based on a training set, which is used to make predictions accordingly. The algorithm uses simple statistics and a little math to calculate the results. For example, a training set consisting of the following four texts:
statement |
type |
symfony is the best |
positive |
phpstorm is great |
positive |
iltar complains a lot |
negative |
no Symfony is bad |
Negative
|
If the given statement is "Symfony are the best" then you can say that the statement is positive. You will usually make the corresponding decision based on the corresponding knowledge you learned before, and the naïve Bayesian algorithm is the same: it depends on the previous training set to decide which type is more similar.
Learn
Before this algorithm formally works, it needs a lot of historical information as a training set. It needs to know two things: how many times each word corresponds and what type each statement corresponds to. We will store both of these information in two arrays when we implement them. An array contains the word statistics for each type, and the other array contains the statement statistics for each type. All other information can be aggregated from these two arrays. The code is the same as the following:
function Learn ($statement, $type) {$words = $this->getwords ($statement), foreach ($words as $word) {if (!isset ($this-& gt;words[$type] [$word])) { $this->words[$type] [$word] = 0;} $this->words[$type] [$word]++;//Add type of Word statistics} $ this->documents[$type]++; Increase the type of statement statistics}
With this set, the algorithm can now receive predictive training based on historical data.
Defined
To explain how this algorithm works, several definitions are necessary. First, let's define the probability that the input statement is one of a given type. This will be represented as P (Type). It is based on the type of data of known types as a molecule, and the number of data in the entire training set as the denominator. One data is one of the entire training set. Until now, this method can be named Totalp, like this:
function Totalp ($type) {return ($this->documents[$type] + 1)/(Array_sum ($this->documents) + 1);}
Note that both the numerator and the denominator are added 1. This is to avoid the case where both numerator and denominator are 0.
Depending on the example of the training set above, both positive and negative types will have a probability of 0.6. Each type of data is 2, altogether 4 data so it is (2+1)/(4+1).
The second is to define the probability that a given word belongs to which type is determined. This we define as P (word,type). First we have to get a word in the training set to determine the number of occurrences of the type, and then use this result to divide the number of words for the entire given type of data. This method we define as P:
function P ($word, $type) {$count = Isset ($this->words[$type] [$word])? $this->words[$type] [$word]: 0; return ($cou NT + 1)/(Array_sum ($this->words[$type]) + 1);}
In this session of the training set, the probability of "is" being positive type is 0.375. This word accounts for two of the 7 words in the entire positive data, so the result is (2+1)/(7+1).
Finally, the algorithm should only care about keywords and ignore other factors. One simple way to do this is to separate the words in the given string:
function Getwords ($string) {return preg_split ('/\s+/', preg_replace ('/[^a-za-z0-9\s]/', ' ', Strtolower ($string)));}
The preparation is ready, and we are beginning to really implement our plan!
Forecast
To predict the type of statement, the algorithm should calculate the probability of the two types of the given statement. As above, we define a P (type,sentence). A type with a high probability will be the result returned by the algorithm in the classifier class.
In order to calculate P (type,sentence), Bayesian theorem is used in the algorithm. The algorithm is defined like this: P (type,sentence) = P (Type) * p (sentence,type)/p (sentence). This means that the type probability of a given statement is the same as the probability of a given type statement divided by the probability of the statement.
So the algorithm calculates the P (tyoe,sentence) for each of the same statements, and P (sentence) is maintained the same. This means that the algorithm can omit other factors, and we only need to be concerned with the highest probability rather than the actual value. The calculation is like this: P (type,sentence) = P (Type) * p (sentence,type).
Finally, to calculate P (sentence,type), we can add a chain rule to each word in the statement. So if there are N words in a statement, it will be the same as P (word_1,type) * p (word_2,type) * p (word_3,type) * ... *p (word_n,type). Each word calculates the probability of the result using the definition we saw earlier.
Well, all of them are finished, it's time to actually do it in PHP:
function Guess ($statement) {$words = $this->getwords ($statement);//Get word $best _likelihood = 0; $best _type = null; fore Ach ($this->types as $type) {$likelihood = $this->ptotal ($type);//Calculate P (Type) foreach ($words as $word) { $like Lihood *= $this->p ($word, $type); Calculate P (Word, Type)} if ($likelihood > $best _likelihood) { $best _likelihood = $likelihood; $best _type = $type; }} return $best _type;}
This is all work, and now the algorithm can predict the type of statement. All you have to do is get your algorithm to start learning:
$classifier = new classifier (); $classifier->learn (' Symfony is the best ', Type::P ositive); $classifier->learn (' Phpstorm is great ', Type::P ositive), $classifier->learn (' Iltar complains a lot ', type::negative); $classifier Learn (' No Symfony is bad ', type::negative); Var_dump ($classifier->guess (' Symfony are great ')); String (8) "Positive" Var_dump ($classifier->guess (' I complain a lot ')); String (8) "negative"
All the code I've uploaded to Git, https://github.com/yannickl88/blog-articles/blob/master/src/machine-learning-naive-bayes/Classifier.php
The full PHP code on GitHub is as follows:
<?phpclass type{Const POSITIVE = ' POSITIVE '; const negative = ' negative ';} Class classifier{Private $types = [Type::P ositive, type::negative]; private $words = [Type::P ositive = [], Type::nega tive = []]; Private $documents = [Type::P ositive = 0, type::negative = 0]; Public function guess ($statement) {$words = $this->getwords ($statement);//Get the words $best _likelihood = 0; $best _type = null; foreach ($this->types as $type) {$likelihood = $this->ptotal ($type);//Calculate P (Type) foreach ($words as $wor d) {$likelihood *= $this->p ($word, $type);//calculate P (Word, type)} if ($likelihood > $best _likelihood) {$ Best_likelihood = $likelihood; $best _type = $type; }} return $best _type; The Public function learn ($statement, $type) {$words = $this->getwords ($statement); foreach ($words as $word) {if (!is Set ($this->words[$type [$word])) {$this->words[$type] [$word] = 0; } $this->words[$type] [$word]++; Increment the word CoUNT for the type} $this->documents[$type]++; Increment the document count for the type} public function P ($word, $type) {$count = 0; if (isset ($this->words[$ty pe][$word]) {$count = $this->words[$type] [$word];} return ($count + 1)/(Array_sum ($this->words[$type]) + 1); Public Function Ptotal ($type) {return ($this->documents[$type] + 1)/(Array_sum ($this->documents) + 1), public function Getwords ($string) {return preg_split ('/\s+/', preg_replace ('/[^a-za-z0-9\s]/', ' ', Strtolower ($string)))}} $classifier = new classifier (); $classifier->learn (' Symfony is the best ', Type::P ositive); $classifier->learn (' Phpstorm is great ', Type::P ositive), $classifier->learn (' Iltar complains a lot ', type::negative); $classifier Learn (' No Symfony is bad ', type::negative); Var_dump ($classifier->guess (' Symfony are great ')); String (8) "Positive" Var_dump ($classifier->guess (' I complain a lot ')); String (8) "negative"
Conclusion
Although we only have very little training, the algorithm should be able to give relatively accurate results. In the real world, you can allow machines to learn hundreds of thousands of records, so that you can give more accurate results. You can download this article (English): Naive Bayesian has been shown to give the results of emotional statistics.
Moreover, naive Bayes can not only be applied to the application of text class. Hopefully this article will bring you a little bit closer to machine learning.
Original address: Https://stovepipe.systems/post/machine-learning-naive-bayes
Articles you may be interested in:
PHP implementation of single-link list flipping Operation example explained
PHP implementation of merging two ordered arrays of methods explained
A detailed approach to the implementation of the Joseph ring problem in PHP