SPAM, Bayesian, and 4-integration of Bayesian algorithms in CakePHP the previous section mentioned several open-source implementations of Bayesian algorithms. This article describes how to integrate one of the open-source implementations named b8 into CakePHP. Download b8 and download the latest version from the b8 website. decompress the package to the vendors directory, where the file is located, such as vendors/b8/b8.php; use text SPAM, Bayesian, and Chinese 4-integrate Bayesian algorithm in CakePHP
The above mentioned several open-source implementations of Bayesian algorithms. This article describes how to integrate one of the open-source implementations called b8 into CakePHP.
Download and install b8
- Download the latest version from the b8 site and decompress it to the vendors Directory, for example, vendors/b8/b8.php;
- Open vendors/b8/etc/config_b8 in a text editor and change databaseType to mysql;
- Open vendors/b8/etc/config_storage in a text editor, modify tableName to the name of the data table for storing keywords, and change createDB to TRUE. Note that after you run b8 for the first time, it will create the above data table, and then you need to change createDB to FALSE again;
- Open vendors/b8/lexer/shared_functions.php in a text editor and comment out 38 lines of code (in echoError, otherwise, b8 will directly display the error information in your Cake application. of course, this is useful in program debugging.
Write a wrapper component for b8
To enable your Cake to call b8, you need to write a component. Create a spam_shield.php file in controllers/components/and add the following code:
class SpamShieldComponent extends Object {
??? /** * b8 instance?
*/
??? var $b8;
??? /** * standard rating * * comments with ratings which are higher than this one will be considered as SPAM?
*/
??? var $standardRating = 0.7;
??? /** * text to be classified
*/
??? var $text;
??? /** * rating of the text */
??? var $rating;
??? /** * Constructor * * @date 2009-1-20 */
??? function startup(&$controller) {
??????? //register a CommentModel to get the DBO resource link
??????? $comment = ClassRegistry::init('Comment'); //import b8 and create an instance????
?????? ?App::import('Vendor', 'b8/b8');
?????? ?$this->b8 = new b8($comment->getDBOResourceLink()); //set standard rating???
?????? ?$this->standardRating = Configure::read('LT.bayesRating') ? Configure::read('LT.bayesRating') : $this->standardRating;
??? }
?
??? /** * Set the text to be classified * * @param $text String the text to be classified * @date 2009-1-20 */
??? function set($text) {
??????? $this->text = $text;
??? }
?
??? /** * Get Bayesian rating * * @date 2009-1-20 */
??? function rate() {
?????? ?//get Bayes rating and return return
?????? ?$this->rating = $this->b8->classify($this->text);
??? }
?
??? /** * Validate a message based on the rating, return true if it's NOT a SPAM * * @date 2009-1-20 */
??? function validate() {
??????? return $this->rate() < $this->standardRating;
??? }
?
??? /** * Learn a SPAM or a HAM * * @date 2009-1-20 */
??? function learn($mode) {
?????? ?$this->b8->learn($this->text, $mode);
??? }
?
??? /** * Unlearn a SPAM or a HAM * * @date 2009-1-20 */
??? function unlearn($mode) {
?????? ?$this->b8->unlearn($this->text, $mode);
??? }
}
Notes:
- $ StandardRating is a critical point. If the Bayesian probability is higher than this value, this message is considered as spam; otherwise, it is ham. I set it to 0.7. you can modify it as needed;
- Configure: read ('Lt. bayesRating ') is to dynamically obtain the above critical point value from the system running configuration. this is my practice. you may not be able to use it. you can modify or even not modify it as needed;
- Comment refers to the Comment model;
- Because b8 needs to obtain a database handle to operate data tables, I wrote $ this-> b8 = new b8 ($ comment-> getDBOResourceLink () in startup, the getDBOResourceLink () used will be mentioned immediately.
Input database handle for b8
Add the following code to models/comment. php:
/** * get the resource link of MySQL connection */ public function getDBOResourceLink() { return $this->getDataSource()->connection; }
Now, after all the preparations are completed, we can use Bayesian algorithms to classify messages.
Use b8 classification message
In controllers/comments_controller.php, first load SpamShieldComponent:
var $components = array('SpamShield');
Then, in the add () method, perform the following operations:
//set data for Bayesian validation
$this->SpamShield->set($this->data['Comment']['body']); //validate the comment with Bayesian
if(!$this->SpamShield->validate()) { //set the status
??? $this->data['Comment']['status'] = 'spam'; //save
??? $this->Comment->save($this->data); //learn it $this->SpamShield->learn("spam"); //render
??? $this->renderView('unmodera
ted');
??? return;
}
//it's a normal post
$this->data['Comment']['status'] = 'published'; //save for publish
$this->Comment->save($this->data); //learn it
$this->SpamShield->learn("ham");
In this way, b8 will automatically classify and learn when the message arrives, and you are basically insulated from spam!
Note: After the first running, do not forget to change the createDB mentioned just now to FALSE.
Http://dingyu.me/blog/spam-bayesian-chinese-4