SPAM, Bayesian, and Chinese 4-integrate Bayesian algorithms in CakePHP

Source: Internet
Author: User
This article describes how to integrate one of the open-source implementations called b8 into CakePHP.

The above mentioned several open-source implementations of Bayesian algorithms. This article describes how to integrate one of the open-source implementations called b8 into CakePHP.

Download and install b8
  1. Download the latest version from the b8 site and decompress it to the vendors Directory, for example, vendors/b8/b8.php;
  2. Open vendors/b8/etc/config_b8 in a text editor and change databaseType to mysql;
  3. Open vendors/b8/etc/config_storage in a text editor, modify tableName to the name of the data table for storing keywords, and change createDB to TRUE. Note that after you run b8 for the first time, it will create the above data table, and then you need to change createDB to FALSE again;
  4. Open vendors/b8/lexer/shared_functions.php in a text editor and comment out 38 lines of code (in echoError, otherwise, b8 will directly display the error information in your Cake application. of course, this is useful in program debugging.
Write a wrapper component for b8

To enable your Cake to call b8, you need to write a component. Create a spam_shield.php file in controllers/components/and add the following code:

Class SpamShieldComponent extends Object {

/*** B8 instance */

Var $ b8;

/*** Standard rating *** comments with ratings which are higher than this one will be considered as SPAM */

Var $ standardRating = 0.7;

/*** Text to be classified */

Var $ text;

/*** Rating of the text */

Var $ rating;

/*** Constructor *** @ date 2009-1-20 */

Function startup (& $ controller ){

// Register a CommentModel to get the DBO resource link

$ Comment = ClassRegistry: init ('Comment'); // import b8 and create an instance

App: import ('bad', 'b8/b8 ');

$ This-> b8 = new b8 ($ comment-> getDBOResourceLink (); // set standard rating

$ This-> standardRating = Configure: read ('Lt. bayesRating ')? Configure: read ('Lt. bayesRating '): $ this-> standardRating;

}

 

/*** Set the text to be classified ** @ param $ text String the text to be classified * @ date 2009-1-20 */

Function set ($ text ){

$ This-> text = $ text;

}

 

/*** Get Bayesian rating *** @ date 2009-1-20 */

Function rate (){

// Get Bayes rating and return

$ This-> rating = $ this-> b8-> classify ($ this-> text );

}

 

/*** Validate a message based on the rating, return true if it's NOT a SPAM *** @ date 2009-1-20 */

Function validate (){

Return $ this-> rate () <$ this-> standardRating;

}

 

/*** Learn a SPAM or a HAM ** @ date 2009-1-20 */

Function learn ($ mode ){

$ This-> b8-> learn ($ this-> text, $ mode );

}

 

/*** Unlearn a SPAM or a HAM *** @ date 2009-1-20 */

Function unlearn ($ mode ){

$ This-> b8-> unlearn ($ this-> text, $ mode );

}

}

Notes:

  1. $ StandardRating is a critical point. If the Bayesian probability is higher than this value, this message is considered as spam; otherwise, it is ham. I set it to 0.7. you can modify it as needed;
  2. Configure: read ('Lt. bayesRating ') is to dynamically obtain the above critical point value from the system running configuration. this is my practice. you may not be able to use it. you can modify or even not modify it as needed;
  3. Comment refers to the Comment model;
  4. Because b8 needs to obtain a database handle to operate data tables, I wrote $ this-> b8 = new b8 ($ comment-> getDBOResourceLink () in startup, the getDBOResourceLink () used will be mentioned immediately.
Input database handle for b8

Add the following code to models/comment. php:

/*** Get the resource link of MySQL connection */public function getDBOResourceLink () {return $ this-> getDataSource ()-> connection ;}

Now, after all the preparations are completed, we can use Bayesian algorithms to classify messages.

Use b8 classification message

In controllers/comments_controller.php, first load SpamShieldComponent:

Var $ components = array ('samples ');

Then, in the add () method, perform the following operations:

// Set data for Bayesian validation

$ This-> SpamShield-> set ($ this-> data ['comment'] ['body']); // validate the Comment with Bayesian

If (! $ This-> SpamShield-> validate () {// set the status

$ This-> data ['comment'] ['status'] = 'spam'; // save

$ This-> Comment-> save ($ this-> data); // learn it $ this-> SpamShield-> learn ("spam"); // render

$ This-> renderView ('underated ');

Return;

}

// It's a normal post

$ This-> data ['comment'] ['status'] = 'published'; // save for publish

$ This-> Comment-> save ($ this-> data); // learn it

$ This-> SpamShield-> learn ("ham ");

In this way, b8 will automatically classify and learn when the message arrives, and you are basically insulated from spam!

Note: After the first running, do not forget to change the createDB mentioned just now to FALSE.

Http://dingyu.me/blog/spam-bayesian-chinese-4

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.