teach you to do keyword matching project (search engine)----20th day, teach you to do the 20th day _php Tutorial

Source: Internet
Author: User

teach you to do keyword matching project (search engine)----20th day, teach you to do 20th day


Cameo: The deceptive form artifact of the cock wire, the database that little thing

Object-oriented sublimation: object-oriented cognition----The new knowledge of the freshmen, the object-oriented thoughts of the outer----(1), the object-oriented cognition---how to find out the class

Load balancing: Load Balancing----concept understanding, load Balancing----implementation configuration (Nginx)

Spit Groove: Someone feedback such a message, said the article more to the end of the more ugly understand, keep pace, also some people say small handsome ability how to temper so fast, is not I am more stupid. Also some directly read the text, do not look at the code, the code is too difficult to understand.

In fact, I have been thinking about this problem these days, so there is no way to carry out a number of object-oriented courses, I hope to those who can not keep up with some help. In fact, the reader does not feedback, I had to follow the small handsome I think to carry out the course.

20th Day

Starting point: teach you to do keyword matching project (search engine)----first day

Review: Hand-taught you to do keyword matching project (search engine)----19th Day

Words small handsome in order to solve the word segmentation algorithm wrote the first edition, he showed to the eldest brother, was asked to rewrite.

There are several reasons for this:

1. How to test and test data?

2. Splitter do too many things?

3. Dress XXL skirt Dress What about repeating phrases like this?

The little handsome took these questions and began to refactor.

First he found this, the Chinese, English and English judgments, and the length of the calculation, he wrote this into the class:

 PhpclassUTF8 {/** * detect if UTF8 * @param $char * @return bool*/     Public Static functionIs ($char){        return(Preg_match("/^([".CHR(228). " -".CHR(233). "] {1} [".CHR(128). " -".CHR(191). "] {1} [".CHR(128). " -".CHR(191). "] {1}) {1}/",$char) ||Preg_match("/([".CHR(228). " -".CHR(233). "] {1} [".CHR(128). " -".CHR(191). "] {1} [".CHR(128). " -".CHR(191). "] {1}) {1}$/",$char) ||Preg_match("/([".CHR(228). " -".CHR(233). "] {1} [".CHR(128). " -".CHR(191). "] {1} [".CHR(128). " -".CHR(191). "] {1}) {2,}/",$char)); }    /** * calculate the number of UTF8 words * @param $char * @return Float|int*/     Public Static functionLength$char) {        if(Self::is ($char))            return Ceil(strlen($char)/3); return strlen($char); }    /** * Detect if the phrase * @param $word * @return bool*/     Public Static functionIsphrase ($word){        if(Self::length ($word) <=1)            return false; return true; }}

Small handsome also consider that the source of the dictionary may come from a number of places, such as the test data I gave, so that is not able to solve the boss said can not test the problem, small handsome the source of the dictionary into a class, class as follows:

 Phpclassdbsegmentation { Public $cid; /** * Get the phrase data of the class word * @return array*/     Public functiontransferdictionary () {$ret=Array(); $sql= "Select Word from category_linklist where cid= '$this->cid ' "; $words= Db::makearray ($sql); foreach($words  as $strWords){            $words=Explode(",",$strWords); foreach($words  as $word){                if(Utf8::isphrase ($word)){                    $ret[] =$word; }            }        }        return $ret; }} classtestsegmentation { Public functiontransferdictionary () {$words=Array(            "Dress, jumpsuit", "XXL,XXL, enlarged, enlarged yards", "X yards, medium", "coats, dresses, coats, coats, tops", "Women, ladies, girls, females"        ); $ret=Array(); foreach($words  as $strWords){            $words=Explode(",",$strWords); foreach($words  as $word){                if(Utf8::isphrase ($word)){                    $ret[] =$word; }            }        }        return $ret; }}

Then splitter focus on the participle, the code is as follows:

classSplitter { Public $keyword; Private $dictionary=Array();  Public functionSetdictionary ($dictionary=Array()){        Usort($dictionary,function($a,$b){            return(Utf8::length ($a) >utf8::length ($b))? 1:-1;        }); $this->dictionary =$dictionary; }     Public functiongetdictionary () {return $this-dictionary; }    /** * Divide keywords into phrases or words * @return keywordentity $keywordEntity*/     Public function Split(){        $remainKeyword=$this-keyword; $keywordEntity=NewKeywordentity ($this-keyword); foreach($this->dictionary as $phrase){            $matchTimes=Preg_match_all("/$phrase/",$remainKeyword,$matches); if($matchTimes>0){                $keywordEntity->addelement ($phrase,$matchTimes); $remainKeyword=Str_replace($phrase,"::",$remainKeyword); }        }        $remainKeywords=Explode("::",$remainKeyword); foreach($remainKeywords  as $splitWord){            if(!Empty($splitWord)){                $keywordEntity->addelement ($splitWord); }        }        return $keywordEntity; }}classkeywordentity { Public $keyword;  Public $elements=Array();  Public function__construct ($keyword){        $this->keyword =$keyword; }     Public functionAddElement ($word,$times=1){        if(isset($this->elements[$word])){            $this->elements[$word]->times + =$times; }Else            $this->elements[] =NewKeywordelement ($word,$times); }    /** * @desc calculate UTF8 string Weights * @param string $word * @return Float*/     Public functionCalculateweight ($word)    {        $element=$this->elements[$word]; return ROUND(strlen($element->word) *$element->times/strlen($this->keyword), 3); }}classkeywordelement { Public $word;  Public $times;  Public function__construct ($word,$times){        $this->word =$word; $this->times =$times; }}

He threw the weights to a class to deal with.

After writing the small handsome, he also wrote the test example:

 
  PHP$segmentationnew  testsegmentation (); $splitter New Splitter (); $splitter->setdictionary ($segmentation, Transferdictionary ()); $splitter->keyword = "Dress xxl skirt Dress"; $keywordEntity $splitter,split(); Var_dump ($keywordEntity);

This way, even if you change the algorithm, it can calmly face.

Little handsome understand this, when you think the class do too much, you can consider the principle of single responsibility.

single principle of responsibility: A class, only one causes it to change. There should be only one duty. Each responsibility is an axis of change, and if a class has more than one responsibility, these responsibilities are coupled together. This can lead to fragile designs. When a duty changes, other responsibilities may be affected. In addition, multiple responsibilities are coupled together, which can affect reusability. For example, to achieve separation of logic and interface. "From Baidu Encyclopedia"

When the boss mentioned whether there are other word segmentation algorithm, we can not use, small handsome very happy, because now its code is how beautiful.

Little handsome how to play third-party participle extension, please continue to follow tell: hand-taught you to do keyword matching project (search engine)----the 21st day




http://www.bkjia.com/PHPjc/873919.html www.bkjia.com true http://www.bkjia.com/PHPjc/873919.html techarticle Hand -taught you to do keyword matching project (search engine)----20th day, to teach you to do the 20th day cameo: Deceptive form artifact of the Cock Silk, database that point of object-oriented sublimation: face ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.