discuz! Online Chinese word breaker is based on the API return word segmentation results. In the project, we only need a function to easily do word segmentation, keyword extraction.
The following is a function written according to the discuz! online word breaker API, and the test is functioning correctly:
The code code is as follows:
/** DZ Online chinese word breaker * @param $title string for the title of the word breaker * @param the contents of the word breaker $content String * @param the data encoding returned by the string API $encode The array of keywords obtained by urn array*/ functionDz_segment ($title= ",$content= ",$encode= ' Utf-8 '){ if($title= = "'){ return false; } $title=Rawurlencode(Strip_tags($title)); $content=Strip_tags($content); if(strlen($content) {>2400) {//Online Word Segmentation service has a length limit $content= Mb_substr ($content, 0, 800,$encode); } $content=Rawurlencode($content); $url= ' http://keyword.discuz.com/related_kw.html?title= '.$title.‘ &content= '.$content.‘ &ics= '.$encode.‘ &ocs= '.$encode; $xml _array=simplexml_load_file($url);//reads the data from the XML into the array object $result=$xml _array->keyword->result; $data=Array(); foreach($result->item as $key=$value) { Array_push($data, (string)$value-kw); } if(Count($data) > 0){ return $data; }Else{ return false; } }
Word breaker example, accessed via URL:
The code code is as follows:
Http://keyword.discuz.com/related_kw.html?title= High-grade history review of high-quality tutorials &content=&ics=utf-8&ocs=utf-8
The XML data returned:
<?XML version= "1.0" encoding= "Utf-8"?><Total_response> <Svalid>36000</Svalid> <keyword> <Info> <Count>1</Count> <errno>0</errno> <Nextuptime>1291287160</Nextuptime> <Keep>0</Keep> </Info> <result> <Item> <kw><! [Cdata[History]]></kw> </Item> </result> </keyword></Total_response>
Reference: http://www.jb51.net/article/47952.htm
discuz! Online Chinese word segmentation service