Unlike the use of their own server for Word segmentation, discuz! online Chinese word segmentation service is based on the API to return word segmentation results. In the project, we only need a function to facilitate segmentation, keyword extraction.
The following is a function that is written according to the discuz! online participle service API, which is tested to work correctly:
Copy Code code as follows:
/**
* DZ Online Chinese participle
* @param the title of the participle $title string
* @param the contents of the participle $content string
* @param data encoding returned by $encode string API
* Array of keywords obtained by @return array
*/
function dz_segment ($title = ', $content = ', $encode = ' utf-8 ') {
if ($title = = ") {
return false;
}
$title = Rawurlencode (Strip_tags ($title));
$content = Strip_tags ($content);
if (strlen ($content) >2400) {//Online participle service has length limit
$content = mb_substr ($content, 0, $encode);
}
$content = Rawurlencode ($content);
$url = ' http://keyword.discuz.com/related_kw.html?title= '. $title. ' &content= '. $content. ' &ics= '. $encode. ' &ocs= '. $encode;
$xml _array=simplexml_load_file ($url); Reads data from XML into an array object
$result = $xml _array->keyword->result;
$data = Array ();
foreach ($result->item as $key => $value) {
Array_push ($data, (string) $value->kw);
}
if (count ($data) > 0) {
return $data;
}else{
return false;
}
}
participle example, accessed by URL:
Copy Code code as follows:
Http://keyword.discuz.com/related_kw.html?title= High-grade three-course review excellent course &content=&ics=utf-8&ocs=utf-8
The returned XML data:
Copy Code code as follows:
<?xml version= "1.0" encoding= "Utf-8"?>
<total_response>
<svalid>36000</svalid>
<keyword>
<info>
<count>1</count>
<errno>0</errno>
<nextuptime>1291287160</nextuptime>
<keep>0</keep>
</info>
<result>
<item>
<kw><! [Cdata[History]]></kw>
</item>
</result>
</keyword>
</total_response>