Phpsplit is a Chinese word thesaurus based on PHP development.
PHP word breaker residing in Unicode encoding dictionary
Only applicable to PHP5, necessary function iconv
This program is used RMM inverse matching algorithm for word segmentation, thesaurus needs to be specially compiled, this class provides a makedict () method
Simple operation Flow: SetSource, Startanalysis, GetResult
Use special format to encode the main dictionary without loading dictionaries to memory operations
Use
Composer Install
Require __dir__. ' /vendor/autoload.php '; $split = new Split (); Var_dump ($split->simple ("Hello Phpsplit"); $this->asserttrue (True);
Array (3) { [0] = = string (0) "" [1] = = string (6) "Hello" [2] = = string (8) "Phpsplit"}
Word Segmentation result suffix description
noun n, time term T, place word s, locality F, numeral M, quantifier Q, distinguishing word B, pronoun r, verb v, adjective A, state word z, adverb D, preposition p, conjunctions C, auxiliary u, modal word y, interjection e, quasi-sound word o, idiom I, Chinese idiom l, abbreviation J, anterior component H, posterior component K, morpheme G, Non-morpheme Word x, punctuation W
Co-workers added the following 3 categories of tags * proper noun classification mark, namely person name NR, place name NS, group organ unit names NT, other special terms NZ; * Morpheme's sub-class mark, namely the noun morpheme ng, the dynamic morpheme VG, describes the morpheme AG, the morpheme TG, the sub-morpheme DG, etc. The name verb vn (verb with noun characteristics), name-shape word an (adjective with noun characteristics), VD (verb with adverb), sub-type AD (adjective with adverb characteristic)
Total of about 40 or so.
Project home:http://www.open-open.com/lib/view/home/1448200861473