PHPAnalysis practical tutorial on Chinese word segmentation. PHPAnalysis is a widely used Chinese word segmentation class. it adopts reverse matching mode word segmentation, so it is more compatible with coding. the variables and common functions are described as follows: 1. important PHPAnalysis is a widely used Chinese word segmentation class. it adopts reverse matching mode word segmentation, so it is more compatible with coding. the variables and common functions are described as follows:
1. important Member variables
$ ResultType = 1 indicates the data type of the word segmentation result generated. (1 indicates all. 2 indicates the dictionary vocabulary and a single Chinese-Japanese simplified Chinese character and English character. 3 indicates the dictionary vocabulary and English word)
This variable is usually set using SetResultType ($ rstype.
$ NotSplitLen = 5: minimum sentence length
$ ToLower = false convert all English words to lowercase letters
$ DifferMax = false use the maximum split mode to cancel the dual word.
$ UnitWord = true try to merge a word (that is, new word recognition)
$ DifferFreq = false use popular keyword priority mode
II. list of major member functions
1. public function _ construct ($ source_charset = 'utf-8', $ target_charset = 'utf-8', $ load_all = true, $ source = '')
Function description: constructor
Parameter list: (www.jbxue.com)
$ Source_charset source string encoding
$ Target_charset directory string encoding
$ Load_all whether to fully load the dictionary (this parameter has been voided)
$ Source string
If the input and output are UTF-8, you can set the text to be operated through the SetSource method instead of using any parameters for initialization.
2. public function SetSource ($ source, $ source_charset = 'utf-8', $ target_charset = 'utf-8 ')
Function description: sets the source string.
Parameter list:
$ Source string
$ Source_charset source string encoding
$ Target_charset directory string encoding
Return value: bool
3. public function StartAnalysis ($ optimize = true)
Function description: start word segmentation.
Parameter list:
$ Optimize: whether to optimize the result after word segmentation
Return value: void
A basic word splitting process:
//////////////////////////////////////
$ Pa = new PhpAnalysis ();
$ Pa-> SetSource ('string to be segmented ');
// Set the word segmentation attribute
$ Pa-> resultType = 2;
$ Pa-> differMax = true;
$ Pa-> StartAnalysis ();
// Obtain the desired result
$ Pa-> GetFinallyIndex ();
////////////////////////////////////////
4. public function SetResultType ($ rstype)
Function description: sets the type of the returned result.
The operation is actually performed on the member variable $ resultType.
The parameter $ rstype value is:
1 is full, 2 is the dictionary vocabulary and a single Chinese-Japanese simplified character and English, 3 is the dictionary vocabulary and English
Return value: void
5. public function GetFinallyKeywords ($ num = 10)
Function description: gets the maximum number of specified entries (usually used to extract document keywords)
Parameter list:
$ Num = 10 returns the number of entries
Return value: list of keywords separated ","
6. public function GetFinallyResult ($ spword = '')
Function description: obtains the final word splitting result.
Parameter list:
$ Separator between spword entries
Return value: string
7. public function GetSimpleResult ()
Function description: returns the rough score.
Returned value: array
(Script school www.jbxue.com)
8. public function GetSimpleResultAll ()
Function description: obtain the rough score result containing the attribute information.
Attribute (1 Chinese word, 2 ANSI word (including fullwidth), 3 ANSI punctuation (including fullwidth), 4 numbers (including fullwidth), 5 Chinese punctuation or unrecognized characters)
Returned value: array
9. public function GetFinallyIndex ()
Function description: obtains the hash index array.
Returned value: array ('word' => count,...) sorted by frequency
10. public function MakeDict ($ source_file, $ target_file = '')
Function description: compiles the dictionary of a text file into a dictionary.
Parameter list:
$ Source_file: source text file
$ Target_file target File (if not specified, it is the current dictionary)
Return value: void
11. public function ExportDict ($ targetfile)
Function description: exports all entries of the current dictionary as text files.
Parameter list:
$ Targetfile target file
Return value: void
Summary 1. important...