PHPAnalysis practical course for Chinese word segmentation-PHP Tutorial

Source: Internet
Author: User
PHPAnalysis practical tutorial on Chinese word segmentation. PHPAnalysis is a widely used Chinese word segmentation class. it adopts reverse matching mode word segmentation, so it is more compatible with coding. the variables and common functions are described as follows: 1. important PHPAnalysis is a widely used Chinese word segmentation class. it adopts reverse matching mode word segmentation, so it is more compatible with coding. the variables and common functions are described as follows:

1. important Member variables
$ ResultType = 1 indicates the data type of the word segmentation result generated. (1 indicates all. 2 indicates the dictionary vocabulary and a single Chinese-Japanese simplified Chinese character and English character. 3 indicates the dictionary vocabulary and English word)
This variable is usually set using SetResultType ($ rstype.
$ NotSplitLen = 5: minimum sentence length
$ ToLower = false convert all English words to lowercase letters
$ DifferMax = false use the maximum split mode to cancel the dual word.
$ UnitWord = true try to merge a word (that is, new word recognition)
$ DifferFreq = false use popular keyword priority mode
II. list of major member functions
1. public function _ construct ($ source_charset = 'utf-8', $ target_charset = 'utf-8', $ load_all = true, $ source = '')
Function description: constructor
Parameter list: (www.jbxue.com)
$ Source_charset source string encoding
$ Target_charset directory string encoding
$ Load_all whether to fully load the dictionary (this parameter has been voided)
$ Source string
If the input and output are UTF-8, you can set the text to be operated through the SetSource method instead of using any parameters for initialization.
2. public function SetSource ($ source, $ source_charset = 'utf-8', $ target_charset = 'utf-8 ')
Function description: sets the source string.
Parameter list:
$ Source string
$ Source_charset source string encoding
$ Target_charset directory string encoding
Return value: bool
3. public function StartAnalysis ($ optimize = true)
Function description: start word segmentation.
Parameter list:
$ Optimize: whether to optimize the result after word segmentation
Return value: void
A basic word splitting process:
//////////////////////////////////////
$ Pa = new PhpAnalysis ();
$ Pa-> SetSource ('string to be segmented ');
// Set the word segmentation attribute
$ Pa-> resultType = 2;
$ Pa-> differMax = true;
$ Pa-> StartAnalysis ();
// Obtain the desired result
$ Pa-> GetFinallyIndex ();
////////////////////////////////////////
4. public function SetResultType ($ rstype)
Function description: sets the type of the returned result.
The operation is actually performed on the member variable $ resultType.
The parameter $ rstype value is:
1 is full, 2 is the dictionary vocabulary and a single Chinese-Japanese simplified character and English, 3 is the dictionary vocabulary and English
Return value: void
5. public function GetFinallyKeywords ($ num = 10)
Function description: gets the maximum number of specified entries (usually used to extract document keywords)
Parameter list:
$ Num = 10 returns the number of entries
Return value: list of keywords separated ","
6. public function GetFinallyResult ($ spword = '')
Function description: obtains the final word splitting result.
Parameter list:
$ Separator between spword entries
Return value: string
7. public function GetSimpleResult ()
Function description: returns the rough score.
Returned value: array

(Script school www.jbxue.com)
8. public function GetSimpleResultAll ()
Function description: obtain the rough score result containing the attribute information.
Attribute (1 Chinese word, 2 ANSI word (including fullwidth), 3 ANSI punctuation (including fullwidth), 4 numbers (including fullwidth), 5 Chinese punctuation or unrecognized characters)
Returned value: array
9. public function GetFinallyIndex ()
Function description: obtains the hash index array.
Returned value: array ('word' => count,...) sorted by frequency
10. public function MakeDict ($ source_file, $ target_file = '')
Function description: compiles the dictionary of a text file into a dictionary.
Parameter list:
$ Source_file: source text file
$ Target_file target File (if not specified, it is the current dictionary)
Return value: void
11. public function ExportDict ($ targetfile)
Function description: exports all entries of the current dictionary as text files.
Parameter list:
$ Targetfile target file
Return value: void

Summary 1. important...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.