Training Word Segmentation Model

Source: Internet
Author: User

1. Documentation of the training
Segmentor_train.txt

File contents, separated by spaces

Chinese import and Export bank and Bank of China to strengthen cooperation Xinhua News agency, Beijing, December 26 (reporter Zhou Genliang) today, the three major indices are small open, followed by the Shanghai and Shenzhen Index in the weight plate group pulled up slightly, but the gem has continued to fall 。 Afternoon weight diving led to the Shanghai and Shenzhen Index also appeared a wave of killing and falling, gem performance is very different, not a wave pulled up, today once plunged 3%. From the face of the plate, today's weight plate still dominate, banks, brokers, real estate rose sharply, but the insurance sector today performance is poor, insurance stocks rose dull. Today, Guo Xin Securities (002736), Western Securities (002673) both trading, Haitong Securities (600837), Guo Yuan Securities (000728), Citic Securities (600030) also have a decent performance. Bank shares, only has been China Citic Bank (601998) trading. Shanghai Composite Index   Change


2. Run the class Edu.stanford.nlp.ie.crf.CRFClassifier

Eclipse Run Settings


Parameters of the training model
-prop Chinese_models/edu/stanford/nlp/models/segmenter/chinese/ctb.prop
-serdictionary chinese_models/edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz
-sighancorporadict chinese_models/edu/stanford/nlp/models/segmenter/chinese/
-trainfile Segmentor_train.txt
-serializeto chinese_models/edu/stanford/nlp/models/segmenter/chinese/newmodel.ser.gz

Parameter description
Prop:ctb.prop, CTB says Chinese Penn Treebank, Pennsylvania Chinese thesaurus
Serdictionary:??
Sighancorporadict:??
Trainfile: Your own training to anticipate documents
Serializeto: Model Storage Location
Requires more than 1g of memory: xmx1g


3. Generated model files in the following directory
Chinese_models/edu/stanford/nlp/models/segmenter/chinese/newmodel.ser.gz

4. Run the word breaker test case
Edu.stanford.nlp.lxf.segmentor/segdemo.java

Training Word Segmentation Model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.