Use Office to perform Chinese Word Segmentation

Source: Internet
Author: User

Words' support for Chinese Word Segmentation is quite good. The most intuitive feeling is that when you edit a Doc document, word constantly detects spelling errors (including English and Chinese Words). In addition, you may also notice that when you double-click a Word in the document, the words that contain this word are automatically selected. For example, if you double-click "medium" in "I am a Chinese, the word "Chinese" will be selected (double-click "country" or "people" will have the same effect ). Word in the evidence table supports Chinese Word segmentation, and the Word segmentation effect is quite good based on my experience in office2003.

It is not convenient to call Word Segmentation in an application. So I want to make a simple experiment and compile a small program. Naturally, I think of "macro" and "VBA" in word ". First click "Recording macro", and then press Ctrl +. I found that some VB code is generated in the Word Macro editor, and there is a selection object, which should represent the selected area. Further exploration is made and selection is found. words is what I want. It is an array of words obtained after the selected area is segmented. In addition, selection has many attributes and methods. For example, selection. Sentences is an array of all sentences. Next, we will port the scattered VBA code in word to VB6.0. Basic functions of this applet: enter a text segment in Box A. After you click the button, output the word segmentation effect in Box B. All words are separated by spaces. The VB Code is as follows:
 
Option explicit
Dim wdapp as word. Application
Dim doc as word. Document
Private sub commandementclick ()
'On error resume next

Dim segwords as string me. command1.caption = "executing ..."
Me. command1.enabled = false


Segwords = ""
Wdapp. Selection. HomeKey Unit: = wdStory
Wdapp. Selection. TypeText Me. Text1.Text
Wdapp. Selection. WholeStory
Dim I As Integer
For I = 1 To wdapp. Selection. Words. Count
Segwords = segwords + wdapp. Selection. Words (I) + ""
DoEvents
Next I

Wdapp. Selection. Delete Unit: = wdCharacter, Count: = 1 Me. Command1.Caption = "Start test"
Me. Command1.Enabled = True
Me. Text2.Text = segwords
End Sub
Private Sub Form_Load ()
Set wdapp = New Word. Application
Set doc = wdapp. Documents. Add
Wdapp. ActiveDocument. SaveAs "c ://~ Ftemp.doc"
End SubPrivate Sub Form_Terminate ()
Doc. Close
Set doc = Nothing
Wdapp. Quit savechanges: = False
Set wdapp = Nothing
End Sub

The program running effect is as follows:

The word splitting effect does not seem to be poor, but when I use a long article for testing, the speed is very slow, and the ICTCLAS word splitting is not in the same magnitude. At first, I thought it was caused by the execution of the explanation of VB. The speed of rewriting with Delphi was improved, but the speed was still unsatisfactory. Delphi code: unit Unit1; interface uses Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms, Dialogs, ComCtrls, StdCtrls, Buttons, comobj; type TForm1 = class (TForm) labels: TStatusBar; GroupBox1: TGroupBox; GroupBox2: TGroupBox; BitBtn1: cached; Memo1: TMemo; Memo2: TMemo; procedure outputs (Sender: TObject ); procedure FormDestroy (Sender: TObject); procedure BitBtn1Click (Sender: TObject); private {Private declarations} public {Public declarations} end; var Form1: TForm1; wdapp: Variant; doc: Variant; implementation {$ R *. dfm} procedure TForm1.FormCreate (Sender: TObject); begin wdapp: = createoleobject ('word. application '); wdapp. visible: = false; doc: = wdapp. documents. add (); wdapp. activeDocument. saveAs ('C ://~ Ftemp.doc '); end; procedure TForm1.FormDestroy (Sender: TObject); begin doc. close; wdapp. quit (savechanges: = False); end; procedure outputs (Sender: TObject); varsegwords: TStringList; wdStory, wdCharacter: OleVariant; I: Integer; begin wdStory: = 6; wdCharacter: = 1; segwords: = TStringList. create; self. bitBtn1.Caption: = 'executing '; self. bitBtn1.Enabled: = false; wdapp. selection. homeKey (wdStory); wdapp. selection. typeText (self. memo1.Lines. text); wdapp. selection. wholeStory; for I: = 1 to wdapp. selection. words. count do begin segwords. append (wdapp. selection. words. item (I); end; wdapp. selection. delete (wdCharacter, 1); self. bitBtn1.Caption: = 'start test'; self. bitBtn1.Enabled: = true; self. memo2.Text: = segwords. text; end.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.