CodeThere are too many cards during editing, so the whole process is complete (2)
The classifier code is pasted.ProgramAs follows (the example still uses the original text ):
Code
1 Using System;
2 Using System. Collections. Generic;
3 Using System. text;
4
5 Using Aspxon. Search. fenlei;
6
7 Namespace Aspxon. Search. consoletest
8 {
9 Class Program
10 {
11 Static Void Main ( String [] ARGs)
12 {
13 String S = " Beijing time (local time in Europe), the first round of the Champions League 1/8 knockout competition in the Gerland Stadium, the focus of the battle, France's leader Lyon vs. the Spanish leader Barcelona, runio scored with a free kick in 7th minutes. etoo and benzema hit each other's door frame one time, and Henry leaned over and reached the door in 67th minutes to flatten the score for Barcelona, in the end, Lyon and Barcelona won a 1-1 draw at the gallands Stadium, and the next round in two weeks will be transferred to nocamp. ";
14 String Result = Chinesespliter. Split (S, " | " );
15 Console. writeline (result );
16 // Trainingdatamanager TDM = new trainingdatamanager ();
17 // Console. writeline (TDM. getfilespath ("c000020") [0]);
18
19 String [] Res = Result. Split ( ' | ' );
20 For ( Int K = 0 ; K < Res. length; k ++ )
21 {
22Console. writeline (RES [k]);
23}
24
25 Bayesclassifier BCF = New Bayesclassifier ();
26 List < Classifyresult > CRS = BCF. classify (s );
27 Console. writeline (CRS. Count. tostring ());
28
29 For ( Int I = 0 ; I < CRS. Count; I ++ )
30 {
31Console. writeline (CRS [I]. Classification+ ":" +CRS [I]. Probability. tostring ());
32}
33
34
35 // String defaultdir = "d :\\ sogouc. mini.20061127 \ sogouc. Mini \ sample \ c000007 ";
36
37 // String [] filespath = system. Io. Directory. getfiles (defaultdir );
38
39 // Console. writeline (filespath. length. tostring ());
40
41 Console. readkey ();
42 }
43 }
44 }
45
The result is as follows:
Sample Data also uses the mini version of sogou lab, which contains 10 categories,
Before a colon is a classification code, and after a colon is a probability result.
The classification encoding and classification name correspond to the following:
C000007 car
C000008 Finance
C000010 it
C000013 healthy
C000014 sports
C000016 Tourism
C000020 Education
C000022 recruitment
C000023 Culture
C000024 military
Therefore, the test data belongs to the sports category.
In the original word divider, the disabled words are also filtered. Because the ICTCLAS word divider has built-in disabled word filtering, I have adjusted the relevant code in the original code.
The ICTCLAS word divider filters pause words (see ICTCLAS Chinese word segmentation for Lucene. net interface code (to implement analyzer) in (1) of this Article )):
Code
Result= NewStopfilter (result, chinese_english_stop_words );
Its principle is also dictionary-based filtering. For the definition of dictionary directories, see ICTCLAS Chinese word segmentation for Lucene. net interface code (for analyzer Implementation) in (1 )).
Code
Public StringNoisepath=Environment. currentdirectory+ "\ Data \ stopwords.txt";
Open stopwords.txt under the \ data \ folder, as shown in the following figure:
There are some common pause words.
No classification test has been conducted for batch data. The test results and code download will be posted later.