Text classification algorithm C # (ii) based on Naive Bayes Classifier)

Source: Internet
Author: User

CodeThere are too many cards during editing, so the whole process is complete (2)

The classifier code is pasted.ProgramAs follows (the example still uses the original text ):

Code
1 Using System;
2 Using System. Collections. Generic;
3 Using System. text;
4
5 Using Aspxon. Search. fenlei;
6
7 Namespace Aspxon. Search. consoletest
8 {
9 Class Program
10 {
11 Static   Void Main ( String [] ARGs)
12 {
13 String S =   " Beijing time (local time in Europe), the first round of the Champions League 1/8 knockout competition in the Gerland Stadium, the focus of the battle, France's leader Lyon vs. the Spanish leader Barcelona, runio scored with a free kick in 7th minutes. etoo and benzema hit each other's door frame one time, and Henry leaned over and reached the door in 67th minutes to flatten the score for Barcelona, in the end, Lyon and Barcelona won a 1-1 draw at the gallands Stadium, and the next round in two weeks will be transferred to nocamp. ";
14 String Result = Chinesespliter. Split (S, " | " );
15 Console. writeline (result );
16 // Trainingdatamanager TDM = new trainingdatamanager ();
17 // Console. writeline (TDM. getfilespath ("c000020") [0]);
18
19 String [] Res = Result. Split ( ' | ' );
20 For ( Int K =   0 ; K < Res. length; k ++ )
21 {
22Console. writeline (RES [k]);
23}
24
25 Bayesclassifier BCF =   New Bayesclassifier ();
26 List < Classifyresult > CRS = BCF. classify (s );
27 Console. writeline (CRS. Count. tostring ());
28
29 For ( Int I =   0 ; I < CRS. Count; I ++ )
30 {
31Console. writeline (CRS [I]. Classification+ ":" +CRS [I]. Probability. tostring ());
32}
33
34
35 // String defaultdir = "d :\\ sogouc. mini.20061127 \ sogouc. Mini \ sample \ c000007 ";
36
37 // String [] filespath = system. Io. Directory. getfiles (defaultdir );
38
39 // Console. writeline (filespath. length. tostring ());
40
41 Console. readkey ();
42 }
43 }
44 }
45

 

The result is as follows:

 

Sample Data also uses the mini version of sogou lab, which contains 10 categories,

Before a colon is a classification code, and after a colon is a probability result.

The classification encoding and classification name correspond to the following:

C000007 car
C000008 Finance
C000010 it
C000013 healthy
C000014 sports
C000016 Tourism
C000020 Education
C000022 recruitment
C000023 Culture
C000024 military

 

Therefore, the test data belongs to the sports category.

In the original word divider, the disabled words are also filtered. Because the ICTCLAS word divider has built-in disabled word filtering, I have adjusted the relevant code in the original code.

The ICTCLAS word divider filters pause words (see ICTCLAS Chinese word segmentation for Lucene. net interface code (to implement analyzer) in (1) of this Article )):

Code
Result= NewStopfilter (result, chinese_english_stop_words );

 

Its principle is also dictionary-based filtering. For the definition of dictionary directories, see ICTCLAS Chinese word segmentation for Lucene. net interface code (for analyzer Implementation) in (1 )).

Code
Public StringNoisepath=Environment. currentdirectory+ "\ Data \ stopwords.txt";

 

Open stopwords.txt under the \ data \ folder, as shown in the following figure:

There are some common pause words.

No classification test has been conducted for batch data. The test results and code download will be posted later.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.