[Share] automatic review of class libraries and auxiliary tools for sensitive content

Source: Internet
Author: User

Through this class library, you can automatically analyze the input content and give a score. The program can determine whether the content has reached a certain sensitivity level based on this score, so as to perform corresponding automatic processing.

If such a database is combined with a manual review, the results will be very good: the suspicious content of the server review is automatically submitted to the manual review wait list for manual review, this ensures both efficiency and accuracy, and is an ideal human-machine collaboration method.

For the operating principle of such libraries, see the article "Improved swearing review solution": http://www.cnblogs.com/SkyD/archive/2009/03/16/updateTextVali.html

 

Preparing audit rules

This type of Library does not allow you to review anything. You need to provide a review rule for it to work. For this reason, I will provide a generator to generate a rule configuration file:

For more information about how rules are applied, see the improved swearing review solution.

The scoring method is described as follows:

The score attribute of the Rule indicates the full score of the rule, that is, the score when the words in the text exactly match the rule, if it can be matched but not accurate, partial score of the corresponding proportion is obtained based on the accuracy.

The exact matching degree is calculated based on the ratio between the exact length attribute and the actual length of the matching content. For example, "White [\ s] {0, 3 }? The precise length of this rule is 2, that is, the length of the original string "idiot" after the tolerable interference symbol is removed, if you encounter the phrase "a little white is staring at her", it will also match the phrase "White is crazy", but its length is 4, by dividing the Precise Length Value 2 of the rule by the actually matched string length 4, the accuracy of this matching is obtained: 50%. If the score of this rule is 6, in this case, only 6*0.5 = 3 points can be obtained.

This is the calculation method of the precise score. In addition, the Class Library also outputs the score without accuracy correction to apply to different situations.

Note that the input rules for such libraries must be in simplified Chinese, but both simplified and Traditional Chinese will be matched during matching.

 

Call Method

Before calling such a library, you should first generate one or more rule configuration files through the Rule Configuration generator and put them in a directory.

Then assign a value to its static attribute, indicating the storage path of the above rule configuration file, and then execute its static "load audit rules" method:

Content Review.Rule file directory = Path.Combine (Application.Startuppath,"Content review rules\\");

Content Review.Load audit rules();


In this way, the initialization is completed. Then, you only need to create the "content review" object, input the string to be analyzed, and perform the "Review" method to complete the review:


Content ReviewC= New Content Review(Textbox2.Text );

C.Review();


After execution, you can access the "accumulative score", "accumulative precise score", "highest score", and "output details" attributes of the object to obtain the corresponding review results.

In addition, the "Review" method also has a heavy load, allowing you to discard the output of detailed matching information, and only output other scores and statistical information to improve the review speed. Generally, this method should be used during server review, the detailed information is output during manual review for the operator's reference.

 

Application Testing

I provide a test tool for simple rule testing.

One of our translationsArticleFor example, the review test, the original address is: http://www.yeeyan.com/articles/view/24994/7075

Test results:

The "capture content coverage" displayed on the title bar is the statistical information provided after the review to indicate the proportion of sensitive content in the full text. During automatic subsequent processing, this attribute should also be used as an important criterion. For example, some contents are short and contain sensitive words, but the total score is not too high because of the small amount of content, but the coverage rate is very high, in this case, this content will not be missed Based on the coverage rate.

"[Religion]" and "[politics]" in the lower-left corner of the window indicate the classification of the rule, that is, the name of the configuration file to which the rule belongs.

 

Conclusion

This type of library is very useful for reviewing user input content on the website. It is far better than solutions such as replacement of Common keywords, forbidden submission of keywords, and manual review, this balances security and efficiency.

If you have any suggestions for improving functions or performance, you are welcome to discuss them together.

In addition, I will release a visual manual review Auxiliary Control for ASP. NET.

 

Download

Class LibrarySource codeAnd supportProgram: Http://www.uushare.com/user/icesee/file/1932449

(Re-upload, corrected an error)

Rule Configuration package: http://www.uushare.com/user/icesee/file/1925838

The rule configuration package contains the following rule configuration example:

Download the XPS version of this article: http://www.uushare.com/user/icesee/file/1925859

 

 

Entertainment Period

These two days, entertainment stickers are very popular. It seems that everyone is too involved in their work. Sometimes they should relax. Here I will attach two suitable puzzle games for programmers, they are rare in the past few years:

 

Plants vs. Zombies

This game is not only fun but also funny, pea shooter, cabbage pitcher, cherry bomb, piranet, flame tree post, ice-town watermelon pitcher, coffee beans, corn cannon, nut wall, potato mine ...... It's good to hear the name. There are a total of 49 different plants. After customs clearance, many new methods will be started, which are both fun and challenging.

The theme is also very nice:

Game: http://www.uushare.com/user/icesee/file/1913105

 

Sticky world

It is said that it was a game developed by two otaku men (worship). This game is an alternative, but it was very successful, with a delicate screen and fresh style. It prompts witty and strange, and sound effects are very good, the main way to play is to use a combination of different features to deliver as many sticks as possible to the target pipeline. The biggest challenge is gravity and those thorns.

Game: http://www.uushare.com/user/icesee/file/1907621

 

If the download speed is very slow, you can also download these two games from here. There are a lot of people and the speed is good:

Http://www.verycd.com/topics/2745208/

Http://www.verycd.com/topics/2738323/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.