Reconstruction of General Collection gadgets-reconstruction of character processing rules

Source: Internet
Author: User

Previously, for work reasons, I wrote a General website data collection tool to collect different website content by configuring XML.

During this period of time, I feel it is necessary to reconstruct it and study it by the way.

 

Character Processing rule Reconstruction

Character Processing is a core content in the collection. If we extract a string of HTML strings into the fields we need. Let's take a look at the previous processing method:

View Code

String temp2; temp2 = GetStr (str, MyConfig. url, lev); // character truncation rule temp2 = ReplaceStr (temp2, MyConfig. urlGL, lev); // character Filtering Rule temp2 = SetCodeing (temp2, MyConfig. urlBM, lev); // Character URL encoding rule temp2 = Myreplace (temp2, MyConfig. urlGvContent, lev); // character replacement rule

Fault analysis:

1. The client calls too many times. If there are four rules, four rules need to be called.

2. scaling is not flexible. If a new collection requirement is encountered and the existing rule does not meet the requirement, a new rule needs to be added, the rule does not comply with the "open and closed principle"

 

Let's start restructuring:

1. extract common rule Interfaces

Text processing Rule interface

/// <Summary> /// Text Processing Rule interface /// </summary> public interface ItextRule {/// <summary> /// Character Processing /// </ summary> /// <param name = "sourceStr"> string to be processed </param> /// <param name = "key"> Configure keywords </param> /// <param name = "lev"> current level </param> /// <returns> </returns> string TextPro (string sourceStr, string key, int column );}

2. Create an abstract class of the Rule class and write some common methods.

Character Processing rule base class

/// <Summary> /// basic class of character processing rules /// </summary> public abstract class TextRuleBase {private string myKey = string. empty; public TextRuleBase (string _ key) {myKey = _ key ;} /// <summary> /// obtain the value of the configuration file /// </summary> /// <param name = "key"> </param> // <param name = "lev"> </param> // <returns> </returns> protected string [] GetValue (string key, int lev) {string str = string. empty; string temp = string. empty; string tempKey = key + myKey + lev; bool Istrue = true; while (Istrue) // indicates the cyclic read configuration, which is null. {temp = SiteConfig. configByKey (tempKey); if (temp = "") {Istrue = false;} else {str + = temp + "|"; tempKey + = lev;}} return str. split (new char [] {'|'}, StringSplitOptions. removeEmptyEntries );} /// <summary> /// enforce subclass implementation for specific rule processing /// </summary> /// <param name = "sourceStr"> </param> /// <param name = "Content"> </param> // <returns> </returns> protected abstract string TextPro (string sourceStr, string [] Contents );}

3. Create a character rule class, create a four-character rule class according to the above logic, inherit the interface and abstract class

Character truncation rules

/// <Summary> // basic character truncation rule // </summary> public class TextIntercept: TextRuleBase, ItextRule {public TextIntercept (): base ("") {}# region ItextRule Public Member /// <summary> /// specific implementation of the character truncation rule (multiple times) /// </summary> /// <param name = "sourceStr"> </param> /// <param name = "key"> </param> /// <param name = "lev"> </param> // <returns> </returns> public string TextPro (string sourceStr, string key, int lev) {return Te XtPro (sourceStr, GetValue (key, lev)) ;}# endregion # enforce private methods inside region /// <summary> // specific implementation of character truncation rules (multiple times) /// </summary> /// <param name = "sourceStr"> </param> /// <param name = "Content"> </param> /// <returns> </returns> protected override string TextPro (string sourceStr, string [] Contents) {string relText = sourceStr; foreach (string value in Contents) {if (value! = "") {RelText = Function. getStr (relText, value, MyConfig. key) ;}} Console. writeLine ("character truncation rule Result:" + relText); Console. writeLine (""); return relText;} # endregion}

Only one example is created here. For details, see CODE.

4. Create a high-level interface for the client to call directly, and encapsulate various rules internally (according to the configuration)

High-level interface for character processing rules

/// <Summary> /// high-level interface of the Character Processing rule /// </summary> public class TextRuleAll: ItextRule {private Dictionary <string, IList <ItextRule> ruleList = new Dictionary <string, IList <ItextRule> (); # region ItextRule member public string TextPro (string sourceStr, string key, int eV) {string dicKey = key + lev; string returnStr = string. empty; if (! RuleList. containsKey (dicKey) {IList <ItextRule> list = new List <ItextRule> (); # region builds a keyword rule list foreach (string vale in MyConfig. allTextRules () {string [] temp = vale. split ('. '); string xmlKey = temp [temp. length-1]; if (xmlKey = "TextIntercept") // The write dead character truncation rule is the basic rule xmlKey = ""; if (SiteConfig. configByKey (key + xmlKey + eV )! = "") // The XML file has this configuration keyword {list. add (ItextRule) Assembly. load ("Demo1 "). createInstance (vale) ;}# endregion ruleList. add (dicKey, list);} IList <ItextRule> mylist = ruleList [dicKey]; if (mylist! = Null & mylist. count> 0) // cyclically execute various rule processing {returnStr = sourceStr; foreach (ItextRule irule in mylist) returnStr = irule. textPro (returnStr, key, lev);} return returnStr ;}# end

5. Configuration File

XML configuration

<MyConfig> <! -- Character Processing extended rule list --> <! -- TextIntercept is the truncation rule --> <! -- StaticReplace: static replacement rule --> <AllTextRules> Collect. TextRule. TextIntercept, Collect. TextRule. StaticReplace, Collect. TextRule. TextUrlEncode, Collect. TextRule. TextFilter </condition> <! -- TextIntercept screenshot rule configuration --> <Name1> the email address is [content] </Name1> <! -- Staticreplacewes static replacement configuration --> <NameStaticReplace1 >#,</NameStaticReplace1> <NameStaticReplace11> fuwentao, fwt </NameStaticReplace11> <! -- Chinese URL encoding rule configuration --> <NameTextUrlEncode1> city = [content] </NameTextUrlEncode1> <! -- Filter rule configuration --> <NameTextFilter1>. com, [content] http </NameTextFilter1> </MyConfig>

After restructuring, let's look at the client call:

String testStr = "I'm fuwentao, and my email address is fwt1314111 # 163.com, Website: http://www.mywaysoft.net/cityw.shanghai"; TextRuleAll cmd = new TextRuleAll (); string rel = cmd. textPro (testStr, "Name", 1); // result

Just click cmd. TextPro! Is it much simpler than before.

In addition, this is highly flexible. If you want to add new processing rules, you only need to create a rule class and then configure it in the configuration file.

 

Download DEMO

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.