Previously, for work reasons, I wrote a General website data collection tool to collect different website content by configuring XML.
During this period of time, I feel it is necessary to reconstruct it and study it by the way.
Character Processing rule Reconstruction
Character Processing is a core content in the collection. If we extract a string of HTML strings into the fields we need. Let's take a look at the previous processing method:
View Code
String temp2; temp2 = GetStr (str, MyConfig. url, lev); // character truncation rule temp2 = ReplaceStr (temp2, MyConfig. urlGL, lev); // character Filtering Rule temp2 = SetCodeing (temp2, MyConfig. urlBM, lev); // Character URL encoding rule temp2 = Myreplace (temp2, MyConfig. urlGvContent, lev); // character replacement rule
Fault analysis:
1. The client calls too many times. If there are four rules, four rules need to be called.
2. scaling is not flexible. If a new collection requirement is encountered and the existing rule does not meet the requirement, a new rule needs to be added, the rule does not comply with the "open and closed principle"
Let's start restructuring:
1. extract common rule Interfaces
Text processing Rule interface
/// <Summary> /// Text Processing Rule interface /// </summary> public interface ItextRule {/// <summary> /// Character Processing /// </ summary> /// <param name = "sourceStr"> string to be processed </param> /// <param name = "key"> Configure keywords </param> /// <param name = "lev"> current level </param> /// <returns> </returns> string TextPro (string sourceStr, string key, int column );}
2. Create an abstract class of the Rule class and write some common methods.
Character Processing rule base class
/// <Summary> /// basic class of character processing rules /// </summary> public abstract class TextRuleBase {private string myKey = string. empty; public TextRuleBase (string _ key) {myKey = _ key ;} /// <summary> /// obtain the value of the configuration file /// </summary> /// <param name = "key"> </param> // <param name = "lev"> </param> // <returns> </returns> protected string [] GetValue (string key, int lev) {string str = string. empty; string temp = string. empty; string tempKey = key + myKey + lev; bool Istrue = true; while (Istrue) // indicates the cyclic read configuration, which is null. {temp = SiteConfig. configByKey (tempKey); if (temp = "") {Istrue = false;} else {str + = temp + "|"; tempKey + = lev;}} return str. split (new char [] {'|'}, StringSplitOptions. removeEmptyEntries );} /// <summary> /// enforce subclass implementation for specific rule processing /// </summary> /// <param name = "sourceStr"> </param> /// <param name = "Content"> </param> // <returns> </returns> protected abstract string TextPro (string sourceStr, string [] Contents );}
3. Create a character rule class, create a four-character rule class according to the above logic, inherit the interface and abstract class
Character truncation rules
/// <Summary> // basic character truncation rule // </summary> public class TextIntercept: TextRuleBase, ItextRule {public TextIntercept (): base ("") {}# region ItextRule Public Member /// <summary> /// specific implementation of the character truncation rule (multiple times) /// </summary> /// <param name = "sourceStr"> </param> /// <param name = "key"> </param> /// <param name = "lev"> </param> // <returns> </returns> public string TextPro (string sourceStr, string key, int lev) {return Te XtPro (sourceStr, GetValue (key, lev)) ;}# endregion # enforce private methods inside region /// <summary> // specific implementation of character truncation rules (multiple times) /// </summary> /// <param name = "sourceStr"> </param> /// <param name = "Content"> </param> /// <returns> </returns> protected override string TextPro (string sourceStr, string [] Contents) {string relText = sourceStr; foreach (string value in Contents) {if (value! = "") {RelText = Function. getStr (relText, value, MyConfig. key) ;}} Console. writeLine ("character truncation rule Result:" + relText); Console. writeLine (""); return relText;} # endregion}
Only one example is created here. For details, see CODE.
4. Create a high-level interface for the client to call directly, and encapsulate various rules internally (according to the configuration)
High-level interface for character processing rules
/// <Summary> /// high-level interface of the Character Processing rule /// </summary> public class TextRuleAll: ItextRule {private Dictionary <string, IList <ItextRule> ruleList = new Dictionary <string, IList <ItextRule> (); # region ItextRule member public string TextPro (string sourceStr, string key, int eV) {string dicKey = key + lev; string returnStr = string. empty; if (! RuleList. containsKey (dicKey) {IList <ItextRule> list = new List <ItextRule> (); # region builds a keyword rule list foreach (string vale in MyConfig. allTextRules () {string [] temp = vale. split ('. '); string xmlKey = temp [temp. length-1]; if (xmlKey = "TextIntercept") // The write dead character truncation rule is the basic rule xmlKey = ""; if (SiteConfig. configByKey (key + xmlKey + eV )! = "") // The XML file has this configuration keyword {list. add (ItextRule) Assembly. load ("Demo1 "). createInstance (vale) ;}# endregion ruleList. add (dicKey, list);} IList <ItextRule> mylist = ruleList [dicKey]; if (mylist! = Null & mylist. count> 0) // cyclically execute various rule processing {returnStr = sourceStr; foreach (ItextRule irule in mylist) returnStr = irule. textPro (returnStr, key, lev);} return returnStr ;}# end
5. Configuration File
XML configuration
<MyConfig> <! -- Character Processing extended rule list --> <! -- TextIntercept is the truncation rule --> <! -- StaticReplace: static replacement rule --> <AllTextRules> Collect. TextRule. TextIntercept, Collect. TextRule. StaticReplace, Collect. TextRule. TextUrlEncode, Collect. TextRule. TextFilter </condition> <! -- TextIntercept screenshot rule configuration --> <Name1> the email address is [content] </Name1> <! -- Staticreplacewes static replacement configuration --> <NameStaticReplace1 >#,</NameStaticReplace1> <NameStaticReplace11> fuwentao, fwt </NameStaticReplace11> <! -- Chinese URL encoding rule configuration --> <NameTextUrlEncode1> city = [content] </NameTextUrlEncode1> <! -- Filter rule configuration --> <NameTextFilter1>. com, [content] http </NameTextFilter1> </MyConfig>
After restructuring, let's look at the client call:
String testStr = "I'm fuwentao, and my email address is fwt1314111 # 163.com, Website: http://www.mywaysoft.net/cityw.shanghai"; TextRuleAll cmd = new TextRuleAll (); string rel = cmd. textPro (testStr, "Name", 1); // result
Just click cmd. TextPro! Is it much simpler than before.
In addition, this is highly flexible. If you want to add new processing rules, you only need to create a rule class and then configure it in the configuration file.
Download DEMO