If you delete all the HTML tags in it, it may cause reading difficulties (such as a and img tags). It is best to delete a part and keep a part.
In a regular expression, it is easy to understand how to determine whether to include certain strings, but how to determine whether to include certain strings (a string, not a character, or a certain character) it is indeed a confusing thing.Copy codeThe Code is as follows: <(?! ((/? \ S? Li) | (/? \ S? Ul) | (/? \ S? A) | (/? \ S? Img) | (/? \ S? Br) | (/? \ S? Span) | (/? \ S? B) [^>] +>
This regular expression is used to determine that the HTML tag does not contain li/ul/a/img/br/span/B. In terms of the preceding requirements, the HTML Tag listed here should be deleted, this is what I found out after a long time.
(?! Exp) the position behind the matching is not the exp position.
/? \ S? At the beginning, I tried to write it to the front <, but the test failed.
The following is a simple function that concatenates tags to be retained, generates a regular expression, and then deletes unnecessary tags...Copy codeThe Code is as follows: private static string RemoveSpecifyHtml (string ctx ){
String [] holdTags = {"a", "img", "br", "strong", "B", "span"}; // tag to be retained
// <(?! ((/? \ S? Li) | (/? \ S? Ul) | (/? \ S? A) | (/? \ S? Img) | (/? \ S? Br) | (/? \ S? Span) | (/? \ S? B) [^>] +>
String regStr = string. Format (@ "<(?! ((/? \ S? {0}) [^>] +> ", string. Join (@") | (/? \ S? ", HoldTags ));
Regex reg = new Regex (regStr, RegexOptions. Compiled | RegexOptions. Multiline | RegexOptions. IgnoreCase );
Return reg. Replace (ctx ,"");
}
Fixed:
If li is retained for the above regular expression, the link is also retained during actual operation. If a is retained, the addr is also retained. The solution is to add the \ B asserted.Copy codeThe Code is as follows: <(?! ((/? \ S? Li \ B) | (/? \ S? Ul) | (/? \ S? A \ B) | (/? \ S? Img \ B) | (/? \ S? Br \ B) | (/? \ S? Span \ B) | (/? \ S? B \ B) [^>] +>
Private static string RemoveSpecifyHtml (string ctx ){
String [] holdTags = {"a", "img", "br", "strong", "B", "span", "li"}; // reserved tag
// <(?! ((/? \ S? Li \ B) | (/? \ S? Ul \ B) | (/? \ S? A \ B) | (/? \ S? Img \ B) | (/? \ S? Br \ B) | (/? \ S? Span \ B) | (/? \ S? B \ B) [^>] +>
String regStr = string. Format (@ "<(?! ((/? \ S? {0}) [^>] +> ", string. Join (@" \ B) | (/? \ S? ", HoldTags ));
Regex reg = new Regex (regStr, RegexOptions. Compiled | RegexOptions. Multiline | RegexOptions. IgnoreCase );
Return reg. Replace (ctx ,"");
}