After fetching data from a Web page (such as a description), if it is displayed as it is, it may be disrupted by its inclusion of an HTML tag that is not closed, or it may use a more "obscure" HTML tag to disrupt the booking format. If you delete all of the HTML tags inside, you may have difficulty reading (such as A, IMG tags), preferably by deleting a part and retaining part of it.
In regular expressions, it is very easy to understand that it is quite understandable to include certain strings, but it is really a mystery how to judge that there are no strings (strings, not characters, some, not some).
< (?! ((/?s?li) | (/?s?ul) | (/?s?a) | (/?s?img) | (/?S?BR) | (/?s?span) | (/?s?b))) [^>]+>
This is to determine that the HTML tag does not contain li/ul/a/img/br/span/b, in terms of the above requirements, is to delete the HTML tags listed here, which I have been groping for a long time to get out.
(?! EXP) match the position that follows not exp
/?s? I first tried to write it to the front <, but the test failed.
Here is a simple function, to keep the tag string up, generate a regular expression, and then delete the unwanted tag ...
private static string removespecifyhtml (String ctx) {
String[] Holdtags = {"A", "IMG", "BR", "strong", "B", "span"};//to keep the tag
< (?! ((/?s?li) | (/?s?ul) | (/?s?a) | (/?s?img) | (/?S?BR) | (/?s?span) | (/?s?b))) [^>]+>
String regstr = String. Format (@ < (?!) (/?s? {0})) [^>]+> ", String. Join (@) | (/?s? ", Holdtags));
Regex reg = new Regex (Regstr, regexoptions.compiled | Regexoptions.multiline | Regexoptions.ignorecase);
Return Reg. Replace (CTX, "");
}
----------------------------
Correction:
The above positive, if retained Li, the actual operation will find link also to retain, keep a will addr also to retain, the solution is to add B assertion.
< (?! ((/?s?lib) | (/?s?ul) | (/?S?AB) | (/?S?IMGB) | (/?S?BRB) | (/?S?SPANB) | (/?S?BB))) [^>]+>
private static string removespecifyhtml (String ctx) {
String[] Holdtags = {"A", "IMG", "BR", "strong", "B", "span", "li"};//reserved tag
< (?! ((/?s?lib) | (/?S?ULB) | (/?S?AB) | (/?S?IMGB) | (/?S?BRB) | (/?S?SPANB) | (/?S?BB))) [^>]+>
String regstr = String. Format (@ < (?!) (/?s? {0})) [^>]+> ", String. Join (@ "B") | (/?s? ", Holdtags));
Regex reg = new Regex (Regstr, regexoptions.compiled | Regexoptions.multiline | Regexoptions.ignorecase);
Return Reg. Replace (CTX, "");
}