Regular expression deletes the specified HTML tag

Source: Internet
Author: User
Tags html tags join regular expression

After fetching data from a Web page (such as a description), if it is displayed as it is, it may be disrupted by its inclusion of an HTML tag that is not closed, or it may use a more "obscure" HTML tag to disrupt the booking format. If you delete all of the HTML tags inside, you may have difficulty reading (such as A, IMG tags), preferably by deleting a part and retaining part of it.

In regular expressions, it is very easy to understand that it is quite understandable to include certain strings, but it is really a mystery how to judge that there are no strings (strings, not characters, some, not some).

< (?! ((/?s?li) | (/?s?ul) | (/?s?a) | (/?s?img) | (/?S?BR) | (/?s?span) | (/?s?b))) [^>]+>

This is to determine that the HTML tag does not contain li/ul/a/img/br/span/b, in terms of the above requirements, is to delete the HTML tags listed here, which I have been groping for a long time to get out.

(?! EXP) match the position that follows not exp

/?s? I first tried to write it to the front <, but the test failed.

Here is a simple function, to keep the tag string up, generate a regular expression, and then delete the unwanted tag ...

private static string removespecifyhtml (String ctx) {

String[] Holdtags = {"A", "IMG", "BR", "strong", "B", "span"};//to keep the tag

< (?! ((/?s?li) | (/?s?ul) | (/?s?a) | (/?s?img) | (/?S?BR) | (/?s?span) | (/?s?b))) [^>]+>

String regstr = String. Format (@ < (?!) (/?s? {0})) [^>]+> ", String. Join (@) | (/?s? ", Holdtags));

Regex reg = new Regex (Regstr, regexoptions.compiled | Regexoptions.multiline | Regexoptions.ignorecase);

Return Reg. Replace (CTX, "");

}

----------------------------

Correction:

The above positive, if retained Li, the actual operation will find link also to retain, keep a will addr also to retain, the solution is to add B assertion.

< (?! ((/?s?lib) | (/?s?ul) | (/?S?AB) | (/?S?IMGB) | (/?S?BRB) | (/?S?SPANB) | (/?S?BB))) [^>]+>

private static string removespecifyhtml (String ctx) {

String[] Holdtags = {"A", "IMG", "BR", "strong", "B", "span", "li"};//reserved tag

< (?! ((/?s?lib) | (/?S?ULB) | (/?S?AB) | (/?S?IMGB) | (/?S?BRB) | (/?S?SPANB) | (/?S?BB))) [^>]+>

String regstr = String. Format (@ < (?!) (/?s? {0})) [^>]+> ", String. Join (@ "B") | (/?s? ", Holdtags));

Regex reg = new Regex (Regstr, regexoptions.compiled | Regexoptions.multiline | Regexoptions.ignorecase);

Return Reg. Replace (CTX, "");

}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: