Remove html tags from regular expressions in Java and java Regular Expressions
Regular Expressions in Java remove html tags for more precise display of content, after entering the content in the editor, the style label is also passed into the background and saved to the database. However, when the abstract is displayed, for example, the first 50 words of the text are displayed as the abstract, in this case, all html tags need to be removed and 50 characters are intercepted. Therefore, the following method is implemented through the Java regular expression. The Code is as follows:
Note: This is a Java regular expression to remove html tags. Private static final String regEx_script = "<script [^>] *?> [\ S \ S] *? <\\/ Script> "; // defines the regular expression private static final String regEx_style =" <style [^>] *?> [\ S \ S] *? <\\/ Style> "; // define the regular expression private static final String regEx_html =" <[^>] +> "; // define the regular expression of the HTML Tag private static final String regEx_space = "\ s * | \ t | \ r | \ n "; // define a space and press enter to enter the line break private static final String regEx_w = "<w [^>] *?> [\ S \ S] *? <\\/W [^>] *?> "; // Define all w tags/*** @ param htmlStr * @ return delete Html tags * @ author LongJin */public static String delHTMLTag (String htmlStr) {Pattern p_w = Pattern. compile (regEx_w, Pattern. CASE_INSENSITIVE); Matcher m_w = p_w.matcher (htmlStr); htmlStr = m_w.replaceAll (""); // filter the script tag Pattern p_script = Pattern. compile (regEx_script, Pattern. CASE_INSENSITIVE); Matcher m_script = p_script.matcher (htmlStr); htmlStr = m_script.replaceAll (""); // filter the script tag Pattern p_style = Pattern. compile (regEx_style, Pattern. CASE_INSENSITIVE); Matcher m_style = p_style.matcher (htmlStr); htmlStr = m_style.replaceAll (""); // filter the style label Pattern p_html = Pattern. compile (regEx_html, Pattern. CASE_INSENSITIVE); Matcher m_html = p_html.matcher (htmlStr); htmlStr = m_html.replaceAll (""); // filter the html Tag Pattern p_space = Pattern. compile (regEx_space, Pattern. CASE_INSENSITIVE); Matcher m_space = p_space.matcher (htmlStr); htmlStr = m_space.replaceAll (""); // filter spaces and press ENTER tags htmlStr = htmlStr. replaceAll ("", ""); // filter return htmlStr. trim (); // returns a text string}
Ps: The method is for reference only. You can learn from each other. If you have any questions, please comment.