Several examples of deleting html tags using regular expressions in java

Source: Internet
Author: User
Tags html tags regular expression return tag trim

Example 1

For news content or blog articles, if the abstract is displayed, you need to remove the html format tag of the content, find a regular expression, and implement:

The code is as follows: Copy code

/**
* Delete the html format in the input string
     * 
* @ Param input
* @ Param length
* @ Return
*/
Public static String splitAndFilterString (String input ){
If (input = null | input. trim (). equals ("")){
Return "";
        } 
// Remove all html elements,
String str = input. replaceAll ("\\& [a-zA-Z] {};", ""). replaceAll (
"<[^>] *>", ""). ReplaceAll ("[(/>) <]", "");
Return str;
    }

Filter out the regular expressions of all script scripts:
Content. replaceAll ("<script [^>] *?> [\ S \ S] *? <\\/ Script> ","")
Filter out the regular expressions of all styles:
Content. replaceAll ("<[\ s] *? Style [^>] *?> [\ S \ S] *? <[\ S] *? \/[\ S] *? Style [\ s] *?> ","");
Filter all html tags and retain the p and br tags.
Content. replaceAll ("</? (?! Br | /? P) [^>] *> ","");
Filter all html tags and keep the p tag.
Content. replaceAll ("</? (?! /? P) [^>] *> ","");

Example 2

The code is as follows: Copy code

Import java. util. regex. Matcher;
Import java. util. regex. Pattern;
  
Public class HtmlUtil {
Private static final String regEx_script = "<script [^>] *?> [\ S \ S] *? <\\/ Script> "; // defines the regular expression of the script.
Private static final String regEx_style = "<style [^>] *?> [\ S \ S] *? <\\/ Style> "; // defines the regular expression of the style.
Private static final String regEx_html = "<[^>] +>"; // defines the regular expression of the HTML tag.
Private static final String regEx_space = "\ s * | \ t | \ r | \ n"; // define a space and press enter to enter the line break.
      
/**
* @ Param htmlStr
* @ Return
* Deleting Html tags
*/
Public static String delHTMLTag (String htmlStr ){
Pattern p_script = Pattern. compile (regEx_script, Pattern. CASE_INSENSITIVE );
Matcher m_script = p_script.matcher (htmlStr );
HtmlStr = m_script.replaceAll (""); // filter script tags
  
Pattern p_style = Pattern. compile (regEx_style, Pattern. CASE_INSENSITIVE );
Matcher m_style = p_style.matcher (htmlStr );
HtmlStr = m_style.replaceAll (""); // filter style labels
  
Pattern p_html = Pattern. compile (regEx_html, Pattern. CASE_INSENSITIVE );
Matcher m_html = p_html.matcher (htmlStr );
HtmlStr = m_html.replaceAll (""); // filter html tags
  
Pattern p_space = Pattern. compile (regEx_space, Pattern. CASE_INSENSITIVE );
Matcher m_space = p_space.matcher (htmlStr );
HtmlStr = m_space.replaceAll (""); // filter the blank carriage return tag
Return htmlStr. trim (); // return a text string
    }
      
Public static String getTextFromHtml (String htmlStr ){
HtmlStr = delHTMLTag (htmlStr );
HtmlStr = htmlStr. replaceAll ("& nbsp ;","");
HtmlStr = htmlStr. substring (0, htmlStr. indexOf (". ") + 1 );
Return htmlStr;
    }
      
Public static void main (String [] args ){
String str = "<div style = 'text-align: center; '> cleaning up the" Four Winds "<br/> <span style = 'font-size: 14px; '> </span> <span style = 'font-size: 18px; '> The company held a mobilization meeting for the party's Mass Line education practices </span> <br/> </div> ";
System. out. println (getTextFromHtml (str ));
    }
}

Example 3

/

The code is as follows: Copy code

**

* Deleting Html tags

 *

* @ Param inputString

* @ Return

*/

Public static String htmlRemoveTag (String inputString ){

If (inputString = null)

Return null;

String htmlStr = inputString; // String containing html tags

String textStr = "";

Java. util. regex. Pattern p_script;

Java. util. regex. Matcher m_script;

Java. util. regex. Pattern p_style;

Java. util. regex. Matcher m_style;

Java. util. regex. Pattern p_html;

Java. util. regex. Matcher m_html;

Try {

// Define the regular expression {or <script [^>] *?> [\ S \ S] *? <\/Script>

String regEx_script = "<[\ s] *? Script [^>] *?> [\ S \ S] *? <[\ S] *? \/[\ S] *? Script [\ s] *?> ";

// Define the regular expression {or <style [^>] *?> [\ S \ S] *? <\/Style>

String regEx_style = "<[\ s] *? Style [^>] *?> [\ S \ S] *? <[\ S] *? \/[\ S] *? Style [\ s] *?> ";

String regEx_html = "<[^>] +>"; // defines the regular expression of the HTML tag.

P_script = Pattern. compile (regEx_script, Pattern. CASE_INSENSITIVE );

M_script = p_script.matcher (htmlStr );

HtmlStr = m_script.replaceAll (""); // filter script tags

P_style = Pattern. compile (regEx_style, Pattern. CASE_INSENSITIVE );

M_style = p_style.matcher (htmlStr );

HtmlStr = m_style.replaceAll (""); // filter style labels

P_html = Pattern. compile (regEx_html, Pattern. CASE_INSENSITIVE );

M_html = p_html.matcher (htmlStr );

HtmlStr = m_html.replaceAll (""); // filter html tags

TextStr = htmlStr;

} Catch (Exception e ){

E. printStackTrace ();

 }

Return textStr; // return a text string

 }

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.