A regular expression is a rule that is specifically used to manipulate strings.
1. In the string class there are methods for matching strings, cutting.
Determines whether the string matches the given regular expression: Boolean matches (string regex);
To cut a string according to a given regular expression: string[] Split (string regex);
Replace the string that matches the regular expression with the other string we want: String ReplaceAll (string regex,string replacement)
2. Here are some common uses of regular expressions
(1)
Copy Code code as follows:
String regex= "[1-9][0-9]{4,15}";
[1-9] Indicates that this number can only be selected within 1-9
[0-9] means that this number can be 0-9
{4,15} represents a number in this format before it can be repeated 4-15 times
This regular expression means that the first number should be any one in 1-9, and then it must be followed by one of the numbers in 0-9, and that number must appear at least 4 times, at most 15 times.
Such as:
10175 in line
10 does not conform, because [0-9]{4,15}, should appear at least 4 times, here only appears once
(2)
[A-za-z0-9_] {6} represents a character that happens to occur 6 times A-Z or a-Z or _
+ indicates that at least one occurrence
* indicates that 0 or more times have occurred
? To show one or 0 times
(3) cutting strings according to regular expressions
Copy Code code as follows:
String str= "SJD.KSDJ.SKDJF";
String regex= "\.";
Attention:. In a regular expression is a table-type arbitrary character, a special symbol. We want to use. To cut, we have to convert it to ordinary characters with \ can.
Because \ is also a special symbol, so you need two \ to represent. When we want to use ordinary \, then we should use \\\\ to express.
String[] Ss=str.split (regex); Returns an array of strings: "SJD" "Ksdj" "SKDJF" implements the cutting of an existing string
(4) Replace what we want to replace according to the regular expression
Replaces all occurrences of 5 or more numeric strings in a string with a #
Copy Code code as follows:
String str= "ABCD1334546LASJDFLDSF2343424SDJ";
String regex= "[0-9]{5,}";
String Newstr=str.replaceall (Regex, "#");
(5) Get a string that matches the regular expression rule
Copy Code code as follows:
Pattern P=pattern.compile (String regex);
Matcher m=p.matcher (String str);
while (M.find ())
{
System.out.println (M.group ());
}
3. Web crawler Production
We make a page that can be read out of all the mailboxes in a webpage and be stored in a text file.
Copy Code code as follows:
/*
Web crawler
That is, get a string or content that matches a regular expression from a Web page
Get the mailbox address from the network
*/
Import java.io.*;
Import java.util.regex.*;
Import java.net.*;
Class MailTest
{
public static void Main (string[] args) throws Exception
{
Getmailaddr ();
}
public static void Getmailaddr () throws Exception
{
URL url=new url ("http://bbs.jb51.net/topics/390148495");
URLConnection con=url.openconnection ();
BufferedReader bufin=new BufferedReader (New InputStreamReader (Con.getinputstream ()));
BufferedWriter bufw=new BufferedWriter (New FileWriter ("E://mailaddress.txt"));
String Str=null;
String regex= "[A-za-z0-9_]{6,12}@[a-za-z0-9]+ (\\.[ a-za-z]+) + ";
Pattern P=pattern.compile (regex);
while ((Str=bufin.readline ())!=null)
{
Matcher M=p.matcher (str);
while (M.find ())
{
String Ss=m.group ();
Bufw.write (Ss,0,ss.length ());
Bufw.newline ();
Bufw.flush ();
}
}
}
}