Get the URL of the string in the page we will use regular expressions to match the acquisition, I will give you a summary of several matching to get the link address examples.
1. The application of Find () method in Matcher in regular expression.
2, the use of the ReplaceAll (string regex,string Replacement) method in the String object. This method is used to get the necessary URL and link text in addition to the unnecessary strings.
1 is super simple.
The code is as follows |
Copy Code |
String content = "<a href=" URL ">"; String pattern= "href=" ([^ "]*)" "; Pattern P = pattern.compile (pattern, 2 | Pattern.dotall); Matcher m = p.matcher (content); if (M.find ()) { System.out.println ("Url=" +m.group (1)); } Example 2. Upper |
Face can only get the URL in a header with double "number", we can improve it to get a title URL in any state
The code is as follows |
Copy Code |
Package com.gong.example; Import Java.util.regex.Matcher; Import Java.util.regex.Pattern; public class Simple { public static void Main (string[] args) { String input= "<a style=" "href =" http://www.111cn.net "target=" _blank ">www.zjsyc.com</a>" + "<a href = ' http://www.111cn.net ' target= ' _blank ' >www.163.com</a>" + "<a href=http://www.hzhuti.com target=_blank >www.yahoo.com</a>"; String patternstring = "\s* (? i) href\s*=\s*" ([^ "]*") | ' [^']*'| ([^ ' >\s]+]) "; Href Pattern pattern = Pattern.compile (patternstring, pattern.case_insensitive); Matcher Matcher = pattern.matcher (input); while (Matcher.find ()) { String Link=matcher.group (); System.out.println (link); Link=link.replaceall ("href\s*=\s*" ([' | ')] *)", ""); System.out.println ("--" +link); Link=link.replaceall ("[' |"] "," "); SYSTEM.OUT.PRINTLN ("---" +link); } } } |
3 We can also use it to upgrade to get URL and link text oh.
*
Function Description: Parse string s, extract hyperlink and link text inside s
March 30, 2008
Program Life Blog
*/
The code is as follows |
Copy Code |
Import Java.util.regex.Matcher; Import Java.util.regex.Pattern; public class Regtest { public static void Main (string[] args) {
String s= "<p id=km> <a href=http://down.111cn.net> space </a> | <a"; String s= "</p><p style=height:14px><a href=http://mb.111cn.net> Enterprise Promotion </a> | <a href=http://code.111cn.net> Search </a> | <a href=/home.html> about Baidu </a> | <a href=http://www.111cn.net>about baidu</a></p><p id=b>©2008 Baidu <a href=http ://www.111cn.net> use of Baidu Pre-read </a> <a href=http://www.miibeian.gov.cn target=_blank> Beijing ICP Certificate No. No. 030173 </a > <a href=http://www.hzhuti.com></a></ p></center></body> String regex= "<a.*?/a>"; String regex = "<a.*> (. *) </a>"; Pattern Pt=pattern.compile (regex); Matcher Mt=pt.matcher (s); while (Mt.find ()) { System.out.println (Mt.group ()); System.out.println (); String s2= ">.*?</a>";//title Section String s3= "href=.*?>";
Pattern pt2=pattern.compile (S2); Matcher Mt2=pt2.matcher (Mt.group ()); while (Mt2.find ()) { System.out.println ("title:" +mt2.group (). ReplaceAll (">|</a>"); }
Pattern Pt3=pattern.compile (S3); Matcher Mt3=pt3.matcher (Mt.group ()); while (Mt3.find ()) { System.out.println ("URL:" +mt3.group (). ReplaceAll ("href=|>", "")); } } } } |