: This article describes how to obtain all links on the specified URL page in PHP. For more information about PHP tutorials, see. Form: http://www.uphtm.com/php/253.html
This is actually a common practice for our developers. we used to crawl links from other websites. Today, a friend sorted out a PHP code to retrieve all the link functions on the specified URL page, let's take a look.
The following code obtains all links on the specified URL page, that is, the href attribute of all a tags:
- // Obtain the HTML code of the link
- $ Html = file_get_contents ('http: // www.111cn.net ');
- $ Dom = new DOMDocument ();
- @ $ Dom-> loadHTML ($ html );
- $ Xpath = new DOMXPath ($ dom );
- $ Hrefs = $ xpath-> evaluate ('/html/body // ');
- For ($ I = 0; $ I <$ hrefs-> length; $ I ++ ){
- $ Href = $ hrefs-> item ($ I );
- $ Url = $ href-> getAttribute ('href ');
- Echo $ url .'
';
- }
This code will get the href attribute of all a tags, but the href attribute value is not necessarily a link. we can filter it and only keep the link address starting with http:
- // Obtain the HTML code of the link
- $ Html = file_get_contents ('http: // www.111cn.net ');
- $ Dom = new DOMDocument ();
- @ $ Dom-> loadHTML ($ html );
- $ Xpath = new DOMXPath ($ dom );
- $ Hrefs = $ xpath-> evaluate ('/html/body // ');
- For ($ I = 0; $ I <$ hrefs-> length; $ I ++ ){
- $ Href = $ hrefs-> item ($ I );
- $ Url = $ href-> getAttribute ('href ');
-
- // Keep the link starting with http
- If (substr ($ url, 0, 4) = 'http ')
- Echo $ url .'
';
- }
The fopen () function reads all links in a specified webpage and counts the number of links. this code is applicable to some areas where the webpage content needs to be collected. In this example, Baidu homepage is read as an example, find all links on the Baidu homepage. the code has been tested and is fully available:
-
- If (empty ($ url) $ url = "http://www.baidu.com/"; // URL of the link to be collected
- $ Site = substr ($ url, 0, strpos ($ url, "/", 8 ));
- $ Base = substr ($ url, 0, strrpos ($ url, "/") + 1); // Directory of the file
- $ Fp = fopen ($ url, "r"); // open the url page
- While (! Feof ($ fp) $ contents. = fread ($ fp, 1024 );
- $ Pattern = "| href = ['\"]? ([^ '\ "] +) [' \"] | U ";
- Preg_match_all ($ pattern, $ contents, $ regArr, PREG_SET_ORDER); // use regular expressions to match all href =
- For ($ I = 0; $ I
- If (! Eregi (": //", $ regArr [$ I] [1]) // you can determine whether a relative path exists ://
- If (substr ($ regArr [$ I] [1],) = "/") // whether it is the root directory of the site
- Echo "link". ($ I + 1). ":". $ site. $ regArr [$ I] [1]."
"; // Root directory
- Else
- Echo "link". ($ I + 1). ":". $ base. $ regArr [$ I] [1]."
"; // Current Directory
- Else
- Echo "link". ($ I + 1). ":". $ regArr [$ I] [1]."
"; // Relative path
- }
- Fclose ($ fp );
- ?>
Form: http://www.uphtm.com/php/253.html
The above introduces PHP to get all links in the specified URL page, including the content, hope to be helpful to friends who are interested in the PHP Tutorial.