1. Start using
First Download and decompress the file, and then include the simple_html_dom.php file into the script file to be compiled. Load the html to be processed. Three modes of html loading are supported: "load from url, load from a string and from a file 』.
The code is as follows: |
Copy code |
<? Php Require_once ('simple _ html_dom.php '); // Load from url $ Html = file_get_html ('http: // www.111cn.net '); // Load from string $ Html = str_get_html ('// Load from a file $ Html = file_get_html('example.htm '); To load an online file from a string, you must first download the file from the network. It is better to use cURL. You need to open the php extension php_curl in the php configuration file. $ Url = 'http: // www.111cn.net '; $ Ci = curl_init (); Curl_setopt ($ ci, CURLOPT_URL, $ url ); Curl_setopt ($ ci, CURLOPT_SSL_VERIFYPEER, false ); Curl_setopt ($ ci, CURLOPT_SSL_VERIFYHOST, false ); Curl_setopt ($ ci, CURLOPT_RETURNTRANSFER, 1 ); $ Result = curl_exec ($ ch ); |
2. Search for html elements
Use the find function to search and return an array containing objects. Common searches are as follows.
The code is as follows: |
Copy code |
// Search for hyperlink elements $ Alink = $ html-> find ('A '); // Query the n-th join element $ Alink = $ html-> find ('A', 5 ); // Find the div whose id is main $ MainDiv = $ html-> find ('div [id = main] '); // Find all div with id defined $ IdDiv = $ html-> find ('div [id] '); // Search for all elements with ids defined $ IdAll = $ html-> find ('[id]'); // Search for elements whose style class is info $ ClassInfo = $ html-> find ('. Info '); // Supports searching nested child elements $ Ret = $ html-> find ('Ul li '); // Search for multiple html elements $ Ret = $ html-> find ('a, img, P '); //....
|
3. Miscellaneous
You can use built-in functions to locate elements, return the parent element parent, return the child element array children, return the first child element first_child, and return the last child element last_child, returns the prev_sibling of the first adjacent element, and the next_sibling of the last adjacent element.
A simple regular expression is provided to filter attribute selectors, similar to the format of [attribute.
Each object has four basic attributes:
Tag-returned html tag name
Innertext-return innerHTML
Outertext-return outerHTML
Plaintext-return the text in the HTML tag
Returned element property value
// Returns the href value of $ alink.
$ Link = $ alink-> href;
You can add, modify, or delete an element by setting its attribute values.
The code is as follows: |
Copy code |
// Delete a url connection $ Alink-> href = null; // Element modification $ Ret-> outertext = '<div class = "nav">'. $ ret-> outertext. '</div> '; $ Ret-> outertext = ''; $ Ret-> outertext = $ ret-> outertext. '<div> other </div> '; $ Ret-> outertext = '<div> Welcome </div>'. $ ret-> outertext; -EOF- |