This article will introduce how to use SimpleHTMLDom parser. This article will introduce how to use the Simple HTML Dom parser.
Script ec (2); script
1. Start Using
First download and decompress the file, and then include the simple_html_dom.php file into the script file to be compiled. Load the html to be processed. Three Modes of html loading are supported: "load from url, load from a string and from a file 』.
The Code is as follows: |
|
Require_once ('simple _ html_dom.php '); // Load from url $ Html = file_get_html ('HTTP: // www.111cn.net '); // Load from string $ Html = str_get_html ('Hello World!'); // Load from a file $ Html = file_get_html('example.htm '); To load an online file from a string, you must first download the file from the network. It is better to use cURL. You need to open the php extension php_curl in the php configuration file. $ Url = 'HTTP: // www.111cn.net '; $ Ci = curl_init (); Curl_setopt ($ ci, CURLOPT_URL, $ url ); Curl_setopt ($ ci, CURLOPT_SSL_VERIFYPEER, false ); Curl_setopt ($ ci, CURLOPT_SSL_VERIFYHOST, false ); Curl_setopt ($ ci, CURLOPT_RETURNTRANSFER, 1 ); $ Result = curl_exec ($ ch ); |
2. Search for html elements
Use the find function to search and return an array containing objects. Common searches are as follows.
The Code is as follows: |
|
// Search for hyperlink Elements $ Alink = $ html-> find ('A '); // Query the n-th join Element $ Alink = $ html-> find ('A', 5 ); // Find the p with the id of main $ MainDiv = $ html-> find ('P [id = main] '); // Find all the p defined by id $ IdDiv = $ html-> find ('P [id] '); // Search for all elements with IDs defined $ IdAll = $ html-> find ('[id]'); // Search for elements whose style class is info $ ClassInfo = $ html-> find ('. info '); // Supports searching nested child elements $ Ret = $ html-> find ('ul li '); // Search for Multiple html elements $ Ret = $ html-> find ('a, img, p '); //....
|
3. Miscellaneous
You can use built-in functions to locate elements, return the parent element parent, return the child element array children, return the first child element first_child, and return the last child element last_child, returns the prev_sibling of the first adjacent element, and the next_sibling of the last adjacent element.
A simple regular expression is provided to filter attribute selectors, similar to the format of [attribute.
Each object has four basic attributes:
Tag-returned html tag Name
Innertext-return innerHTML
Outertext-return outerHTML
Plaintext-return the text in the HTML Tag
Returned element property value
// Returns the href value of $ alink.
$ Link = $ alink-> href;
You can add, modify, or delete an element by setting its attribute values.
The Code is as follows: |
|
// Delete a url Connection $ Alink-> href = null; // Element Modification $ Ret-> outertext =' '. $ Ret-> outertext .' '; $ Ret-> outertext = ''; $ Ret-> outertext = $ ret-> outertext .'Other '; $ Ret-> outertext ='Welcome '. $ Ret-> outertext; -EOF- |