Php html parser Simple HTML Dom instructions

Source: Internet
Author: User
Tags regular expression tag name


1. Start using

First Download and decompress the file, and then include the simple_html_dom.php file into the script file to be compiled. Load the html to be processed. Three modes of html loading are supported: "load from url, load from a string and from a file 』.

The code is as follows: Copy code

<? Php
Require_once ('simple _ html_dom.php ');
// Load from url
$ Html = file_get_html ('http: // ');
// Load from string
$ Html = str_get_html ('// Load from a file
$ Html = file_get_html('example.htm ');
To load an online file from a string, you must first download the file from the network. It is better to use cURL. You need to open the php extension php_curl in the php configuration file.

$ Url = 'http: // ';
$ Ci = curl_init ();
Curl_setopt ($ ci, CURLOPT_URL, $ url );
Curl_setopt ($ ci, CURLOPT_SSL_VERIFYPEER, false );
Curl_setopt ($ ci, CURLOPT_SSL_VERIFYHOST, false );
Curl_setopt ($ ci, CURLOPT_RETURNTRANSFER, 1 );
$ Result = curl_exec ($ ch );

2. Search for html elements
Use the find function to search and return an array containing objects. Common searches are as follows.

The code is as follows: Copy code
// Search for hyperlink elements
$ Alink = $ html-> find ('A ');
// Query the n-th join element
$ Alink = $ html-> find ('A', 5 );
// Find the div whose id is main
$ MainDiv = $ html-> find ('div [id = main] ');
// Find all div with id defined
$ IdDiv = $ html-> find ('div [id] ');
// Search for all elements with ids defined
$ IdAll = $ html-> find ('[id]');
// Search for elements whose style class is info
$ ClassInfo = $ html-> find ('. Info ');
// Supports searching nested child elements
$ Ret = $ html-> find ('Ul li ');
// Search for multiple html elements
$ Ret = $ html-> find ('a, img, P ');

3. Miscellaneous
You can use built-in functions to locate elements, return the parent element parent, return the child element array children, return the first child element first_child, and return the last child element last_child, returns the prev_sibling of the first adjacent element, and the next_sibling of the last adjacent element.

A simple regular expression is provided to filter attribute selectors, similar to the format of [attribute.

Each object has four basic attributes:
Tag-returned html tag name
Innertext-return innerHTML
Outertext-return outerHTML
Plaintext-return the text in the HTML tag

Returned element property value

// Returns the href value of $ alink.
$ Link = $ alink-> href;
You can add, modify, or delete an element by setting its attribute values.

The code is as follows: Copy code

// Delete a url connection
$ Alink-> href = null;
// Element modification
$ Ret-> outertext = '<div class = "nav">'. $ ret-> outertext. '</div> ';
$ Ret-> outertext = '';
$ Ret-> outertext = $ ret-> outertext. '<div> other </div> ';
$ Ret-> outertext = '<div> Welcome </div>'. $ ret-> outertext;

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.