PHPSimpleHTMLDOM parser getting started

Source: Internet
Author: User
It has always been a problem to parse the html document tree using php. SimpleHTMLDOMparser helped us solve this problem well. You can use this php class to parse html documents and perform operations on the html elements (PHP5 or a later version ). The parser not only helps us verify html documents, but also resolves html documents that do not comply with W3C standards.

It has always been a problem to parse the html document tree using php. Simple html dom parser helps us solve this problem well. You can use this php class to parse html documents and perform operations on the html elements (PHP5 or a later version ). The parser not only helps us verify html documents, but also resolves html documents that do not comply with W3C standards.

It has always been a problem to parse the html document tree using php.Simple html dom parserIt helped us solve this problem well. You can use this php class to parse html documents and perform operations on the html elements (PHP5 + and later versions ).

The parser not only helps us verify html documents, but also resolves html documents that do not comply with W3C standards. It uses element selectors similar to jQuery to locate and locate elements by id, class, and tag. It also provides the function of adding, deleting, and modifying document trees. Of course, such a powerful html Dom parser is not perfect. You need to be very careful about memory consumption during use. But don't worry. In this article, I will introduce how to avoid excessive memory consumption.

Start to use

After uploading a class file, you can call this class in three ways:

Load html documents from URLs

Load html documents from strings

Load html documents from files

?

1

2

3

4

5

6

7

8

9

10

11

12

13

// Create a Dom instance

$html = new simple_html_dom();

// Load from the url

$html->load_file('http://www.cnphp.info/php-simple-html-dom-parser-intro.html');

// Load from a string

$html->load('Loading html documents from strings');

// Load from a file

$html->load_file('path/file/test.html');

?>

If you load html documents from strings, you must first download them from the network. We recommend that you use cURL to capture html documents and load them into the DOM.

Search for html elements

You can use the find function to find elements in html documents. The returned result is an array containing objects. We use functions in the html dom parsing class to access these objects. The following is an example:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

// Search for hyperlink elements in html documents

$a = $html->find('a');

// Search for the N hyperlink in the document. If N is not found, an empty array is returned.

$a = $html->find('a', 0);

// Find the p element whose id is main

$main = $html->find('p[id=main]',0);

// Search for all p elements containing the id attribute

$ps = $html->find('p[id]');

// Search for all elements with the id attribute

$ps = $html->find('[id]');

?>

You can also use a selector similar to jQuery to find the positioning element:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

// Find the element whose id is '# iner'

$ret = $html->find('#container');

// Find all elements of class = foo

$ret = $html->find('.foo');

$ret = $html->find('a, img');

// It can also be used like this

$ret = $html->find('a[title], img[title]');

?>

The parser supports searching child elements.

?

1

2

3

4

5

6

7

8

9

// Find all li items in the ul list

$ret = $html->find('ul li');

// Find the li item of the specified class = selected in the ul list

$ret = $html->find('ul li.selected');

?>

If you think this is difficult to use, you can use built-in functions to easily locate the parent element, child element, and adjacent element of an element.

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

// Returns the parent element.

$e->parent;

// Returns the array of child elements.

$e->children;

// Return the specified child element through the index number

$e->children(0);

// Returns the first resource speed

$e->first_child ();

// Returns the last child element.

$e->last _child ();

// Returns the last adjacent element.

$e->prev_sibling ();

// Returns the next adjacent element.

$e->next_sibling ();

?>

Element attribute operations

Use a simple regular expression to operate the attribute selector.

[Attribute]-select an html element containing an attribute

[Attribute = value]-select all html elements of the specified value attribute

[Attribute! = Value]-select all html elements with unspecified value Attributes

[Attribute ^ = value]-select all html elements starting with the specified value

[Attribute $ = value] select all html elements of the end attribute of the specified value

[Attribute * = value]-select all html elements that contain the specified value attribute

Call element attributes in the parser

Element attributes in the DOM are also objects:

?

1

2

3

4

// In this example, assign the $ a anchor value to the $ link variable.

$link = $a->href;

?>

Or:

?

1

2

3

$link = $html->find('a',0)->href;

?>

Each object has four basic object attributes:

Innertext-return innerHTML

Outertext-return outerHTML

Edit element in parser

The usage of editing element attributes is similar to calling them:

?

1

2

3

4

5

6

7

8

9

10

11

12

// Assign a new value to the $ a anchor Link

$a->href ='http://www.cnphp.info';

// Delete the anchor

$a->href = null;

// Check whether there is an anchor Link

if(isset($a->href)) {

// Code

}

?>

The parser does not have a special method to add or delete elements, but you can use it as a work und:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

// Encapsulate Elements

$e->outertext ='

' . $e->outertext .'

';

// Delete an element

$e->outertext ='';

// Add Element

$e->outertext =$e->outertext . '

foo

';

// Insert element

$e->outertext ='

foo

' . $e->outertext;

?>

Saving the modified html DOM document is also very simple:

?

1

2

3

4

5

6

$doc = $html;

// Output

echo $doc;

?>

How to avoid excessive memory consumption by the parser

In the beginning of this article, I mentioned the problem that the Simple HTML DOM parser consumes too much memory. If the php script occupies too much memory, the website will stop responding and other serious problems. The solution is also very simple. After the parser loads the html document and uses it, remember to clear this object. Of course, do not take the problem too seriously. If only two or three documents are loaded, there are no different regions to clean up or not clean up. When you load 5 or more documents, it is absolutely your responsibility to clean up the memory when you use up one. ^_^

?

1

2

3

$html->clear();

?>

45

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.