A problem analysis of dynamically adding code to modify HTML page in Web browser

Source: Internet
Author: User
Tags xpath

Introduction: In the world of the web, the browser loads the page and presents the final content to the user, but does the final presentation of the HTML code match the code stored on the server?

1. Why does XPath not work correctly?

XPath is the way and method for quickly locating page elements, and then one day, discovering that the XPath intercepted in the page is not working in the code match, what is the problem?

Page Address: http://www.66ip.cn/

Intercept the IP address in the matching page, based on the developer tool that comes with the Chrome browser, we intercept one of the matching XPath paths:

*[@id = "main"]/div/div[1]/table/tbody/tr[2]
Next, you need to use the XPath in your code to match the contents of this path:
From lxml import etreefrom randomurl = "http://www.66ip.cn" pattern = '//*[@id = "main"]/div/div[1]/table/tbody/tr ' r = Requests.get (URL) tree = etree. HTML (r.text) nodes = Tree.xpath (pattern) Print ("Node length:%s"% len (nodes))
The expectation should be able to match the actual number of TR elements, but the actual situation is that the length of the nodes is 0, and what is the problem?

2. Problem analysis

The first question is whether there is a problem with XPath, carefully checking the extraction process, no problem.

Is the content of the page read inconsistent with what is actually seen in the browser, and does the site have other processing? According to this idea, the content of the response was extracted:

Extracting the Text property in the Response object, r.text the contents, and looking at the contents of it, finds that there is something expected to match, but why doesn't it match successfully?

There are only 2 possible answers: either XPath has a problem or the response file is inconsistent with the content in the browser?

The XPath extraction is extracted by the browser itself, so there should be no problem, it can only be seen in the browser and the actual content in the code to respond to inconsistent, then compare the content it, compared with the discovery:

Page fragments in the browser:

<table width= "100%" border= "2px" cellspacing= "0px" bordercolor= "#6699ff" >    <tbody><tr><td >ip</td><td> Port number </td><td> Agent location </td><td> proxy type </td><td> authentication time </ Td></tr> .....    </tbody>  </table>
The code snippet actually gets in the code is as follows:
<table width= "100%" border= "2px" cellspacing= "0px" bordercolor= "#6699ff" >    <tr><td>ip</td ><td> Port number </td><td> Agent location </td><td> proxy type </td><td> authentication time </td></tr > ......</table>
The difference is that <tbody> in the actual code is not exist, but in the browser display process, the browser added itself, resulting in the extracted XPath path and the actual code does not match, and ultimately unable to obtain the correct matching results.

3. Problem solving

After you modify the path of the XPath and remove the tbody, the code is re-executed to the correct content.

4. Extension analysis

To simplify the problem, make another simple HTML page that looks at the displayed code on the browser:

The HMTL code in the editor:

Display effect in Browser:

   

You can see in the browser, add the tbody of the child tags, packaging tr content; Tfoot's content went up, and Tfoot's content was empty. Isn't that weird?

The above code has been tested in firefox/chrome and is the same, and should be the processing standard that the browser executes.

5. Summarize

In order to better display the HMTL code from the server, make the corresponding adjustment, in the Code analysis of the page, you need to note that some of the tags are actually the browser when parsing the execution code, the self-added, not the actual page content.

A problem analysis of dynamically adding code to modify HTML page in Web browser

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.