A problem analysis of dynamically adding code to modify HTML page in Web browser

Last Update:2017-04-21 Source: Internet

Author: User

Tags xpath

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction: In the world of the web, the browser loads the page and presents the final content to the user, but does the final presentation of the HTML code match the code stored on the server?

1. Why does XPath not work correctly?

XPath is the way and method for quickly locating page elements, and then one day, discovering that the XPath intercepted in the page is not working in the code match, what is the problem?

Page Address: http://www.66ip.cn/

Intercept the IP address in the matching page, based on the developer tool that comes with the Chrome browser, we intercept one of the matching XPath paths:

*[@id = "main"]/div/div[1]/table/tbody/tr[2]

Next, you need to use the XPath in your code to match the contents of this path:

From lxml import etreefrom randomurl = "http://www.66ip.cn" pattern = '//*[@id = "main"]/div/div[1]/table/tbody/tr ' r = Requests.get (URL) tree = etree. HTML (r.text) nodes = Tree.xpath (pattern) Print ("Node length:%s"% len (nodes))

The expectation should be able to match the actual number of TR elements, but the actual situation is that the length of the nodes is 0, and what is the problem?

2. Problem analysis

The first question is whether there is a problem with XPath, carefully checking the extraction process, no problem.

Is the content of the page read inconsistent with what is actually seen in the browser, and does the site have other processing? According to this idea, the content of the response was extracted:

Extracting the Text property in the Response object, r.text the contents, and looking at the contents of it, finds that there is something expected to match, but why doesn't it match successfully?

There are only 2 possible answers: either XPath has a problem or the response file is inconsistent with the content in the browser?

The XPath extraction is extracted by the browser itself, so there should be no problem, it can only be seen in the browser and the actual content in the code to respond to inconsistent, then compare the content it, compared with the discovery:

Page fragments in the browser:

<table width= "100%" border= "2px" cellspacing= "0px" bordercolor= "#6699ff" >    <tbody><tr><td >ip</td><td> Port number </td><td> Agent location </td><td> proxy type </td><td> authentication time </ Td></tr> .....    </tbody>  </table>

The code snippet actually gets in the code is as follows:

<table width= "100%" border= "2px" cellspacing= "0px" bordercolor= "#6699ff" >    <tr><td>ip</td ><td> Port number </td><td> Agent location </td><td> proxy type </td><td> authentication time </td></tr > ......</table>

The difference is that <tbody> in the actual code is not exist, but in the browser display process, the browser added itself, resulting in the extracted XPath path and the actual code does not match, and ultimately unable to obtain the correct matching results.

3. Problem solving

After you modify the path of the XPath and remove the tbody, the code is re-executed to the correct content.

4. Extension analysis

To simplify the problem, make another simple HTML page that looks at the displayed code on the browser:

The HMTL code in the editor:

Display effect in Browser:

You can see in the browser, add the tbody of the child tags, packaging tr content; Tfoot's content went up, and Tfoot's content was empty. Isn't that weird?

The above code has been tested in firefox/chrome and is the same, and should be the processing standard that the browser executes.

5. Summarize

In order to better display the HMTL code from the server, make the corresponding adjustment, in the Code analysis of the page, you need to note that some of the tags are actually the browser when parsing the execution code, the self-added, not the actual page content.

A problem analysis of dynamically adding code to modify HTML page in Web browser

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More