Javascript-[PHP] [. NET] [JS] [AJAX] questions about crawling web page source code

Source: Internet
Author: User
For example, you can only view the source code when the web page is loaded for the first time using a browser. However, many web pages now use AJAX technology. In fact, it will be asynchronously loaded multiple times, and the final result will be much worse than the original source code. Now I want to get the final page... example:
You can only view the source code when the web page is loaded for the first time.
However, many web pages now use AJAX technology. In fact, it will be asynchronously loaded multiple times, and the final result will be much worse than the original source code.
Now I want to obtain the source code when the web page is finally loaded.
In other words, I want to obtain the actual source code of the webpage each time AJAX obtains the value, and then modifies the source code through JS.
In theory, there is such a real source code, right.
It can also be obtained using Chrome's review elements.

But now I want to use PHP,. NET, or JS ......
I don't know if you have any good methods ......
PC and WEB ...... Are there similar functions, frameworks, class libraries, methods ......
All kinds of ideas ......

Http://www.moonlord.cn

Reply content:

Example:
You can only view the source code when the web page is loaded for the first time.
However, many web pages now use AJAX technology. In fact, it will be asynchronously loaded multiple times, and the final result will be much worse than the original source code.
Now I want to obtain the source code when the web page is finally loaded.
In other words, I want to obtain the actual source code of the webpage each time AJAX obtains the value, and then modifies the source code through JS.
In theory, there is such a real source code, right.
It can also be obtained using Chrome's review elements.

But now I want to use PHP,. NET, or JS ......
I don't know if you have any good methods ......
PC and WEB ...... Are there similar functions, frameworks, class libraries, methods ......
All kinds of ideas ......

Http://www.moonlord.cn

My previous practices are:
1. Use firebug to capture packets and check the api address of the ajax request.
2. Check the parameters of the api request. If there are no parameters, go to step 1.
3. If the api parameters are on the webpage.
4. Go to the page to find the api parameters. (All parameters must be regular. If there is no regular rule, it is impossible for him to make the webpage dynamic .)
5. Then, use the required parameters of the api to obtain the address of the api. (If you are lucky, json data is directly returned, so you don't have to deal with html)

PhantomJS, CasperJS

Net WebBrowser

You can only view the source code when the web page is loaded for the first time.
Who said this?

It is asynchronous loading. After loading, what you see is the fully loaded HTML code.
Many crawlers are captured, and Python is readily available.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.