Curl simulates logon and finds that the url sent by the cookie is always changed. The first step is to open the homepage in get mode. The second step is to click the logon button on the homepage. the logon box is displayed, and the post data completes the logon.
The second step requires a cookie value generated during the first step.
The first step is to access the home page, but it will automatically follow the request for nearly 10 URLs. Many of these URLs are automatically issued by the js loaded on the home page.
This cookie is written to the client on a page of these pages.
But how many pages do I need to simulate this cookie?
These pages also change. some URLs will appear this time and will not appear the next time you create a homepage. In short, I don't know which page the cookie was issued during a certain visit.
Curl does not seem to be able to automatically process the new js request page under the first request page as the browser does?
It seems impossible to simulate pages one by one.
I was thinking that curl should be used to request a page that automatically sends subsequent requests, so that all cookies can be obtained. But the implementation does not seem like this. curl can have only one url request.
Isn't curl powerless like this? After all, curl is not a browser.
Prompt. If curl cannot be used, what can be done?
Reply to discussion (solution)
Clear all cookie caches, and then browse the page to capture packets to see what specific URLs are available. Repeat until the cookie generated url is captured.
Access the url that generates the cookie + access the target page.
This is a cumbersome process. curl is not a smart robot, but a transmission tool that supports multiple protocols.
Clear all cookie caches, and then browse the page to capture packets to see what specific URLs are available. Repeat until the cookie generated url is captured.
Access the url that generates the cookie + access the target page.
This is a cumbersome process. curl is not a smart robot, but a transmission tool that supports multiple protocols.
Thank you. The cookie must have been cleared before each request. I just found that there will be a lot of js code in the first loaded page, and then the js code will send other requests, and the cookies used are issued in these requests. In addition, the URLs produced by these js will contain some complex variables. it is estimated that you have to understand the JavaScript code that is tricky to create such URLs. It seems that other powerful curl functions are very small. it is quite difficult to process get requests.
Does curl execute js?
Those professional browsers still have compatibility issues.
If this problem occurs, I generally pass it. it is a cost problem.
Solution two:
1. install a js engine (V8 is powerful at present). extract the js part from the homepage and submit it to the js engine for running.
2. first, manually find the URLs of the 10 requests, even if they are random, there is a certain pattern. then, determine which cookie is generated, and record the cookie for the url request. In short, this is two operations, request cookie before, post after
Method 1 also has V8 learning costs, which I will not consider. Method 2 has a high filtering time cost. I would rather log on one time manually and capture all the cookies after logon for subsequent programs ......
In the end, it is a matter of value trade-offs. if you want to log on for 1000 times, I will take the time to get this
If you only log on once and get the next 1000 pages, it will be worth the effort to log on.