Recently in a project, to get some data on the page, static pages are easier to do, as long as the URL to parse the site can get HTML code, but some of the Web pages are generated dynamically, such as the process of page flipping, the Address bar URL address will not be changed, So getting the content of this web page is relatively troublesome. Below I take https://honors.libraries.psu.edu/browse/author/all/this web page to move as an example, explain the Dynamic Web page HTML code acquisition process.
1. Open this website with IE9: https://honors.libraries.psu.edu/browse/author/all/
2. Press F12 to bring up the developer tools
Click "Network" in the developer tools and "start capturing", then tap the "next page" link on the page
3. The process of obtaining the entire request
Click "Go to Detailed View"
4. Bind parameters to C # htmlwebrequest objects
<summary>///Use HTTPS protocol to access network///</summary>///<param name= "URL" >url address </param> ; <param name= "Strpostdata" > Data sent </param>///<returns></returns> public string Open Readwithhttps (String URL, String strpostdata, Encoding Encoding) {cookiecontainer cc = new Cookiecontai NER (); Acl ADD (New Cookie ("Csrftoken", "04696113ff3ee3e8220dd9044921e100", "/browse/author/all/", "honors.libraries.psu.edu") ); Acl ADD (New Cookie ("__utma", "148028590.1404245236.1416720957.1416734716.1416748914.3", "/browse/author/all/", " Honors.libraries.psu.edu ")); Acl ADD (New Cookie ("__UTMZ", "148028590.1416720957.1.1.utmcsr= (direct) |utmccn= (direct) |utmcmd= (none)", "/browse/ Author/all/"," honors.libraries.psu.edu ")); Acl ADD (New Cookie ("__UTMB", "148028590.2.10.1416748914", "/browse/author/all/", "honors.libraries.psu.edu")); Acl ADD (New Cookie ("__UTMC", "148028590","/browse/author/all/"," honors.libraries.psu.edu ")); HttpWebRequest request = (HttpWebRequest) webrequest.create (URL); Request. Cookiecontainer = CC; Request. Method = "POST"; Request. Accept = "text/html, Application/xhtml+xml, */*"; Request. ContentType = "application/x-www-form-urlencoded"; Request. Referer= "https://honors.libraries.psu.edu/browse/author/all/"; Request. KeepAlive = true; Request. useragent = "mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; trident/5.0) "; Request. Host = "honors.libraries.psu.edu"; Request. Headers.add (Httprequestheader.acceptlanguage, "en-us"); Request. Headers.add (httprequestheader.acceptencoding, "gzip, deflate"); Request. Headers.add (Httprequestheader.cachecontrol, "No-cache"); byte[] buffer = encoding. GetBytes (Strpostdata); Request. contentlength = buffer. Length; StrEAM writer = Request. GetRequestStream (); Gets the request stream writer. Write (buffer, 0, buffer. Length); Writes the request parameter to the stream writer. Close (); Close Request Flow HttpWebResponse response = (HttpWebResponse) requests. GetResponse (); using (StreamReader reader = new StreamReader (response. GetResponseStream (), encoding) {return reader. ReadToEnd (); } }
Parameter description:
URL: Requested address, Strpostdata:post sent data, encoding: page encoding
5. Call
private void Button2_Click (object sender, EventArgs e) { string url = "Https://honors.libraries.psu.edu/browse /author/all/"; String strpostdata = "csrfmiddlewaretoken=04696113ff3ee3e8220dd9044921e100&browse_start=all&browse_type= Author&page=9&display=50&num_display_items=50 "; TextBox1.Text = Openreadwithhttps (URL, strpostdata, Encoding.UTF8); }
Summary process: Capture the page request process with the IE9 Developer tool, get the parameters of the request, and then bind the parameters to the Htmlwebrequest object for the request!
Get HTML code for dynamic Web pages with IE9 developer tools