Experience in retrieving information about your world community grid task using web crawling Technology

Source: Internet
Author: User

The world community grid task page has undergone many twists and turns. At the beginning, we thought that this page is similar to other pages. Directly post the login information to obtain the login cookie. However, if this method fails, the remote end will return the login error page. After that, I put it on hold for a long time until I found that the ie9 developer tool provided friendly web packet capture data. However, it is similar to the data obtained by Wireshark, but it only filtered data unrelated to the HTTP protocol, it is easier to view.

After capturing the login operation, you can find out what requests the browser has sent to the server and how the server has responded.

To track the POST request during login, the actual request address is/j_security_check, which can also be seen in the JS Code on the page. Then, write the corresponding key-value pair to request. header based on the request content. The biggest detour for me is the processing of cookies. At first, I assigned the cookie content to the request. header ["cookie"], the result is always unsuccessful login. After a series of attempts, the request uses cookiecontainer to store the sent cookie, write the captured key-value pairs such as "_ utma" to the cookiecontainer object and assign the object to the request. cookiecontainer, you can successfully log on to and obtain cookies such as JSESSIONID and access authentication information, and then you can access the corresponding page to obtain the desired content.

The basic code is as follows:

String url = "https://secure.worldcommunitygrid.org/j_security_check"; <br/> string resulturl = "https://secure.worldcommunitygrid.org/MS/viewboincresults. do? Filterdevice = 0 & filterstatus = 4 & projectid =-1 & sortby = returnedtime & pagenum = 1 "; <br/> httpwebrequest request = NULL; <br/> httpwebresponse response = NULL; <br/> stream requeststream = NULL; <br/> cookiecontainer cc = new cookiecontainer (); <br/> try <br/> {<br/> string formdata = "j_username = zl860628 & j_password =" + constant. password1; <br/> asciiencoding encoding = new asciiencoding (); <br/> byte [] DATA = ENCO Ding. getbytes (formdata); <br/> // Add the cookie in the request <br/> CC. add (new cookie ("_ utma", "2464182.1748491084.1301203514.1303514.1301231617.2", "/", "secure.worldcommunitygrid.org"); <br/> CC. add (new cookie ("_ utmz", "2464182.1301203514.1.1.utmccn = (direct) | utmcsr = (direct) | utmcmd = (none )","/", "secure.worldcommunitygrid.org"); <br/> CC. add (new cookie ("_ utmc", "2464182", "/", "secure.worldcommunitygrid.org "); <Br/> CC. add (new cookie ("_ utmb", "2464182", "/", "secure.worldcommunitygrid.org"); <br/> request = (httpwebrequest) webrequest. create (URL); <br/> request. method = "Post"; <br/> request. contenttype = "application/X-WWW-form-urlencoded"; <br/> request. useragent = "Mozilla/5.0 (windows; U; Windows NT 6.1; ZH-CN; RV: 1.9.2.13) Gecko/20101203 Firefox/3.6.13"; <br/> request. accept = "text/html, applicatio N/XHTML + XML, */* "; <br/> request. contentlength = data. length; <br/> request. cookiecontainer = cc; <br/> request. keepalive = true; <br/> request. timeout = 100*1000; <br/> request. headers ["Accept-encoding"] = "gzip, deflate"; <br/> request. headers ["Accept-language"] = "en"; <br/> request. headers ["Accept-charset"] = "gb2312, UTF-8"; <br/> request. headers ["cookie"] = "_ utma = 2464182.1748491084.1301203514.1301 203514.1301231617.2; _ utmz = 2464182.1301203514.1.1.utmccn = (direct) | utmcsr = (direct) | utmcmd = (none); _ utmc = 2464182; _ utmb = 2464182 "; <br/> // request. headers ["cookie"] = "_ UTF-8 = 2464182"; <br/> requeststream = request. getrequeststream (); <br/> requeststream. write (data, 0, Data. length); <br/> response = (httpwebresponse) request. getresponse (); <br/> response. cookies = cc. getcookies (request. requesturi); <br/> // Obtain the cookie after login <br/> CC. add (response. cookies); <br/> // use the cookie to obtain the task result page. <br/> string html = transporttool. getandgethtml (resulturl, CC, encoding. default); <br/>}< br/> catch (webexception ex) <br/>{< br/>}< br/> finally <br/> {<br/> If (response! = NULL) response. Close (); <br/> If (request! = NULL) request. Abort (); <br/>}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.