An iterative approach to capturing the httphelper of millions of user information

Source: Internet
Author: User

What is Httphelper?

Httpelpers is a tool class that encapsulates the resources that are available for use on the network. Because it is using the HTTP protocol, so named Httphelper.

Httphelper appearance of the background

Using WebClient makes it easy to get resources on the network, such as

              New WebClient ();             string html=   client. Downloadstring ("https://www.baidu.com/");

This can get Baidu home source code, because WebClient packaging is too strong, sometimes not flexible, need to have more detailed control of the bottom, this time need to build their own network resources to obtain tools;

Httphelper Primary

Now set out to build your own download tool, just at the beginning of this

 Public class Httphelp  {        publicstaticstring downloadstring (string  url)        {
               String. Empty;
HttpWebRequest Request=(HttpWebRequest) webrequest.create (URL);
using(HttpWebResponse response =(HttpWebResponse) Request. GetResponse ())
{
using(Stream stream = response. GetResponseStream ())
{
using(StreamReader reader =NewStreamReader (Stream, Encoding.UTF8))
{
Source=Reader. ReadToEnd ();
}
}
}
return Source;
}
}
The program always has a variety of exceptions, this time add a try Catch statement
 Public classHttphelp { Public Static stringDownloadstring (stringURL) {           stringSource =string.           Empty; Try{HttpWebRequest Request=(HttpWebRequest) webrequest.create (URL); using(HttpWebResponse response =(HttpWebResponse) request. GetResponse ()) {using(Stream stream =Response. GetResponseStream ()) {using(StreamReader reader =NewStreamReader (Stream, Encoding.UTF8)) {Source=Reader.                        ReadToEnd (); }                     }                 }           }          catch
{Console.WriteLine ("error, the requested URL is {0}", URL); } returnSource; } }

Requesting resources is I/O intensive, especially time-consuming, and this time requires asynchronous
  Public Static Asynctask<string> downloadstring (stringURL) {            return awaittask<string. Run (() =            {                stringSource =string.                Empty; Try{HttpWebRequest Request=(HttpWebRequest) webrequest.create (URL); using(HttpWebResponse response =(HttpWebResponse) request. GetResponse ()) {using(Stream stream =Response. GetResponseStream ()) {using(StreamReader reader =NewStreamReader (Stream, Encoding.UTF8)) {Source=Reader.                            ReadToEnd (); }                        }                    }                }                Catch{Console.WriteLine ("error, the requested URL is {0}", URL); }                returnSource;                   }); }

Httphelper Perfect
In order to deceive the server, let the server think that this request is issued by the browser

   " mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) gecko/20100101 firefox/49.0";

Some resources need permission, this time to disguise as a user, the HTTP protocol is stateless, the tag information is on the cookie, the request with a cookie

    Request. Headers.add ("cookie" ) , place a cookie here, copy it from thebrowser)

Perfect, set a timeout.

   Request. Timeout = 5;

Some Web sites provide resources that are gzip compressed, which saves bandwidth, so the request header is added to the requests. Headers.add ("accept-encoding", "gzip, deflate, BR"), corresponding to the corresponding flow to have the corresponding decompression, this time httphelper become like this
          public static string downloadstring (string URL)
{
string Source = string. Empty;
try{
HttpWebRequest request =(HttpWebRequest) webrequest.create (URL); Request. UserAgent="mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) gecko/20100101 firefox/49.0"; request. Headers.add ("Cookies","Here is the cookie");request. Headers.add ("accept-encoding","gzip, deflate, BR"); Request. KeepAlive=true;//Enable Long connections using(HttpWebResponse response =(HttpWebResponse) request. GetResponse ()) {using(Stream DataStream =Response. GetResponseStream ()) {if(Response. Contentencoding.tolower (). Contains ("gzip"))//Unzip { using(GZipStream stream =NewGZipStream (response. GetResponseStream (), compressionmode.decompress)) {using(StreamReader reader =NewStreamReader (Stream, Encoding.UTF8)) {Source=Reader. ReadToEnd (); } } } Else if(Response. Contentencoding.tolower (). Contains ("deflate"))//Unzip { using(Deflatestream stream =NewDeflatestream (response. GetResponseStream (), compressionmode.decompress)) {using(StreamReader reader =NewStreamReader (Stream, Encoding.UTF8)) {Source=Reader. ReadToEnd (); } } } Else { using(Stream stream = response. GetResponseStream ())//Original { using(StreamReader reader =NewStreamReader (Stream, Encoding.UTF8)) {Source=Reader. ReadToEnd (); }}}}} request. Abort (); } Catch{Console.WriteLine ("error, the requested URL is {0}", URL); } returnSource;
}

The request attitude will be rejected by the server, returning 429. This time the agent needs to be set, our request will be submitted to the proxy server, the proxy server will request to the target server, the resulting response is returned to us by the proxy server. As long as the proxy is constantly switched, the server will not refuse to request a program because the request is too frequent
   var New WebProxy ("Adress",8080); // followed by the port number   Request. Proxy = proxy; // set up a proxy for HttpWebRequest

As for how to get the agent, please see the following blog

An iterative approach to capturing the httphelper of millions of user information

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.