Data collection program (Web thieves)

Source: Internet
Author: User

 

The so-called data collection program is the Web page thief Program (Don't scold me). After writing it, let's send something here. I hope you can study it together.

1. at the beginning of data download, some websites need to log on to see the corresponding data. This requires us to send the login username and password, But I logged on, however, his server is not rubbish, and He redirected it to him. A total of two sessions were generated. I don't know how to capture these 2nd sessions. so I speculate ^-^. I caught the session with the software and caught a software called Ethereal. I used the following code to add it to the header of the HTTP request.
WebClient mywebclient = new WebClient ();
String sessionkey = textbox78.text;
String refererurl = textbox77.text;
Mywebclient. headers. Clear ();
Mywebclient. headers. Add ("cookie", sessionkey );
Mywebclient. headers. Add ("Referer", refererurl );
Mywebclient. headers. add ("User-Agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; RV: 1.5) Gecko/20031107 Debian/1.5-3 ");
In this way, the server is cheated, haha

2. The second part is code download.
Byte [] mydatabuffer = mywebclient. downloaddata (remoteuri );
Download = encoding. Default. getstring (mydatabuffer );

3. the third part is the data matching. I read the stream into the data, then use indexof to get the location of the two key fields, and then use the substring to get the location, I know this is stupid, but it is difficult to use regular expressions (who will give me some advice). After matching the strings, I will use the following function to remove the HTML code:
Private string striphtml (string strhtml)
{
String [] aryreg = {
@ "<SCRIPT [^>] *?>. *? </SCRIPT> ",
@ "<(// S *)?!? (/W + :)? /W +) (/W + (/S * =? /S * (["" ']) (// ["" 'tbnr] | [^/7]) *? /7 |/W +) |. {0}) |/s )*? (// S *)?> ",
@ "([/R/n]) [/S] + ",
@ "& (Quot | #34 );",
@ "& (Amp | #38 );",
@ "& (LT | #60 );",
@ "& (GT | #62 );",
@ "& (Nbsp | #160 );",
@ "& (Iexcl | #161 );",
@ "& (Cent | #162 );",
@ "& (Pound | #163 );",
@ "& (Copy | #169 );",
@ "& # (/D + );",
@ "--> ",
@ "<! --. */N"
};

String [] aryrep = {
"",
"",
"",
"/"",
"&",
"<",
"> ",
"",
"/XA1", // CHR (161 ),
"/Xa2", // CHR (162 ),
"/Xa3", // CHR (163 ),
"/Xa9", // CHR (169 ),
"",
"/R/N ",
""
};

String newreg = aryreg [0];
String stroutput = strhtml;
For (INT I = 0; I <aryreg. length; I ++)
{
RegEx = new RegEx (aryreg [I], regexoptions. ignorecase );
Stroutput = RegEx. Replace (stroutput, aryrep [I]);

}

Stroutput. Replace ("<","");
Stroutput. Replace ("> ","");
Stroutput. Replace ("/R/N ","");

Return stroutput;
}

4. after that, the database is stored. You can understand this. however, when I write data, an exception occurs, saying that my field is too long to be written into the database. I use access, I will try to use SQL.

 

5. Do you have any good suggestions for me to make a speech contest and make progress together.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.