The so-called data collection program is the Web page thief Program (Don't scold me). After writing it, let's send something here. I hope you can study it together.
1. at the beginning of data download, some websites need to log on to see the corresponding data. This requires us to send the login username and password, But I logged on, however, his server is not rubbish, and He redirected it to him. A total of two sessions were generated. I don't know how to capture these 2nd sessions. so I speculate ^-^. I caught the session with the software and caught a software called Ethereal. I used the following code to add it to the header of the HTTP request.
WebClient mywebclient = new WebClient ();
String sessionkey = textbox78.text;
String refererurl = textbox77.text;
Mywebclient. headers. Clear ();
Mywebclient. headers. Add ("cookie", sessionkey );
Mywebclient. headers. Add ("Referer", refererurl );
Mywebclient. headers. add ("User-Agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; RV: 1.5) Gecko/20031107 Debian/1.5-3 ");
In this way, the server is cheated, haha
2. The second part is code download.
Byte [] mydatabuffer = mywebclient. downloaddata (remoteuri );
Download = encoding. Default. getstring (mydatabuffer );
3. the third part is the data matching. I read the stream into the data, then use indexof to get the location of the two key fields, and then use the substring to get the location, I know this is stupid, but it is difficult to use regular expressions (who will give me some advice). After matching the strings, I will use the following function to remove the HTML code:
Private string striphtml (string strhtml)
{
String [] aryreg = {
@ "<SCRIPT [^>] *?>. *? </SCRIPT> ",
@ "<(// S *)?!? (/W + :)? /W +) (/W + (/S * =? /S * (["" ']) (// ["" 'tbnr] | [^/7]) *? /7 |/W +) |. {0}) |/s )*? (// S *)?> ",
@ "([/R/n]) [/S] + ",
@ "& (Quot | #34 );",
@ "& (Amp | #38 );",
@ "& (LT | #60 );",
@ "& (GT | #62 );",
@ "& (Nbsp | #160 );",
@ "& (Iexcl | #161 );",
@ "& (Cent | #162 );",
@ "& (Pound | #163 );",
@ "& (Copy | #169 );",
@ "& # (/D + );",
@ "--> ",
@ "<! --. */N"
};
String [] aryrep = {
"",
"",
"",
"/"",
"&",
"<",
"> ",
"",
"/XA1", // CHR (161 ),
"/Xa2", // CHR (162 ),
"/Xa3", // CHR (163 ),
"/Xa9", // CHR (169 ),
"",
"/R/N ",
""
};
String newreg = aryreg [0];
String stroutput = strhtml;
For (INT I = 0; I <aryreg. length; I ++)
{
RegEx = new RegEx (aryreg [I], regexoptions. ignorecase );
Stroutput = RegEx. Replace (stroutput, aryrep [I]);
}
Stroutput. Replace ("<","");
Stroutput. Replace ("> ","");
Stroutput. Replace ("/R/N ","");
Return stroutput;
}
4. after that, the database is stored. You can understand this. however, when I write data, an exception occurs, saying that my field is too long to be written into the database. I use access, I will try to use SQL.
5. Do you have any good suggestions for me to make a speech contest and make progress together.