ASP. NET skills-data collection program introduction first let's look at some concepts, the so-called data collection program is also a Web page thief Program (Don't scold me ), after writing something, I hope you can study it together.
ASP. NET skills-the first step of the data collection program, at the beginning of data download, some websites need to log on to see the corresponding data, this requires us to send the login username and password, but I logged on to the server, but the server was not rubbish. I redirected the server to him and generated two sessions. I don't know how to capture these 2nd sessions. so I speculate ^-^. I caught the SESSION with the software and caught a software called Ethereal. I used the following code to add it to the header of the HTTP request.
- WebClient myWebClient = new WebClient();
- string sessionkey=textBox78.Text;
- string refererurl=textBox77.Text;
- myWebClient.Headers.Clear();
- myWebClient.Headers.Add("Cookie",sessionkey);
- myWebClient.Headers.Add("Referer", refererurl);
- myWebClient.Headers.Add("User-agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031107 Debian/1.5-3");
In this way, the server is cheated, haha
ASP. NET skills-data collection procedure step 2, code download
- byte[] myDataBuffer = myWebClient.DownloadData(remoteUri);
- download = Encoding.Default.GetString(myDataBuffer);
ASP. NET skills-data collection program step 3, data matching, I read the stream into the data, and then use IndexOf to get the location of two key fields, I know this is stupid, but it is difficult to use a regular expression (who will give me some advice ), after matching the string, I used the following function to remove the HTML code:
- private string StripHTML(string strHtml)
- {
- string [] aryReg ={
- @"<script[^>]*?>.*?</script>",
- @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",
- @"([\r\n])[\s]+",
- @"&(quot|#34);",
- @"&(amp|#38);",
- @"&(lt|#60);",
- @"&(gt|#62);",
- @"&(nbsp|#160);",
- @"&(iexcl|#161);",
- @"&(cent|#162);",
- @"&(pound|#163);",
- @"&(copy|#169);",
- @"&#(\d+);",
- @"-->",
- @"<!--.*\n"
- };
-
- string [] aryRep = {
- "",
- "",
- "",
- "\"",
- "&",
- "<",
- ">",
- " ",
- "\xa1",//chr(161),
- "\xa2",//chr(162),
- "\xa3",//chr(163),
- "\xa9",//chr(169),
- "",
- "\r\n",
- ""
- };
-
- string newReg =aryReg[0];
- string strOutput=strHtml;
- for(int i = 0;i<aryReg.Length;i++)
- {
- Regex regex = new Regex(aryReg[i],RegexOptions.IgnoreCase );
- strOutput = regex.Replace(strOutput,aryRep[i]);
-
- }
-
- strOutput.Replace("<","");
- strOutput.Replace(">","");
- strOutput.Replace("\r\n","");
-
-
- return strOutput;
- }
After that, the database is stored. You can understand this. however, when I write data, an EXCEPTION occurs, saying that my field is too long to be written into the database. I use ACCESS, I will try to use SQL.
The data collection program of ASP. NET skills will be introduced here, and it will be helpful for you to write data collection programs using ASP. NET.
- Analysis on ASP. NET runtime environment Establishment
- ASP. NET Overview
- Analysis on the advantages of ASP. NET in eleven aspects
- Analysis on ASP. NET database connection pool settings
- How to Learn the nine steps of ASP. NET