The program automatically reads the information displayed by other Web pages, similar to the crawler program. Let's say we have a system to extract the ranking of song searches on the Baidu site. The analysis system analyzes the data according to the obtained data. Provide reference data for the business. In order to complete the above requirements, we need to simulate the browser to browse the Web page, the page data is analyzed, and finally the structure of the analysis, that is, the collation of data written to the database. So our idea is: 1, send HttpRequest request. 2, receive the results of HttpResponse return. Gets the HTML source file for a particular page. 3, take out the part that contains the data source. 4, according to the HTML source code generation HTMLDocument, the circular extraction data. 5, write to the database.The procedure is as follows: //According to URL address to get the HTML source page private string Getwebcontent (String Url) { String strresult= ""; Try { HttpWebRequest request = (HttpWebRequest) webrequest.create (URL); //Declaration of a HttpWebRequest request Request. Timeout = 30000; //Set the connection timeout time Request. Headers.set ("Pragma", "No-cache"); HttpWebResponse response = (HttpWebResponse) request. GetResponse (); Stream streamreceive = Response. GetResponseStream (); Encoding Encoding = encoding.getencoding ("GB2312"); StreamReader StreamReader = new StreamReader (streamreceive, encoding); strresult = Streamreader.readtoend (); } Catch { MessageBox.Show ("error"); } return strresult; } In order to use HttpWebRequest and HttpWebResponse, you need to fill in the name space reference Using System.Net; The following is the process of implementing the program: private void Button1_Click (object sender, EventArgs e) { //URL address to crawl String Url = "Http://list.mp3.baidu.com/topso/mp3topsong.html?id=1#top2"; //Get the source code of the specified URL String strwebcontent = Getwebcontent (URL); richTextBox1.Text = strwebcontent; Take out the source code associated with the data int ibodystart = Strwebcontent.indexof ("<body", 0); int iStart = Strwebcontent.indexof ("Song TOP500", Ibodystart); int itablestart = Strwebcontent.indexof ("<table", IStart); int itableend = Strwebcontent.indexof ("</table>", Itablestart); String strweb = Strwebcontent.substring (Itablestart, Itableend-itablestart + 8); Generate HTMLDocument WebBrowser Webb = new WebBrowser (); Webb. Navigate ("About:blank"); HTMLDocument htmldoc = Webb. Document.opennew (TRUE); Htmldoc. Write (Strweb); HtmlElementCollection htmltr = Htmldoc. getElementsByTagName ("TR"); foreach (HtmlElement tr in htmltr) { String Strid = tr. getElementsByTagName ("TD") [0]. InnerText; String strName = Splitname (tr. getElementsByTagName ("TD") [1]. InnerText, "Musicname"); String Strsinger = Splitname (tr. getElementsByTagName ("TD") [1]. InnerText, "Singer"); Strid = Strid.replace (".", ""); //Insert DataTable AddLine (Strid, StrName, Strsinger, "0"); string strID1 = tr. getElementsByTagName ("TD") [2]. InnerText; String strName1 = Splitname (tr. getElementsByTagName ("TD") [3]. InnerText, "Musicname"); String StrSinger1 = Splitname (tr. getElementsByTagName ("TD") [3]. InnerText, "Singer"); strID1 = Strid1.replace (".", ""); AddLine ( StrID1, strName1, StrSinger1, "0"); string strID2 = tr. getElementsByTagName ("TD") [4]. InnerText; String StrName2 = Splitname (tr. getElementsByTagName ("TD") [5]. InnerText, "Musicname"); String StrSinger2 = Splitname (tr. getElementsByTagName ("TD") [5]. InnerText, "Singer"); strID2 = Strid2.replace (".", ""); AddLine ( StrID2, StrName2, StrSinger2, "0"); } //Insert Database InsertData (DT); Datagridview1.datasource = dt. DefaultView; } |