C # code for automatically capturing remote web page information through a program

Source: Internet
Author: User

Pass Program It automatically reads the information displayed on other websites, similar to crawlers. For example, we have a system that extracts the ranking of songs on the Baidu website. The analysis system performs Data Analysis Based on the obtained data. Provides reference data for businesses.
To meet the above requirements, we need to simulate the browser to browse the Web page, get the page data for analysis, and finally write the analysis structure, that is, the sorted data into the database. Our idea is:
1. Send an httprequest request.
2. receive the results returned by httpresponse. Obtain the HTML source file of a specific page.
3. Extract the source code that contains the data.
4. Generate htmldocument Based on the HTML source code and fetch data cyclically.
5. Write data to the database.
The procedure is as follows:

// Obtain the HTML source code of the webpage based on the URL
Private string getwebcontent (string URL)
{
String strresult = "";
Try
{
Httpwebrequest request = (httpwebrequest) webrequest. Create (URL );
// Declare an httpwebrequest request
Request. Timeout = 30000;
// Set the connection timeout
Request. headers. Set ("Pragma", "No-Cache ");
Httpwebresponse response = (httpwebresponse) request. getresponse ();
Stream streamreceive = response. getresponsestream ();
Encoding encoding = encoding. getencoding ("gb2312 ");
Streamreader = new streamreader (streamreceive, encoding );
Strresult = streamreader. readtoend ();
}
Catch
{
MessageBox. Show ("error ");
}
Return strresult;
}
To use httpwebrequest and httpwebresponse, you must specify the namespace reference.
Using system. net;

The specific implementation process of the program is as follows:
Private void button#click (Object sender, eventargs E)
{
// The URL to be crawled
String url = "http://list.mp3.baidu.com/topso/mp3topsong.html? Id = 1 # top2 ";

// Obtain the source code of the specified URL
String strwebcontent = getwebcontent (URL );

Richtextbox1.text = strwebcontent;
// Retrieve the Source Code related to the data
Int ibodystart = strwebcontent. indexof ("<body", 0 );
Int istart = strwebcontent. indexof ("song top500", ibodystart );
Int itablestart = strwebcontent. indexof ("<Table", istart );
Int itableend = strwebcontent. indexof ("</table>", itablestart );
String strweb = strwebcontent. substring (itablestart, itableend-itablestart + 8 );

// generate htmldocument
webbrowser Webb = new webbrowser ();
Webb. navigate ("about: blank");
htmldocument htmldoc = Webb. document. opennew (true);
htmldoc. write (strweb);
htmlelementcollection htmltr = htmldoc. getelementsbytagname ("TR");
foreach (htmlelement TR in htmltr)
{< br> string Strid = tr. getelementsbytagname ("TD") [0]. innertext;
string strname = splitname (TR. getelementsbytagname ("TD") [1]. innertext, "musicname");
string strsinger = splitname (TR. getelementsbytagname ("TD") [1]. innertext, "Singer");
Strid = Strid. replace (". "," ");
// insert a datatable
addline (Strid, strname, strsinger," 0 ");

String strid1 = tr. getelementsbytagname ("TD") [2]. innertext;
String strname1 = splitname (tr. getelementsbytagname ("TD") [3]. innertext, "musicname ");
String strsinger1 = splitname (tr. getelementsbytagname ("TD") [3]. innertext, "Singer ");
// Insert a datatable
Strid1 = strid1.replace (".","");
Addline (strid1, strname1, strsinger1, "0 ");

String strid2 = tr. getelementsbytagname ("TD") [4]. innertext;
String strname2 = splitname (tr. getelementsbytagname ("TD") [5]. innertext, "musicname ");
String strsinger2 = splitname (tr. getelementsbytagname ("TD") [5]. innertext, "Singer ");
// Insert a datatable
Strid2 = strid2.replace (".","");
Addline (strid2, strname2, strsinger2, "0 ");

}
// Insert a database
Insertdata (DT );

Datagridview1.datasource = DT. defaultview;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.