C # automatic page-turning and automatic classification of image collection software (essential tools for image collection ),
The website administrator wants to download the full-site data of others to his/her own website or save some content of the other's website to his/her own server. Extract related fields from the content and publish them to your website system. Sometimes you need to save the webpage-related files to a local device, as well as files and attachments.
The image collection software can collect pictures of various formats on any website, and classify all the pictures in the middle of articles, news, and posts in sequence and save them to your computer, images of all posts on any Forum website can be collected locally to easily filter advertisements. This is an essential tool for websites, webmasters, and friends who like to collect beautiful pictures.
This article demonstrates the use of the C # WebBrowser control to automatically identify the next page, classify by title, and automatically download images (not repeated), as shown in figure 1. The complete source code is provided in the accompanying Code download.
Figure 1
Demo program structure
Create a demo program. In Visual Studio 2013, select "create C # Windows form application" and name it ImgSpider. Add the Controls folder that contains the encapsulated label and textbox Control; the Core folder contains the base class used by the dictionary Entity, and the Entity folder contains the read configuration file dict. xml ing Class; the Helper folder contains DownLoadHelper for DownLoadHelper to download images, HtmlParserHelper for parsing HTML, XmlHelper for reading XML documents, and form file frmautobor.pdf is the program operation interface. Figure 2 shows the overall structure of the demo program.
Program Execution Form file frmautobor.pdf
Controls used in the Form file frmautobor.pdf,
First, the WebBrowser Control
WebBrowser is A. NET control class, which is added in. NET Framework 2.0. The WebBrowser class allows you to navigate webpages in a form. You can use the WebBrowser control to host webpages and other documents that support browsers in Windows Forms applications. For example, you can use the WebBrowser control to provide HTML-Based Integrated user help or Web browsing functions in applications. You can also use the WebBrowser control to add existing Web-based controls to Windows Forms client applications.
Second, Timer control
By triggering a Timer event, the Timer control can regularly execute code once a period of time.
Third, the DataGridView Control
Binding data to the DataGridView control is simple and intuitive. In most cases, you only need to set the DataSource attribute.
Click "collect image ".
WebBrowser. Navigate. You must note that the WebBrowser control can continue the collection operation only after it is loaded. Tasks to be collected after the WebBrowser control is loaded are executed at regular intervals through the Timer control.
Obtain the title URL of the current page to determine whether the page has been downloaded.
Process the title URL to be downloaded, and navigate to the specified URL in WebBrowser.
Obtain the URL of the image with the current title, and navigate to the next image URL through webbor.pdf
Download images, determine whether to split by title, and update Access Database
The main method of DownLoadHelper. cs file is
Download images
Main Methods of HtmlParserHelper. cs File
First, get the current image URL and the next image URL
Second, get the current page URL and next page URL
End
This article demonstrates how to use the C # WebBrowser control to implement image collection software, automatically flip pages, and automatically classify (essential tools for collecting meitu), as shown in figure 1. The complete source code is provided in the accompanying Code download. Complete source code download link