C # HTML parsing example-the case of the stars

Source: Internet
Author: User

[Preface]

After switching from CSDN to cnBlog for a while, we found that cnBlog also has a function similar to CSDN, that is, flash memory. Flash uses the Lucky Star mechanism, and also triggers a large number of people to brush the stars if they are okay ...... Although I don't know how to use it, I tried it several times in my boredom. Thanks to the random distribution of lucky stars, there is an idea to keep sending messages. If it is not a star, delete it to avoid the suspicion of screen flushing. Manual operations are of course troublesome, so we simply use code, which creates a need: Getting html, parsing, automatic submission and login, automatic Publishing, determining whether it is a star, deleting, and so on.

[Solution] Webbrowser

Because I just want to play with it without considering the complexity and perfection, I first thought of using webbrowser, then getting html and parsing it manually.

Step 1: Form filling

First of all is to place a Webbrowser control, for convenience, directly set the url for the http://passport.cnblogs.com/login.aspx

Then log on to http://passport.cnblogs.com/login.aspx, and the source code will be retrieved from the login box.

   

            HtmlDocument doc = (HtmlElement em  str =   = 

If the login is successful, the page should jump to the home page, and the program also needs to navigate to the URL of flash memory. Of course, judging the current url also needs to be in the DocumentCompleted event, because we need to wait until the page refresh ends before making a judgment.

            isLogIn =  (wbBlog.Url.ToString() == = 

At this time, we may find that there are a lot of things to be done in the DocumentCompleted event. Will there be conflicts or repeated executions? So we need some tags for control. In the above code, we can see that the isLogIn variable is used to control whether to execute the judgment or execute table filling in DocumentCompleted.

Step 2: Release flash

After successful login, webbrowser jumps to the Flash page. At this time, the program needs to automatically release the flash memory. The principle is form filling and submission. You can see the source code on the following page.

What are you doing? What are you thinking?

What the program needs to do is keep filling and submitting, and then judge whether there are stars. If there are stars, exit the loop.

            HtmlDocument doc = (HtmlElement em  str = = = =

After the form is submitted, the page will be refreshed, so it is also necessary to determine whether there are stars in the DocumentCompleted event, and the isPulish mark is required to indicate whether the judgment method needs to be executed.

Step 3: Determine whether a star exists

By analyzing the source code of the flash page, we can see that every flash html is as follows:

Author: Response to content 31 minutes ago

Therefore, you must first obtain all the divs whose id format is feed_content _ ***, and then determine whether the htmlelement contains the published information. If so, lock the element and determine whether it contains

 

If yes, a message is displayed, indicating that the flash is successfully published. If no, The Flash is deleted and the flash is released again.Note that, There is no delete option after a new flash is released,

You need to refresh the page to see it, including checking whether there are stars.

             HtmlDocument doc = (HtmlElement em  str = (str !=  && str.Contains() && (em.OuterHtml.Contains( +

Step 4: Delete the flash memory

I am in trouble writing the program here. Because I get the div whose id is feed_content _ ***, I want to get the delete link in the div and find that this a link has no id, so how can we get it? It seems that a regular expression is required. However, I am a lazy here to get all a connections on the page, and then determine whether the title attribute is "delete this flash" to get this a connection element.

 

After getting the element of a connection, you can operate on its Click event. However, there is another new problem. After clicking Delete, a Confirm dialog box pops up, which leads to an old problem, how to kill the Confirm and Alert dialog boxes displayed on the webpage. Here we use an original method to make all function confirm () on the page automatically return true. First, you must reference Microsoft. mshtml and Interop. SHDocVw. The specific operation code is as follows:

                                HtmlElementCollection hrefs = em.GetElementsByTagName( (HtmlElement h  (h.GetAttribute() == = (wbBlog.ActiveXInstance  SHDocVw.WebBrowser).Document ,                                         

To achieve this, the basic functions have been implemented. Now, we need to loop through the operations of release, judgment, and deletion, note that refreshing and deleting the flash memory on a webpage is refreshing part of the div through ajax, so webbrowser does not trigger the DocumentCompleted event. Here we can write a DoEvent like winform, and Sleep is required for a while. Then you can read the refreshed page information.

         =   ExitFrame(=  

[End]

At this point, the program has been written and run, and it is found that the program is constantly controlling the release and deletion, but there is still a problem that cannot be found. Although the stars in the cnblog are randomly allocated, the same content, or the time between them is too short, will be excluded. There are two flash mechanisms, the same page allows only one user to publish five pieces of information, there is a limit on the number of Flash files that users can release each day. The specific amount is not counted. It is a prompt given after hundreds of Flash files are refreshed by a program. Therefore, although the program has been written, it has not achieved the initial effect, which is disappointing. However, in another way, five different FLASH memories are released at a time (several seconds before and after the interval is required), and then you can determine whether there are stars, delete those without stars, and retain those with stars, this should conform to the rules. Of course, it is complicated to find something that is written at will, and this Part has not been implemented. In fact, the main purpose of this article is HTML parsing, isn't it?

Looking back, if you really want to use such a tool, using webbrowser for html parsing will lead to low maintainability and execution efficiency of the program. If you want to parse html, we recommend using the Html Agility Pack, you can use the web development tool to capture a package for analysis and then use ajax to directly send requests. For example, in the previous code, it is complicated to obtain the flash div and Flash content and determine whether there are stars or not. If you use the Html Agility Pack, xpath makes it easy to get everything done.

             pageUrl = = [] pageSourceBytes = wc.DownloadData( pageSource = Encoding.GetEncoding(=  xpath = = (HtmlNode node = node.SelectSingleNode( (img != 

The program source code is attached. If you are interested, you can refactor the program to complete the unimplemented functions.

Source code download

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.