There are some flaws that don't have time to be refined, but they only achieve results. Let's take a look at how to write this regular expression:
URL: http://news.szhome.com/83642.html
Content:
Copy codeThe Code is as follows:
Object> </div>
</Div>
<Div class = "share"> <div class = "linkshare" style = "right: 0">
Code between the two tags. The END mark problem is solved, but it is depressing that the START cut mark is because there is a line break between the second DIV and the third DIV. I am speechless and don't know how to deal with this regular expression.
What's depressing is that there are multiple repeated tags that are not familiar with regular expressions. My solution is as follows:
Copy codeThe Code is as follows:
MatchCollection mc = Regex. Matches (ghoPage. Trim (),@"(? <= <Div class = ['""] txtmsg [' ""]>) [\ s \ S] *? (? = <Div class = ['""] share [' "]> <div class =)", RegexOptions. CultureInvariant | RegexOptions. IgnoreCase );
Foreach (Match mm in mc)
{
Sb. Append (mc [0]. Value. Substring (1933, mc [0]. Value. Length-1933 ));
}
I figured out that the length of the two FLASH ad divs in multiple places is 1933. Then I processed the string and got the text I wanted, the disadvantage of doing so is that, in case the station changes the length of the two FLASH ad DIV, the data I have obtained is not complete. If you are interested in the research, let's take a look at how to deal with the DIV regular problem of line breaks.
It uses a self-written BUTTON control. You can click the BUTTON to prohibit repeated clicks, and then make some judgments. It is quite good in thinking and can be captured all the time, because it is not commonly used as a WINDOWS service type, such a program can be used as a WINDOWS Service and write the rules in the INI file. The captured rules and regular expressions are also placed in the configuration file, in this way, automatic capturing and recording can be realized.
Very short code. Anyone interested in this kind of video capture can try it. Download