Parse the webpage with IE WebControl to get specific data in the specific webpage: Set oDocument = Form2.m _ IE. Document
Set oelement = oDocument. Forms ("searchdetail ")
Set oListTableElement = oelement. children (0 ). the advantage of children (0) is that it is simple, but the disadvantage is: how to read the information and what the node value actually means, it is not good to extract the information and put it out as a flexible configuration file. Because it belongs to a Childnodes and a Childnodes traversal, the depth and meaning cannot be set flexibly. I have also experienced a regular expression that specifically processes the HTML of a specific site, because the page I want to process always contains some fixed code. It is found that the following formula can resolve a series of values to one MatchCollection. I used The "The Regulator" tool. The test results are as follows: Collection "hiddentonenames" is always a Collection of XX names in order; Collection "hiddenspnames" is always a Collection of XX names, sort in order, and so on. In C #, you can use the following code to obtain the values of each set: foreach (Match match in matchCollection) {Group groupToneNames = match. groups ["hiddentonenames"]; Group groupSpNames = match. groups ["hiddenspnames"]; Group groupSingers = match. groups ["hiddensingers"];} in this way, I can write special regular expressions for the Portal styles of these schools, but let them output the same Collections names, in this way, the code for getting each field of a ringtone is fixed. You only need to modify the Regular Expression of each school. In addition, I am very grateful to Wang Hui for his wonderful article "Web page crawling practices!
Trackback: http://tb.blog.csdn.net/TrackBack.aspx? PostId = 107090