One, have to say the nonsense
I know to use the MSHTML IHTMLDocument2 get_links to get the IHTMLElementCollection interface, IHTMLElementCollection to get ihtmlanchorelement, and then pass Through the IHTMLAnchorElement interface Get_href we can get all the links of the page! But this is a MSHTML approach, as far as I am concerned, I always like to do it myself, and I don't like to use things I don't see inside (although Microsoft writes better than me). So, I encapsulate a class that gets the URL by analyzing the markup character of the page, which I know is flawed, so I publish it and I hope someone can make a new and better version of me on the basis of my class.
Ii. about this class
Some people want to say, analysis of the URL of the page is not the analysis of href= ... or something? But it is easy to say a lot of things, but to do a good job as much as possible, but also know the real to try to know. For example, some connections are such url= .... For example, the connection in JavaScript (and I'm also having problems analyzing JavaScript), like the relative address and so on ... In this class can handle I have handled as much as possible, but my level is limited, not very perfect.
Third, the interface of this class
The only one of the interface functions is the constructor of this class, and the following is the declaration of this function:
CWebHost(const CString& m_str_webcode, /*网页代码*/
vector<HyperLink>& m_vec_URL, /*分析得到的url和url标题的结构*/
CString& str_URL); /*本页的url*/
HyperLink is a structure I put inside the urlsturct file:
//URLSturct_.h
//超连接数据结构
#ifndef _____HyperLinkTag_h____
#define _____HyperLinkTag_h____
//超连接数据集
typedefstruct tagHyperLinkTag{
//link address;
CString str_Hyperlink;
//link text;
CString str_HyperlinkText;
}HyperLink;
#endif
Iv. List of functions for this class
function name |
feature |
cwebhost (...) ; |
constructor |
void Nwebcontent (...); |
|
void ongethtmlurl (...); |
get HTML URL |
void ongetjumpurl (...); |
Get the URL of the jump |
void onreturnframeurl (...); |
|
cstring onconversionurl (...); |
|
void onAnalysejavascrript (...); |
Returns the URL in the JavaScript code |
cstring ongetlinktext (...); |
URL connection text |
V. The processing process of this class
Six, detailed code
The code is too much, limited to space, so please go to the source code to see.
Seven, still want to say some nonsense
This kind of analysis is flawed, I hope you can ask a lot of questions, or simply you write a new class. Since childhood my language level is rotten, write don't understand please don't take offense.
Compiled via Vc7.1+windows Server 2003
This article supporting source code