A class that analyzes URLs and URL headers from the tag inside a Web page

Source: Internet
Author: User
Tags constructor

One, have to say the nonsense

I know to use the MSHTML IHTMLDocument2 get_links to get the IHTMLElementCollection interface, IHTMLElementCollection to get ihtmlanchorelement, and then pass Through the IHTMLAnchorElement interface Get_href we can get all the links of the page! But this is a MSHTML approach, as far as I am concerned, I always like to do it myself, and I don't like to use things I don't see inside (although Microsoft writes better than me). So, I encapsulate a class that gets the URL by analyzing the markup character of the page, which I know is flawed, so I publish it and I hope someone can make a new and better version of me on the basis of my class.

Ii. about this class

Some people want to say, analysis of the URL of the page is not the analysis of href= ... or something? But it is easy to say a lot of things, but to do a good job as much as possible, but also know the real to try to know. For example, some connections are such url= .... For example, the connection in JavaScript (and I'm also having problems analyzing JavaScript), like the relative address and so on ... In this class can handle I have handled as much as possible, but my level is limited, not very perfect.

Third, the interface of this class

The only one of the interface functions is the constructor of this class, and the following is the declaration of this function:

CWebHost(const CString& m_str_webcode, /*网页代码*/
     vector<HyperLink>& m_vec_URL, /*分析得到的url和url标题的结构*/
     CString& str_URL); /*本页的url*/

HyperLink is a structure I put inside the urlsturct file:

//URLSturct_.h
//超连接数据结构
#ifndef _____HyperLinkTag_h____
#define _____HyperLinkTag_h____
//超连接数据集
typedefstruct tagHyperLinkTag{
  //link address;
  CString str_Hyperlink;
  //link text;
  CString str_HyperlinkText;
}HyperLink;
#endif

Iv. List of functions for this class

function name feature
cwebhost (...) ; constructor
void Nwebcontent (...);
void ongethtmlurl (...); get HTML URL
void ongetjumpurl (...); Get the URL of the jump
void onreturnframeurl (...);
cstring onconversionurl (...);
void onAnalysejavascrript (...); Returns the URL in the JavaScript code
cstring ongetlinktext (...); URL connection text

V. The processing process of this class

Six, detailed code

The code is too much, limited to space, so please go to the source code to see.

Seven, still want to say some nonsense

This kind of analysis is flawed, I hope you can ask a lot of questions, or simply you write a new class. Since childhood my language level is rotten, write don't understand please don't take offense.

Compiled via Vc7.1+windows Server 2003

This article supporting source code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.