Introduction
Recently, I am working on a Web Crawler. In order to better utilize the Web crawling capability, I need to use Asynchronous HTTP. DNS resolution and download must be asynchronous. This component is implemented based on WinHTTP. If you need to create a wheel from the ground up (iocp-based Asynchronous framework ).
For WinHTTP ideas, msdn has many introductions. Here, here
Description
This component is easy to use. You only need to complete several callback interfaces.
1. Complete the HTTP Header
2. Complete the entire HTTP Content
3. Read the HTTP body
4. Jump URL
5. handle errors
Several important concepts are abstracted for WinHTTP.
1. url-a simple wrapper for the URL
2. Session-each HTTP access generates a session, which is used to record the status and adjust the access policy attributes.
3. connection -- this object is required to link to the HTTP server. A session and a URL are used to construct a link.
4. Request -- after connecting to the server, you need to request the corresponding resources on the server to control the method of the Request server.
Example
void header(const http::request &req, std::uint32_t size){http::query::raw_headers accept;req.query_http_header(accept);http::query::content_type content_type;req.query_http_header(content_type);http::query::date date;req.query_http_header(date);http::query::expires expires;req.query_http_header(expires);std::wcout << (const wchar_t *)accept.buffer() << std::endl << size << std::endl;}void response_complete(bool suc){assert(suc);}bool read(const char *buf, size_t len){std::cout.write(buf, len);return true;}void redirect(const wchar_t *url, size_t len){std::cout << url << std::endl;}void error(const std::string &msg){std::cout << msg << std::endl;}int _tmain(int argc, _TCHAR* argv[]){try{http::session session(L"test");std::chrono::seconds second(5);session.set_timeout(second, second, second, second);http::url url(L"http://www.baidu.com");http::connection con(session, url);http::request request(con, L"GET", url);/*request.set_option(http::option::security(SECURITY_FLAG_IGNORE_CERT_CN_INVALID| SECURITY_FLAG_IGNORE_CERT_DATE_INVALID| SECURITY_FLAG_IGNORE_UNKNOWN_CA));request.set_option(http::option::user_name(L"default proxy name"));request.set_option(http::option::password(L"default proxy password"));*/request.register_callback(&header, &response_complete, &read, &redirect, &error);request.send_request(L"");system("pause");}catch(std::exception &e){std::cerr << e.what() << std::endl;}system("pause");return 0;}
As you can see, the interface is very simple to use. Here are a few notes.
1. STD: chrono: seconds. This is the time libaray for the latest entry into the standard library. For details, see the documentation.
2. Request. set_option (...): configure the attributes of the access policy. A group of options is predefined here. For details, see the source code.
3. Request. register_callback, which is the callback registration for the processing process. Yes
- HTTP accept complete header callback
- Whether the returned information is completely successful or not.
- Read and return data callback
- Redirect callback
- Error Handling callback
Environment
Note: This component is written in C ++ 0x with the development environment vs2010 or later.
Sorry
Of course, this component is still not perfect, but it is enough for my application. It is also very easy to expand if needed.
For example, you can add authentication and proxy settings.
If anything is wrong, please discuss with each other. Thank you.
Download
Click here to download