What about search engines (web download)

Source: Internet
Author: User

[Disclaimer: All Rights Reserved. You are welcome to reprint it. Do not use it for commercial purposes. Contact Email: feixiaoxing @ 163.com]

Previously we just opened a search engine head. Next we will analyze and dissect the various content of the search engine. Of course, we can pull something else here. In fact, in the current Internet market in China, there are already a lot of search engine websites. Apart from Baidu and Google, there are also many search engines developed by other portal websites, this includes sogou, youdao, sousuo and so on. Of course, in addition to the search engines developed by portal websites, many Chinese enterprises have also developed their own search websites, such as pangu search and instant search. So much attention is paid to search. On the one hand, it is true that search is an essential tool for us to access the Internet. On the other hand, search is also one of the products with clear profit models on the Internet so far. When it comes to profitability, You can see Baidu's quarterly financial report. In fact, the search profit is very impressive, with hundreds of thousands of merchants nationwide, each merchant contributes tens of thousands or even hundreds of thousands of dollars each year. These constitute a very good business profit model. Such a lucrative profit is estimated to be comparable to that of the real estate industry.

It is not a complicated task to do a search engine, but it is indeed a complicated task to do a good job. This basic principle is the same in all industries. Of course, the purpose of writing is not to make money, so we can start with the most basic knowledge point, and the complex and optimized work can be solved slowly.

I am not good at writing the download code for a webpage. The best way to download a webpage is to use socket to download it based on the HTTP protocol, but I can't wait. Therefore, I found a method on the Internet that can use the Windows network library to download webpages. Of course, this method may not be the most efficient than writing a socket. But it does not matter. We can first learn the process and then consider socket optimization.

#include <stdio.h>#include <windows.h>#include <wininet.h>#define U8 unsigned char#define U32 unsigned int#define MAX_BLOCK_SIZE 1024#pragma comment(lib, "wininet.lib")static void download(const char* url, const char* path){U8 buffer[MAX_BLOCK_SIZE];U32 iNumber;FILE* hFile;HINTERNET hSession;HINTERNET hUrl;hSession = InternetOpen("RookIE/1.0", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);if(NULL == hSession){return;}hUrl = InternetOpenUrl(hSession, url, NULL, 0, INTERNET_FLAG_DONT_CACHE, 0);if(NULL == hUrl){goto error1;}hFile = fopen(path, "wb");if(NULL == hFile){goto error2;}iNumber = 1;while(iNumber > 0){InternetReadFile(hUrl, buffer, MAX_BLOCK_SIZE -1, &iNumber);fwrite(buffer, sizeof(char), iNumber, hFile);}fclose(hFile);error2:InternetCloseHandle(hUrl);error1:InternetCloseHandle(hSession);}int main(int argc, char* argv[]){download("http://www.baidu.com", "C:/www.baidu.com.html");return 1;}

The content of this Code is not complex, but it is enough for us. The download function is used to download all web pages. The function has two input parameters: the former is the domain name of the Downloaded web page, and the latter is the address saved locally. The basic operation to download a webpage is to use internetopenurl to download webpage data, and then use fwrite to save it locally. Of course, Windows network library functions help us to omit a lot of work, however, this is not very important for us. After all, the most important thing for us is how to download webpages.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.