Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
1, the page crawl need fast and comprehensive
We know that the Internet is a dynamic content update, every day there are a lot of people on the Internet to publish new content, or update the old content, search engine is to crawl from the vast number of information in the most consistent with the user search for the purpose of the Web page, the face of the existing mass of information and the geometric level of the increase in the amount of informative The workload of search engines is very large, search engine program every time to update the program to spend a lot of times, especially in the birth of the time, the updated cycle of 8630.html "> sometimes unexpectedly can reach a few months to update, imagine, in a few months, how many Web pages updated and new generation?" Such search results are often lagging behind.
To return to the best search results, search spiders must crawl as much as possible, which requires a search engine to solve many technical problems. is also the major challenge it faces.
2. Mass Storage data
The information on the Internet is huge, the large almost you can not imagine, and every day there will be a lot of new information generation, search engine again crawl these pages, but also must be stored in a certain data format, data structure requirements reasonable, and to have a very high scalability. Data writes quickly, and access is fast enough.
In addition to storing a large amount of information on the page itself, in order to better index and sort, the search engine must also store links between pages and pages, historical data, and many index information. These data volumes are huge. There are certainly many technical challenges to such large-scale data storage and reading.
3, index processing fast and efficient, but also to have scalability
After the search engine captures and stores the page data, it also has to do a lot of indexing on the page. such as the calculation of links between pages, forward index, reverse index, and so on. For example, Google's PR calculation, search engines must do a lot of indexing to quickly return the search results, and the process of indexing, there are a large number of new pages in the production, search engine index processing program has a better scalability.
4. Quick and accurate query processing
The front steps are run in the search engine's daemon, and the query phase is a step that users can see. We in search engine search box input keyword Click Search, often less than a second time search engine can return results to us, although the surface looks simple, but for search engines, it is a very complex process. There are many algorithms involved. It needs to be in less than a second time quickly from the basic requirements of the page to find a reasonable page, ranking in the front of the search engine. We know that Baidu is the most we can see 76 pages of results, Google to a little more, you can see a maximum of 100 pages of results. Article Source: http://www.suptb.cn/reprint Please indicate the source, thank you