Four solutions for centralized/distributed search engines

Source: Internet
Author: User
Tags hash
For search engines, when the index volume and search volume reach a certain level, the efficiency of index update will gradually decrease, and the pressure on servers will gradually increase, therefore, basically, the utilization of the entire search engine is getting lower and lower, and with the difficulties brought by massive data storage, designing a good distributed search engine will be a key factor in the future development of a search engine.
What are the main core issues of distributed search engines?
1. Distribution information acquisition and calculation and data unification
This includes distribution of crawlers/or corresponding data acquisition mechanisms, and unified management of information processing.
2. Distributed storage and management after data processing
It mainly refers to the mechanism of accurate file locating, updating, adding, deleting, and moving.
3. Distribution of front-end search services
Distribution mechanism for processing large-scale concurrent requests
Based on the above three basic requirements, the following four types of distributed search engines can be constructed:
1. Distributed meta search engine
2. Hash distribution search engine
3. P2P distributed search engine
4. Local traversal search engine
The following describes four types of scalable search engines:
1. Distributed meta search:
With multiple single search engines, the central search engine uses the results of these distributed single search engines to match the complete results.
Such a design requires the search engine of each unit to have the same sorting algorithm and basically the same data output structure, so that it can be sorted by the central search.
For such search engines, the key design is that the indexes owned by each unit do not constitute duplicates, but data is collected (crawlers) you can use an independent system to obtain the information and then distribute it to each unit according to the rules.
Advantages: The design is simple and fast, and any unit can be removed at any time, but it does not affect much.
Disadvantages: it is not a good solution for large-scale concurrency
2. Hash distribution search engine
The indexing server and the document server are hashed according to the Query, so that any index word can be accurately located on the specific indexing server and thus located on the correct document server.
Advantages: compression, simple design
Disadvantages: it is difficult to dynamically adjust the capacity of a single index server or document server.
3. Peer 2 peer Search Engine
The famous Napster is such a design. It uses centralized indexing to match the file Source formed by a single computer distributed around the world, it constitutes one of the world's largest p2p search engines.
In this design, the central indexing server only records some relatively critical information, such as location (IP, serial number), song name, author, etc, other information can be obtained from any computer that is online and has full information of this article. At the same time, p2p can also create cache for some intermediate routes based on the search, that is, some search results are stored on a single or similar node to speed up the search.
Advantages: it can be super large, and there is basically no maintenance cost.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.