Four centralized distributed search engine designs

Source: Internet
Author: User

For search engines, when the index volume and search volume reach a certain level, the efficiency of index update will gradually decrease, and the pressure on servers will gradually increase, therefore, basically, the utilization of the entire search engine is getting lower and lower, and with the difficulties brought by massive data storage, designing a good distributed search engine will be a key factor in the future development of a search engine.

What are the main core issues of distributed search engines?

1. Distribution Information Acquisition and calculation and data unification
This includes distribution of crawlers/or corresponding data acquisition mechanisms, and unified management of information processing.

2. distributed storage and management after Data Processing
It mainly refers to the mechanism of accurate file locating, updating, adding, deleting, and moving.

3. Distribution of front-end search services
Distribution Mechanism for processing large-scale concurrent requests

Based on the above three basic requirements, the following four types of distributed search engines can be constructed:
1. Distributed Meta Search Engine
2. Hash Distribution search engine
3. P2P distributed search engine
4. Local traversal Search Engine

The following describes four types of scalable search engines:
1. Distributed Meta Search:
With multiple single search engines, the central search engine uses the results of these distributed single search engines to match the complete results.
Such a design requires the search engine of each unit to have the same Sorting Algorithm and basically the same data output structure, so that it can be sorted by the central search.
For such search engines, the key design is that the indexes owned by each unit do not constitute duplicates, but data is collected (crawlers) you can use an independent system to obtain the information and then distribute it to each unit according to the rules.
Advantages: The design is simple and fast, and any unit can be removed at any time, but it does not affect much.
Disadvantages: it is not a good solution for large-scale concurrency

2. Hash Distribution search engine
The indexing server and the Document Server are hashed according to the query, so that any index word can be accurately located on the specific Indexing Server and thus located on the correct document server.

Advantages: compression, simple design
Disadvantages: it is difficult to dynamically adjust the capacity of a single Index Server or document server.

3. Peer 2 Peer Search Engine
The famous Napster is such a design. It uses centralized indexing to match the file source formed by a single computer distributed around the world, it constitutes one of the world's largest P2P search engines.
In this design, the central Indexing Server only records some relatively critical information, such as location (IP, serial number), song name, author, etc, other information can be obtained from any computer that is online and has full information of this article. At the same time, P2P can also create cache for some intermediate routes based on the search, that is, some search results are stored on a single or similar node to speed up the search.

Advantages: It can be super large, and there is basically no maintenance cost.
Disadvantages: The update efficiency of the central server is very low, and the information source is unstable.

4. Local traversal Search Engine
This type of search engine can adopt a variety of design solutions. The more feasible one is to establish an information tree after clustering information. You only need to traverse the tree from one branch. Partial traversal should have certain rules, and at the initial stage of design, each index must be arranged in a relatively accurate position so that it can be placed on a suitable node to ensure search efficiency.

Advantages: easy solution to compression, high search accuracy, and high search efficiency
Disadvantages: complicated design, difficult to adjust the location of the index Node

In general, there are many ways to design a search engine. I believe there will be more clever design solutions in the future.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.