Senseidb Architecture Design Analysis

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Sensei Introduction

Sensei is an open-source distributed real-time semi-structured database developed by linkin. It mainly supports the following functions (according to official website translation ):

Full-text search
Real-time update
Faceted search
Key-value Query
High concurrent update and query performance
Support integration with hadoop

After a preliminary investigation of the following project, we found that the query syntax of browse Query Language (bql, similar to SQL) is encapsulated on the basis of full-text index, I personally feel that the main advantages of this project are as follows:

Support bql syntax

Compared with Lucene syntax and ease of understanding, it is more friendly to SQL developers.

Simple cluster maintenance

As long as the node configuration is set for the newly added node, the process will automatically join the cluster after it is started. You do not need to modify the configurations of other servers or the Server LIST configuration in the center.

Supports real-time Indexing

Real-time index retrieval is implemented through zoie internally. zoie is an open-source Real-time index engine developed by linkin, and is implemented based on Lucene ramdirectory at the underlying layer.

This article mainly introduces the system architecture of the project and some key processes (indexing, retrieval, resizing, and error recovery.

2. overall architecture

The overall senseidb system architecture is as follows (from the official website ):

The sensei system consists of the following parts:

Sensei Node

Indexes and retrieval requests are actually processed, and index data is saved in the following modules:

Zoie System

Stores index data through zoie to provide real-time Indexing Service. Data in multiple partitions can be stored on a server. Each partition corresponds to a zoiesystem object.

Data Provider

Obtains data from the data gateway and transmits the data to zoiesystem for indexing. The data is distributed based on the partition information.

Broker

Receives user queries, distributes queries to different sensei nodes according to the balancing policy and Server Load balancer, merges data distributed in different partitions, and then formats and outputs the data to the user.

Zookeeper

It is used to maintain the list of server node clusters and other information, so that each server node can collaborate directly. The information basically does not need to be manually modified. You only need to configure the configuration of each node (such as the node number and the list of stored partitions ), after the node is started, the data in zookeeper is automatically modified. Therefore, the sensei cluster management is quite simple.

Data Gateway

Provides the sensei data source module and currently supports the following data sources:

The topic subscription method allows all Sensei nodes to obtain data, which is used in online update mode, but cannot provide security recovery.

File

Obtains the index data from a specified file. It is mostly used to import raw data when creating an index.

JDBC

In fact, it is a persistent Message Queue implemented by the database. A database table with a timestamp-like attribute is required. The sensei node regularly obtains the data of the last obtained version to create an index.

Kafka

A Message Queue implemented by LinkedIn, which has not been investigated.

In fact, in the sensei implementation, the broker and Sensei node are implemented in the same process, and there is no physical separation.

3. Balance Policy

Like most distributed applications, Sensei provides hash balancing based on the value of the balanced field. It provides a hash ing layer, that is, when the system is deployed, the entire system is divided into multiple partitions, and a server loads data from several partitions at the same time.

4. Indexing Process

Data gateway obtains the new index data.
Each Sensei node obtains all the new index data from the data gateway, and filters out its own data for processing. Other data is directly discarded, in this way, the data gateway does not need to pay attention to the partition policy. It is relatively simple when adding nodes.
When processing index data, Sensei node starts a zoie system for each partition, that is, each Sensei node has an independent index for each partition.
Partitions of different sensei nodes can overlap with each other to serve as backup servers. backup servers can also provide retrieval servers to improve load capabilities.

5. Search Process

A user's retrieval request can be sent to any broker. Each broker is equivalent and can provide the same retrieval service.
Brokers are distributed to different sensei Nodes Based on the server Load balancer policies based on the user-queried balancing fields for retrieval.
When distributing retrieval requests to the sensei node, because each Sensei node is responsible for data in multiple partitions, you also need to specify the partitions to be retrieved in the request.
When global search is performed, the broker merges the results of multiple Sensei nodes.

6. Error Recovery

If the data gateway fails, the system will stop the new index, but the search will not be affected. After the data gateway is restarted, each Sensei node will continue to obtain new data.
If a sensei node error occurs, after the sensei node restarts for data sources that can be persisted, such as files, HTTP, and databases, it will re-pull the data from the last data version to create an index, automatic Recovery is completed. However, for non-persistent data sources of JMS, indexes may be lost.

7. Online resizing

The balance policy provided by Sensei is a ing relationship. Therefore, the expansion of Sensei can only limit the splitting of multiple partitions in the same server into multiple servers, if a server has only one partition, only the backup storage can be created to reduce the system load.
The index created by Sensei is pulled from the data source by the sensei node. Therefore, you only need to configure the partition information of the new sensei node and start it directly, the new node automatically crawls the index creation request from the data source and re-creates the index and adds it to the cluster. However, looking at code analysis, if jmq is used as the data source, it seems that online resizing cannot be achieved, you can only stop sending an index creation request to jmq (stop indexing), and then copy the existing partition index file to the new node, start the new server and then send an index creation request to jmq.
It is worth noting that when dividing partitions at the beginning, if too many partitions are divided, the index will be physically divided into multiple partitions, which is more efficient for personal search, however, the global search performance will be affected. If there are too few partitions, the expansion will be limited.

8. References

Official homepage: www.senseidb.com
Shadowhunter blog: http://johnnychenjun.blog.163.com/blog/#m=0&t=3&c=sensei

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Senseidb Architecture Design Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Senseidb Architecture Design Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support