Senseidb Architecture Design Analysis

Source: Internet
Author: User
1. Sensei Introduction

Sensei is an open-source distributed real-time semi-structured database developed by linkin. It mainly supports the following functions (according to official website translation ):

  • Full-text search
  • Real-time update
  • Faceted search
  • Key-value Query
  • High concurrent update and query performance
  • Support integration with hadoop

After a preliminary investigation of the following project, we found that the query syntax of browse Query Language (bql, similar to SQL) is encapsulated on the basis of full-text index, I personally feel that the main advantages of this project are as follows:

  1. Support bql syntax

Compared with Lucene syntax and ease of understanding, it is more friendly to SQL developers.

  1. Simple cluster maintenance

As long as the node configuration is set for the newly added node, the process will automatically join the cluster after it is started. You do not need to modify the configurations of other servers or the Server LIST configuration in the center.

  1. Supports real-time Indexing

Real-time index retrieval is implemented through zoie internally. zoie is an open-source Real-time index engine developed by linkin, and is implemented based on Lucene ramdirectory at the underlying layer.


This article mainly introduces the system architecture of the project and some key processes (indexing, retrieval, resizing, and error recovery.

2. overall architecture

The overall senseidb system architecture is as follows (from the official website ):

The sensei system consists of the following parts:

  1. Sensei Node

Indexes and retrieval requests are actually processed, and index data is saved in the following modules:

    1. Zoie System

Stores index data through zoie to provide real-time Indexing Service. Data in multiple partitions can be stored on a server. Each partition corresponds to a zoiesystem object.

    1. Data Provider

Obtains data from the data gateway and transmits the data to zoiesystem for indexing. The data is distributed based on the partition information.

  1. Broker

Receives user queries, distributes queries to different sensei nodes according to the balancing policy and Server Load balancer, merges data distributed in different partitions, and then formats and outputs the data to the user.

  1. Zookeeper

It is used to maintain the list of server node clusters and other information, so that each server node can collaborate directly. The information basically does not need to be manually modified. You only need to configure the configuration of each node (such as the node number and the list of stored partitions ), after the node is started, the data in zookeeper is automatically modified. Therefore, the sensei cluster management is quite simple.

  1. Data Gateway

Provides the sensei data source module and currently supports the following data sources:

    1. JMS

The topic subscription method allows all Sensei nodes to obtain data, which is used in online update mode, but cannot provide security recovery.

    1. File

Obtains the index data from a specified file. It is mostly used to import raw data when creating an index.

    1. JDBC

In fact, it is a persistent Message Queue implemented by the database. A database table with a timestamp-like attribute is required. The sensei node regularly obtains the data of the last obtained version to create an index.

    1. Kafka

A Message Queue implemented by LinkedIn, which has not been investigated.

In fact, in the sensei implementation, the broker and Sensei node are implemented in the same process, and there is no physical separation.

3. Balance Policy

Like most distributed applications, Sensei provides hash balancing based on the value of the balanced field. It provides a hash ing layer, that is, when the system is deployed, the entire system is divided into multiple partitions, and a server loads data from several partitions at the same time.

4. Indexing Process
  1. Data gateway obtains the new index data.
  2. Each Sensei node obtains all the new index data from the data gateway, and filters out its own data for processing. Other data is directly discarded, in this way, the data gateway does not need to pay attention to the partition policy. It is relatively simple when adding nodes.
  3. When processing index data, Sensei node starts a zoie system for each partition, that is, each Sensei node has an independent index for each partition.
  4. Partitions of different sensei nodes can overlap with each other to serve as backup servers. backup servers can also provide retrieval servers to improve load capabilities.
5. Search Process
  1. A user's retrieval request can be sent to any broker. Each broker is equivalent and can provide the same retrieval service.
  2. Brokers are distributed to different sensei Nodes Based on the server Load balancer policies based on the user-queried balancing fields for retrieval.
  3. When distributing retrieval requests to the sensei node, because each Sensei node is responsible for data in multiple partitions, you also need to specify the partitions to be retrieved in the request.
  4. When global search is performed, the broker merges the results of multiple Sensei nodes.
6. Error Recovery
  1. If the data gateway fails, the system will stop the new index, but the search will not be affected. After the data gateway is restarted, each Sensei node will continue to obtain new data.
  2. If a sensei node error occurs, after the sensei node restarts for data sources that can be persisted, such as files, HTTP, and databases, it will re-pull the data from the last data version to create an index, automatic Recovery is completed. However, for non-persistent data sources of JMS, indexes may be lost.
7. Online resizing
  1. The balance policy provided by Sensei is a ing relationship. Therefore, the expansion of Sensei can only limit the splitting of multiple partitions in the same server into multiple servers, if a server has only one partition, only the backup storage can be created to reduce the system load.
  2. The index created by Sensei is pulled from the data source by the sensei node. Therefore, you only need to configure the partition information of the new sensei node and start it directly, the new node automatically crawls the index creation request from the data source and re-creates the index and adds it to the cluster. However, looking at code analysis, if jmq is used as the data source, it seems that online resizing cannot be achieved, you can only stop sending an index creation request to jmq (stop indexing), and then copy the existing partition index file to the new node, start the new server and then send an index creation request to jmq.
  3. It is worth noting that when dividing partitions at the beginning, if too many partitions are divided, the index will be physically divided into multiple partitions, which is more efficient for personal search, however, the global search performance will be affected. If there are too few partitions, the expansion will be limited.
8. References
  1. Official homepage: www.senseidb.com
  2. Shadowhunter blog: http://johnnychenjun.blog.163.com/blog/#m=0&t=3&c=sensei

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.