"Editor's note" in the "Pioneer" series of business, High-performance, Wang Tao to build beyond the MongoDB NoSQL, we and Wang Tao talk about High-performance, have a business sequoiadb build experience. Readers need to interpret the advantages of each nosql from the data, we also invited relevant experts at home and abroad to MongoDB, SEQUOIADB, Cassandra, hbase four NoSQL Benchmarking, and published test cases, related data and test rules. This time we will give the domestic experts to release the assessment, for reference only. PS: In order to facilitate circulation, especially for everyone to prepare a PDF version, click to download.
Test data
In this test report, we use the standard YCSB test rules published by Yahoo! to compare MongoDB, sequoiadb, Cassandra, HBase, and try to give the application scenarios for each of the different products. In the test configuration, we do our best to make the most available configurations for all products, and final consistency is used at the consistency level.
In the test we will compare the two types of NoSQL database, including the Document-oriented document class database, and the Big-table wide table class database. Since each type of database has many of its own unique characteristics, we cannot represent each feature one by one in the evaluation result. This test is mainly aimed at the performance index of the database under different task types, and only relies on the standard test flow provided by YCSB.
This test will detail the physical environment of the test and the configuration information so that readers can independently validate the results using their environment.
Test Summary
1. Test product
This test mainly compares two types of NoSQL databases, including four different products:
MongoDB (document class, V2.6.1) sequoiadb (document class, V1.8) HBase (Wide table class, V0.94.6-CDH 4.5.0) Cassandra (Wide table class, V1.1.12)
MongoDB as the current market share the highest database, may be a lot of readers concerned about products, providing rich database functions, known as the closest to the relational database of NoSQL products, and SEQUOIADB by the former IBM DB2 team of researchers created, It is said to be able to compete head-on with MongoDB in performance and functionality, as well as provide many MongoDB features such as fragmentation, multiple indexing, and more.
HBase is a member of the Hadoop framework and has been accepted by a wide range of businesses and internet users, and the version we use is 0.94.6 to follow CDH 4.5.0 installation packages, while Cassandra is a product that is similar to HBase, developed by Facebook and open source, Also has a broad user market.
Our tests use the Yahoo Cloud serving Benchmark (YCSB) benchmark, released by the Yahoo! Institute, and have modified and adapted the interface to the latest version of their products. We also provide the SEQUOIADB YCSB test interface in the appendix following the text.
It should be re-emphasized that each of the different products has its own scenario. YCSB testing is a test framework provided by the Yahoo! Institute, but in many scenarios it does not fully display the characteristics of each product. In this test, we tried to use the YCSB framework to give the most objective assessment results. If you have questions about the test results or configurations, we welcome readers to readjust their needs and open the results for your reference.
2. Test Scenario
The YCSB test Framework provides a rich scenario configuration mechanism that allows users to select the amount of data to be imported and the corresponding proportion of additions and deletions. In this test, we imported 100 million data and compared the following scenarios.
scene numbering Scene classification description 1 single record import single record import 2 batch record import batch record import 3 simple query 100% Query 4 query Import Balance 50% import, 50% query 5 update main 95% update, 5% query 6 query main 95% query, 5% update 7 query latest 95% Query, 5% import
For data import scenarios, we distinguish between a single record insert and a BULK insert two scenes. For some databases, the default configuration is to package and send a batch of records to the server on the client, which we classify as a batch record import mode, even though its interface is a single record operation.
Data written and queried simulates the length of a typical log record with the following characteristics:
Attribute Description Field Number 10 Field field name length 6 bytes record Total size 100Bytes all field type string primary key length 23 bytes Total Records 100 million total raw data approximately 100GB data copy number 3
Among them, the SEQUOIADB and MongoDB are configured as a main two from the HBase HDFs set the number of copies for the 3;cassandra table when the use of parameter replication_factor=2.
At the consistency level, we use the weakest final consistency, and the write concern is set to 1.
3. Test environment
In this test, the test environment contains a total of 4 Dell R520 physical machines as data stores. The YCSB program that generates the data runs in the same physical environment as the database.
Note: The use of stand-alone servers for YCSB data generation can lead to gigabit network bottlenecks.
The entire cluster topology is shown in Figure 1:
Figure 1: Testing the cluster topology
Server environment. This test database server uses 4 Dell R520 physical machine environments, each configured as follows:
type parameter CPU Intel (R) xeon®cpu e5-2420 1.9GHZ (6 core) memory DDR3 48GB Disk 6 block built-in SATA hard drive, 2tb/block network Gigabit Ethernet operating system Red Hat Enterprise Linux S Erver Release 6.4
kernel-release:2.6.32-358.e16.x86_64 JDK Oracle JDK 1.6
4. Test methods
This test uses the YCSB standard and is performed on four physical machines. The test process for each of the different products is as follows:
The installation software is based on a four-node deployment cluster, configured as much as possible based on the following guidelines:
High-availability configuration The final consistency feature is consistent with a single node environment full use of hardware resources
deployment of YCSB clusters in four physical machines, the number of statistics generated by YCSB to generate records when data is written to the local cluster the resulting Excel file is repeated for other scenarios
Concurrency is based on the following rules:
Single record insert per server 24 thread bulk record insert per Server 8 threads all other operations 36 threads per server
Free Subscription "CSDN cloud Computing (left) and csdn large data (right)" micro-letter public number, real-time grasp of first-hand cloud news, to understand the latest big data progress!
CSDN publishes related cloud computing information, such as virtualization, Docker, OpenStack, Cloudstack, and data centers, sharing Hadoop, Spark, Nosql/newsql, HBase, Impala, memory calculations, stream computing, Machine learning and intelligent algorithms and other related large data views, providing cloud computing and large data technology, platform, practice and industry information services.