The contention of time series data processing: MongoDB vs. Cassandra

Source: Internet
Author: User
Keywords nbsp; data processing we can write

Http://www.aliyun.com/zixun/aggregation/13461.html ">mongodb and Cassandra are the two most popular NoSQL databases, MongoDB is the NoSQL field is worthy of the popularity of the king, while the Cassandra is the perennial occupation of the column storage field chief, compared with the hbase of concern for many reasons have been second. Recently mydrive soulutions operations and Architecture director shared the two popular NoSQL databases in their company practice, and gave the relevant comparison, the following is:


                              Use Cassandra to replace MongoDB to manage the leap after a time series
Use scenarios
Mydrive has a data processing platform based on AWS Hosting, which is the responsibility of the Resque worker. As a remote information processing company, we need to deal with a lot of time series data, the initial use of MongoDB. As an initial scenario, MongoDB was initially defined as a temporary option. The
MongoDB performance is also pretty good, but the unpredictability of different sizes of load processing time has caused a lot of trouble, while also making the optimization of data processing pipelines difficult. Based on MongoDB design (as is the case with most relational databases), returning 30 documents and 300 documents will trigger different I/O loads, and 30 documents must be returned faster than 300. Of course, whether 30 or 300, the processing speed is not very fast, even if the use of a very good index and very appropriate examples.
Since then, we have had to extend the MongoDB instance to a certain extent;
Solution
From the start, the target is locked on the Cassandra because the version after AboutUs 5.0 uses Cassandra. It has proven to us strong reliability, visibility, and resilience. These features make Cassandra a worthy killer data store after consolidating the landscape.
The general data stream is written to write data >> read back >> modify >> write >> re-read, these operations are performed by different users. Although this is not the Cassandra design concept, but Cassandra is very suitable for our query method.
Cassandra is really good for time series data, because you can periodically write time series data to 1 columns, and then query for a certain time range of data through partial string comparisons. In this way, using a column is more efficient than using a row, and loading a single line allows you to achieve huge I/O performance. Then Cassandra must read at least one data, and as the data is persisted to disk, all remaining data will be read.
We use the timestamp as the first half of the ID, so that we can do a range query for any period of time--each row reflects a single piece of data, and the column reflects the time seriesA cycle of data. This allows all data to be queried by key and starting and ending times.
Given the type of our workload, the number of write operations can seriously affect read performance when using MongoDB, which is not the case with Cassandra.
Contrast
The above mutation shows performance over a half year, and the Blue line represents the part that is affected by the data storage performance. It is clear that there is a great gap in performance when doing MongoDB optimization and scaling hardware, yet these have no effect on Cassandra.
We phased out the MongoDB, started by stopping writing to it, and finally stopping the read on MongoDB (the red Arrow annotation is used on the picture), but the picture does not show that a lot of batch jobs were added after the transition to Cassandra.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.