Cassandra's Internal Data Storage Structure

Source: Internet
Author: User
Tags cassandra

Data storage rules in cassandra

  1. Data: stores real data files. multiple directories can be specified for the sstable file.
  2. Commitlog: stores data that is not written to sstable (put in the log file before each write ).
  3. Cache: stores cached data in the system (loads cached data from this directory when the service is restarted ).
Reasonably arrange the positions between the above nodes to improve performance.
CommitlogCommilog consists of two parts: Commilog-XXXX.log and Commilog-XXXX.log.header. The Commitlog-XXXX.log file stores the value of the last update operation, and the Commitlog-XXXX.log.header records the data that has been written to sstable from memtable. Cassandra has two methods to record commitlog: "Cycle" and "batch. When the size of a commitlog file exceeds a threshold, a new commitlog is created. Every update of Cassandra information will be written to commitlog, and files will be synchronized at intervals (the cached data will also be synchronized so that all files will be saved in commitlog ).
MemtableAfter data is written to commitlog, It is cached in memtable (each memtable is a columnfamily service ). When the capacity, number, and interval in memtable exceed the threshold, data is written to the disk to form an sstable file. The data saved in memtable is concurrentskiplistmap <decoratedkey, columnfamily>. The data is written in the key order. The advantage of using memtable is to change random Io into sequential io (which seems to be used in many systems ).
SstableAn sstable file of CF1 consists of the following files:
  1. Cf1-e-1-Data.db
  2. Cf1-e-1-Filter.db
  3. Cf1-e-1-Index.db
  4. Cf1-e-1-Statistics.db
(File Name)-(Version Number)-(File ID)-(four different parts)
  1. The filter file is used to quickly locate whether a key is in the sstable file (using a Boolean filter for determination ).
  2. The index file is used for the index file to save the key and the corresponding location in the data file. Hash and then perform binary search when searching in the memory (there is also a cache optimization, how does this feel like the usual method in the kernel, haha ).
  3. Data File Stores the index information of some columns corresponding to the data and key (the index is used to quickly locate the value to be searched ).
  4. The statistics file is used to store the number of columns and the number of rows contained in sstable.

System tablespaceThis part is similar to the practice of many relational databases. In cassandra, in addition to the custom keyspace, there is also a special keyspace: system.
  1. Manage the system metadata information of Cassandra.
  2. Cache hint data.
Welcome to shoot bricks.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.