Framework Introduction:
A summary of Cassandra distributed database (due to the relatively small number of Cassandra data, the summary is only a personal understanding, as a reference only):
Cassandra is a kind of nosql database and a lightweight distributed database based on column family storage.
Thrift Framework:
The Cassandra client and server are communicating through the thrift Framework, the thrift Framework is a Cross-language service deployment framework that uses an intermediate language IDL to define RPC interfaces and data types. After the thrift-specific compiler compiles the IDL file, it generates the server-side and Client interface library files for the specified programming language. Users can use the appropriate server-side or client interface to complete their own logical functions. (The server-side program in Cassandra is implemented, as long as it is enabled, we want to complete the work is the client's writing). Implement the so-called RPC (remote procedure Call), the so-called remote procedure Call refers to the concept of a local procedure call, (if you use a function call that is not in the code file in the normal program code, you need to get the offset bit in the stage of the compile link to the desired function in the entire executable file). Which is the pointer to the function, to execute the function by jumping through the pointer to the function's code to execute it, the principle of RPC is to make the function calls that are not in the same executable file to be implemented by network communication in the way of communication, and to set the interface name and parameters of the server and the client in advance in the RPC. Where the server is responsible for implementing the functionality of the interface, when the client invokes the interface of the same name, it uses the communication interface generated by the thrift framework to send the function name and parameter list to the server, which is executed by finding a pointer to a function with the same name in the map container based on the function name and parameter list information. , and the return value of the function is returned to the client by means of communication, and the client receives the return value of the function and completes an RPC call.
In Cassandra, a cassandra.thrift file is saved in the interface directory, which uses the IDL language to define the Cassandra basic data structure and interface. Because itself Cassandra provides the Cassandra.thrift-compiled interface file library by default, it is not necessary to compile it manually. These library files can be used directly to write Cassandra client programs to achieve access to the database server.
Cassandra Storage Data structure:
The logical layer of the Cassandra database has a relatively simple data structure, which is mainly 1.keyspace (equivalent to the table space in the Oracle database), similar to Oracle, There is also the system table space and the user-created table space 2.columnfamily column family (quite Oracle table) 3.column and Supercolumn, which is the smallest element in the Cassandra data structure. Each column is composed of three elements name,value,timemap.
Cassandra the write process for a database cluster:
A column of data under a key:18028682078 column_name=cust_name, column_value= Zheng, timemap=xxxx. The process of writing this column of data to the database may be as follows: First, the key value of the hash, to obtain a token value such as 3, according to the token value to find the corresponding 3rd node, if the node is the client access to the local node, then the local node, if the other nodes through the communication between nodes, Send the data to the corresponding node for storage. If the column family has a specified backup strategy, such as a backup number of 3, the data is written at the same time in the 2 nodes following the token corresponding node. In order to reduce random IO in Cassandra, a write cache mentable is developed, each mentable corresponding to a column family column_family. After the data enters the mentable, the mentable searches for key values of 18028682078 and merges them together. Of course, before entering the mentable to the Commitlog log to write the log, Commitlog rather Oracle in the redo log, in the database exception, resulting in mentable cached data loss to recover. When the amount of data in the mentable reaches a certain later order output to the Sstable file, a sstable file consists mainly of the following four files: The data file (the file that actually holds the information), The filter file, which is used to quickly find out if there is a key in the Sstable file, is a secondary file, the index file (mainly for the position of the data under a key), statistics (mainly save some statistical information). The sstable file is the general concept of these four types of files.
Overall process: Hash key value Get token value find corresponding node and (backup node)-〉 in redo log commitlog register-〉 enter mentable cache-〉 output to sstable file (generate corresponding four class files respectively)
Cassandra the read process of the database cluster:
For example, to read the key value of 18028682078 of all columns, first of all, find the most appropriate node (the specific algorithm is not very clear, estimated to take into account the speed of network communications, such as whether it is the same network segment rack and so on). It then sends a read request to the server of that node, if you need a higher level of consistency, you will also need to read data from other nodes (so that you can compare multiple data, find the latest data based on the timestamp), and after receiving the read request, the server will first look for records in the write cache mentable, For example, find a column with key 18028682078 (Column_name=cust_name, column_value= Zheng, timemap=xxxx), and then read the cache Rowcash to find, For example, found the key for the other column 18028682078 (column_name =cust_addrss, column_value=), and then go to Keycash to find whether there is the key value, get the corresponding data file offset. The above two read caches, if not hit, will be added after the key's data is found. In the read-write cache can only find some of the data in the sstable, the rest need to be retrieved in the data file, first use filter to quickly determine whether the key is stored in the sstable file is looking for, if there is read index index file (in order to speed up the process of filtering, The filter file and part of the index file are loaded as memory at boot time to find the offset of the key corresponding to the data file. Finally read the data file.
The overall process: The client sends read requests to the appropriate node and server side of the backup node-〉 write cache mentable find the key's corresponding data-〉 in read cache Rowcash and Keycash find-〉 finally go to sstable file to find-〉 Finally, according to the requirement consistency, a data is extracted from multiple nodes ' data to satisfy the consistency requirement.
Physical storage and compression of Cassandra databases:
The Cassandra database is based on column family storage, so the physical storage of the Cassandra Columnfamily (which is equivalent to a table in a traditional database) differs from the Pure column storage mode and the pure row storage pattern. For example, key 18028682078 has multiple columns of data, most likely stored in Cassandra: Mentable (Column_name=cust_name, column_value= Zheng, timemap=xxxx) , Sstable1 's data file is stored (column_name =cust_addrss, column_value=, timemap=xxxx), column_name =cust_company value= Guangdong Telecom, TIMEMAP=XXXX) two columns of data, and the two columns of data are physically contiguous. Sstable2 's data file holds column_name =cust_sex, column_value= Male, timemap=xxxx), (column_name =cust_age, column_value= 25,TIMEMAP=XXXX) Two columns of data, which are also physically continuous. This situation occurs because the data in the mentable is intermittent output to the disk to form sstable files, so the first write the key to 18028682078 of the corresponding two columns of data, until the mentable amount of data reached a certain output Sstable1 file , the Sstable2 file is also exported for the second time. The last column is not yet exported to disk and is also stored in the write cache. This storage pattern leads to a problem where the columns in the key are scattered, looking for a column of data that needs to be looked up in multiple sstable files, affecting query speed, and consuming too much file identifier resources. So the concept of compression is introduced, this compression is not to the character set algorithm compression, but refers to the above situation, a number of sstable file merging work, as far as possible to a key under the multiple columns stored in a sstable file. In addition to the compression also refers to remove unwanted redundancy, such as Cassandra database updates and deletes and traditional databases are not the same, in a sense Cassandra is not updated function, the same key under the same column if you keep writing more than one data, Many of these data will be stored in the information file and will not overwrite the old data, and the most recent data will be resolved by reading consistency. In addition, data deletion is the same, the Cassandra Data deletion is only the corresponding data on the deletion of the tag。 is not actually removed from the disk. In this way, the Cassandra database will be a large form of redundancy, and compression can be used to clear the redundancy, such as the compression of only a few copies of the latest copy of the data, delete the deletion of the deleted tag data.
Overall: Compression reduces unwanted data redundancy, combines valid data, and improves read efficiency.