Http://www.lupaworld.com/article-213231-1.html
Oceanbase is a high-performance distributed database system that supports massive data, and implements hundreds of millions of records, hundreds of TB data across the row across the transaction, by the Taobao core System Research and Development Department, operational dimension, DBA, advertising, application research and development departments together to complete.
oceanbase Solve what problem
The core assets of many companies are all kinds of business data, such as Taobao's goods, trades, orders, shopping hobbies, etc., which are usually structured and have a variety of associations between the data, the traditional relational database is the best carrier of these data. However, with the rapid growth of the business, the data has ballooned, the number of records from tens of millions of to billions of, the amount of data from the hundred GB to several TB, the future may increase to hundreds of millions of and hundreds of TB, the traditional relational database can not afford such a large amount of data. Oceanbase solves the problem of increasing structured data storage and querying.
From the theoretical perspective of Professor Eric Brewer's cap (consistency c:consistency, usability a:availability, partitioned fault-tolerant p:tolerance of network Partition), as an e-business enterprise, Taobao and other companies ' business for consistency and usability requirements than partition fault tolerance, data characteristics is a large amount of data and gradually increase the amount of data in a unit time is not large, but the real-time requirements are high. This requires that we provide a system that is more focused on supporting CA characteristics, while taking into account the scalability and performance in terms of real-time, cost, and performance.
the architecture of the Oceanbase
Schematic diagram of OCEANBASE logic structure
▲
some basic concepts of oceanbase architecture
Primary key
Row key, also known as primary key, is similar to a DBMS's primary key, unlike a DBMS, where the primary key of Oceanbase is always a binary string (binary string), but it can have some sort of structure. Oceanbase table data in order of primary key
Sstable
A data storage format used by a oceanbase to store a section of a table or tables in successive primary key data.
Tablet
A table (front open, closed) range by primary key, usually containing one or several sstable, and the amount of data in a tablet is usually around 256MB.
Benchmark data and Dynamic Data
Oceanbase in an incremental way to record the deletion of the table data for a period of time, so as to maintain the table main data in a relatively stable period of time, including additions and deletions of the data is called Dynamic Data (usually in memory, also known as the Memory table), and over a period of time relatively stable principal data is Dynamic Data after baseline data and dumps (saved to SSD solid-state disks or disks) is stored in sstable format
Chunkserver
The server that holds the baseline data, usually multiple, to avoid interruption of service due to software hardware failure, the same benchmark data is usually saved in 3 copies and stored on different chunkserver.
Updateserver
A server that holds dynamic data, typically a single server. In order to avoid interruption of service caused by software hardware failure, Updateserver records commit log and typically uses two-machine hot standby
Mergeserver
Servers that perform static dynamic data consolidation often share a physical server with Chunkserver. Mergeserver enables users to access complete and up-to-date data
Rootserver
Configure the server, typically a single server. To avoid interruption of service caused by software hardware failure, Rootserver records commit log and is usually hot standby with two machines. Because the rootserver load is generally very light, it is often shared with updateserver physical machines
Frozen
Refers to the dynamic data (also known as the Memory table) of the update to a certain time, or the amount of data reached a certain scale, oceanbase stop the change of the block dynamic data, subsequent updates to write new Dynamic Data block (that is, new memory table), the old dynamic Data block no longer modified, this process is
Dump
The process of persisting a frozen dynamic block of data (a memory table) (converted to sstable and saved to an SSD solid disk or disk) for reasons such as memory saving or persistence
Data Merge (merge)
The process by which the base data of a query item is merged with its dynamic data (that is, additions and deletions) to get the latest results of that data item. In addition, the process of combining old baseline data with frozen Dynamic data to generate new datum data is also known as data consolidation
Table (Join)
A table is based on the left connection of a primary key with another or several tables, similar to the natural connection of a DBMS
COW
Copy on write abbreviation, in oceanbase specifically btree copy data backup write when updating, avoid technical means of system lock
Characteristics of Oceanbase
Oceanbase function
Oceanbase design and implementation of the time to discard the features of the emergency DBMS, such as temporary tables, views, research and development team to focus on the limited resources to key points, the current oceanbase mainly solve the data update consistency, high-performance cross-table read transactions, range query, join, Data full volume and incremental dump, batch data import.
Oceanbase data Access Features
Although the total amount of data is relatively large, but like many industries, Taobao business for a period of time (such as hours or days) of the data additions and deletions are limited (usually not more than tens of millions of times a day to hundreds of millions of times), according to this feature, oceanbase for a period of time the modification of additions and deletions are recorded in incremental form ( Called Dynamic Data, usually stored in memory, so that the principal data remains relatively stable for a period of time (called the baseline data).
Because dynamic data is relatively small, oceanbase is typically kept in the memory of a stand-alone server updateserver. The performance of the system write transaction is greatly improved by the memory saving and deletion. In addition, if each modification averaged Bytes, then 10GB memory could record 100M (i.e. 100 million) changes, and the expansion of updateserver memory would increase the amount of modification in memory. Not only that, because the frozen memory table is no longer modified, it can also be converted to sstable format and saved to an SSD solid disk or disk. The memory is released after staging to the SSD solid-state disk and can still provide a higher performance read service, which also alleviates the updateserver memory requirements in extreme cases. In order to cope with machine malfunction, Dynamic Data server Updateserver write commit log and take double machine (even multiple machine) hot standby. Because the Updateserver primary standby is synchronized, the standby can also provide read services.
Because the benchmark data is relatively stable, Oceanbase saves multiple replicas (typically 3) to multiple machines (Chunkserver) after the primary key (primary key, also known as the Row key) segment (that is, the tablet), avoiding service outages caused by a single machine failure, Multiple replicas also improve system service capabilities. The size of a single tablet can be configured according to the characteristics of the application data, and a relatively small tablet will be merged, and the oversized tablet will split.
Because the tablet is kept continuously by the primary key block, Oceanbase's range query by primary key corresponds to continuous disk reads, which is very efficient.
For dynamic data that has been frozen/dumped, Oceanbase's Chunkserver initiates the merging of datum data with the Freeze/dump memory table and generates new datum data when it is not too busy. This merging process is actually a range query, which is a series of continuous disk reads and continuous disk writes, and is also very efficient.
Traditional DBMS provides a powerful transactional, good consistency and short query modification response time, but the data scale is severely restricted, lack of extensibility; modern cloud computing provides a great scale of data, good scalability, but lack of cross row cross table transactions, data consistency is also weak, query modification response time is usually longer, Oceanbase's design and implementation combine the advantages of both:
--------------------------------------------------------------------------------
Updateserver: Similar to the DB role in DBMS, provides response time and good consistency across rows across table transactions and short query modifications.
Chunkserver: Similar to a working machine in cloud computing, such as GFS's chunk server, with multiple copies of data (typically 3), medium-scale data granularity (about 256MB of tablet size), automatic load balancing, downtime recovery, machine Plug and Play, System capacity and performance can be extended at any time.
Mergeserver: Combining Chunkserver and Updateserver to obtain the latest data and achieve data consistency.
Rootserver: Similar to the main controller in cloud computing, such as GFS master, for machine fault detection, load balancing calculation, load migration scheduling, and so on.
--------------------------------------------------------------------------------
The advantages of the above DBMS and cloud computing technology make the oceanbase have the traditional DBMS cross row cross table transaction, the data strong consistency as well as the short query modification response time, as well as cloud computing's massive data management ability, automatic fault recovery, automatic load balancing and good extensibility.
Oceanbase's current application in Taobao
Oceanbase has now been applied to Taobao favorites, used to store Taobao user items and specific merchandise, store information, daily support 4~5 tens of millions of update operations. Waiting for the application of the online also includes CTU, SNS, and so on, updated more than 2 billion daily, updated data volume of more than 2.5TB, and will gradually in Taobao internal promotion, but also look forward to external collaborators.
Main performance data
Testing the hardware and software environment
Red Hat Enterprise Linux Server Release 5.4 (Tikanga)
GCC version 4.1.2 20080704 (Red Hat 4.1.2-46)
Intel (R) Xeon (r) CPU E5520 @ 2.27GH
Chunkserver & mergeserver:memory 16GB Disk 300GB sas*10 NO Raid
Updateserver & rootserver:memory 48GB Disk 300GB sas*6 Raid1
Test environment Deployment Diagram
▲
Test Data Scale
2.1 billion data, base data 3 backup.
Test schema
Two tables, of which there are 21 columns in table 1, and 11 columns in table 2.
There is a join relationship between the 11 columns in table 1 and the 11 columns in table 2.
A single record is 500 bytes in size.
Test performance graphs
Range data Query
▲
Single data query
▲
When the pressure is maximum, chunkserver single output data 90mb/s, has been close to the gigabit NIC limit
Update data
▲