Principles of database sharding and implementation method based on thinkPHP, database sharding thinkphp
Why table sharding? database sharding:
When the data volume and access volume of our data tables are large or frequently used, a data table cannot afford such a large amount of data access and storage. Therefore, in order to reduce the burden on the database, to speed up data storage, we need to divide a table into multiple tables and store a type of data into several different tables. When the table shard cannot meet the requirement, we can also split the database, and several databases are used for storage.
Table sharding has different implementation methods based on different requirements and functions. The following is an example of my project:
Requirement: the two tables of product and product_price are one-to-multiple relationships, and the daily prices of products and products. One product corresponds to several prices. Because the data volume of the product table is large, every day there are more than a million pieces of data, on the day as the unit of table sharding, the unit of month for database sharding, the table named 'product _ price2014-07-20 'table format: the first is the name of the original database table, followed by the date (year-month-day ).
The code for creating a database and a table is as follows:
Function get_product_price_table () {$ db_info = array (); // This month $ newmot_time = date ("Y-m "); // today's date $ newday_time = date ("Y-m-d"); // this month's database, today's table $ db_name = 'acbooking '. $ newmot_time; $ table_name = "product_price ". $ newday_time; // date of Yesterday $ yesday_time = date ("Y-m-d", strtotime ("-1 day"); // retrieve the data of yesterday, get the last id of product_price as the starting value of the new table id $ last_one_product_price = get_info ('product _ price_table_id ', array ('time' => $ yesday_time )); if ($ last_one_product_price ['table _ id_end ']> 0) {$ table_id = $ last_one_product_price ['table _ id_end'] + 1;} else {$ table_id = 1 ;} // create a database
$ Db_string_line = C ('db _ type '). '://'. C ('db _ user '). ':'. C ('db _ pwd '). '@'. C ('db _ host '). '/'. $ new_db_database; $ DB_P = C ('db _ prefix'); $ Model = M ($ table, $ DB_P, $ Db_string_line ); $ db_name = $ Model-> execute ($ SQL); // create a new database tag using SQL statement configuration. The SQL statement is omitted here.
// Create a data table
$Db_string_line = C('DB_TYPE').'://'.C('DB_USER').':'.C('DB_PWD').'@'.C('DB_HOST').'/'.$new_db_database;$DB_P=C('DB_PREFIX');$Model=M($table,$DB_P,$Db_string_line);$table_name=$Model->execute($sql);
// Return to the new database. The new table $ db_info ['database'] = $ db_name; $ db_info ['table'] = $ table_name; return $ db_info ;}
After creating a sub-table, you can store all the data of the day. You can change the table every day, which is highly efficient ....
Why database sharding and table sharding?
1. What is the basic idea of database/table sharding?
Simply put, the data originally stored in a database is stored in multiple databases in blocks, and the data originally stored in a table is stored in multiple tables in blocks.
2. Why database/table sharding?
The data volume in the database is not necessarily controllable. When database and table sharding is not performed, as time and business grows, the number of tables in the database increases and the data volume in the table increases, correspondingly, the overhead for adding, deleting, modifying, and querying data operations will also increase. In addition, due to the inability to conduct distributed deployment, the resources of one server (CPU, disk, memory, IO, etc) it is limited, and the data volume and data processing capability that the database can carry will all experience bottlenecks.
3. Implementation policies for database/table sharding.
There are two types of database/table sharding: vertical sharding and horizontal sharding.
3.1 What is vertical segmentation? A table is divided by functional modules and close relationships and deployed to different databases. For example, we will create and define the database workDB, the product database payDB, the user database userDB, and the log database logDB, it is used to store project data definition tables, product definition tables, user data tables, and log data tables.
3.2 What is horizontal segmentation? When the data volume in a table is too large, we can divide the data in the table according to certain rules, such as userID hash, then, it is stored in multiple tables with the same structure and different databases. For example, in our userDB user data table, each table has a large amount of data, so we can split userDB into multiple userDB with the same structure: part0DB, part1DB, etc, then, we split the user data table userTable in userDB into many usertables: userTable0 and userTable1, and then store these tables to multiple userDB according to certain rules.
3.3 which method should be used to implement database sharding and table sharding? This depends on the bottleneck of the data volume in the database and considers the business type of the Project comprehensively.
If the database causes massive data because of too many tables, and the business logic of the Project is clearly divided and low coupling, the vertical split with simple and easy implementation rules will be the first choice.
If there are not many tables in the database, but the data volume of a single table is large, or the data volume is very high, horizontal segmentation should be selected in this case. horizontal segmentation is more complex than vertical segmentation, it physically splits the originally logically integrated data. In addition to evaluating the granularity of the split, the data Average and load average are considered, in the future, additional data management burden will be generated for project personnel and applications.
In real projects, there are often both the two cases, which requires a balance between vertical splitting and horizontal splitting. Our game project uses both vertical and horizontal splitting. We split the database vertically first, and then split some tables horizontally, usually user data tables.
4. database/table sharding Problems.
4.1 transaction problems.
After database/table sharding, it is difficult to manage database transactions because data is stored in different databases. If you rely on the distributed transaction management function of the database to execute transactions, you will have a high performance cost. If the application helps control the transactions, it will also cause programming burden.
4.2 cross-database and cross-table join.
After database/table sharding is performed, it is difficult to avoid dividing originally logically correlated data into different tables and different databases. In this case, table join operations will be restricted, we cannot join tables in different database shards or tables with different table shards. services that can be completed by a query may need to be completed by multiple queries.
4.3 Additional data management burden and data operation pressure.
The additional data management burden is the most obvious problem of data locating and repeated execution of Data addition, deletion, modification, and query. These problems can be solved through applications, however, this will inevitably lead to additional logical operations. For example, for a user data table userTable that records user scores, the business needs to find the best 100 bits. Before table sharding, only one order by statement can be used, but after table sharding, n order by statements are required to locate the first 1 of each table shard ...... remaining full text>
How does one Insert new data into a table shard in the Thinkphp database?
Table shards generally do not have auto-incrementing primary keys. to maintain consistency with the primary keys of the primary table