Being a reptile, part of it is comment data. Because there are too many comments, you want to make a list of the reviews that you crawled down.
The current thinking is:
Each table holds 1 million data (1-1000000 saved in table1,1000001-2000000 saved in table2).
Create a string key comment:totalnum in Redis to save the number of comments already in the database.
According to Comment:totalnum, determine the primary key ID of the comment and which table to save it to.
The current thinking is:
$id = $redis->get( 'comment:totalNum' );$tableName = getTableName( $id );$sql = "insert xxx";$res = $db->mysql_query( 'sql' );if( $res ){ $redis->incr( 'comment:totalNum' );}
Because we consider multithreading, there is a problem when a table is approaching 1 million data:
If a process obtains a comment:totalnum of 1000000, the judgment can be placed in table one, and then go to perform operations on the database, but if there is also a process to get the value of Comment:totalnum, then the operation of the database will also be done, The result will be two records added to the database, a primary key is 1,000,000, a primary key is 1,000,001, does not conform to the original intention.
So I would like to ask, there is no better way to achieve:
原子操作 start $id = $redis->get( 'comment:totalNum' ); $tableName = getTableName( $id ); $sql = "insert xxx"; $res = $db->mysql_query( 'sql' ); if( $res ){ $redis->incr( 'comment:totalNum' ); }原子操作 end
Reply content:
Being a reptile, part of it is comment data. Because there are too many comments, you want to make a list of the reviews that you crawled down.
The current thinking is:
Each table holds 1 million data (1-1000000 saved in table1,1000001-2000000 saved in table2).
Create a string key comment:totalnum in Redis to save the number of comments already in the database.
According to Comment:totalnum, determine the primary key ID of the comment and which table to save it to.
The current thinking is:
$id = $redis->get( 'comment:totalNum' );$tableName = getTableName( $id );$sql = "insert xxx";$res = $db->mysql_query( 'sql' );if( $res ){ $redis->incr( 'comment:totalNum' );}
Because we consider multithreading, there is a problem when a table is approaching 1 million data:
If a process obtains a comment:totalnum of 1000000, the judgment can be placed in table one, and then go to perform operations on the database, but if there is also a process to get the value of Comment:totalnum, then the operation of the database will also be done, The result will be two records added to the database, a primary key is 1,000,000, a primary key is 1,000,001, does not conform to the original intention.
So I would like to ask, there is no better way to achieve:
原子操作 start $id = $redis->get( 'comment:totalNum' ); $tableName = getTableName( $id ); $sql = "insert xxx"; $res = $db->mysql_query( 'sql' ); if( $res ){ $redis->incr( 'comment:totalNum' ); }原子操作 end
1.mysql is basically a single-table 1000w data, 100w is very small, do not hurry to divide the table
2. The sub-table needs to be handled according to demand.
Now you need to save the chat record, according to the time dimension of the table, first to do a statistic, a day of the amount of data can be reached, calculate how long it can reach 1000w of data, assuming that the one-month data volume reached 1000w, then one months a table.
Now that you have a problem with the program, the dependency is strong, Redis hangs or data is lost, you need to handle it manually.
If you're doing what you're doing now, this is a conflict problem, and you can use the watch and transactional features of Redis.
Talk is cheap. Show me the Code
$num = $redis->get("num");$redis->watch("num");//开启 mysql 事务begin()//根据 num 插入数据到指定表中insert xxxxxx//开启 redis 事务,进行递增$redis->multi();$redis->incr("num");$incr = $redis->exec();//如果在这期间,其他进程更改了 num 的值,会返回 false, 没有就返回递增后的值if(! $incr ){ //回滚}else{ //提交 mysql 事务}
The number of modulo tables based on the ID number of the comment and then the average insert data on each table do you think it's better? Atomic operation is not allowed.