The evolution of user participation in record storage

Source: Internet
Author: User

1, the amount of data is small, user operation behavior fixed:
Storage: MySQL
Scenario: We take the UID as key, a line of users, each user included as a column storage, such as uid=100, fixed storage for operation A and Operation B, the table structure is as follows:
Table_operation
UID Operation_a Operation_b
100 1 1

If we want to query whether a user participates in a or B, you can achieve the goal by using Sql:select * from Table_operation WHERE uid=100 and action_a=1 directly.

Question: User operation fixed, extended more difficult, if you need to increase user action behavior, you need to increase the field or increase the table storage, the method of increasing the field in a certain amount of data below (such as 1 million) is feasible, if the behavior is irrelevant, then increase the table storage scheme will be very good performance.

2, small amount of data, user action behavior is not fixed:
Compared with scenario 1, the current scenario adds the user action variable in addition to the UID variable, that is, we need to focus on the user and the user manipulating two variables.
Storage: MySQL
Scenario 1: Add the action table, generate the action ID, and the User Action behavior table stores the UID and OID. Inserts a record in the action behavior table when the user performs a new operation. Its table structure is roughly as follows:

Table_operation_info
OID Name
1 operation_a
2 Operation_b

Table_operation
UID OID
1 1
1 2

When you need to query whether user 1 has performed operation A, use Sql:select * from Table_operation WHERE oid=1 and oid=1.
Problem: When the user's operation behavior is more, the user action behavior growth rate is very fast, the data quantity also increases gradually, possibly the MySQL single table cannot load. The solution is described in a subsequent scenario.

3, large amount of data, user behavior fixed
Storage: MySQL
Scenario: Compared with scenario 1, the current scenario is different in that the data volume is larger than the scene 1, and the data is large to the MySQL sheet load. This solution is the problem, when the Tanku is too large, cost-effective methods are generally used by the table. The variable for our current scenario is the UID, as long as it is based on the UID by the horizontal table.

4, large amount of data, user behavior is not fixed
Storage: MySQL
Scenario 1: This scenario applies to situations where the user's action behavior can be categorized, that is, adding two more table operations on the basis of scenario 1, a table by the action behavior category, and a user-specific table. In the current scenario we need to deal with two variables: the action behavior and the user. Two sub-table corresponding to these two variables, according to the business rules to do the operation of the table operations, according to the user ID level segmentation to reduce the amount of data.

Scenario 2: This scenario is a complete horizontal table operation, based on the Scenario 2, split by user level.

5, the amount of data is very large
Storage: MySQL
Scenario 1: Sub-Library, at this time a library has been unable to meet the requirements of the rules based on the previous scenario, according to the actual needs of different libraries can be considered on different machines.
Scenario 2: On the basis of the 0,1, bitwise storage, because an action behavior has not been executed is a state of whether, that is, the state, so we can use a bit to store, 64 bits can store 64 action behavior of the tag.

Other storage
Key-value Database
Our needs do not really need a lot of relational database functions, simple K-V database can realize our function, and in the performance will be improved, after all, do less, will be fast.
Regardless of whether the choice is based on memory, or not memory (can be selected according to actual requirements, can also be hot data in memory, silent data in non-memory), assuming we have enough space to store.
Programme 1:
To Uid+oid as key, the value can store the state, but also can only store whether participate (0 and 1), but there will be too many keys, especially when the amount of data is too large, the number of UID *oid, may be you can not resemble the magnitude.
Programme 2:
In general, the user action behavior of the amount of data is completely less than the user's level, and user action behavior data controllable. If you want to reduce the number of keys, we can use the Oid+ user partition index ID as a key, where the so-called user partition index refers to the user to a certain number of areas, all users are recorded in this interval, such as 10000 for a range, The users with a UID of 1 to 9999 are divided into interval 0, where the user can be stored in 1 and 0 to perform this operation, and a key corresponding to the value initialization store 10,000 0. When a uid=100 user performs an operation, the 100th 0 is placed to 1.
Programme 3:
On the basis of Scenario 2, convert 10,000 0 to 10,000 01 digits, assuming a bit to store 50 bits, you need only 200 of them altogether.
Programme 4:
When the user is very large, most users may not be involved in an operation, then on the basis of scenario 3 we increase the simple sparse matrix compression, add index to each storage bit, when the stored value is not 0 o'clock will be stored.
Programme 5:
I haven't thought about it, I'm looking forward to your sharing.

Summary

• With the increase in the amount of data, the general idea is to divide the metallurgical, when a table to make indefinite time table, when a library to do not time the library, when a machine do not time to add a machine.
• For different storage media options need to consider costs and requirements, all of the options are balanced results.
• Space-saving, storage-by-bit.
• Do not optimize prematurely.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.