In discuz! NT in the Enterprise version of the design process, the processing of large data tables has been a headache, especially like the topic table (topic), User table (users), etc., because for a large number of traffic and posting forum, after a few years of operation, the two The amount of data in a table may be broken (note: Because the Post table uses a table mechanism, this is not covered here, but for performance reasons, it also provides a similar solution in this article). There are two ways to solve this problem when considering the architecture design:
One is a myspace-like approach, where the records in a large data table are segmented by a certain record key value (such as the UID of the user table), such as the top 2 million users (ie: uid<200w) into a table, 2.004 billion of the users into another table, and so on. Of course, you can put a few tables in a database, can also be placed on other MSSQL databases or instances. However, there are some problems with this scenario, such as when a user table needs to be queried by a table (such as a left jion). For example, our post table for paging query needs to leftist user table, when the use of the table or distributed distribution may face the problem, not only the business logic to change, even in the stored procedure There will be no small change, and there is no question of efficiency. Of course, there are suggestions that you can use data redundancy, for example, in the post table of redundant user information corresponding fields, but the same plan to substantially modify that there is code, and if the user information changes, not only to update the user table, but also to update the corresponding posts in the table redundant fields, if the two are not the same step, Will cause the data to show an exception, of course, at the database level to increase storage costs are also have to pay.
The second is the use of third-party tools to handle large data tables, such as the tokyotyrant,mongodb, such as this article, such as the NoSQL software from the advent of a massive data storage access, and this kind of software is often open source, in addition to the plan to the company's enterprise version of the user contact, found that although their server configuration is very high, but the number is not much, so we should consider how to maximize the reuse of existing machine resources, and such NoSQL software is often ' cost-effective ' very high, that is, with a few resources (memory, CPU, etc.) can achieve unexpected results. Of course, I am still very cautious use of it, that is not immediately as the main data storage tools, but the auxiliary MSSQL database Tools, so you will see this article, the two tools in the enterprise version of the role of the most is a high-level Memcacehd. But my idea is very simple, is any tool and technology, if you do not know it very well or it is very new, then there must be a "check period", if in the ' any ' within it through the assessment, only to be entrusted with the task, such as failure to pass the assessment, will not let the system platform to bear too much technical level of ' risk '.
To sum up, I finally put the direction on the Tokyotyrant,mongodb, the reason for choosing these two tools, mainly based on the following factors:
1. The massive data solution should be able to run on the Linux and window platform. Of course, some people will say that MongoDB can run these two platforms, then why to introduce tokyotyrant? In fact, there are some special circumstances of the product to consider, such as our users of the majority of the data read and write than in 4:1, that is, 5 SQL Access 4 is select operation, 1 is cud operation, which caused the imbalance of reading and writing ratio. Although MongoDB is excellent and stable in read and write performance, there are still some gaps in concurrent reading relative to Tokyotyrant+cabinet (note: See the link for more content, and then this is limited to the results in the stress test environment in our products, not universal, So I hope you have specific problems specific analysis)
2. In view of the fact that some user companies have corresponding technical reserves, the two schemes are also easy for the user to make the technology selection (of course, because of the interface, the user can be fully introduced to other Third-party NoSQL tools to achieve).
OK, so much, let's start today's text.
As I said before, the solution uses the interface, and here's a look at the corresponding interface declaration: