Here we take a seemingly simple function to analyze.
Requirements: A total of forum posts statistics
Additional requirements: real-time updates
In many people's view, this feature is very easy to implement, not just a query that executes a select COUNT (*) to get the results
, huh? Yes, it really just needs a simple query to get the results. However, if we adopt a MyISAM storage
Engine, instead of using the InnoDB storage engine, you can imagine if there are tens of millions of posts in the table where the post is stored
How much does it cost to execute this query statement? I'm afraid no good hardware equipment, I am afraid it will not be able to complete within 10 seconds a
Check it again. If we get a little more traffic, does anyone think it's a simple thing?
Since this query does not work, then we should specifically for this function to build a table, there is only one field, a record, save
Put this statistic, each time a new post is generated, this value is increased by 1, so that we only need to query this table every time we can
In order to get the result, this efficiency is sure to meet the requirements. Sure, the query efficiency will certainly meet the requirements, but if our department
The system post is generated very quickly, in the peak period may have dozens of or even hundreds of posts in the new operation, I am afraid this table will also
It's a nightmare for everyone. Either because of concurrency problems resulting in inaccurate statistical results, or because lock resource contention is a serious cause of integrity
A significant decline in energy.
In fact, the focus of the problem here should not be to achieve the technical details of this function, but the additional requirements of this function "real-time more
New "above. When the number of posts in a forum is very large, how many people will be concerned about whether this statistic is changing in real time?
How many people care about the inaccuracy of this data in a short period of time? I'm afraid no one is going to stare at this statistic and pursue
I sent a post and then back to refresh the page found that the statistics did not add 1? Even if plainly tell the user this statistic
The data is updated every long period of time, so what? Is there a lot of users who are uncomfortable with this?
As long as we remove this "real-time Update" of the additional conditions, we can easily implement this feature. Like I mentioned before.
By creating a statistical table, and then updating the stats within a certain time interval through a scheduled task, which
It can not only solve the efficiency problem of statistical value query, but also can guarantee the efficiency of new posts without affecting the double benefit.
In fact, there are a lot of similar functional points that can be optimized in our application system. such as the list page participation column for some occasions
When the data volume of the table reaches an order of magnitude, it is perfectly possible to not accurately display the total number of messages in the list, and how many
page, and only a approximate estimate or a statistic before the time period is required. This omits the need for our paging program to be
Previous real-time count out the number of records that met the criteria.
In fact, in many application systems, real-time and quasi-real-time, accurate and basic accurate, in many places the performance of the consumption may be
A few performance differences. In the system performance optimization, should try to analyze those can not be real-time and incomplete accurate place, make some
The corresponding adjustment, may bring the unexpected huge performance improvement to everybody.
The accumulation of useless functions makes the system overly complex and affects the overall performance
Many times, adding a feature to your system may not cost you much, and you want to have a run for a while
function is very difficult to remove from the original system.
First of all, for the development department, you might want to rearrange a lot of code to find out the code that might exist with the added functionality
Set the other function points, delete the code that is not associated, modify the associated code;
Second, for the test department, due to functional changes, it is necessary to return to test all the relevant functional points are normal. May be due to bounded
The scope of the regression to a large scale, the test workload is also very large.
Finally, all work participants associated with the removal of a feature in the downline are not able to bring any substantial benefits
The opposite is that it is only possible to bring risk.
As a result of these factors, few companies can have a well-developed project (or function) downline mechanism, there are few public
Can be done in a timely manner to the system some inappropriate functions offline. As a result, the application systems we face may always be more complex and more
The bigger, the short-term complexity may not be too much of a problem, but as time accumulates, the systems we face will become extremely congestion.
Swollen. Not only is maintenance difficult, but performance is also getting worse. In particular, there are some unreasonable features, at the beginning of the design or just on-line
Because of the small amount of data, it can not bring much performance loss. As time goes by, the amount of data in the database is increasing and data retrieval
Increasingly difficult, the resource consumption of the system is increasing.
Furthermore, due to the increasing complexity of the system, the complexity of the subsequent development of other functions may be much simplified
Single function, because of the complexity of the system and have to add a lot of logic to judge, resulting in the calculation of the system application of the increasing amount of itself
Performance will be affected. And if these logical judgments also need to interact with the database through persistent data to complete, the resulting
The performance penalty is greater, and the performance impact on the overall system is even greater.
Are the data stored in our database suitable for storage in the database?
For some developers, the database is one of the most convenient universal storage centers, hoping that what data will be stored in the data
In the library, whether it is data that needs to be persisted, or temporary stored process data, whether it is in plain text format, character data, or
is the binary data of multimedia, all like to plug into the database. Because for the application server, the database is often a
A centralized storage environment, unlike an application server, and there is a dedicated DBA to help maintain it without
Like the application server often requires developers to do some maintenance, and the key is that the database operation is very simple
One, not as complex as file operations or other types of storage.
In fact, the following types of data are not suitable for storing in a database:
1. Binary Multimedia data
To store binary multimedia data in the database, one problem is that the database space resource consumption is very serious, another problem
It is the storage of these data that consumes the CPU resources of the database host. This data mainly includes pictures, audio, video and other
The associated binaries. This data processing is not the advantage of the data, if we are hard to plug them into the database, will certainly
Cause the database processing resource consumption is serious.
2. Flow queue Data
As we all know, the database is needed to ensure the security of transactions (the storage engine that supports transactions) and recoverability.
Log information for all changes is logged. The purpose of the flow queue data is to determine the data in the table where the data will be kept
is insert,update and delete, and each operation generates the corresponding log information. In MySQL, if the
The storage engine that holds the transaction, the amount of this log generation is doubled. And if we go through some mature third-party queue software to
The performance of this queue data processing function will be multiplied.
3. Super-Large text data
For MySQL versions prior to 5.0.3, the VARCHAR type can hold up to 255 bytes of data, if you need to store more
Long text data into a field, we must use the text type (maximum 64KB) field, or even a larger
Longtext type (max. 4GB). The processing performance of text type data is much lower than that of varchar type data.
A lot. Starting with version 5.0.3, the maximum length of the VARCHAR type is adjusted to 64KB, but when the actual data is less than 255
Bytes, the actual storage space is the same as the actual data length, but once the length exceeds 255 Bytes, the memory
The storage space is twice times the actual data length.
Therefore, the large text data stored in the database will not only bring low performance problems, but also lead to the waste of space consumption asked
Problem.
MySQL performance tuning and architecture design (i) impact of business requirements and system architecture on performance