System Optimization summary, System Optimization

Source: Internet
Author: User

System Optimization summary, System Optimization

I have been in the company for a year and have been doing new businesses without interruption. However, if you want to do a good job in new businesses, you cannot spend too much time on other aspects. In the first half of the year, we spent about 30% of our time on system maintenance. We need to spend time on system maintenance every day while doing new business. It is inevitable that we will submit a ticket. In the first half of the year, as we first came to the company, we needed a lot of time to familiarize ourselves with the current system and analyze advantages and disadvantages, I tried to find out the problem. During this period, many things I wanted to do were not pushed forward. After nearly a year, I participated in the development of code for almost all core modules in our group. Therefore, I have a relatively accurate understanding of the problems, six months later, it took nearly three months to restructure the Optimization System (of course, it was not because the team stopped to rebuild the system, but only to squeeze time in the new business). This is a result of a small amount of work. Here we will share with you:

First, let me briefly introduce several modules of our main business and their corresponding problems: (some problems are caused by large data volumes, while others are system design problems)

  • Submit contract
    • Special product approval is slow, with a maximum of 2 minutes
    • Batch import of items is very slow. The worst result is that the import was not completed in an hour. The result is that the user mistakenly thought that the page was forcibly closed due to a program crash and then re-imported.
    • Batch import of commodities may produce incorrect data, which makes data consistency worse.
    • The Calculation of order price is abnormal, Which is fatal. Order Price is the core business data and affects data analysis of downstream systems.
  • Review contract
    • Order loading is very slow. The maximum time is 15 minutes.
    • Review error. If there are too many orders, an error is reported directly.
  • Order use process
    • Incorrect order Data calculation, which directly causes the user to use more or less inventory
  • Refund
    • Searching for a refund order is a bit like a needle in a haystack. When there are many orders, the order of one user can only be found one page at a time, which is very depressing.
    • Refund error. An error is reported directly when there are many orders.
    • Wrong order refund amount, which is also fatal, same as the order price


When we see the above problems, we can conclude that when the data volume is large, the basic functions of the system cannot ensure normal operation. The priority of the solution is:

  • Ensure that the main function works properly without considering performance
  • Ensures optimal performance when the main function is not affected
  • Optimize user operations to improve user experience

Function malfunction problems:

Let's take a look at this story: In the peak days of a ticket submission last month, online user feedback failed, and the website response speed was very slow. Because we have never encountered similar phenomena before, we subconsciously think that it should not be a program problem, whether it is a server problem, but check that the server load has not found the problem, because the problem has not been solved in the middle of the night, in the end, our leaders just prepared to go to bed and finally put up half of their pants and came to the company to fight with us. Later, we learned that a group of new users used the system, and their characteristic was that there were a lot of order data submitted at a time (more than 2000 at a time ). In the past, the system did not pass the test of a large number of orders, so no problems were exposed. After knowing the situation, it was determined that the program was vulnerable to a large number of orders. The following are a small set of major issues to be addressed:

  • When querying a set, the extended WhereIn method returns an error.

The following query requirements are common. to query data within a specified range, the purpose of expansion is to facilitate the call and ease of understanding.

The following describes the extension methods and problems.

I didn't try to solve the problem of this expression. I changed it to Contains to solve the problem.

  • An error is reported when the number of orders exceeds a certain number during review.

    When the number of orders reaches a certain level, a large number of SQL statements will be generated when updating the database, which is easy to form a network jam for SQL transmission and eventually results in database operation timeout (when 2000 orders were submitted earlier, A single table needs to generate 2000 rows of SQL). The solution is to process data in batches. For example, if you operate a database with 500 rows of data in one batch and change large batches to small batches for processing, there will be no problem.

  • When importing data, if you close the browser, the imported data may be incomplete, resulting in incorrect data.

Original solution: When you import data, a scrolling image is displayed. However, this Scrolling Image is not a real progress bar and you cannot know when the import process will be completed.

New Solution:

    • Give a prompt to the user to correctly induce the user to operate according to the regular process. The user is kind. If our prompt is correct, they will basically execute according to the operation, it will not look for bugs like testers.
    • A certain mechanism is used to implement real progress bar control, so that you can know the quasi-real-time progress of system import without worrying about it.
  • An error occurred while calculating the order amount.

    The most serious case is the amount-related data, which is due to the lack of rigorous system design, resulting in a negative number of amount data. To completely block this vulnerability, I made overlord terms on the entity model:

Therefore, when designing databases and entity models, we must ensure that data is under control and error data cannot be allowed.

  • An error occurred while calculating the number of used orders.

For example, if you purchase an order with 100 data records, you can use it at will. If you use one, you can record a used data on the order, and finally use the data = the number of original orders, if the user performs a refund to deprecate the service, the used data must be recovered. There are two main problems:

    • The logic for determining whether there are still unused instances is not rigorous. Verification should be performed before the update or the previous used data should be used as a condition for updates.
    • The logic for updating the used quantity is not rigorous. Due to the lack of server-side verification, you can open an order on multiple pages to perform refund operations at the same time.

       

      I personally do not recommend using this method to record core data using a database field, because it is always performed based on an intermediate state. If an error occurs, subsequent operations will fail, at least we should have data monitoring to ensure that the result value is normal.

Data Consistency problems:

  • The status of the approved order has not changed, or some orders have been changed successfully. Some orders have failed to be changed.
  • An exception occurred when calling some services.

In addition to discovering and solving the problem from the system perspective, we also adopt a method to reinforce the problem: data monitoring and automatic data repair. The core business data is monitored. When an exception is detected, an alarm is triggered by the relevant owner. For issues that can be confirmed, you can use some tools to automatically modify the data. For example, if a temporary call of some services fails, we can automatically re-call until the call is successful.

Performance problems:

  • The Order List loading is slow. The maximum time is 15 minutes, which affects the review efficiency.
    • How to locate performance bottlenecks

      You can use some tools or manually record logs. Here I use dottrace to write logs specifically:

    • How to optimize
      • Cache is fully enabled, and the same data is only calculated once.
      • Remove the functionality that is not used for business purposes.

        The list page contains a batch order, which contains multiple orders. However, to display the order in a uniform manner, the price and status of the batch order are also displayed, its price is the sum of the prices of all the sub-orders below. The order status is the smallest among all the sub-orders below. Therefore, a recursive query is applied, which consumes a lot of time. After frequently communicating with the business side, I confirmed that the two data that we spent a lot of computing has no substantial effect on the business side, and removed it decisively. The performance improvement is from hell to heaven.

  • The number of batch import orders is slow.
    • Problem
      • It takes more than 20 minutes to import 300 data records.
      • The system reports an error when importing 500 data records.
      • The imported data detection will only be reported after all the data is operated. If the data quality is not high, it will take a lot of time to wait, the result is that the user can export 300 pieces of data three or more times. It may take more than one hour to compute every 20 minutes, the real "exaggeration" of the user is "not imported after half a day ".
    • How to optimize
      • Data Detection is performed in advance to prevent users from waiting. If the key data is invalid, the system prompts the user directly.
      • The original solution is to insert data into the database one by one and change it to batch insert. For details, see performance optimization experience of one batch insert of Multi-table data with EF.

Summary: Our goal is to ensure that users can use the system happily. Therefore, we continuously rebuild and optimize the system. Optimization is relative. We do not have to find the most perfect solution, but ensuring user experience is the most basic requirement.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.