On the path of code-level performance optimization Change (i.)

Source: Internet
Author: User

First, preface

Hello, I have not discussed the technology with you for a long time, so today I will discuss with you the way of performance change of a project I am responsible for.

Many of the architectural changes or evolutionary articles we've seen before are mostly about architecture, and there are few performance optimizations for code-level descriptions, which are like building buildings that are well-built, but are not professionally built, and have a lot to pay attention to. Then in the inside to fill the brick gavà when there is a problem, the consequence is that the house often leaks, wall cracks and other problems appear, although not the building collapsed, but the building has become a dangerous. So today we will introduce some of the details of the code, you are welcome to spit groove and suggestions.

Second, the server environment

Server configuration: 4 Core CPU 8G Memory Total 4 units
Mq:rabbitmq
Database: DB2
SOA framework: Dubbo within the company's internal package
Cache frame: redis,memcached
Unified Configuration Management System: a system developed within the company

Third, the problem description

1, a single 40TPS, add to 4 servers to 60TPS, the scalability of almost no.
2, in the actual production environment, the frequent occurrence of database deadlock causes the entire service interruption is not available.
3, the database transaction chaos, resulting in a transaction time consuming too long.
4, in the actual production environment, the server often occurs memory overflow and CPU time is full.
5, the process of program development, the consideration is not comprehensive, fault tolerance is poor, often because of a small bug and caused the service is not available.
6, the program does not print the key log, or print the log, information is useless information does not have any reference value.
7, configuration information and small change of information will still be frequently read from the database, resulting in a large database IO.
8, the project is not completely split, a tomcat will be the deployment of multiple project War package.
9. Due to bugs in the underlying platform, or functional defects, program availability is reduced.
10, the program interface has no current limit policy, resulting in many VIP merchants directly to our production environment for pressure measurement, directly affect the real service availability.
11, there is no failure to downgrade the policy, the project after the problem of long-time resolution, or directly rough rollback project, but not necessarily solve the problem.
12, there is no suitable monitoring system, can not be quasi-real-time or early detection of project bottlenecks.

Iv. Optimization of solutions

1. Database Deadlock Optimization Solution
Let's start with the second article and look at a basic example to show the occurrence of a database deadlock:

In the above case, session B throws a deadlock exception, and the reason for the deadlock is that A and B two sessions wait for each other.

Analysis: This problem occurs when we have a large number of transaction +for UPDATE statements in the project, with the following three basic locks for database locks:

When the FOR UPDATE statement is mixed with the gap lock and the Next-key lock lock, it is very easy to deadlock when the usage is not noticed.

What is the purpose of our large number of locks, after the business analysis found that in fact, in order to prevent weight, at the same time there may be more than one payment to the corresponding system, and the anti-weight measures by a record on the way to lock up.

In view of the above problems there is no need to use pessimistic locking method to carry out anti-weight, not only to the database itself caused great pressure, but also for the project extensibility is also a large expansion bottleneck, we used three ways to solve the above problems:
* Using Redis to do distributed locks, Redis uses multiple shards, and one redis hangs up, and it's OK to scramble again.

    • Using the primary key anti-weight method, the method at the entrance of the use of the anti-weight table, to intercept all duplicate orders, when repeated insertion of the database will report a repeat error, the program returned directly.

    • Use the version number mechanism to prevent weight.
      The above three ways must have an expiration time, when the lock a resource timeout, can release resources to let the competition start again.

2. The database transaction takes a long time
Pseudo code example:

publicvoidtest() {    Transaction.begin  //事务开启    try {        //插入一行记录        httpClient.queryRemoteResult()  //请求访问        //更新一行记录        Transaction.commit()  //事务提交    catch(Exception e) {          //事务回滚    } }

There are many such programs in the project, often mixing similar httpclient, or operations that may cause a long time-out, into the transaction code, which not only results in a lengthy transaction execution time, but also severely reduces the concurrency capability.

So when we use transactions, we follow the principle of fast forward and fast, transaction code to be as small as possible . For the above pseudocode, it is not a good practice to split the line with httpclient and avoid mixing with transactional code.

3, CPU time is full analysis
Here is the starting point of the problem as a case I analyzed earlier, first look at the following figure:

In the process of the project, the CPU has been high, and the analysis results are as follows:

    • Database Connection Pool Impact

We simulate the environment on the line, as far as possible in the test environment to reproduce, using the database connection pool as our default c3p0.

Then when the pressure is measured to 20,000 batches, 100 users simultaneously access, the concurrency suddenly dropped to zero! The error is as follows:
Com.yeepay.g3.utils.common.exception.YeepayRuntimeException:Could not get JDBC Connection; Nested exception is java.sql.SQLException:An attempt by a client to checkout a Connection have timed out.

Then for the above error tracking c3p0 source code, as well as on-line search materials:
Http://blog.sina.com.cn/s/blog_53923f940100g6as.html
It is found that the performance of c3p0 under large concurrency is poor.

    • Improper use of thread pool causes
Private Static FinalExecutorservice Executorservice = Executors.newcachedthreadpool ();/** * Asynchronous execution of short-frequency fast tasks * @param task * *  Public Static void Asynshorttask(Runnable Task) {Executorservice.submit (Task);//task.run ();} commonutils.asynshorttask (NewRunnable () {@Override                 Public void Run() {String SMS = sr.getsmscontent ();                    SMS = Sms.replaceall (Finalcode, Aes.encrypttobase64 (Finalcode, Constantutils.getdb_aes_key ()));                    Sr.setsmscontent (SMS);                Smsmanageservice.addsmsrecord (SR); }            });

The scenario for the above code is that every concurrent request comes up, creates a thread, and analyzes the dump log export to discover that more than 10,000 threads are started in the project, and each thread is extremely busy and completely exhausted.

So where is the problem??? It's in this line!

privatestaticfinal ExecutorService executorService = Executors.newCachedThreadPool();

In the case of concurrency, unlimited application thread resources cause severe performance degradation, in the chart the culprit of the parabolic shape is it!!! So how many threads can be produced by this way?? The answer is: The maximum value of integer! See the following source code:

Then try to modify the code as follows:

privatestaticfinal ExecutorService executorService = Executors.newFixedThreadPool(50);

After the modification is completed, the concurrency rises to more than 100 TPS, but when the concurrency is very large, the project GC (the garbage collection ability drops), the analysis reason or because Executors.newfixedthreadpool (50) This line, although solves the problem which produces the infinite thread , but when the concurrency is very large, the use of Newfixedthreadpool this way, will cause a large number of objects piled up into the queue can not be consumed in time, see the source as follows:

You can see that you are using unbounded queues, which means that the queue can hold an unlimited number of executable threads, resulting in large numbers of objects that cannot be freed and recycled.

Final thread pool technology scenario
Programme one:

Note: Because the server CPU is only 4 cores, and some servers even only 2 cores, so in the application of a large number of threads, but will have a performance impact, for such a problem, we have all the asynchronous tasks are removed from the application project, the task is sent to a dedicated task processor processing, Processing completes the callback application. The back-end timer task periodically scans the task table and periodically sends out the unhandled asynchronous task to the task processor for processing.

Scenario Two:
Using the Akka technical framework, here is a simple pressure test that I wrote before:
http://www.jianshu.com/p/6d62256e3327

4, log printing problems
Let's look at the following log print program:

QuataDTO quataDTO = null;        try {            quataDTO = getRiskLimit(payRequest.getQueryRiskInfo(), payRequest.getMerchantNo(), payRequest.getIndustryCatalog(), cardBinResDTO.getCardType(), cardBinResDTO.getBankCode(), bizName);        } catch (Exception e) {            logger.info("获取风控限额异常", e);        }

Code like this is strictly non-conforming, although each company has its own printing requirements.
* First, the log must be printed in Logger.error or Logger.warn way.
* Log Printing format: [System source] Error description [key information], log information to be able to print out the information can be read, there are antecedents and consequences. Even the entry and exit parameters of some methods should be considered for printing.
* When entering the error message, exception do not print it in e.getmessage way.

The proper log format is:

Logger. Warn("[Innersys]-["+ Exceptiontype. Description+"] - ["+ MethodName +"] - "+"errorcode:["+ ErrorCode +"], "+"errormsg:["+ ErrorMsg +"]"E;Logger. Info("[Innersys]-[Enter the parameter]-["+ MethodName +"] - "+ Loginfoencryptutil. getlogstring(arguments) +"]");Logger. Info([ Innersys]-[return result]-["+ MethodName +"] - "+ Loginfoencryptutil. getlogstring(result));

We have a large number of print logs in the program, although we can print a lot of useful information to help us troubleshoot problems, but more of the log volume is too much not only affect the disk IO, more can cause thread blocking on the performance of the program has a large impact.
When using the log4j1.2.14 version, use the following format:

%-5p %c:%L [%t] - %m%n

There are a number of threads that can be blocked at the time of the pressure test, such as:

Again look at the pressure mapping is as follows:

The reasons can be analyzed according to log4j source code as follows:

* * Note: **log4j source in the use of synchronized lock, and then through the printing stack to get the line number, in high concurrency may appear above the situation.

The log4j configuration file is then modified to:

%-5p %c [%t] - %m%n

The above problem is solved, the situation of thread blocking is very rare, which greatly improves the concurrency ability of the program, as shown in:

To be continued, the next step will be "on the path of code-level performance optimization changes (ii)" Please look forward to!

On the path of code-level performance optimization Change (i.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.