The insider of the high-performance database connection pool

Last Update:2017-04-13 Source: Internet

Author: User

Tags cas connection pooling sql injection

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

He Tao

The only product platform architect

He Tao, currently serving in the Platform Architecture Department, is responsible for data access layer, gateway, database middleware, platform framework and other development and design work. In the database performance optimization, architecture design and other aspects have a lot of experience accumulated. Passionate about high-availability, high concurrency, and high-performance architecture research.

You may have this question: The connection pool is similar to the thread pool or object pools, it is a pond to put the connection, the use of the time from the inside to take one, the use of the return, the function is very simple, what can be said.

There may also be a question: high performance on such a large, a small connection pool, how to heel tall on top.

This topic will cover the principles of connection pooling, high performance design, optimization practices, bottlenecks and solutions for existing connection pooling. At the same time will also introduce the only product will be self-research database connection pool products (code: caelus)

Why you should have a connection pool

First look at where the connection pool is located:

The business implementation of the application framework typically accesses databases, caches, or HTTP services. Why add a connection pool to the places you visit?

In the following example, to access MySQL, execute an SQL command, if you do not use a connection pool, which processes need to go through.

1:TCP three-time handshake to establish a connection

2:MYSQL Certified three-time handshake

3: True SQL execution

4:mysql's off.

5:TCP four-time handshake off

As you can see, there is a lot more network interaction that we don't care about in order to execute a SQL.

Pros: Simple to implement.

Disadvantages:

1: More Network IO

2: High load on database

3: Longer response times and lower QPS

4: Apply frequent create connections and close connections, resulting in more temporary objects, GC frequent

5: After closing the connection, there will be a large number of time_wait TCP states (closed after 2 MSL)

Using the connection pooling process

The first time you visit, you need to establish a connection. However, subsequent accesses are reused for the previously created connection.

Advantages:

1: Less network overhead

2: The performance of the system will have a substantial improvement

3: The TIME_WAIT state without trouble

Of course, the reality is often cruel, when we solve a problem, and accompanied by another problem arises.

The biggest challenge with connection pooling: the performance of the connection pool

Number of connections and number of threads performance optimizations

Library DB Deployment structure:

Suppose there are 128 sub-Libraries: 32 servers with 4 schemas per server. With the design of 128 sub-Libraries, a new 128 stand-alone database connection pool is created.

Model of the database connection pool

Characteristics:

1:128 connection pools are completely independent, different schemas also correspond to different connection pools

2: First through the library, read and write strategies to select the corresponding connection pool, and then get a connection from the connection pool to operate

3: After the operation is finished, return the connection to the corresponding connection pool.

Advantages:

Simple structure, decentralized competition

The problems faced:

1: Excessive number of threads

Let's take a look at the number of new threads you need to create a new connection pool.

You can see that as the library increases, the number of threads increases linearly regardless of which connection pool is chosen. Too many threads will result in a large memory footprint: The default of 1 threads consumes 1M of space, and if it is 512 threads, it consumes the 1m*512=512m context switch overhead.

Tips: Because the stack and heap applications are virtual address spaces, they are not freed once used. (Threads do not necessarily occupy 1M of space)

2: Too many connections

Database connection resources are heavier, and as the connection increases, the performance of the database decreases significantly. DBAs typically limit the number of connections per db, such as a limit of 3K. Assume that the database is limited to a single 3k,32 table capacity of 3k*32=96k. If the maximum number of applications is 10, then each app total needs to 128*10=1.28k a connection. The number of applications that the database theoretically supports is 96k/1.28k= 80 units

3: Cannot connect multiplexing

Different schemas under the same physical machine are completely independent and the connection cannot be reused

Optimized database connection Pool model

Characteristics:

1: Only one connection pool, all nodes share threads (resolves an issue with too many threads)

2: Each physical machine corresponds to a host, the host maintains multiple Schema,schema storage connections.

3: Different schemas under the same host can be used for connection multiplexing (resolving problems with too many connections)

Get the connection process:

1: Get the connection need to take the Ip,port and schema information: For example, get Host31 schema1

2: An idle connection is obtained in the schema1 of Host31 first, but SCHEMA1 no idle connection, the idle connection is obtained from SCHEMA2.

3: The connection from SCHEMA2 executes USESCHEMA1, and the connection switches to SCHEMA1.

4: Perform the corresponding SQL operation, after execution, return the connection to the SCHEMA1 pool.

Advantages:

1: Connection multiplexing: Effectively reduce the number of connections.

2: Improve performance: Avoid frequent new connections. The cost of creating a new connection is large, but using the use schema is very expensive

3: Effectively reduce the number of threads. The current scenario requires only 4 threads. 512 Threads required before optimization

Disadvantages:

1: more complex management

2: Does not conform to the JDBC interface specification. The DataSource has only a simple getconnection () interface and no interface for obtaining a connection to the corresponding schema. You need to inherit datasouce to implement a specific interface.

Transaction Statement Performance optimization

Model for performing transactions before optimization

Gets the connection from the connection pool, which is automatically committed by default. In order to open a transaction, you need to perform a setautocommit=false operation, and then execute the specific SQL, when you return the connection, you also need to set the connection to autocommit (need to execute set autocommit=true). You can see the statement that opens the transaction and requires additional execution of two transactions.

Model for performing transactions after optimization

All connections in each schema are grouped according to Autocommit. It is divided into auto-commit (autocommit=true) and non-auto-commit (Autocommit=false). Getting connections takes precedence over the connections in the same autocommit group, and if no connection is available, gets the connection from another group, and then returns it to the corresponding group after the business operation is executed. This mechanism avoids the two transaction statements that enable the transaction to execute more than one execution.

Lock performance Optimization

Common features of connection pooling:

Connection pooling consists of five parts: Get connections, return connections, schedule tasks, maintain components and resource pools

Get the connection:

1: Get timeout: An exception is thrown if the connection is not acquired over the specified time

2: Validity check: When you get a resource from a resource pool, you need to check the validity of the resource, and if it fails, get the connection again. Avoid the error when doing business.

3: Create a connection: You can create it synchronously or asynchronously.

Return Connection:

1: Return the connection: for example, to check the maximum number of idle, to determine whether the physical shutdown or return to the connection pool

2: Destroy connection: Can be destroyed synchronously or can be destroyed asynchronously

Scheduled tasks:

1: Idle check: The main is to check the idle connection, the connection is idle for more than a certain time, the connection will be closed.

2: Minimum number of connections control: The minimum number of connections is generally set. Ensure the minimum number of connections within the current system. If not, a new connection is created.

Component maintenance:

1: Connection state control: idle, use, delete etc State control

2: Exception Handling: The exception for JDBC access is handled uniformly, and if the exception is related to the connection, the connection is destroyed.

3: Cache: Avoid repeated parsing of SQL, the preparestatement mechanism, the SQL resolved objects are cached.

4:JDBC Encapsulation: The implementation of JDBC, the real implementation is the underlying driver, such as Mysql-connector-java.

Resource pool:

1: The resource pool is the place where the connection is placed and the core of the connection pool.

2: All components are basically interacting with resource pools, and the competition for connecting resources is intense. The performance of this service will determine the performance of the entire connection pool.

3: The general resource pool is implemented using the Blockingqueue provided by the JDK. Then whether there is a scheme can be unlocked design, to avoid competition.

Resource pool lock-free design

Get the connection approximate process:

1: Gets the connection from the threadlocal, if there is no idle connection, is obtained from the Global Connection pool (copyonwritearraylist).

2: If there are no idle connections in the global connection pool, new connections are created asynchronously.

3: Determines whether the timeout is greater than the threshold value, and if it is less than the threshold, spin. Otherwise do park hibernation.

4: After the connection is established, the park's thread will wake up

There are four main aspects to achieve the lock-free design: threadlocal,copyonwritearraylist, asynchronous establishment of the connection and spin.

ThreadLocal

1: Each thread has a connection queue. The queue is a reference to the global queue.

2: Get the connection first from the threadlocal inside the connection, if the connection is idle, then use. Otherwise remove, then take one, until the connection is not reached.

3: Return the connection, only need to return to the threadlocal queue, and set the connection to idle state

4: If you use Blockqueue, call poll when you get a connection, call offer when you return the connection, there is a two-time lock competition. After optimization, the cost of two locks is avoided by CAS (when the connection is obtained, the CAS-connected is not idle, and the CAs-connection is used for the idle state when it is returned)

Copyonwritearraylist

1: The Queue usage scenario is: Large read, small write operations, and the data stored is relatively limited. The connection pool scenario is ideal for use with copyonwritearraylist.

2: When obtaining a connection or returning a connection, only the state of the connection is changed via CAs and no additions or deletions are made to the connection pool.

3: General Connection Pool Connection number is more controllable, copyonwritearraylist in the write operation, all connections are copied, the memory has little impact.

To establish a connection asynchronously

After getting to the connection, determine if there is concurrency waiting to get the connection, and if so, establish the connection asynchronously. Avoid waiting for the next connection. If the copyonwritearraylist does not have an idle connection, the connection is established asynchronously.

Spin

This spin is similar to the spin mechanism of the JDK to synchronized. If the timeout period is found to be greater than the set threshold (for example, 10 microseconds), the thread hangs. If it is less than the set threshold, regain the connection and make the choice to avoid the performance overhead of the thread's context switch.

Optimization Tips

Method Inline Optimization

1: Each call method, the thread will create a new stack frame, the new stack frame overhead is relatively large

2:jit is inline optimized at run time, and multiple methods use a stack frame to avoid creating too many new stacks

3:jit method Inline Optimization The default number of bytecode thresholds is 35 bytes, which is less than 35 bytes before being optimized. (Can be set by-xx:maxinlinesize=35)

By modifying the code above, the compiled bytecode is modified to 34 bytes, which satisfies the inline condition.

Heartbeat Statement Selection

Preparestatement mode selection

MySQL driver is the client mode by default, and if you need to turn on server mode, you need to set useserverprepstmts=true. Preparestatement The default client mode and statement are not different for the DB side. It is generally understood that the difference between preparestatement and statement is that preparestatement can avoid SQL injection. But how do you avoid SQL injection?

When you use Preparestatement to set parameters, such as calling setstring (int parameterindex, String x), the parameters that are set are escaped locally to avoid SQL injection.

When executing SQL, the SQL will be replaced with the escaped character and sent to the database execution.

Pscache

Mysqldriver is not turned on by default and can be turned on by setting Cacheprepstmts = True

QueryTimeout

Previously also encountered because of the opening of the QueryTimeout, causing the connection leakage problem.

For details, refer to:

MySQL Driver Bug Discovery Tour: http://blog.csdn.net/hetaohappy/article/details/52091005

Prior to the optimal configuration of the connection pool has been done, for reference:

http://blog.csdn.net/hetaohappy/article/details/51861015

Only the product will self-research connection pool: caelus

Caelus is a high-performance, distributed database connection pool that can be self-developed by the product.

High performance: Improve connection pool performance based on lock-free connection pool design model;
Reduce the number of threads in scenarios where there are more sub-libraries. If there are 128 sub-Libraries, the existing connection pool model requires 128 separate connection pools, each of which requires a thread (1-4 different connection pools) to process the task. The total cost of maintaining 128 to 128*4 threads is significant. The Caelus connection pool will significantly reduce the number of threads.
Connection multiplexing. For a MySQL instance above there are multiple schema scenarios. Connections to different schemas for existing connection pools are not reusable. Caelus can reuse different schema connections to improve performance.
Excessive transaction instructions. In the case of a transactional statement, when the connection is received from the connection pool, the transaction (Setautocommit=false) needs to be opened (set Autocommit=true) to be returned. Two additional transaction instructions are required for each connection to be used. Caelus can effectively reduce the transaction instructions.
The unification of configuration specifications. Combine MySQL settings to provide a uniform, optimal configuration.

This topic explains the performance design optimizations for database connection pooling. About caching (Memcache,redis)

and HTTP connection pooling optimization practices, there is a chance to communicate with you later.

For a pure asynchronous HTTP connection pool design for Netty, refer to:

http://blog.csdn.net/hetaohappy/article/details/51867059

E-commerce Technology Special Month | Mesozoic Technology

Https://toutiao.io/posts/z5vlan/preview

Inside of a high-performance database connection pool

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More