In GitLab how we extend the database

Source: Internet
Author: User
Tags connection pooling database load balancing postgresql cpu usage disk usage

For a long time gitlab.com used a single PostgreSQL database server and a single replication for disaster recovery. It worked well in the first few years of gitlab.com, but over time we saw a lot of problems with this setup, for example, the database was under heavy stress and CPU usage was around 70% almost all the time.

When we use PostgreSQL to track these issues, we use the following four techniques:

1. Optimize your application code to make the query more efficient.

2. Use a connection pool to reduce the number of required database connections and associated resources.

3, to balance the load across multiple database servers.

4, The Shard database.

Connection pool

In PostgreSQL, a connection is handled by starting an operating system process, which in turn requires a lot of resources, and more connections (and those processes) will use more resources on your database. PostgreSQL also defines a mandatory maximum number of connections in the Max_connections setting. Once this limit is reached, PostgreSQL will reject the new connection, for example, the following figure represents the setting:

Here our client connects directly to PostgreSQL, so that each client requests a connection.

Through connection pooling, we can have multiple client side connections that reuse a PostgreSQL connection. For example, when there is no connection pool, we need 100 postgresql connections to handle 100 client connections, we need only 10 after using the connection pool, or according to our configured PostgreSQL connection. This means that our connection chart will become as seen below:

Here we show an example where four clients connect to the Pgbouncer, but instead of using four PostgreSQL connections, only two are required.

There are two most common connection pools for PostgreSQL:

    • Pgbouncer
    • Pgpool-ii

Pgpool is a bit special because it's not just a connection pool: it has a built-in query caching mechanism that can load balance across multiple databases, manage replication, and so on. Another pgbouncer is very simple: it is a connection pool.

Database load Balancing

Database-level load balancing is typically achieved using PostgreSQL's "Hot Standby Hot-standby" feature. Jepper is a copy of PostgreSQL that allows you to run a read-only SQL query, as opposed to an ordinary standby machine that does not allow any SQL queries to run standby. To use load balancing, you need to set up one or more hot standby servers and, in some ways, balance the read-only queries across the hosts while sending other operations to the primary server. It is easy to extend such a setting: Simply add multiple hot spares to increase read-only traffic (if needed).

Another benefit of this approach is to have a more resilient database cluster. Even if there is a problem with the primary server, you can continue to process WEB requests using only the secondary server, and of course, you may still encounter errors if these requests end up using the master server.

However, this approach is difficult to achieve. For example, once they contain write operations, the transaction obviously needs to run on the primary server. Also, after the write operation is complete, we want to continue using the master server for a while, because there may not be these changes on the hot standby server when using asynchronous replication.

Sharding

Sharding is the act of splitting your data horizontally. This means that the data is saved on a specific server and is retrieved using a shard key. For example, you can shard data by project and use the project ID as a sharding key. When your write load is high, the Shard database is useful (in addition to a multi-master setting, there is no other easy way to balance the write operation), or it is useful when you have a large amount of data and you no longer use the traditional way to save it (for example, you can't simply put it all into a single disk).

Unfortunately, setting up a shard database is a very large task, even when we use software such as Citus. Not only do you need to set up the infrastructure (different complex programs depend on whether you're running your own datacenter or hosting solution), you need to tweak a large portion of your application to support sharding.

Connection pool for GitLab

For connection pooling we have two main demands:

1, it must work very well (obviously this is required).

2, it must be easy to use in our Omnibus package, so that our users can also get benefits from the connection pool.

Use the following two steps to evaluate both solutions (Pgpool and Pgbouncer):

1, perform various technical tests (whether effective, configuration is easy, etc.).

2. Find out the experience of other users who are using this solution, what problems have they encountered? How to solve it? Wait a minute.

Pgpool was the first solution we explored, mainly because many of the features it offered seemed attractive. Some of our test data can be found here.

Ultimately, based on a number of factors, we decided not to use Pgpool. For example, Pgpool does not support glue-connected sticky connection. The problem occurs when you perform a write and (try) to display the result immediately. Imagine creating a ticket issue and immediately redirecting to this page without expecting HTTP 404, because any server that is used for read-only queries has not received the data. One solution to this situation is to use synchronous replication, but this will bring additional problems to the table, and we want to avoid these problems.

Another problem is that Pgpool load balancing logic is irrelevant to your application by parsing SQL queries and sending them to the correct server. Because this happens outside of your application, you can barely control where the query runs. This can actually be good for some people too, because you don't need additional application logic. However, it also prevents you from adjusting the routing logic if you need to.

Configuration Pgpool is also difficult due to the very many configuration options. Perhaps the reason we finally decided not to use it was the feedback we received from those who used it in the past. Even in the case where most of the cases are not very detailed, the feedback we receive often holds a negative view on Pgpool. Although most of the complaints are related to earlier versions of Pgpool, it still makes us wonder whether it is the right choice to use.

In combination with the questions and feedback described above, we finally decided to use Pgbouncer instead of Pgpool. We performed a similar test with pgbouncer and were very satisfied with the result. It's very easy to configure (and it doesn't need a lot of configuration at first), it's relatively easy to use, just focus on the connection pool (and it's really good), and there's no apparent load overhead (if any). Perhaps my only complaint is that Pgbouncer's website is a bit difficult to navigate.

With Pgbouncer, we can reduce the number of active PostgreSQL connections from hundreds of to only 10-20 by using a transaction pool transaction pooling. We select the transaction pool because the Rails database connection is persistent. In this setting, session pooling cannot allow us to reduce the number of PostgreSQL connections, and thus benefit (if any). By using a transaction pool, we can lower the max_connections setting value of PostgreSQL from 3000 (the reason we are not clear on this particular value) to 300. This configuration of the pgbouncer, even at the peak, we only need 200 connections, which gives us some extra space to connect, such as psql console and maintenance tasks.

You cannot use preprocessing statements for the negative effects of using a transaction pool, because the PREPARE and EXECUTE commands may eventually run in different connections, resulting in incorrect results. Fortunately, when we disabled the preprocessing statements, we did not measure any increase in response time, but we determined that the amount of memory used on our database servers was reduced by approximately GB.

To ensure that our web requests and background jobs have available connections, we have set up two separate pools: a pool of 150 connected background processes, and a Web request connection pool with 50 connections. We rarely have more than 20 requests for web connections, but for background processes, our spikes can easily reach 100 connections due to the large number of processes running in the background on gitlab.com.

Today, we offer pgbouncer as part of the GitLab EE high availability package. For more information, you can refer to "Omnibus GitLab PostgreSQL High Availability".

Consolidate connection pooling and database load balancing

Consolidating connection pooling and database load balancing allows us to drastically reduce the resources required to run a DB cluster and the load that is distributed to the hot standby. For example, our primary server CPU usage has been hovering at 70%, now it is generally between 10% and 20%, while our two hot standby servers are mostly around 20%:

CPU Percentage

Here, db3.cluster.gitlab.com It's our primary server, and the other two are our secondary servers.

Other load-related factors, such as average load, disk usage, and memory usage, have also improved significantly. For example, the primary server now has an average load of almost no more than 10, rather than hovering around 20 as before:

CPU Percentage

During peak business hours, our secondary servers were around 12000 transactions per second (about 740000 per minute), while the primary server had transactions of about 6000 per second (about 340000 per minute):

Transactions Per Second

Unfortunately, we don't have any data on the transaction rate until we deploy Pgbouncer and our database load balancer.

A summary of our latest stats for PostgreSQL can be found on our public Grafana dashboard.

Some of our pgbouncer are set up as follows:

Set value
default_pool_size 100
reserve_pool_size 5
reserve_pool_timeout 3
max_client_conn 2048
pool_mode Transaction
server_idle_timeout 30

In addition to the foregoing, there is work to be done, such as deployment of service Discovery (#2042), continuous improvement in checking the availability of secondary servers (#2866), and ignoring too many secondary servers (#2197) lagging behind the primary server.

It is worth mentioning that, so far, we do not have any plans to put our load-balancing solution, individually packaged into a library that you can use outside of GitLab, instead, our focus is to provide a reliable load balancing solution for GitLab EE.

If you're interested in it, and like using databases, improving application performance, and adding database-related features to Gitlab (e.g., service discovery), you must check out our recruiting positions and database expert manuals to get more information.

Reference URL: https://linux.cn/article-9048-1.html

In GitLab how we extend the database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.