HAProxy, PHP, Redis, and MySQL Support 1 billion request architecture Solutions

Source: Internet
Author: User
Tags website server varnish haproxy icinga redis server xeon e5

HAProxy, PHP, Redis, and MySQL Support 1 billion request architecture Solutions

In the development of the company, ensuring the scalability of servers plays an important role in expanding the enterprise's market. Therefore, this puts forward certain requirements for architects. Octivi co-founder and software architect mongoi Orfin will introduce you to a very simple architecture that uses HAProxy, PHP, Redis, and MySQL to support 1 billion requests per week. At the same time, you can also understand the future horizontal scaling approaches and common models of the project.

The following is a translation:

In this article, I will demonstrate a very simple architecture that uses HAProxy, PHP, Redis, and MySQL to support 1 billion requests per week. In addition, I will show the future horizontal scaling methods and common models of the Project. Let's take a look at the details.

Status:

Server:

3 Application nodes 2 MySQL + 1 backup 2 Redis

Data storage:
  1. Redis stores 0.16 billion records, with a data volume of about 100 GB and is our primary data storage.
  2. MySQL stores 0.3 billion records and the data volume is about 300 GB. Normally, it serves as a three-level cache layer.

Application:
  1. Applications process 1 billion requests per week
  2. A single Symfony2 instance with a peak of 700 requests per second (about 550 requests per second on an average workday)
  3. Average Response Time: 30 ms
  4. Varnish, more than 12 thousand requests per second (obtained during stress testing)
  1. Platform:

  • Monitoring:

  1. Icinga
  2. Collectd
  • Applications
  1. HAProxy + Keepalived
  2. Varnish
  3. PHP (PHP-FPM) + Symfony2 Framework
  • Data Storage
  1. MySQL (master-slave configuration), using HAProxy for Load Balancing
  2. Redis (Master/Slave configuration)

Background

About a year ago, a friend found me and put forward a demanding requirement: they were a fast-growing e-commerce start-up company and were ready to develop internationally. At that time, they were still a startup company, and the initial solution had to meet the so-called cost-effectiveness, so they could not invest more money on the server. Legacy systems use standard LAMP stacks, so they have a strong PHP development team. If new technologies must be introduced, they must be simple enough to avoid too much architectural complexity. Then, their current technical team can maintain applications for a long time.

To meet their needs to expand to the next market, architects must use the Scalable concept for design. First, we reviewed their infrastructure:

The old system uses a single modular design concept. The underlying layer is some PHP-based Web applications. This startup company has many so-called front-end websites that mostly use independent databases and share some common code that supports business logic. To put it bluntly, long-term maintenance of such applications is definitely a nightmare: As the business grows, some code must be rewritten. In this case, modifying a website will inevitably lead to inconsistent business logic. As a result, they have to make the same changes on all Web applications.

Generally, this is due to project management issues. The administrator must be responsible for the code that spans multiple code libraries. Based on this point of view, the first step of rectification is to extract key business functions and split them into independent services (this is also a key part of this Article ), that is, the so-called service-oriented architecture follows the "separation of concern" principle throughout the system. Each service is responsible for only one business logic, and a higher level of business functions should also be clarified. For example, this system may be a search engine or a sales system.

Frontend websites interact with services through REST APIs, and responses are in JSON format. For the sake of simplicity, we chose SOAP, a developer-less protocol, because no one wants to parse a bunch of XML.

Extract services that are not frequently processed, such as identity authentication and session management. This is a very necessary step because they have a high processing level. Front-end websites are responsible for this part. Only these websites can recognize users. In this way, we can keep the Service simple enough to have a huge advantage in dealing with expansion and code-related issues, which can be said to have their respective duties and be perfect.

Benefits:

  • Independent subsystems (services) can be easily developed in different teams, developers do not interfere with each other, and efficiency naturally increases.
  • Authentication and sessions are not managed through them, so they cause scaling problems to fly.
  • The business logic is differentiated, so different front-end websites will no longer have functional redundancy.
  • This significantly improves service availability.

Symbiosis disadvantages:

This increases the workload for system administrators. Since services all use independent infrastructure, this will bring more attention to administrators.

It is difficult to maintain backward compatibility. After one year of maintenance, the API method has undergone countless changes. So when the problem occurs, they will inevitably undermine backward compatibility, because the code of each website may change, and many technicians may modify a website at the same time ...... However, after one year, all methods match the document created at the beginning of the project.

Application Layer

With an eye on the request workflow, the first layer is the application. HAProxy Server Load balancer, Varnish, and Symfony2 applications are on this layer. Requests from front-end websites are first sent to HAProxy, and then the Server Load balancer distributes requests to different nodes.

Application node configuration

  • Xeon E5-1620@3.60GHz, 64 gb ram, SATA
  • Varnish
  • Apache2
  • PHP 5.4.X (PHP-FPM), using APC bytecode Cache

We purchased three such servers, with N + 1 redundant configuration in active-active mode. The backup server also processes requests. Because performance is not the primary factor, we configure independent Varnish for each node to reduce cache hit and avoid SPOF ). In this project, we pay more attention to availability. Because Apache 2 is used in a front-end website server, we keep this stack. In this way, administrators will not be troubled by too many new technologies.

Symfony2 Application

The application itself is based on Symfony2, a PHP full-stack framework that provides a large number of components for accelerated development. As a typical REST service based on complex frameworks, it may be questioned by many people. Here is a detailed explanation:

  • Friendly to PHP/Symfony developers. The client IT team is composed of PHP developers. Adding new technologies means that new developers must be recruited because the business system must be maintained for a long time.
  • Clear project structure. Although PHP/Symfony has never been a necessity, it is the default choice for many projects. It is very convenient to introduce new developers because the code is very friendly to them.
  • Many ready-made components. Follow the DRY idea ...... No one is willing to do repetitive work, and we are no exception. We use a large number of Symfony2 Console Component. This framework is very helpful for CLI commands, Application Performance Analysis (debug toolbar), and recorder.

Before selecting Symfony2, we did a lot of performance tests to ensure that the application can support the planned traffic. We developed a concept verification and executed it using JMeter, and we were able to achieve satisfactory results-the response time for 700 requests per second can be controlled at 50 milliseconds. These tests give us enough confidence that even a complex framework like Symfony2 can achieve the desired performance.

Application Analysis and Monitoring

We use the Symfony2 tool to monitor applications and perform very well in collecting the execution time of specific methods, especially those that interact with third-party network services. In this way, we can discover potential weaknesses in the architecture and find out the most time-consuming part of the application.

Lengthy logs are also an indispensable part. We use the PHP Monolog library to process these logs into elegant log-lines for developers and administrators to understand. Note that you should add as many details as possible. The more detailed, the better, we use different log levels:

  • Debug. For example, the request information is sent to an external Web service before it is called, and the response is called from the API after the event occurs.
  • Error: When an Error occurs, the request stream is not terminated, for example, the Error response of a third-party API.
  • Critical, an instant of application crash.

Therefore, you can clearly understand Error and Critical information. In the development/test environment, the Debug information is also recorded. At the same time, logs are stored in different files, that is, "channels" under the Monolog library ". The system has a main log file, which records all application-level errors and short logs of each channel, and records detailed logs from each channel from a separate file.

Scalability

It is not difficult to expand the application layer of the platform. HAProxy performance will not be exhausted in a short time. The only thing to consider is how to redundancy to avoid spof. Therefore, all you need to do now is add the next application node.

Data Layer

We use Redis and MySQL to store all the data. MySQL is more of the three-level cache layer, while Redis is the main data storage of the system.

Redis

During system design, we choose databases that meet the planning requirements based on the following:

  • Performance is not affected when a large amount of data is stored, with about 0.25 billion records
  • Generally, simple GET requests based on specific resources are not found or complicated SELECT operations.
  • Obtain as many resources as possible for a single request to reduce latency

After some investigation, we decided to use Redis

  • Most of the operations we perform have O (1) or O (N) Complexity. N is the number of keys to be retrieved, which means the size of the keyspace does not affect performance.
  • Generally, the MGET command line is used to retrieve more than 100 keys at the same time, so as to avoid network latency as much as possible, rather than performing multiple GET operations in a loop.

We now have two Redis servers using the master-slave replication mode. The two nodes are configured the same, are Xeon E5-2650v2@2.60GHz, 128 GB, SSD. The memory limit is set to 100 GB. Generally, the usage is 100%.

When the application does not exhaust all the resources of a single Redis server, the slave node is mainly used for backup to ensure high efficiency. If the master node goes down, we can quickly switch the application to the slave node. During maintenance and server migration, replication is also executed-it is very easy to convert to a server.

You may guess that when Redis resources are exhausted all the time, all the keys are persistence types, accounting for about 90% keyspace, and all the remaining resources are used for TTL expiration cache. Currently, keyspace has been divided into two parts: one is the TTL set (cache), and the other is used for persistent data. Thanks to the feasibility of "volatile-lru" to maximize the memory settings, the cache key will be removed at least frequently. In this way, the system can keep a single Redis instance running two operations at the same time-primary storage and general cache.

To use this mode, you must always monitor the number of expired keys:

Db. redis1: 6379> info keyspace

# Keyspace

Db0: keys = 16 XXXXXXX, expires = 11 XXXXXX, avg_ttl = 0

When the number of expired keys is close to 0, the more dangerous it is. In this case, the administrator needs to consider appropriate partitions or increase the memory.

How do we perform monitoring? Here Icinga check is used, and the dashboard shows whether the number reaches the critical point. We also use Redis to visualize the ratio of "missing keys.

One year later, we fell in love with Redis, which never disappointed us. This year, the system never experienced any downtime.

MySQL

In addition to Redis, we also use the traditional RDBMS-MySQL. But unlike others, we usually use it as a third-level cache layer. We use MySQL to store objects that are not frequently used to reduce Redis resource usage, so they are stored on hard disks. There is nothing to say here. We just try to keep it as simple as possible. We use two MySQL servers, configured with Xeon E5-1620@3.60GHz, 64 gb ram, SSD. The two servers use local and asynchronous master-master replication. In addition, we use a separate slave node for backup.

High Availability of MySQL

In applications, databases are always the most difficult bottlenecks. Currently, you do not need to consider the scale-out operation. We use the vertical scale-out of Redis and MySQL servers. There is still room for development in this policy. Redis runs on a GB memory server, and it is not difficult to expand to GB. Of course, such a server also has a disadvantage, such as a snapshot, or a simple start-The Redis server takes a long time to start.

After the vertical expansion fails, it must be horizontal expansion. Fortunately, at the beginning of the project, we prepared a structure that is easy to split for the data:

In Redis, we use four "heavy" types for the record. Based on data types, they can be divided into four servers. Instead of using hash sharding, we choose to use record-type sharding. In this case, we can still run MGET, which is always executed on a type key.

On MySQL, structured tables are very easy to migrate to another server-also based on record types (tables ). Of course, once the sharding Based on record type does not work, we will transfer it to hash.

Learned knowledge

  • Do not share your database. Once a front-end website expects to switch sessions to Redis, The Redis cache space will be exhausted, and it will reject the application to save the next cache key. In this way, all the caches will be transferred to the MySQL server, which will cause a large amount of overhead.
  • The more detailed the log, the better. If there is not enough information in log-lines, it will be difficult to quickly locate the Debug problem. In this way, you have to wait until you find the root cause.
  • Using a complex framework in an architecture does not mean low performance. Many people are surprised that we use the full stack framework to support such traffic applications. The secret is that we use tools more intelligently, or even Node. js may become slow. Select a technology that provides a good development environment, and no one expects to use a bunch of unfriendly tools, which will reduce the morale of the development team.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.