Support 1 billion requests per week using Haproxy, PHP, Redis, and MySQL

Source: Internet
Author: User
Tags fpm varnish haproxy icinga redis server

In the company's development, to ensure that the scalability of the server to expand the market needs of enterprises have an important role, therefore, this to the architect put forward some requirements. Octivi co-founder and software architect Antoni Orfin will introduce you to a very simple architecture that can support 1 billion requests per week using Haproxy, PHP, Redis, and MySQL. At the same time, you can understand the future of the project's lateral expansion path and common patterns.

State
    • Server
      • 3 Application Nodes
      • 2 x mysql+1 Backups
      • 2 Redis
    • Application
      • Application processes 1 billion requests per week
      • Peak 700 Requests/sec single Symfony2 instances (average working days approx. 550 Requests/sec)
      • Average response Time 30 ms
      • Varnish, request more than 12,000 times per second (obtained during stress testing)
    • Data storage
      • Redis stores 160 million records, data volume of about 100GB, and it is our primary data storage
      • MySQL stores 300 million records with a data volume of approximately 300GB, usually as a level three cache layer
Platform

    • Monitoring:
      • Icinga
      • Collectd
    • Application
      • HAProxy + keepalived
      • Varnish
      • PHP (PHP-FPM) + Symfony2 Framework
    • Data storage
      • MySQL (master-slave configuration), load balancing using Haproxy
      • Redis (master-slave configuration)
background

About 1 years ago, a friend found me and made a demanding request: they were a fast-growing e-commerce start-up, and were ready for international development. At that time they were still a start-up company, and the initial solution had to be cost-effective, so it was impossible to put more money on the server. Legacy systems use the standard lamp stack, so they have a strong PHP development team. If new technologies have to be introduced, these technologies must be simple enough, without too much architectural complexity, and their immediate technical teams can maintain their applications for a long time.

To meet their needs to expand into the next market, architects must design with a scalable concept. First, we look at their infrastructure:

The old system uses a single modular design approach, with some PHP-based Web applications at the bottom. The start-up company has many so-called front-end sites, most of which use separate databases and share some common code that supports business logic. It is no surprise that long-term maintenance of this application is a nightmare: because as the business progresses, some code must be rewritten so that modifying a site will inevitably lead to inconsistent business logic, so that they have to make the same changes on all Web applications.

Typically, this is due to project management issues, and administrators must be responsible for the code that spans multiple code repositories. Based on this view, the first step of rectification is to extract the core business key functions and split them into separate services (this is also a key part of this article), the so-called service-oriented architecture, throughout the system to follow the "separation of concern" principle. Each service is responsible for only one business logic, as well as a higher level of business functionality. An example of an image is that the system may be a search engine, a sales system, and so on.

The front-end Web site interacts with the service through the rest API, and the response is based on JSON format. For the sake of simplicity, we did not choose Soap, a developer who did not love the protocol, because no one is willing to parse a heap of XML.

Extract services that are not frequently handled, such as authentication and session management. This is a very necessary link, because they have a higher level of processing. The front-end Web site is responsible for this section, only they can identify users. This way we can keep the service simple enough to handle extensions and code-related issues with a huge advantage, all of which is flawless.

Benefits of:
    • Independent subsystems (services) can be easily developed in different teams, developers do not interfere, the efficiency of course, improve.
    • Authentication and sessions are not managed by them, so the extension problems they cause are missing.
    • Business logic is differentiated and there is no redundancy in the front-end sites.
    • Significantly improves the availability of services.
Disadvantages of Symbiosis:

Bring more work to the system administrator. Since the service uses a separate infrastructure, this will give administrators more room to focus on.

Backward compatibility is difficult to maintain. After one year of maintenance, countless changes have taken place in the API approach. So the problem is that they are bound to break backwards compatibility, because each site's code can change, and there may be many technicians who modify a website at the same time ... However, a year later, all methods match the document that was created at the beginning of the project.

Application Layer

Focus on the request workflow, the first layer is the application. Haproxy load balancers, varnish, and Symfony2 applications are all at this level. Requests from the front-end Web site are first passed to Haproxy, and then the load balancer assigns him to different nodes.

Application node Configuration
    • Xeon [Email PROTECTED],64GB ram,sata
    • Varnish
    • Apache2
    • PHP 5.4.X (PHP-FPM), using APC byte-code caching

We purchased 3 such servers, N+1 redundant configured active-active mode, and the backup server also handles requests. Because performance is not the primary factor, we configure separate varnish for each node to reduce cache hit and also avoid a single point of failure (SPOF). In this project, we pay more attention to usability. Because Apache 2 is used in a front-end Web server, we keep this stack. This way, administrators will not be bothered by too many new technologies to join.

Symfony2 applications

The application itself is built on Symfony2, a PHP full stack framework that provides a lot of accelerated development components. As a typical rest service based on a complex framework may be questioned by many, here is your detail:

    • friendly to php/symfony developers. the client It team is made up of PHP developers, and adding new technology will mean that new developers must be recruited because the business system must be maintained for a long time.
    • a clear project structure. Php/symfony, though never a necessity, is the default choice for many projects. Introducing new developers will be very handy because the code is very friendly to them.
    • many ready-made components. Follow dry thought ... No one is willing to take the effort to do repetitive work, and we are no exception. We used a lot of Symfony2 Console Component, which is very useful for CLI commands, as well as application performance Analysis (Debug toolbar), loggers, etc.

Before choosing Symfony2, we did a lot of performance testing to ensure that the application can support the planned traffic. We developed the proof of concept and executed it using JMeter, and we got a satisfying result--response time of 700 requests per second can be controlled in 50 milliseconds. These tests give us enough confidence to believe that even a complex framework like Symfony2 can achieve the desired performance.

Application analysis and Monitoring

We use the Symfony2 tool to monitor applications and perform well in collecting specified method execution times, especially those interacting with third-party network services. In this way, we can identify potential weaknesses in the architecture and find the most time-consuming parts of the application.

Lengthy logs are also an integral part of using the PHP Monolog library to process these logs into elegant log-lines, which is easy for developers and administrators to understand. The important thing to note here is to add as much detail as possible, and the more detailed the better, the different log levels we use:

    • Debug, something that might happen. For example, the request message is passed to an external Web service before it is called, and the response is called from the API after it occurs.
    • Error , the request flow is not terminated when the fault occurs, such as an error response from a third-party API.
    • Critical, the application crashes instantly.

Therefore, you can clearly understand the error and critical information. In the dev/test environment, the debug information is also recorded. At the same time, the logs are stored in different files, the "channels" under the Monolog library. The system has a master log file that records all application-level errors, as well as a short log of each channel, and logs from individual channel in a separate file.

Scalability

The application layer of the platform is not difficult, haproxy performance is not exhausted in a short time, the only thing to consider is how to redundancy to avoid a single point of failure. So what you need to do now is just add the next application node.

Data layer

We use Redis and MySQL to store all the data, MySQL is more as a level three cache layer, and Redis is the primary data store of the system.

Redis

In system design, we select the database that meets the planning requirements based on the following points:

    • No performance impact when storing large amounts of data, approximately 250 million records
    • Typically, many are simple get requests based on specific resources, no lookups, and complex select operations
    • Get as much resources as you can to reduce latency on single requests

After some investigation, we decided to use Redis

    • Most of the operations we perform have O (1) or O (n) complexity, and N is the number of keys that need to be retrieved, which means that the keyspace size does not affect performance.
    • Typically, you use the Mget command line to retrieve more than 100 keys at the same time, so that you can avoid network delays as much as possible, rather than doing multiple get operations in a loop.

We now have two Redis servers, using master-slave copy mode. These two nodes have the same configuration and are Xeon [email PROTECTED],128GB,SSD. The memory limit is set to 100GB, and the usage rate is typically 100%.

When the application does not exhaust all the resources of a single Redis server, it is used primarily as a backup to ensure high efficiency. If the primary node goes down, we can quickly switch the application to the slave node. Replication is also performed during maintenance and server migrations-converting a server is straightforward.

You might suspect that when the Redis resource is exhausted, all keys are persisted types, about 90% keyspace, and the remaining resources are all used for TTL expiration caching. Now, Keyspace has been divided into two parts: one is the TTL set (cache) and the other is for persisting data. Thanks to "VOLATILE-LRU" to maximize the viability of the memory settings, the most infrequently used cache keys are removed. As a result, the system can keep a single Redis instance performing two operations simultaneously-primary and universal.

Use this mode to always monitor the number of "expiry" keys:

    1. DB. Redis1:6379> info keyspace
    2. # Keyspace
    3. DB0:keys=16XXXXXXX,expires=11XXXXXX,avg_ttl=0

The closer the number of "expiry" keys is to 0, the more dangerous the situation is, the more the administrator needs to consider the appropriate shards or increase the memory.

How do we monitor? Using Icinga Check here, the dashboard will show whether the number will reach a critical point, and we have used redis to visualize the ratio of lost keys.

A year later, we've fallen in love with Redis, which has never let us down, and this year the system has never had any downtime.

Mysql

Outside of Redis, we've also used traditional rdbms--mysql. But unlike others, we usually use it as a level three cache layer. We use MySQL to store some of the objects that are not often used to reduce the resource usage of Redis, so they are placed on the hard disk. There's nothing to say here, we just keep it as simple as possible. We used two MySQL servers, and the configuration was Xeon [email PROTECTED],64GB RAM,SSD. Two servers use a local, asynchronous primary-primary replication. In addition, we use a separate slave node as a backup.

High availability of MySQL

In an application, the database is always the hardest bottleneck. Currently, there is no need to consider scale-out operations, we are mostly scaled-up Redis and MySQL servers. This strategy still has a certain development space, Redis running on a 126GB memory server, scaling to 256GB is not difficult. Of course, such servers also have disadvantages, such as snapshots, or simple boot--redis server startup takes a long time.

It is necessary to scale out after the failure of the vertical extension, and it is gratifying to note that at the beginning of the project we prepared an easily fragmented structure for the data:

In Redis, we used 4 "heavy" types for records. Based on the data type, they can be fragmented onto 4 servers. Instead of using hash shards, we select sharding based on record type. In this case, we can still run mget, which is always executed on a type key.

On MySQL, structured tables are very easy to migrate to another server-also based on record types (tables). Of course, once the Shard based on the record type no longer works, we'll move to the hash.

Learn the Knowledge
    • don't share your database with others. once, a front-end Web site wanted to switch session processing to Redis, so it was connected directly, and it ran out of redis cache space so that the application could no longer save the next cache key. All caches should be stored when the MySQL result set consumes a significant amount of overhead.
    • The more detailed the log the better . If there is not enough information in the Log-lines, the quick debug problem positioning will become difficult. As a result, you have to wait until another problem occurs until you find the root knot.
    • using a complex framework in a schema does not imply low performance. Many people are amazed that we use a full stack framework to support such a traffic application, the secret of which is smarter use of tools, or even node. js can become slow. By choosing a technology that provides a good development environment, no one expects to use a bunch of unfriendly tools that will reduce the morale of the development team.
Original: http://highscalability.com/blog/2014/8/11/the-easy-way-of-building-a-growing-startup-architecture-usin.html Antoni Orfin Translation: http://www.csdn.net/article/2014-08-14/2821203 Translator: Dongyang

Support 1 billion requests per week using Haproxy, PHP, Redis, and MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.