Haproxy Basic Knowledge Collation

Source: Internet
Author: User
Tags haproxy

Directory

1, Haproxy Introduction

2. Haproxy characteristics

3. Haproxy applicable scene

4, Haproxy's Scheduling algorithm introduction

1, Haproxy Introduction

The Haproxy is a free, fast, and reliable solution that provides high availability, load balancing, and proxies based on TCP (which means applications such as reverse proxy MySQL) and HTTP applications, supporting virtual hosts. Haproxy is especially useful for web sites that are large in size (concurrent with 1w or around 2w), and often require session-hold or seven-tier processing. Haproxy runs on today's hardware, fully supports tens of thousands of concurrent connections, and its operating mode makes it easy and safe to integrate into your current architecture while protecting your Web servers from being exposed to the network.

Haproxy implements an event-driven processing model and is a single process model that is robust and supports very large number of concurrent connections. Multi-process or multithreaded models are subject to memory limitations, System Scheduler constraints, and ubiquitous lock limits, which rarely handle thousands of of concurrent connections, and the event-driven model does not have these problems because it implements all of these tasks on the client side (User-space) with better resource and time management. The disadvantage of this model is that, on multicore systems, these programs often have poor extensibility. That's why they have to be optimized so that each CPU time slice (Cycle) does more work.

2. Haproxy characteristics

1, the author developed a unique elastic binary tree data structure, so that the complexity of data structure rose to 0 (1), that is, the search speed will not increase with the data entry speed decreased;

2, support the client's keepalive function, reduce the client and haproxy more than three times handshake leads to waste of resources, so that multiple requests in a TCP connection to complete;

3, support TCP acceleration, 0 replication function, similar to the mmap mechanism;

4. Support Response Pool (response buffering)

5. Support RDP protocol

6, based on the stickiness of the source, similar to Nginx Ip_hash function, the request from the same client is always dispatched to the same server upstream in a certain time;

7, better statistical data interface, there is a Web interface to display the backend cluster of the various servers receiving, sending, rejecting, error and other data statistics;

8, detailed health detection, the Web interface on the upstream server health detection status, and provide a certain degree of management functions;

9. Flow-based health assessment mechanism;

10, based on HTTP authentication;

11, command-line-based management interface

12, based on ACLs, for access control, etc.;

11, log Analyzer, the log can be analyzed.

3, Haproxy of the application of the scene

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6E/1C/wKioL1V0KTDCX7kdAAC-eN4lA7c891.jpg "title=" 1.jpg " alt= "Wkiol1v0ktdcx7kdaac-en4la7c891.jpg"/>

It is known that haproxy is applicable in a variety of scenarios, because it supports TCP, HTTP proxy scenario, so can be in the front of the Web server load balancing, but also in the front of the dynamic server load, such as PHP, Tomcat, etc. can also be used for MySQL such data to do load Balancing cluster, Of course only to read the request to do load balancing, and the MySQL database read and write separation haproxy do not understand, this needs to be done by the front-end program or a professional read-write separator, read-write splitter to the request to Haproxy, and then read by Haproxy load balance.

4, Haproxy's Scheduling algorithm introduction

In short, haproxy scheduling algorithm is divided into static algorithm, dynamic algorithm, mixed algorithm three categories (note: Here The classification is the author's own induction, may not be accurate), and haproxy is a transport layer, the application layer of the load balancer, so the scheduling algorithm can also be targeted at these two levels of implementation.

So how do you tell if an algorithm is dynamic or static? The difference between static and dynamic algorithms, the dynamic algorithm refers to the upstream server weight (weight) can be dynamically adjusted at Haproxy runtime, do not need to restart the haproxy, only need to reread the configuration file; The dynamic algorithm also supports the slow start function of the upstream server. Refers to when the upstream server due to failure or maintenance needs offline maintenance processing, the state of the server from down to up, Haproxy will not be the current algorithm to join the restored server, but let up the server slowly access to the business, to the server a warm-up time.

Static scheduling algorithm:

Static-rr

Similar to Roundrobin, each server based on the weight of polling, it is a static algorithm, means that the server runtime changes in the weight is not valid, the number of backed back-end server is not limited, the server starts immediately into the cluster, the entire cluster load will be broken and re-distributed computing, So slow start is not supported.


First

This algorithm means that all requests are forwarded to the first server in backend, and the client's request is not forwarded to the next server until the connection data reaches the Maxconn number, which is not commonly used.


Dynamic Scheduling algorithm:

Roundrobin

Although called Roundrobin, but in fact refers to the poll with the weight, do not need to maintain the session when the algorithm is available, this is a most fluent, the most fair algorithm, this algorithm is commonly used. This algorithm backend server weights can be dynamically adjusted, that is, after modifying the server's weight, Reload the haproxy.cfg configuration file to take effect; back-end servers support slow start, that is, when a server offline maintenance or failure and re-online, Haproxy will slowly dispatch the request to this server, and will not break the previous schedule, this to the server just on-line a warm-up time. The number of servers in the backend backed can be no more than 4095, which is the size of the actual production environment, so it is equivalent to not limiting the number of backed servers.


Leastconn

The least active connection number algorithm with weights is a dynamic algorithm that represents the least number of connections to a server preference, but this algorithm does not apply to short-session protocols such as Web services, and is recommended for services with long sessions, such as LDAP, SQL, TSE, SSH.


Hybrid algorithm:

Source

Select the backend server with the source IP address as the standard, and this algorithm can be static or dynamic, specified by Hash-type. Usually in the source IP this way there will be two ways to pick the back-end server, one is to use the method of redundancy, there is a consistent hash.

When "Hash-type map-based" indicates that source is a static algorithm, this algorithm to hash the source IP address in addition to the number of backend servers to obtain the remainder is the selected server, if the backend backend server number changes that will affect the global scheduling, the impact of large scope.

When "Hash-type consistent" indicates that the source is a dynamic algorithm, the algorithm puts the backend server on a hash ring, may also be virtual out of many virtual nodes, Server 22 has a hash value range, The hash value of the source IP address is placed on this hash bucket to the clockwise direction of the server is selected out of the server, if the backend backend the number of servers there is a change in the server is only affected by the scheduling, the impact of a small range.

Uri

Let's take a look at the format of a URL: <scheme>://<user>:<password>@

This algorithm is part of the URI (that is, after <port> in the URL, the part before the question mark, that is,:/<path>;<params>) or all URIs (when using "whole" As a URI parameter when using the full URI to do hash, the total weight of the server divided by this hash value to allocate the selection of the backend backend server, as long as the back-end server farm is normal, the same URI address is always dispatched to the same server. This algorithm is commonly used in the backend is the cache server scene, the maximum increase in cache hit ratio, the algorithm is static or dynamic algorithm depends on the value of Hash-type, the algorithm also has "Len" and "depth" two parameters, the parameter followed by a space plus a positive integer, The URI algorithm can be controlled to take the length or depth of bytes in the URI field.

Url-param:

According to the parameters in the URL (that is, the URL format in the "<params>" section) to do the scheduling, the algorithm is static or dynamic algorithm depends on the Hash-type value, such a scheduling algorithm used in similar scenarios, if a site to provide different users to provide services, There are VIP users and non-VIP users of the points, then can be based on this <params> to differentiate user identities, to achieve scheduling to different server groups to provide different services.

HDR (<name>):

<name> is replaced with the corresponding request header, such as "HDR", which indicates the Host header of the request header as the dispatch. Look for the HTTP header field in each HTTP request, with the header field as the dispatch. Same as the ACL function ' HDR () '. Header names enclosed in parentheses are not case-sensitive. If HDR () does not specify any headers, the Roundrobin algorithm is used instead.

Rdp-cookie

Rdp-cookie (<name>)

This scheduling algorithm is based on cookies to do scheduling, not commonly used, detailed I do not understand.


This article is from the "focus on operations, and Linux Dances" blog, please be sure to keep this source http://zhaochj.blog.51cto.com/368705/1659452

Haproxy Basic Knowledge Collation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.