Haproxy is a powerful, flexible and easy to use reverse proxy software, provides a high-availability, load-balanced, back-end server Agent functions, it is in the 7-tier load balancing function is very powerful (support cookie track, header rewrite, etc.), support dual-machine hot standby, support virtual host , has a very good server health check function, when its agent's back-end server failure, Haproxy will automatically remove the server, after the failure to automatically join the server, but also provide an intuitive monitoring page, you can clearly monitor the health of the service cluster in real-time.
--------------------------------------------------------------------------------
Software that implements load balancing on layer four (TCP):
LVS------> Heavy-weight
Nginx------> Lightweight, with cache function, regular expression more flexible
Haproxy------> Analog four-layer forwarding, more flexible
Software that implements reverse proxy on Layer seven (HTTP):
Haproxy------> Natural skills, full support seven Layer agent, session hold, tag, path transfer;
Nginx------> only in the HTTP protocol and the Mail protocol function better, the performance is similar to Haproxy;
Apache------> Poor functionality
--------------------------------------------------------------------------------
Working model diagram of Haproxy
When the number of concurrent requests reaches a certain amount, the use of haproxy for load balancing has obvious advantages, and Haproxy can also be based on the user's cookies, according to the scheduling algorithm, the user has been directed to the previous access to the back-end server, in order to improve the speed of website access, Generally in the backend of Haproxy to configure the cache server, can be static page content cache, or dynamic Web content cache, production environment is necessary to add MySQL cache.
When a user accesses a site domain name, DNS resolves to the external interface Haproxy server, Haproxy requests direct forwarding (TCP) to the rear server, or parses the user request and then sends the same request (HTTP) to the backend server as the client. After obtaining the content returned by the rear server, it is re-encapsulated and responds to the client, at which point Haproxy acts as the intermediate translator.
Haproxy+keepalived build WebLogic High-availability load Balancing cluster http://www.linuxidc.com/Linux/2013-09/89732.htm
Keepalived+haproxy Configuring high-availability Load balancing http://www.linuxidc.com/Linux/2012-03/56748.htm
Centos6.3 under Haproxy+keepalived+apache configuration Note http://www.linuxidc.com/Linux/2013-06/85598.htm
Haproxy + keepalived implement Web cluster on CentOS 6 http://www.linuxidc.com/Linux/2012-03/55672.htm
Haproxy+keepalived Building high-availability load balancing http://www.linuxidc.com/Linux/2012-03/55880.htm
Configure the HTTP load balancer with HAProxy http://www.linuxidc.com/Linux/2015-01/112487.htm
haproxy Configuration
The Haproxy configuration file is divided into four sections:
Global configuration:
Global: Configuration Segment
Agent configuration:
Default configuration: All the same content in backend, frontend, Linsten can be defined here;
Frontend: Previous configuration: Defines the front end socket, accepts the client request;
Backend: Backend configuration: Define back-end allocation rules and interact with backend servers;
Listen: Binding configuration: Binds the specified client directly to the backend-specific server;
There is usually no special need, there is no need to manually debug the configuration file inside the options, most of the default values to meet our needs, and the official documentation says many options are recommended to use the default.
Configuration parameters and Meanings
Global configuration
* Process management and safety-related parameters
-Chroot <jail Dir>: Modify the Haproxy working directory to the specified directory and perform the chroot () operation before discarding the permissions, which can increase the security level of haproxy. However, it is important to ensure that the specified directory is empty and that no user can have write permission.
-Daemon: Let Haproxy work in the background as a daemon, which is equivalent to the function of the "-d" option, and, of course, it can be disabled in the command line with the "-db" option;
-GID <number>: Run Haproxy with the specified GID, it is recommended to use the GID dedicated to running haproxy to avoid the risk of permission problems;
-Group <group Name>: Same as GID, but the specified set name;
-Log <address> <facility> [max level]: Define a global syslog server with up to two definitions;
-Log-send-hostname [<string>]: Adds the current host name at the header of the syslog message, either the name specified for "string" or the current hostname by default;
-Nbproc <number>: Specifies the number of haproxy processes to start, only for the daemon mode haproxy; By default, only one process is started, and due to many reasons, such as debugging difficulties, Multi-process mode is generally used only in scenes where a single process can only open a few file descriptors;
-Pidfile:
-UID: Runs the haproxy process with the specified UID identity;
-Ulimit-n: Sets the maximum number of file descriptors that can be opened per process, which is calculated automatically by default, so it is not recommended to modify this option;
-User: The same UID, but the username is used;
-Stats:
-node: Defines the name of the current node for multiple haproxy processes in an HA scenario when the same IP address is shared;
-Description: Description information for the current instance;
* Parameters related to performance tuning
- Maxconn <number>: Sets the maximum number of concurrent connections accepted by each haproxy process, which is equivalent to the command-line option "-N", and the result of the automatic calculation of "Ulimit-n" is set by reference to this parameter;
- Maxpipes <number>:haproxy uses pipe to complete kernel-based TCP message reassembly, which is used to set the maximum number of pipes allowed per process, and each pipe will open two file descriptors, so "ulimit-n" This value is adjusted as needed for automatic calculation, and the default is MAXCONN/4, which usually appears too large;
- Noepoll: Disable the epoll mechanism on Linux systems;
- Nokqueue: Disable the kqueue mechanism on BSD system;
- Nopoll: Disable poll mechanism;
- Nosepoll: Disable heuristic epoll mechanism in Linux;
- Nosplice: It is forbidden to use kernel TCP reassembly on Linux sockets, which leads to more recv/send system calls, but there are bugs in the TCP reassembly function on the Linux 2.6.25-28 series cores;
- Spread-checks <0..50, in Percent>: In scenes with many servers in the Haproxy backend, it may be an unexpected problem to have a unified server health check after a precise interval ; This option is used to increase or decrease the length of time interval for which it is checked to a certain random duration;
- Tune.bufsize <number>: Sets the size of the buffer, under the same memory condition, the smaller value can allow Haproxy to accept more concurrent connections, larger values allow some applications to use larger cookie information; default is 16384 , which can be modified at compile time, but it is strongly recommended to use the default value;
- Tune.chksize <number>: Sets the size of the check buffer in bytes; a larger value helps to complete text lookups based on strings or patterns in larger pages, but also consumes more system resources;
- Tune.maxaccept <number>: Sets the number of connections that can be accepted at a time when the kernel of the Haproxy process is scheduled to run, a larger value can result in a larger throughput rate, default in single-process mode is 100, and 8 in multi-process mode. Set to-1 to prohibit this restriction, generally not recommended;
- Tune.maxpollevents <number>: Sets the maximum number of events a system call can handle, the default depends on the OS, which saves bandwidth when the value is less than 200, but increases the network latency slightly, while greater than 200 reduces latency. But it will slightly increase the consumption of network bandwidth;
- Tune.maxrewrite <NUMBER>: Set the buffer space reserved for the first part of the rewrite or append, it is recommended to use about 1024 of the size, when the need to use more space, Haproxy will automatically increase its value;
- Tune.rcvbuf.client <number>:
- Tune.rcvbuf.server <number>: Sets the size of the server or client receive buffer in the kernel socket, in bytes; it is strongly recommended to use the default value;
- Tune.sndbuf.client:
- Tune.sndbuf.server:
* Debug-related parameters
*balance algorithm
Balance <algorithm> [<arguments>]
Balance Url_param <param> [check_post [<max_wait>]
Defines a load balancing algorithm that can be used for "defaults", "Listen", and "backend". <algorithm> is used to pick a server in a load balancing scenario that applies only to conditions where persistent information is not available or when a connection needs to be re-dispatched to another server. The supported algorithms are:
- Roundrobin: This is the most balanced and fair algorithm when the server's processing time is evenly distributed based on the weight of polling. This algorithm is dynamic, which means that its weights can be adjusted at run time, however, in design, each backend server can only accept up to 4,128 connections;
- STATIC-RR: Polling based on weights, similar to Roundrobin, but for static methods, resizing its server weights at run time does not take effect, however, there is no limit on the number of backend server connections;
- Leastconn: New connection requests are distributed to back-end servers with a minimum number of connections; This algorithm is recommended in scenarios with longer sessions, such as LDAP, SQL, and so on, which are not very suitable for shorter session application layer protocols such as HTTP; this algorithm is dynamic and can be resized at run time ;
- Source: Hash operation of the originating address of the request and distribution to a matching server by dividing the total number of weights of the backend server; This allows the same client IP request to always be dispatched to a particular server, but when the total server weight changes If a server is down or a new server is added, many client requests may be distributed to servers that are different from the previous request, and are often used to load balance a TCP-based protocol that has no cookie function; it is implicitly static, but can also be modified using Hash-type ; hash-type:map-based: static; hashing algorithm; consistent: dynamic; consistent hashing algorithm
- URI: Hashes the left half of the URI (the part before the "problem" tag) or the entire URI and distributes it to a matching server by dividing it by the total weight of the server; This allows requests for the same URI to always be dispatched to a particular server, unless the server's total weight has changed This algorithm is often used for proxy caches or anti-virus proxies to increase the cache hit ratio; it should be noted that this algorithm applies only to HTTP back-end server scenarios, which are implicitly considered static algorithms, but can also be modified using Hash-type;
- Url_param: The parameters specified by <argument> for the URL will be retrieved in each HTTP GET request, if the specified parameter is found and it is given a value by the equals sign "=". Then this value will be executed hash operation and divided by the total weight of the server distributed to a matching server; This algorithm can ensure that a request for the same user ID will be sent to the same server by tracing the user ID in the request, unless the server's overall weight has changed If the specified parameter is not present in a request or has no valid value, the request is dispatched using the round-robin algorithm, which is static by default, but it can also be modified using Hash-type;
- HDR (<name>): For each HTTP request, the HTTP header specified by <name> is retrieved, and if the corresponding header does not appear or has no valid value, the corresponding request is dispatched using the round-robin algorithm; there is an optional option "Use_ Domain_only ", you can only calculate the domain name portion when you specify to retrieve the header of a similar host class (for example, by www.linuxidc.com, only the hash value of the LINUXIDC string is computed) to reduce the computation of the hash algorithm; This algorithm defaults to static. However, it can also use Hash-type to modify this feature;
--------------------------------------------------------------------------------
Hash-type
Hash-type <method>
defines the method used to map the hash code to the backend server, which cannot be used in the Frontend section; the available methods are map-based and consistent, It is recommended to use the default Map-based method in most scenarios. The
Map-based:hash table is a static array that contains all the online servers. Its hash value will be very smooth, the weight will be considered in the column, but it is a static method, the weight of the online server will not take effect, which means that it does not support slow start. In addition, the pick server is based on its location in the array, so when a server goes down or adds a new server, most of the connections will be re-dispatched to a different server than before, and this approach is not appropriate for the caching server's work scenario. The
Consistent:hash table is a tree-like structure populated by each server, and the closest server will be selected when the corresponding server is located in the hash tree based on the hash key. This method is dynamic and supports modifying server weights at run time, so it is compatible with slow-start features. When a new server is added, it only affects a small subset of requests, so it is especially useful for scenarios where the backend server is the cache. However, this algorithm is not very smooth, the distribution to the server's request may not achieve the ideal equalization effect, therefore, you might need to adjust the server's weight to achieve better equalization.
--------------------------------------------------------------------------------
BIND
Bind [<address>]:<port_range> [, ...]
Bind [<address>]:<port_range> [, ...] interface <interface>
This instruction can only be used for frontend and listen sections, The socket used to define one or several listeners.
<address>: Optional option, which can be a hostname, IPV4 address, IPv6 address, or *; If this option is omitted, specified as * or 0.0.0.0, all IPv4 addresses of the current system will be listened to;
<port_range> : Can be a specific TCP port, but also a port range (such as 5005-5010), the proxy server will be the specified port to receive client requests, it should be noted that each group of listening sockets <address:port> on the same instance can only be used once, and a port less than 1024 needs to be used by a user with specific permissions, which may need to be defined by the UID parameter,
<interface>: Specifies the name of the physical interface, which can only be used on a Linux system, which cannot use an interface alias, but can use only the physical interface name. And only the physical interfaces that have permission to specify bindings are managed;
For example:
forntend main
bind *:80
& nbsp Bind *:8080
--------------------------------------------------------------------------------
mode
Mode {Tcp|http|health}
sets the operating mode or protocol for the instance. When you implement content Exchange, the front-end and back-end must work in the same mode (which is generally HTTP mode), or you will not be able to start the instance.
TCP: The instance runs in pure TCP mode, a full-duplex connection is established between the client and the server, and no type checks are made on the 7 beginning, which is the default mode, typically used for applications such as SSL, SSH, SMTP, and so on;
http: instance running in HTTP mode, Client requests are deeply parsed before being forwarded to the backend server, and all requests that are not compatible with the RFC format are rejected; The
Health: instance works in health mode, and its inbound request responds only to "OK" information and closes the connection without logging any log information This mode will be used to respond to requests for health checks of external components; for the time being, this mode is deprecated because the monitor keyword in TCP or HTTP mode can perform similar functions;
--------------------------------------------------------------------------------
Log
Log Global
Log <address> <facility> [<level> [<minlevel>]
enables event and traffic logging for each instance and is therefore available for all segments. You can specify up to two log parameters per instance, but if you use "Log global" and the "global" segment has two log parameters, the extra log parameter is ignored.
Global: This format is used when the log system parameters of the current instance are defined in the "global" segment, and each instance can only define one "log global" statement with no additional parameters;
<address>: Defines where the log is sent to, One of its formats can be <IPV4_ADDRESS:PORT>, where the port is the UDP protocol ports, the default is 514, and the second format is the UNIX socket file path, but you need to be mindful of the chroot application and the user's read and write permissions;
< Facility>: Can be one of the standard facility for syslog system;
<level>: Defines the log level, which is the output information filter, which defaults to all information, and when the level is specified, all log information equal to or above this level is sent;
--------------------------------------------------------------------------------
Maxconn
Maxconn <conns>
Sets the maximum number of concurrent connections for a front end, so it cannot be used for backend segments. For large sites, this value can be raised as much as possible to allow Haproxy to manage connection queues, thus avoiding the inability to answer user requests. Of course, this maximum value cannot exceed the definition in the "global" segment. Also, be aware that Haproxy will maintain two buffers per connection, 8KB per buffer, plus other data, each connection will occupy approximately 17KB of RAM space. This means that 40000-50000 concurrent connections can be maintained with 1GB of available RAM when properly optimized.
If you specify an oversized value for <conns>, in extreme scenarios, it may eventually occupy more space than the current host's available memory, which can lead to unexpected results, so it's wise to set an acceptable value. It defaults to 2000.
--------------------------------------------------------------------------------
Default_backend
Default_backend <backend>
Specifies the default backend used for the instance when there is no matching "use_backend" rule, so it cannot be applied to the backend segment. When you exchange content between "frontend" and "backend", you typically use "use-backend" to define its matching rules, whereas a request that is not matched by a rule is received by the backend specified by this parameter.
<backend>: Specifies the name of the backend used;
Use case:
Use_backend Dynamic If Url_dyn
Use_backend static if Url_css url_img extension_img
Default_backend Dynamic
--------------------------------------------------------------------------------
Server-defined back-end server
------------------------------------------------------------------------------------
Server <name> <address>[:p ort] [param*]
Declares a server for the backend, so it cannot be used for the defaults and frontend segments.
<NAME>: The internal name specified for this server will appear in the log and warning messages, and if "Http-send-server-name" is set, it will also be added to the request header destined for this server;
<address>: The IPV4 address of this server also supports the use of resolvable hostname, except that the host name must be resolved to the corresponding IPV4 address at startup;
[:p ORT]: Specifies the destination port when the connection request is sent to this server, which is optional; when not set, the same phase port when the client request is used;
[param*]: A series of parameters set for this server, and its available parameters are very many, please refer to the official documentation for instructions, the following only a few common parameters;
---------------------------------------------------
Server or default server parameters:
Disabled: This server is disabled only;
Backup: Set as the standby server, only other servers in the load balancing scenario are not available to enable this server;
Check: Initiates a health check on this server, which can be used to perform finer settings with additional parameters such as:
Inter <delay>: Sets the time interval for health checks, in milliseconds, by default of 2000, or you can use Fastinter and downinter to optimize this time delay based on server-side state;
Rise <count>: Sets the number of successful checks that an offline server transitions from an offline state to a normal state during a health state check;
Fall <count>: Verify the number of times the server needs to be checked for transition from normal to unavailable;
Cookie <value>: Sets the cookie value for the specified server, the value specified here will be checked when the inbound is requested, the first server selected for this value will be selected in the subsequent request for the purpose of implementing the function of persistent connection;
Maxconn <maxconn>: Specifies the maximum number of concurrent connections accepted by this server, and if the number of connections destined to this server is higher than the value specified here, it will be placed in the request queue to wait for other connections to be released;
Maxqueue <maxqueue>: Sets the maximum length of the request queue; 0 means no upper limit;
Observe <MODE>: By observing the communication status of the server to determine its health status, the default is disabled, its supported types are "Layer4" and "Layer7", "Layer7" can only be used in the HTTP proxy scenario;
Redir <prefix>: Enable redirection, both get and head requests destined for this server are in 302 status code response, it should be noted that after prefix cannot use/, and the relative address can not be used, so as to avoid causing the loop;
Server Srv1 172.16.100.6:80 redir http://imageserver.linuxidc.com check
Weight <weight>: Weight, default is 1, maximum value is 256,0 means not participating in load balancing;
Check method:
Option Httpchk
Option Httpchk <uri>
Option Httpchk <method> <uri>
Option Httpchk <method> <uri> <version>: cannot be used for frontend segments, for example:
Backend Https_relay
Mode TCP
Option Httpchk OPTIONS * http/1.1\r\nhost:\ www.linuxidc.com
Server Apache1 192.168.1.1:443 Check Port 80
Use case:
Server first 172.16.13.13:1080 cookie first check Inter 1000
Server Second 172.16.13.14:1080 cookie second check Inter 1000
-------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
capture Request Header
Capture request header <name> Len <length>
captures and records the first value of the last occurrence of the specified request header, which can only be used for "frontend" and "Listen" section. The captured header value is added to the log using curly braces {}. If multiple header values need to be captured, they appear in the log file in the specified order, with a vertical bar "|" As a separator character. A nonexistent header record is an empty string, and the most commonly captured header includes the "host" used in the virtual host environment, the "content-length" in the upload request header, the "user-agent" that quickly distinguishes the real user from the network robot, and the "X-forward-for" that records the source of the real request in the proxy environment.
<NAME>: The name of the header to capture, which is not case-sensitive, but is recommended in the same format that they appear in the header, such as the upper-case initials. Note that the record in the log is the header corresponding to the value, not the header name.
<LENGTH>: Specifies the exact length that is recorded when the header value is recorded, and the excess portion is ignored. There is no limit to the number of request headers that can be captured by the
, but each capture can record up to 64 characters. To ensure the consistency of the log format in the same frontend, the first capture can only be defined in Frontend.
Capture response header
Capture response header <name> Len <length>
captures and records the response header in the form and key of the request header.
--------------------------------------------------------------------------------
Stats Enable
enables statistical reports based on the default settings at the time of program compilation and cannot be used in the Frontend section. As long as there are no additional settings, they will use the following configuration:
-Stats uri :/haproxy?stats &NBS P //url
-Stats realm: "HAProxy Statistics" //Do certification is provided information
-stats auth : no Authenticati On
-stats scope:no restriction & nbsp //Unrestricted
Although the "stats enable" can enable statistical reporting, it is recommended that you set all other parameters so that they do not rely on default settings for unintended consequences. The following is a?? Configure the case.
123456789 Backend Public_www
Server Websrv1 172.16.100.11:80
Stats enable
Stats Hide-version
Stats scope.
Stats Uri/haproxyadmin?stats
Stats Realm Haproxy\ Statistics
Stats Auth Statsadmin:password
Stats Auth Statsmaster:password
-------------------------------------------------------------------------------------
Stats Hide-version
Enable the statistics report and hide the Haproxy version report, not for the "frontend" section. By default, the statistics page displays some useful information, including the version number of the Haproxy, however, it is very risky to expose Haproxy's exact version number to everyone because it can help a malicious user quickly locate the version's flaws and vulnerabilities. Although stats hide-version can enable statistical reporting, it is recommended that you set all other parameters so that they do not rely on default settings for unintended consequences. Please refer to the "Stats enable" section for details.
-------------------------------------------------------------------------------------
Stats Realm
Stats Realm <realm>
Enable statistical reporting and high-precision authentication areas, not for the "frontend" section. Haproxy reads a realm as a word, so any whitespace character in the middle must be escaped with a backslash. This parameter is only meaningful when used with the "stats auth" configuration.
<realm>: The realm name that is displayed in the browser when HTTP Basic authentication is implemented, prompting the user to enter a user name and password.
Although the "stats realm" can enable statistical reporting, it is recommended to set all other parameters so that they do not rely on default settings for unintended consequences. Please refer to the "Stats enable" section for details.
-------------------------------------------------------------------------------------
Stats scope
Stats Scope {<name> | "." }
Enable the statistical report and limit the section of the report, not for the "frontend" section. When this statement is specified, the statistics report will display only the report information that lists the extents, and the information for all other segments will be hidden. This statement can be defined more than once if you need to display statistical reports for multiple extents. It is important to note that the section name detection is only done in the form of a string comparison, and it does not really detect whether the specified segment really exists.
<name>: Can be the name of a "listen", "frontend" or "backend" segment, and "." Represents the current segment defined by the stats scope statement.
Although the stats scope can enable statistical reporting, it is recommended that you set all other parameters so that they do not rely on default settings for unintended consequences. The following is a configuration case.
Backend private_monitoring
Stats enable
Stats Uri/haproxyadmin?stats
Stats Refresh 10s
----------------------------------------------------------------------------------------
Stats Auth
Stats Auth <user>:<passwd>
Enable the statistical reporting feature with authentication and authorize a user account, which cannot be used in the "Frontend" section.
<user>: User name authorized for access;
<passwd>: This user's access password, clear text format;
This statement enables the statistics reporting feature based on the default settings and allows only the user access that it defines, and it can also be defined multiple times to authorize multiple user accounts. You can combine the "stats realm" parameter to give a domain description when prompting the user for authentication. When you use the illegal user access statistics feature, it responds to a "401 Forbidden" page. The authentication method is HTTP Basic authentication, the password transmission will be in clear text, therefore, the configuration file is also used in plain text to describe its non-confidential information and therefore cannot be the same as other key account password.
Although stats Auth can enable statistical reporting, it is recommended that you set all other parameters so that they do not rely on default settings for unintended consequences.
---------------------------------------------------------------------------------------
Stats Admin
Stats Admin {if | unless} <cond>
Enables the management level feature of the statistics reporting page when the specified conditions are met, which allows the server to be enabled or disabled through the Web interface, but the statistics report page should be as read-only as possible, based on a security perspective. In addition, enabling this management level can cause unexpected behavior if the Haproxy multi-process mode is enabled.
For now, the POST request method is limited to the ability to use only the buffer minus the reserved portion of space, so the list of servers cannot be too long, otherwise the request will not work correctly. Therefore, it is recommended that you only adjust a few servers at a time. Here are two cases, the first limiting the ability to enable the management level feature only when the report page is opened natively, and the second defines that only authorized users are allowed to use the management level feature.
Backend Stats_localhost
Stats enable
Stats Admin If LOCALHOST
Backend Stats_auth
Stats enable
Stats Auth Haproxyadmin:password
Stats Admin If TRUE
--------------------------------------------------------------------------------
Option Logasap
No option LOGASAP
Enables or disables the logging of HTTP requests in advance, and cannot be used in the "backend" section.
By default, an HTTP request is logged at the end of the request so that it can log its overall transmission duration and number of bytes, so that when a larger object is passed, it may have a slight delay in logging. The "option Logasap" parameter allows the log to be logged instantly when the server sends the complete header, except that the overall transfer duration and number of bytes are not recorded at this time. In this case, capturing the "Content-length" response header to record the number of bytes transferred is a good choice. Here is an example.
Listen Http_proxy 0.0.0.0:80
Mode http
Option Httplog
Option Logasap
Log 172.16.13.9 Local2
--------------------------------------------------------------------------------
Option Forwardfor
Option forwardfor [except <network>] [header <name>] [If-none]
Allows the "x-forwarded-for" header to be inserted in the request header destined for the server.
<NETWORK>: Optional parameter, when specified, the source address to match the request in this network to disable this feature.
<NAME>: Optional parameter, you can use a custom header, such as "x-client" to replace "x-forwarded-for". Some unique web servers do need to be used for a unique header.
If-none: Only when this header does not exist will it be added to the request message asking.
Haproxy three different types of configuration options