This is a creation in Article, where the information may have evolved or changed.
1 Preface
The effect of HTTPS on the speed of user access is described above.
This article is to introduce the HTTPS in the access speed, computational performance, security and other aspects based on protocol and configuration optimization.
This article was first published in the Baidu Operations and maintenance Department official blog
2 HTTPS access speed optimization
2.1 TCP Fast Open
HTTPS and HTTP are transmitted using the TCP protocol, which means that a TCP connection must be established through a three-time handshake, but is it too wasteful to transmit only one SYN packet in a RTT time? Can I send the data of the application layer at the same time as the SYN packet? In fact, it is possible, this is the idea of TCP fast open, referred to as TFO. The concrete principle can refer to rfc7413.
Unfortunately TFO requires the support of a high-version kernel, Linux supports TFO from 3.7 onwards, but the current Windows system does not support TFO, so it can only work within the company's internal servers.
2.2 HSTS
As mentioned earlier, the user HTTP request 302 jumps to HTTPS, which has two effects:
1. Unsafe, 302 jump not only exposes the user's access site, it is also easy to be supported by the intermediary.
2. Reduced access speed, 302 jump not only need a RTT, browser execution jump also need to execute time.
Since the 302 jump is actually triggered by the browser, the server does not have full control, this demand led to the birth of HSTs:
HSTS (HTTP Strict Transport Security). The server returns a HSTs HTTP header, and after the browser gets to the HSTs header, the request internally jumps to https://www.baidu.com, regardless of whether the user enters www.baidu.com or http://www.baidu.com for a period of time.
Chrome, Firefox, ie all support HSTs (http://caniuse.com/#feat =stricttransportsecurity).
2.3 Session Resume
The session resume as the name implies is the reuse session, to achieve a simplified handshake. There are two benefits of reusing a session:
1. CPU consumption is reduced because asymmetric key exchange calculations are not required.
2. Increase the speed of access, do not need to complete the handshake phase two, saving a RTT and calculation time-consuming.
TLS protocol currently provides two mechanisms to implement the session resume, introduced separately.
2.3.1 Session Cache
The principle of session cache is to use the session ID in client hello to query server session cache, if the server has a corresponding cache, then directly using the existing session information to complete the handshake, known as a simplified handshake.
The Session cache has two drawbacks:
1. server memory needs to be consumed to store session content.
2. Current open source software includes Nginx,apache only supports single-machine multi-process shared cache, does not support distributed cache between multiple machines, for Baidu or other large internet companies, the single session cache has little effect.
The Session cache also has a very big advantage:
1. Session ID is the standard field of the TLS protocol, and all browsers in the market support session cache.
Baidu through the TLS handshake protocol and server-side implementation of the optimization, has supported the global session cache, can significantly improve user access speed, Save server computing resources.
2.3.2 Session Ticket
The previous section mentions the two drawbacks of session cache, session ticket can compensate for these deficiencies.
The principle of Session ticket reference RFC4507. The following are briefly described:
The server encrypts the session information into a ticket and sends it to the browser, and the browser will send the Ticket,server end if it can successfully decrypt and process the ticket, the simplified handshake can be completed.
Obviously, the advantage of session ticket is that there is no need for the server to consume a lot of resources to store session content.
The disadvantage of Session ticket:
1. Session ticket is only an extended feature of the TLS protocol, the current support rate is not very wide, only about 60%.
2. Session ticket need to maintain a global key to decrypt, need to consider key security and deployment efficiency.
Generally speaking, the function of session ticket is obviously superior to session cache. The client implementation is preferred to support session ticket.
2.4 OCSP stapling
The OCSP full name online certificate status Check Protocol (rfc6960), which is used to query the CA site for certificate status, such as revocation. Typically, the browser uses the OCSP protocol to initiate a query request, the CA returns the certificate status content, and then the browser accepts that the certificate is in a trusted state.
This process is very time consuming, because the CA site is likely to be abroad, the network is unstable, the RTT is also relatively large. Is there a way to not request OCSP content directly from the CA site? This functionality can be achieved with OCSP stapling.
Refer to section RFC6066 8th for details. The principle is that the browser initiates client Hello with a certificate status request extension, and the server sees this extension and returns the OCSP content directly to the browser, completing the certificate status check.
Because the browser does not need to directly query the CA site for certificate status, this feature improves access speed significantly.
Nginx currently supports this OCSP stapling file, which only needs to be configured with the OCSP stapling file directive to enable this feature:
on; ssl_stapling_file ocsp.staple;
2.5 False Start
Typically, the application tier data must wait until the full handshake is complete before it can be transferred. This is actually a waste of time, that can be similar to TFO, in the second phase of the full handshake to send the application data together? Google has proposed false start to implement this function. The reference https://tools.ietf.org/html/draft-bmoeller-tls-falsestart-00 is described in detail.
A simple generalization of the principle of false start is to save a RTT by sending the application layer data together when the client_key_exchange is emitted.
False start relies on PFS (perfect forward secrecy perfect forward encryption), and PFS relies on DHE key Exchange series algorithms (Dhe_rsa, Ecdhe_rsa, Dhe_dss, ECDHE_ECDSA), So try to support Ecdhe key exchange algorithm to achieve false start.
2.6 Using Spdy or HTTP2
Spdy is a protocol for optimizing HTTP transfer efficiency (HTTPS://WWW.CHROMIUM.ORG/SPDY), which basically follows the semantics of the HTTP protocol, but uses frame control to achieve a number of features that significantly improve the efficiency of the HTTP protocol transfer.
The biggest feature of Spdy is multiplexing, which can send multiple HTTP requests together on the same connection, unlike the current HTTP protocol, which can only be serially sent on a per-request basis. Although pipeline supports multiple requests to be sent together, it is received sequentially and is not inherently able to resolve concurrency problems.
HTTP2 is the IETF February 2015 HTTP next-generation protocol, which is based on Spdy, and is finalized after more than two years of discussion and refinement.
This article does not introduce too much spdy and HTTP2 income, need to explain two points:
1. The current implementation of Spdy and HTTP2 uses the HTTPS protocol by default.
2. Both Spdy and HTTP2 support the existing HTTP semantics and APIs, which are almost transparent to Web applications.
Google announced that Chrome browser 2016 will abandon the Spdy agreement, full support HTTP2, but the current domestic part of the browser manufacturers progress is very slow, not only do not support HTTP2, even spdy have not supported.
Baidu's service side and Baidu Mobile browser have now supported the SPDY3.1 protocol.
3 HTTPS COMPUTE Performance optimizations
3.1 Using ECC First
ECC elliptic cryptographic arithmetic has much better performance than normal discrete logarithm computation speed. The following table is the NIST recommended key length comparison table.
For RSA algorithms, at least 2048-bit key lengths are currently used to guarantee security. ECC only requires a 224-bit-length key to achieve the security strength of the RSA2048 bit length. The speed of the same modulo exponential operation is obviously much faster.
3.2 Using the latest version of OpenSSL
In general, the new version of OpenSSL will be improved compared to the old version of the calculation speed and security. For example, openssl1.0.2 uses Intel's latest optimization results, and the elliptic Curve p256 has a 4 times-fold improvement in computational performance. (https://eprint.iacr.org/2013/816.pdf)
Openssl has been upgraded 5 times in 2014, basically to fix bugs on the implementation or bug in the algorithm. So try to use the latest version to avoid the risk of security.
3.3 Hardware acceleration scenarios
There are two main types of TLS hardware acceleration scenarios that are commonly used today:
1. SSL dedicated accelerator card.
2. GPU SSL acceleration.
The main usage of the above two scenarios is to insert hardware into the server's PCI slots, and the hardware performs the most consumed performance calculations. However, such a scenario has the following drawbacks:
1. Support algorithms are limited. For example, ECC is not supported, GCM is not supported, etc.
2. High cost of upgrade.
A) When a new cryptographic algorithm or protocol is present, the hardware acceleration scheme cannot be upgraded in a timely manner.
b) When a large security vulnerability occurs, some hardware scenarios cannot be resolved in the short term. such as the 2014 exposure of Heartbleed loopholes.
3. Unable to take full advantage of hardware acceleration performance. Hardware acceleration programs are generally run in the kernel state, the calculation results passed to the application layer requires IO and memory copy overhead, even if the hardware computing performance is very good, the upper layer of synchronous wait and IO overhead will also lead to the overall performance is not expected to take full advantage of the hardware accelerator card computing power.
4. Poor maintenance. Hardware drivers and Application layer API is mostly provided by the security manufacturers, after the problem will require manufacturers to follow up. Users can not master the core code, more passive. Unlike the open source OpenSSL, the user can master both the algorithm and the Protocol.
3.4 TLS Remote agent calculation
It is precisely because of the above reasons, Baidu implemented a dedicated SSL hardware acceleration cluster. The basic ideas are:
1. Optimize the TLS protocol stack, stripping the calculation of the most CPU-consuming resources, mainly the following parts:
A) encryption and decryption calculations in RSA.
b) The public key generation in ECC algorithm.
c) shared key generation in ECC algorithm.
2. Optimize the Hardware calculation section. The hardware calculation does not involve protocol and state interaction, it only needs to handle the large number operation.
3. The task between the WEB server to the TLS Compute cluster is asynchronous. That is, after the Web server sends the content to be calculated to the accelerated cluster, it can still continue processing other requests, the whole process is asynchronous and non-blocking.
4 HTTPS Security Configuration
4.1 Protocol version Selection
SSL2.0 has long been proven to be an unsafe protocol, the statistics found that there is no client support SSL2.0, so you can safely disable the SSL2.0 protocol on the server.
The poodle attack broke out in 2014, and SSL3.0 proved to be unsafe. But the statistics found that 0.5% of the traffic still only supports SSL3.0. So we can only selectively support SSL3.0.
TLS1.1 and 1.2 No security breaches have been identified so far, and priority support is recommended.
4.2 Encryption Suite Selection
The cryptographic suite consists of four parts:
1. Asymmetric key exchange algorithm. It is recommended to use Ecdhe First, disable dhe, and prioritize RSA.
2. Certificate signing algorithm. Because some browsers and operating systems do not support ECDSA signatures, the default is to use RSA signature, where SHA1 signature is no longer secure, chrome and Microsoft 2016 began to no longer support the SHA1 signed certificate (HTTP/ googleonlinesecurity.blogspot.jp/2014/09/gradually-sunsetting-sha-1.html).
3. Symmetric plus decryption algorithm. Priority is given to using the AES-GCM algorithm to disable RC4 (rfc7465) for more than 1.0 protocols.
4. Content consistency check algorithm. Md5 and SHA1 are already unsafe, it is recommended to use a secure hash function above SHA2.
4.3 HTTPS anti-attack
4.3.1 Preventing protocol downgrade attacks
Downgrade attacks typically include two types: cryptographic Suite downgrade attacks (cipher suite rollback) and protocol downgrade attacks (version roll back). The principle of a downgrade attack is that the attacker forges or modifies the client Hello message, which enables the client and server to communicate using a weaker cryptographic suite or protocol.
In order to deal with the downgrade attack, now the server side and browser have implemented the SCSV function, the principle of reference https://tools.ietf.org/html/draft-ietf-tls-downgrade-scsv-00.
One explanation is that if the client wants to downgrade, the tls_scsv signal must be sent, and if the server sees Tls_scsv, it will not accept a protocol that is lower than the highest protocol version on the service side.
4.3.2 Preventing renegotiation attacks
Renegotiation (TLS renegotiation) is divided into two types: Cryptographic Suite renegotiation (cipher suite renegotiation) and protocol renegotiation (Protocol renegotiation).
There are two pitfalls of renegotiation:
1, a weak security algorithm is used after re-negotiation. The consequence is that the transmission of content is easily disclosed.
2, a full handshake request is constantly initiated during the renegotiation process, triggering a high-intensity calculation on the server and initiating a denial of service.
For renegotiation, the most direct means of protection is to prohibit the client to actively renegotiate, of course, due to the need of special scenarios, should allow the server to initiate re-negotiation.
5 concluding remarks
HTTPS practice and optimization involves a lot of knowledge points, because of the space relationship, this article on a number of optimization strategies are simply introduced. If you want to understand the rationale behind the protocol, you need to read the TLS protocol and PKI knowledge in detail. For large sites, if you want to be extreme, the deployment of HTTPS requires detailed consideration in conjunction with the architecture of the product and infrastructure, and more effort will be spent on the product and operational dimensions than the deployment of HTTPS-enabled access and optimization. The next article in this series will be further described.