Objective
Due to the transmission of data in TCP, UDP, etc., it is possible for the packet to be intercepted by others and to parse out the information, which poses a great challenge to information security. The initial SSL protocol was proposed by Netscape, which does not affect upper-layer protocols (such as HTTP, e-mail, etc.), but can guarantee the communication security of the upper layer protocol. If SSL is used correctly, a third party can only infer the end addresses of the connection, the type of encryption, and the data frequency and approximate amount of data sent, but cannot read or modify any actual data. The IETF later changed it to TLS when standardizing the SSL protocol. Many people will mix SSL with TLS, but strictly speaking they refer to different versions of the protocol (SSL3.0 's upgrade is TLS1.0). This paper focuses on the concept and principle of TLS and its network optimization.
1. Encryption, authentication and integrity
The goal of the TLS protocol is to provide three basic guarantees for the transmission of information: encryption, authentication, and data integrity. These three services are not required and can be selected according to the specific application scenario.
Encryption: A mechanism for confusing data.
Authentication: A mechanism for verifying the validity of an identity.
Integrity: A mechanism for detecting whether a message has been tampered with or forged.
2.TLS handshake
The client and server must negotiate the establishment of an encrypted channel before exchanging data over TLS. The negotiation includes: TLS version, encryption suite, and, if necessary, certificates. Each negotiation requires a roundtrip between the client and the server, and the approximate process is as follows:
0 Ms:tls runs on the TCP basis, which means we must first complete the TCP three handshake ", which requires a full back-and-forth interaction (RTT).
After the MS:TCP connection is established, the client sends some negotiation information, such as the TLS protocol version, a list of supported cipher suites, and other TLS options.
MS: The server picks up the TLS protocol version, picks up a cipher suite in the Encryption Suite list, comes with its own certificate, and returns the response to the client. Optionally, the server can also send certificate authentication requests to the client and other TLS extension parameters.
MS: Suppose that both parties negotiate a common TLS version and encryption algorithm, the client uses the certificate provided by the server, generates a new symmetric key, encrypts it with the server's public key, and tells the server to switch to the encrypted communication process. So far, all the data exchanged is transmitted in plaintext, except for the symmetric key, which uses the server-side public key cryptography.
MS: The server decrypts the symmetric key sent by the client with its own private key and verifies that the MAC checks the integrity of the message and returns an encrypted "finished" message to the client.
168 MS: The client decrypts the message with a symmetric key and verifies the Mac, and if everything is OK, the encrypted tunnel is established. The application data can be sent.
- Application Layer Protocol Negotiation
Theoretically, two network nodes may use a custom application protocol to communicate with each other. One way to solve this problem is to assign it a well-known port (for example, port 80 for HTTP,TLS port 443) before determining the protocol, and configure all clients and servers to use it. In practice, however, this is a slow and impractical process: the allocation of each port must be approved, and worse, firewalls and other intermediary servers often allow only 80 and 443 to communicate. To simplify the deployment of custom protocols, you need to reuse 80 or 443 ports, and then negotiate the protocol by additional mechanisms. 80-port is reserved for the HTTP,HTTP specification provides a special upgrade process to accomplish this goal. However, the use of upgrade may result in additional network round-tripping delays, and in practice it is often unreliable because of the existence of many intermediary servers.
Since port 80 is not suitable for negotiating protocols, use port 443, which is reserved for secure HTTPS sessions. End-to-end encryption tunnels blur data for intermediate devices, so this becomes a fast and reliable way to implement and deploy arbitrary application protocols. However, using TLS solves the reliability, and we still need a way to negotiate the application protocol! As an HTTPS session, of course, you can reuse the HTTP upgrade mechanism for negotiation, but this will result in an additional full round trip delay (RTT). Is it possible to negotiate a protocol while the TLS handshake is in place?
The application-layer protocol negotiation (ALPN) is a TLS extension that supports protocol negotiation during the TLS handshake , eliminating the additional round-trip delay required for the upgrade mechanism over HTTP. The process is as follows:
- The customer adds a new Protocolnamelist field in the ClientHello message that contains a list of supported application protocols.
- The server examines the Protocolnamelist field and returns a ProtocolName field in the Serverhello message that indicates the protocol selected by the server side.
The server may only respond to one of the protocols, and it may choose to abort the connection if it does not support any of the client-required protocols. As a result, after the TLS handshake is complete, the security tunnel is established, and the client and the server also negotiate the application protocol used-they can immediately begin to communicate.
An encrypted TLS tunnel can be established between any two TCP ports: the client only needs to know the IP address of the peer to establish the connection and perform the TLS handshake. However, if the server needs to deploy multiple standalone sites, each with its own TLS certificate, but using the same IP address-how do I handle it? To address the above problem, the SNI (server name Indication) extension is introduced into the TLS protocol, which allows the client to start the handshake to indicate the hostname he wants to connect to. The server checks the SNI hostname, selects the appropriate certificate, and continues the handshake.
Note: The TLS + SNI workflow is the same as the host header Domain declaration process for HTTP, where the client indicates the host it wants to request in the header domain: the same IP address may be deployed many different DOMAIN,SNI and hosts are used to differentiate between host or domain.
3.TLS Session Recovery
The full TLS handshake requires additional latency and computation, resulting in severe performance losses for all applications that require secure communication. To help reduce some of the performance losses, TLS provides a recovery mechanism that shares the same negotiated key data between multiple connections.
Session identifiers
The session identifier (RFC 5246) recovery mechanism was first introduced in SSL 2.0, enabling the server side to create a 32-byte session identifier and send it as part of the "Serverhello" message. Inside the server, the server holds a session ID and its corresponding negotiation parameters. Correspondingly, the client also stores the session ID information, in subsequent sessions, it can carry the session ID information in the "ClientHello" message, telling the server client that it also remember the key and encryption algorithm for session ID, and can reuse the information. Assuming that both the client and the server can find shared session ID parameters in their respective caches, you can reduce the handshake as shown in. Otherwise, start a new session negotiation and generate a new session ID.
With session identifiers, we are able to reduce the cost of a full round trip, as well as the public key cryptography algorithm used to negotiate shared keys. This allows us to quickly establish a secure connection without loss of security. However, one limitation of the session identifier mechanism is to require the server to create and maintain a session cache for each client. This can cause a few problems on the server, for some servers that are tens of thousands of or even millions of of the time each day: because the memory consumption required to cache session IDs is very large, and there is a problem with session ID cleanup policy. This is not a simple task for some of the most traffic-heavy sites, and ideally, using a shared TLS session cache can achieve the best performance. The above problem is not impossible to solve, many high-traffic sites successfully used the session identifier. However, for any multi-service host deployment, the session identifier scheme requires some serious thinking and a good system architecture to ensure good session caching.
Session Record sheet
Because caching session information is a big burden in the event of a large number of server accesses, the "sesion Ticket" mechanism is introduced to eliminate the need for the server to maintain session-state caching for each client-the server side no longer needs to save the client's session state. If the client indicates that it supports Session Ticket, the last step of the server completing the TLS handshake will contain a "New Session Ticket" message that contains the information needed to encrypt the communication, which is encrypted using a key known only to the server. This session ticket is stored by the client and can be added to the sessionticket extension of the ClientHello message in subsequent sessions. Therefore, all session information is stored only on the client, and session ticket is still secure because it is encrypted by a key that only the server knows.
Conversation identifiers and session logging mechanisms, often referred to as "session caching" and "stateless recovery" mechanisms, respectively. A major improvement to stateless recovery is the elimination of server-side session caching, which simplifies deployment by requiring customers to provide session Ticket at the start of each new session until Ticket expires.
Note: In practice, deploying session Ticket in a set of load-balancing servers also requires careful consideration: All servers must use the same session key, or may require additional mechanisms to periodically rotate shared keys on all servers.
4. Certificate issuance and Revocation
Authentication is an important part of establishing each TLS connection. After all, TLS can communicate with any end through an encrypted tunnel, including an attacker, unless we can be sure that the other party we are communicating with is trustworthy, otherwise all cryptographic work is invalid. How to prove that a host is trustworthy? This requires a certificate, only the host with a legitimate certificate is trustworthy. What are the sources of the certificates?
- manually specified user certificates : Each browser and operating system provides a mechanism to manually import any certificates that you trust. How to get a certificate and verify its integrity depends entirely on you.
- Certification Authority : a Certification authority (CA) is a trustworthy third-party institution (owner) whose certificate is trustworthy.
- browsers and operating systems : Each operating system and most browsers contain a list of well-known certification authorities. Therefore, you can also trust the vendor of this software to provide and maintain a list of trusts.
In practical applications, it is impractical to manually verify the certificate for each website (although you can, if you are so inclined). Therefore, the most common solution is to do this with a certification authority (CA): Specify in the browser which CAs are trustworthy (root CA certificates), and the CA is responsible for verifying each site that you visit and auditing them to verify that they are not abused or compromised. If any site violates the security requirements of the CA's certificate, it is the responsibility of the CA to revoke its certificate.
Occasionally the certification authority may need to revoke or invalidate the certificate, which may be due to the compromised private key of the certificate, the certification authority itself being compromised, or some other normal cause such as certificate substitution, change of certificate authority, and so on. To solve this problem, the certificate itself contains the logic to check if it has been revoked . Therefore, to ensure that the chain of trust is not affected by the attack, each node can check the status of each certificate together with the signature.
certificate Revocation list (CRL): Each certification authority maintains and publishes a list of revoked certificate serial numbers on a regular basis. To verify the reliability of the certificate, query the CRL list directly.
The CRL file itself can be published on a regular basis, or published on each update, and the CRL file can be transmitted over HTTP, or any other file transfer protocol. The list is also signed by the CA, which is usually allowed to be cached at the specified time interval. In practice, this process works well, but there are some scenarios in which the CRL mechanism may be flawed:
- More and more revocation means that the CRL list is only getting longer, and each client must get the entire list of serial numbers
- There is no certificate revocation instant notification mechanism-if the certificate is revoked during client caching, the client will assume that the certificate is valid until the cache expires.
Online Certificate Status Protocol (OCSP): provides a mechanism to check the status of a certificate in real time, enabling the verification side to directly query the serial number in the certificate database to verify that the certificate is valid.
OSCP consumes less bandwidth, supports real-time validation, and poses some problems. As follows:
- The CA must be able to handle real-time and load of real-time queries.
- The CA must ensure that the service is available worldwide at all times.
- The client must wait for OCSP requests before any negotiation is made.
- Because the CA knows which websites the client accesses, real-time OCSP requests may expose the customer's privacy.
5.TLS Recording Protocol
The TLS logging protocol is primarily used to identify the type of message in TLS (the data in the "Content Type" field to identify the handshake, warning, or data), and the integrity protection and validation of each message. The typical process for delivering application data is as follows:
- The recording protocol receives the application data;
- Data chunked, each block maximum 2^14 is a KB;
- Data compression (optional);
- Add a Message Authentication code (MAC) or HMAC (to verify the integrity and reliability of the message);
- Encrypts the data using the negotiated encryption algorithm.
Once the above steps are complete, the encrypted data is passed down to the TCP layer for transmission. At the receiving end, reverse the same workflow: Use the negotiated encryption algorithm to decrypt the data, validate the Mac, extract the application data to the application layer. Another good news is that all of the above processing is handled by the TLS layer itself and is completely transparent to most applications.
Of course, the TLS logging protocol also introduces some important limitations:
- The maximum size of the TLS record is 16KB;
- Each record contains a 5-byte head, MAC (SSLV3,TLS 1.0,tls 1.1 up to 20 bytes, TLS 1.2 for up to 32 bytes), and a block encryption algorithm with padding blocks (padding);
- In order to decrypt and validate each piece of data, you must ensure that all data has been received.
6.TLS optimization
- Calculate cost
- Early completion (handshake)
- Session caching and stateless recovery
- TLS record size
- TLS compression
- The length of the certificate chain
- OCSP Envelopes
- HTTP Strict Transport Security
TLS for the network protocol