1 HTTPS Protocol Overview
HTTPS can be thought of as HTTP + TLS. The HTTP protocol is familiar, and most Web applications and websites are now transmitted using the HTTP protocol.
TLS is the Transport Layer encryption protocol, its predecessor is the SSL protocol, was first released by Netscape Company in 1995, 1999 after IETF Discussion and specification, renamed TLS. If not specifically stated, SSL and TLS are all the same protocol.
The location of HTTP and TLS at the protocol layer and the composition of the TLS protocol are as follows:
Figure 1 TLS Protocol format
There are five main parts of the TLS protocol: Application Data layer protocol, handshake Protocol, alarm protocol, encrypted Message Acknowledgement protocol, heartbeat protocol.
The TLS protocol itself is transmitted by the record Protocol, with the format of the record protocol as shown at the very right.
The commonly used HTTP protocol is HTTP1.1, the common version of the TLS protocol is as follows: TLS1.2, TLS1.1, TLS1.0 and SSL3.0. Among them, SSL3.0 because the POODLE attack has been proven unsafe, but the statistics found that there are still less than 1% of browsers use SSL3.0. TLS1.0 also has some security vulnerabilities, such as RC4 and BEAST attacks.
TLS1.2 and TLS1.1 temporarily have no known security vulnerabilities, more secure, while there is a large number of expansion speed and performance, recommended for everyone to use.
One thing to be concerned about is that TLS1.3 will be a very significant reform of the TLS protocol. Both security and user access speed will be a qualitative improvement. However, there is no definitive release time.
At the same time HTTP2 has been formally finalized, the Protocol evolved from the SPDY protocol HTTP1.1 is a very significant change, can significantly improve the efficiency of application layer data transfer.
2 HTTPS Feature Introduction
Baidu uses the HTTPS protocol primarily to protect user privacy and prevent traffic hijacking.
HTTP itself is transmitted in plaintext, without any security processing. For example, users in Baidu search for a keyword, such as "Apple phone", the middle is fully able to look at this information, and may call to harass users. There are also some users complained about the use of Baidu, found the homepage or the results of the page floated a very long and large ads, which is certainly the middle of the page to plug in the advertising content. If hijacking technology is inferior, users can't even access Baidu.
The intermediary mentioned here mainly refers to some network nodes, is the user data in the browser and Baidu Server intermediate transmission must pass through the node. such as WIFI hotspots, routers, firewalls, reverse proxies, cache servers and so on.
Under the HTTP protocol, the intermediary can sniff the user's search content, steal privacy and even tamper with the webpage. However, HTTPS is the bane of these hijacking actions and can be fully and effectively defended.
In general, the HTTPS protocol provides three powerful features to combat the above hijacking behavior:
1, content encryption. Browser to Baidu server content is encrypted form of transmission, the intermediary can not directly view the original content.
2, identity authentication. To ensure that users access to the Baidu service, even if the DNS hijacked to a third-party site, will also remind users not to visit Baidu Services, there may be hijacked
3, data integrity. Prevent content from being impersonated or tampered with by a third party.
How does HTTPS do the above three points? Here's a brief introduction to the principle.
3 HTTPS Principle Introduction 3.1 content Encryption
Encryption algorithms are generally divided into two types, symmetric and asymmetric encryption. Symmetric encryption (also known as key encryption) means that encryption and decryption use the same key. Asymmetric encryption (also known as public-key encryption) means that encryption and decryption use different keys.
Figure 2 Symmetric encryption
Figure 3 Asymmetric Encryption
Symmetric content encryption is very strong, generally can not be cracked. But there's a big problem with the inability to safely generate and keep keys. If each session between the client software and the server uses fixed, the same key encryption and decryption, there must be a great security risk. If someone obtains a symmetric key from the client side, the entire content is not secure, and managing a huge amount of client-side keys is a complex matter.
Asymmetric encryption is mainly used for key exchange (also called key negotiation), which can solve this problem well. Each new session of the browser and the server uses an asymmetric key exchange algorithm to negotiate the symmetric key, using these symmetric keys to complete the application data encryption and decryption and validation, the entire session of the key is only generated and saved in memory, and each session of the symmetric key is not the same (unless the session is reused), the intermediary cannot steal.
Asymmetric key exchange is secure, but it is also the "culprit" for HTTPS performance and speed degradation. To know why HTTPS affects speed and why it consumes resources, you must understand the entire process of asymmetric key exchange.
The following emphasis is on the mathematical principle of asymmetric key exchange and its application in the process of TLS handshake.
3.1.1 Asymmetric key exchange
Before the asymmetric key exchange algorithm appears, a big problem with symmetric encryption is that you don't know how to safely generate and store the key. The asymmetric key exchange process is mainly to solve this problem, making symmetric key generation and use more secure.
Key exchange algorithm itself is very complex, the key exchange process involves random number generation, modulo exponential operation, blank completion, encryption, signature and other operations.
The common key exchange algorithm has Rsa,ecdhe,dh,dhe and other algorithms. They are characterized by the following:
- RSA: The algorithm is simple, born in 1977, has a long history, after a lengthy break test, high security. The disadvantage is that it takes a large number of primes (currently 2048-bit) to ensure security intensity and consumes CPU computing resources. RSA is currently the only algorithm that can be used for both key exchange and certificate signing.
- Dh:diffie-hellman key exchange algorithm, the birth time is earlier (1977), but 1999 is not public. The disadvantage is that CPU performance is more consumed.
- ECDHE: The DH algorithm using Elliptic curve (ECC) has the advantage of being able to achieve the same security level as RSA with a smaller prime number (256 bits). The disadvantage is that the algorithm is complex and the history of the key exchange is not long, and it has not been tested for long time security attack.
- ECDH: PFS is not supported, security is low, and false start cannot be implemented.
- DHE: ECC is not supported. Consumes CPU resources very much.
It is recommended that RSA and Ecdh_rsa key exchange algorithms be supported first. The reasons are:
1, ECDHE supports ECC acceleration for faster computation. More secure with PFS support. Support False start, user access speed faster.
2, there are currently at least 20% of the clients do not support ECDHE, we recommend using RSA instead of DH or DHE, because the DH series algorithm is very CPU-intensive (equivalent to do two RSA calculation).
It is important to note that the ECDHE key exchange is usually referred to as ECDHE_RSA, using ECDHE to generate the public private key required by the DH algorithm, and then using the RSA algorithm for signing and then calculating the symmetric key.
Asymmetric encryption is more secure than symmetric encryption, but there are two obvious drawbacks:
1, CPU Compute resource consumption is very large. A full TLS handshake, where the asymmetric decryption computation of the key exchange accounts for more than 90% of the entire handshake process. Symmetric encryption is only equivalent to 0.1% of asymmetric encryption, if the application layer data also uses asymmetric encryption and decryption, the performance overhead is too large to withstand.
2, the asymmetric encryption algorithm has a limit on the length of the encrypted content and cannot exceed the public key length. For example, the current common public key length is 2048 bits, which means that the content to be encrypted cannot exceed 256 bytes.
Therefore, public key encryption can only be used for key exchange or content signature, and is not suitable for the application layer to transmit the content of the encryption and decryption.
Asymmetric key exchange algorithm is the cornerstone of the whole HTTPS security, fully understanding the asymmetric key exchange algorithm is the key to understand the HTTPS protocol and function.
The following is a brief introduction of RSA and ECDHE in the process of key exchange application.
3.1.1.1 RSA key Negotiation 3.1.1.1.1 RSA algorithm Introduction
The security of RSA algorithm is based on the non-reversible multiplication or the hard decomposition of large number factor. The derivation and implementation of RSA involves the concept of Euler function and Fermat theorem and modulo inverse element, and interested readers can own Baidu.
RSA algorithm is one of the most important algorithms to rule the world, and from now on, RSA is also the most important algorithm in the HTTPS system, not one.
The calculation steps for RSA are as follows:
1, randomly select two prime numbers p, q, assuming p = +, q = 19. n = p * q = 13 * 19 = 247;
2,? (n) represents the number of Inma with an integer n. If n equals the product of a two prime number, then? (n) = (p-1) (q-1) Select a number e to meet 1< e < (n) and E with coprime, assuming e = 17;
3, calculate e's modulo inverse element for n, ed=1 mod? (n), by E = 17,? (n) =216 can get d = 89;
4, to find the E, and D, assuming that the plaintext M = 135, the ciphertext in C. Then the encryption and decryption is calculated as follows:
In practice, (n,e) consists of a public key pair, (n,d) consisting of a private key pair, where N and D are a large number close to 22048. Even now with a high performance CPU, to calculate m≡c^d mod (n), it also needs to consume larger computational resources and time.
Public key pairs (n, e) are generally registered in the certificate, anyone can directly view, such as the public key of the Baidu certificate, such as, where the last 6 digits (010001) converted to 10 binary is 65537, that is, the public key pair of E. There are two advantages to E for smaller values:
1, known by C=m^e MoD (n), E is smaller, and client CPU computing consumes less resources.
2, increase the server side of the crack difficulty. E is small, and d in the private key pair is bound to be very large. So the value of D space is very large, increasing the difficulty of cracking.
Then why (n,e) be open for public key, even everyone can see directly from the certificate, so safe? The analysis is as follows:
Because of ed≡1 mod? (n), knowing the E and N, want to ask for the private key D, you must know? (n). and? (n) = (p-1) * (q-1), p and Q must be calculated to determine the private key D. But when n is large enough (for example, near 2^2048), even now the fastest CPU is unable to do this factorization, it is impossible to know which number p and q are multiplied by N. So even if you know the public key, the entire encryption and decryption process is very safe.
Figure 500-degree HTTPS Certificate public key
RSA key negotiation during 3.1.1.1.2 handshake
After introducing the principle of RSA, how is the symmetric key required for the final session generated? What's the relationship with RSA?
Take TLS1.2 as an example, omit the handshake message unrelated to the key exchange. The process is as follows:
1, the browser sends Client_hello, which contains a random number random1.
2, the server reply Server_hello, contains a random number random2, while replying to certificate, carrying the certificate public key P.
3, the browser will be able to generate premaster_secrect and Master_secrect after receiving RANDOM2. Where the Premaster_secret length is 48 bytes, the first 2 bytes are the protocol version number, and the remaining 46 bytes are populated with a random number. The structure is as follows:
Struct {byte version[2];bute random[46];}
The generation algorithm of Master Secrect is summarized as follows:
Master_key = PRF (Premaster_secret, "Master secrect", random number 1+ random number 2) where PRF is a random function, defined as follows:PRF (secret, label, Seed) = P_md5 (S1, label + Seed) XOR P_sha-1 (S2, label + seed)
As can be seen from the above, the Premaster_key is assigned to secret, "master key" assigned to the label, the browser and server side of the two random number of seeds can be determined to find a 48-bit long random number.
The master Secrect contains six parts, which are keys for verifying content consistency, keys for symmetric content encryption and decryption, and initialization vectors (for CBC mode), client and server.
At this point, the browser-side key has been negotiated.
4, the browser uses the certificate public key P to encrypt the premaster_secrect and send it to the server.
5, the server uses the private key decryption to get premaster_secrect. Because the server received a random number of 1 before, so the service side based on the same generation algorithm, under the same input parameters, the same master Secrect is obtained.
The RSA key negotiation handshake process is illustrated below:
Figure 6 RSA Key negotiation process
As you can see, the key negotiation process requires 2 RTT, which is also an important reason for HTTPS slowness. The key role RSA plays is to encrypt and decrypt premaster_secrect. It is impossible for the intermediary to crack the RSA algorithm, it is impossible to know the premaster_secrect, thus guaranteeing the security of the key negotiation process.
3.1.1.2 ECDHE key Negotiation 3.1.1.2.1 DH and ECC algorithm principle
ECDHE algorithm implementation is much more complex, mainly divided into two parts: Diffie-hellman algorithm (DH) and ECC (elliptic curve arithmetic). Their safety is based on the difficulty of discrete logarithm computation.
Briefly introduce the implementation of the DH algorithm, first introduced two basic concepts:
- Primitive root: If integer A is the primitive root of prime p, then A, a^2, ..., a^ (p-1) are different under mod p.
- Discrete logarithm: For the primitive root A of any integer B and prime p, there is a unique exponent I satisfies:
B≡a^i mod p (0≤i≤p-1)
It is called I is the discrete logarithm of B's modulo p with a as the base.
Understanding these two concepts, the DH algorithm is very simple, the example is as follows:
Assuming that the client and server need to negotiate the key, p=2579, the primitive root a = 2.
1, the Client chooses the random number Kc = 123 as its own private key, calculates YC = A^KC mod p = 2^123 mod 2579 = 2400, sends the YC as the public key to the server.
2, the Server chooses the random number Ks = 293 as the private key, calculates the YS = a^ks mod p = s^293 mod 2579 = 968, and sends the YS as the public key to the client.
3, Client Compute shared key: Secrect = YS^KC mod (p) = 968^123 mod (2579) = 434
4, Server compute shared key: Secrect = yc^ks mod (p) =2400^293 mod (2579) =434
The above formula Ys,yc,p, a, are public information, can be viewed by the intermediary, only KS,KC as the private key is not public, when the private key is small, through the exhaustive attack can calculate the shared key, but when the private key is very large, the exhaustive attack is certainly not feasible.
A big drawback of the DH algorithm is that it needs to provide a large enough private key to ensure security, so it consumes CPU compute resources. ECC Elliptic curve arithmetic can solve this problem well, the 224-bit key length can reach the security intensity of RSA2048 bit.
ECC's curve formula is actually not an ellipse, just the oval curve circumference formula is called elliptic curve encryption arithmetic. ECC involves many concepts of finite field, group and other modern algebra, and does not introduce in detail.
ECC security relies on the fact that:
p = KQ, known as K, Q is relatively simple to find, but the known P and Q to find K is very difficult.
The upper-style looks very simple, but has the following constraints:
1, q is a very large prime number, p, K, q are discrete points on the finite field of an elliptic curve.
2, finite fields define their own addition and multiplication rules, even if the KQ operation is very complex.
ECC is applied to the Diffie-hellman key exchange process as follows:
1, define a finite field that satisfies the elliptic equation, i.e. select p, a, B to satisfy the following equation:
y^2 mod p = (x^3+ax +b) mod p
2, select the base point g = (x, y), and the Order of G is n. n is the smallest positive integer that satisfies NG = 0.
3, Client selects the private key Kc (0 <kc<n), generating the public key Yc =KC *g
4, Server Select Private key Ks and generate public key Ys =ks*g
5, the client computes the shared key K = Kc*ys, the server side computes the shared key ks*yc, and the result is the same because:
Kc*ys = kc* (ks*g) = ks* (kc*g) = Ks*yc
As described above, as long as the determination of P, a, B can determine the elliptic curve in a finite field, because not all elliptic curves can be used for encryption, so the choice of P, a, B is very fastidious, the direct relationship between the curve security and computational speed.
The elliptic curve parameters on the FIPS-recommended 256-bit prime number domain are defined as follows for Openssl:
Prime number p = 115792089210356248762697446949407573530086143415290314195533631308867097853951 order n = 115792089210356248762697446949407573529996955224135760342422259061068512044369SEED = c49d3608 86e70493 6a6678e1 139d26b7 819f7e90c = 7efba166 2985be94 03cb055c 75d4f7e0 ce8d84a9 c5114abcaf317768 0104fa0d The coefficients of the elliptic curve a = 0 Elliptic curve system B = 5ac6 35d8 aa3a93e7 b3ebbd55 769886BC 651d06b0 cc53b0f63bce3c3e 27d2604b base point G x = 6b17d1f2 e12c4247 f8bce6e5 63a440f2 77037d81 2 deb33a0f4a13945 d898c296 base point G y = 4fe342e2 fe1a7f9b 8ee7eb4a 7c0f9e16 2bce3357 6b315ececbb64068 37bf51f5
ECDHE key negotiation during 3.1.1.2.2 handshake
This paper briefly introduces the mathematical principle of ECC and DH algorithm, and we look at the application of ECDHE in the process of TLS handshake.
It takes more than Rsa,ecdhe to send a server_key_exchange handshake message to complete the key negotiation.
Also take TLS1.2 as an example to briefly describe the process:
1, the browser sends Client_hello, contains a random number random1, and requires 2 extensions:
A) Elliptic_curves: the curve type and finite field parameters supported by the client. Now the most used is the 256-bit prime number field, which is defined as described in the previous section.
b) Ec_point_formats: Supported curve point format, default is uncompressed.
2, server reply Server_hello, including a random number random2 and ECC extension.
3, the server reply certificate, carried the certificate public key.
4, the server generates ECDH temporary public key, while replying to Server_key_exchange, which contains three important parts:
A) ECC-related parameters.
b) ECDH the temporary public key.
c) The ECC parameter and the signature value generated by the public key for client-side validation.
5, after the browser receives the Server_key_exchange, uses the certificate public key to carry on the signature decryption and the checksum, obtains the server side ECDH temporary public key, generates the shared secret key which the session needs.
At this point, the browser side has completed the key negotiation.
6, the browser generates ECDH temporary public key and Client_key_exchange message, unlike RSA key negotiation, this message does not need to be encrypted.
7, the server processes the Client_key_exchang message and gets the client ECDH temporary public key.
8, the server generates the shared key required for the session.
9, the Server-side key negotiation process is complete.
This is illustrated below:
Figure 7 ECDHE Key negotiation process
3.1.2 Symmetric content Encryption
The asymmetric key exchange process concludes with the symmetric key that is required for this session. Symmetric encryption is divided into two modes: streaming encryption and packet encryption. Streaming encryption is now commonly used RC4, but RC4 is no longer secure, Microsoft also recommends that the site try not to use RC4 streaming encryption.
A new alternative to RC4 's streaming encryption algorithm, called CHACHA20, is the faster and more secure encryption algorithm Google launches. It has been adopted by Android and Chrome, has been compiled into Google's open source OpenSSL branch-boring SSL, and Nginx 1.7.4 also supports compiling BORINGSSL.
The previously common pattern for packet encryption is AES-CBC, but CBC has been shown to be susceptible to beast and LUCKY13 attacks. The currently recommended packet encryption mode is AES-GCM, but its disadvantage is that it is computationally expensive, with high performance and power consumption, and is not suitable for mobile phones and tablets.
3.2 Identity Verification
Identity authentication mainly involves PKI and digital certificates. Generally speaking, PKI (public Key Infrastructure) contains the following parts:
- End entity: The terminal body, which can be a terminal hardware or Web site.
- CA: Certificate issuing authority.
- RA: Certificate registration and Auditing authority. such as reviewing the application website or the authenticity of the company.
- CRL Issuer: Responsible for publishing and maintaining the certificate revocation list.
- Repository: Responsible for digital certificates and CRL content storage and distribution.
Applying for a trusted digital certificate usually has the following process:
1, the terminal entity generates a public private key and a certificate request.
2, RA examines the legality of the entity. This step is not required if you are an individual or a small website.
3, the CA issues a certificate and sends it to the requester.
4, the certificate is updated to Repository, the terminal subsequently updates the certificate from repository, queries the certificate status, and so on.
The current certificate used by Baidu is x509v3 format, consisting of the following three parts:
1, Tbscertificate (to be signed certificate signed certificate content), this part contains 10 elements, respectively, the version number, serial number, Signature algorithm identification, issuer name, validity period, certificate principal name, certificate principal public key information, publisher unique identifier, Subject unique identification, extension, etc.
2, Signaturealgorithm, Signature algorithm identification, specifies the algorithm to sign the tbscertificate.
3, Signaturvalue (signature value), use Signaturealgorithm to calculate the signature value for Tbscertificate.
Digital certificates have two functions:
1, identity authorization. Make sure that the Web site that your browser accesses is a trusted, CA-verified site.
2. Distribute the public key. Each digital certificate contains the public key generated by the registrant. The SSL handshake is transmitted to the client through the certificate message. For example, the RSA certificate public key encryption and ECDHE signature mentioned earlier are all used in this public key.
The applicant gets the CA's certificate and deploys it on the server side of the website, and after the browser initiates a handshake to receive the certificate, how to confirm that the certificate is issued by CA? How to avoid a third party to forge this certificate?
The answer is digital signatures (digitally signature). Digital signatures are security labels for certificates, and the process of making and verifying the most widely used SHA-RSA digital signatures is as follows:
1, digital signature of the issue. The first is to use a hash function to treat the signature content as a secure hash, generate a message digest, and then encrypt the message digest using the CA's own private key.
2, digital signature verification. Use the CA's public key to decrypt the signature and then use the same signature function to sign the content of the signing certificate and compare it with the signature content in the server-side digital signature, if the same is considered successful.
Figure 8 Digital signature generation and verification
Here are a few things to note:
- The key pair used by digital signature signing and verifying is the CA's own public-private key, which is not related to the public key submitted by the certificate requester.
- The process of signing a digital signature is just the opposite of the public key encryption process, which is to encrypt the private key and decrypt the public key.
- Now the big CA will have a certificate chain, the benefits of the certificate chain is security, keep the root CA's private key offline use. The second benefit is ease of deployment and revocation, that is, if there is a problem with the certificate, simply revoke the appropriate level of the certificate and the root certificate remains secure.
- The root CA certificate is self-signed, which completes the production and validation of the signature with its own public and private keys. The certificate signature on the certificate chain is signed and validated using the key pair of the previous certificate.
- How do I get the key pair for a root CA and a multilevel CA? Are they credible? Of course, because these vendors work with browsers and operating systems, their public keys are installed by default in the browser or operating system environment. Firefox, for example, maintains a list of trusted CAs, and chrome and IE use a list of the operating system's CAs.
3.3 Data integrity
This part of the content is better understood, similar to the usual MD5 signature, but the security requirements are much higher. OpenSSL now uses two kinds of integrity check algorithms: MD5 or SHA. Since MD5 is more likely to conflict in practice, try not to use MD5 to verify content consistency. SHA also cannot use SHA0 and SHA1, and Professor Xiao of Shandong University in China announced in 2005 that the SHA-1 full version of the algorithm was cracked.
Both Microsoft and Google have announced that they will no longer support SHA1 signing certificates after 16 and 17.
4 HTTPS usage costs
The only problem with HTTPS at the moment is that it has not been applied on a large scale and has received less attention and research. As for the cost of use and the extra cost, don't worry too much at all.
Generally speaking, you may be very concerned about the following issues before using HTTPS:
- Certificate fees and update maintenance. We feel that the application of the certificate is very troublesome, the certificate is also very expensive, but the certificate is not expensive, cheap dozens of yuan a year, up to hundreds of. And now there are free certification authorities, such as the famous Mozilla-sponsored free Certificate project: Let's Encrypt (https://letsencrypt.org/) supports free certificate installation and Automatic Updates. The project will be formally used in the middle of this year.
The cost of digital certificates is actually not high, for small and medium-sized websites can use cheap or even free digital Certificate Services (there may be security risks), like the famous VeriSign Company's certificate generally thousands of to tens of thousands of blocks a year. Of course, if the company's demand for certificates is relatively large, high customization requirements, you can establish their own CA site, such as Google, can be free to issue Google-related certificates.
- HTTPS reduces user access speed. HTTPS has a somewhat reduced speed, but the impact of HTTPS on speed is fully acceptable as long as it is properly optimized and deployed. In many scenarios, the HTTPS speed is exactly the same as HTTP, if you use SPDY,HTTPS even faster than HTTP.
We now use Baidu HTTPS security search, have you feel slow?
- HTTPS consumes CPU resources and requires a large number of machines to be added. The asymmetric key exchange is described earlier, which is a major drain on CPU computing resources, and in addition, symmetric plus decryption requires CPU computing.
Similarly, as long as reasonable optimization, HTTPS machine costs will not increase significantly. For small and medium-sized websites, there is no need to add machines to meet performance requirements.
HTTPS protocols and principles