Disclaimer: This series of articles (a total of about 4) is transferred from the cool network, the middle of my personal changes or comments.
Objective
Baidu has been in the recent launch of the full HTTPS
-site security search, the default will be the HTTP
request to jump into HTTPS
. This article focuses on HTTPS
the protocol, and briefly describes HTTPS
the significance of the deployment of the whole station.
HTTPS Protocol Overview
HTTPS
Can be considered to be HTTP + TLS
. The HTTP protocol is familiar to everyone, most of the WEB
applications and websites are now HTTP
transmitted using protocols.
TLS
Is the Transport Layer encryption protocol, its predecessor is the SSL
agreement, the earliest by the netscape
company released in 1995, 1999 after IETF
Discussion and specification, renamed TLS
. If not specifically stated, SSL
and TLS
is said to be the same agreement.
HTTP
and the TLS
location of the protocol layer and the TLS
composition of the protocol such as:
TLS
There are five main parts of the protocol: Application Data layer protocol, handshake Protocol, alarm protocol, encrypted Message Acknowledgement protocol, heartbeat protocol.
TLS
The protocol itself is record
transmitted by the Protocol, and the format of the protocol is shown at the record
very right.
The commonly used protocols HTTP
are the HTTP1.1
following: the common TLS
Protocol version TLS1.2, TLS1.1, TLS1.0 和 SSL3.0
. SSL3.0
POODLE
While the attacks have proved unsafe, the statistics found that less than 1% of browsers still use them SSL3.0
. TLS1.0
There are also some security vulnerabilities, such as RC4
and BEAST
attacks.
TLS1.2
and TLS1.1
temporarily no known security vulnerabilities, more secure, while there is a large number of expansion to improve speed and performance, recommended for everyone to use.
One thing to be concerned about is that it will be TLS1.3
TLS
a very significant reform of the agreement. Both security and user access speed will be a qualitative improvement. However, there is no definitive release time.
It HTTP2
has also been formally finalized, the protocol SPDY
evolved from the agreement HTTP1.1
is a very significant change, can significantly improve the efficiency of application layer data transfer.
HTTPS Feature Introduction
Baidu HTTPS
's use of the agreement is mainly to protect user privacy, prevent traffic hijacking.
HTTP
is transmitted in plaintext, without any security processing. For example, users in Baidu search for a keyword, such as "Apple phone", the middle is fully able to look at this information, and may call to harass users. There are also some users complained about the use of Baidu, found the homepage or the results of the page floated a very long and large ads, which is certainly the middle of the page to plug in the advertising content. If hijacking technology is inferior, users can't even access Baidu.
The intermediary mentioned here mainly refers to some network nodes, is the user data in the browser and Baidu Server intermediate transmission must pass through the node . such as WIFI
hotspots, routers, firewalls, reverse proxies, cache servers and so on.
HTTP
under the agreement, the intermediary can sniff the user search content, steal privacy and even tamper with the webpage. But HTTPS
it is the bane of these hijacking actions that can be completely effective in defending.
Overall, the HTTPS
protocol provides three powerful features to combat the above hijacking behavior:
Content encryption. Browser to Baidu server content is encrypted form of transmission, the intermediary can not directly view the original content.
Identity authentication. Ensure that users access to the Baidu service, even if DNS
hijacked to a third-party site, will also remind users not to visit Baidu Services, there may be hijacked
Data integrity. Prevent content from being impersonated or tampered with by a third party.
HTTPS
How did that happen to the three points above? Here's a brief introduction to the principle.
HTTPS Principle Introduction 4.1 content Encryption
Encryption algorithms are generally divided into two types, symmetric and asymmetric encryption. Symmetric encryption (also known as key encryption) means that encryption and decryption use the same key. Asymmetric encryption (also known as public-key encryption) means that encryption and decryption use different keys.
Symmetric and Asymmetric encryption
Symmetric content encryption is very strong, generally can not be cracked. But there's a big problem with the inability to safely generate and keep keys . If each session between the client software and the server uses fixed, the same key encryption and decryption, there must be a great security risk. If someone obtains a symmetric key from the client side, the entire content is not secure, and managing a huge amount of client-side keys is a complex matter.
Asymmetric encryption is mainly used for key exchange (also called key negotiation), which can solve this problem well. Each new session of the browser and the server uses an asymmetric key exchange algorithm to negotiate the symmetric key, using these symmetric keys to complete the application data encryption and decryption and validation, the entire session of the key is only generated and saved in memory, and each session of the symmetric key is not the same (unless the session is reused), the intermediary cannot steal.
Asymmetric key exchange is safe, but it is also HTTPS
the "culprit" for a severe decrease in performance and speed. If you want to know HTTPS
why it affects speed and why you consume resources, you must understand the whole process of asymmetric key exchange.
The following highlights the mathematical principle of asymmetric key exchange and its application in the TLS
handshake process.
Asymmetric key exchange
Before the asymmetric key exchange algorithm appears, a big problem with symmetric encryption is that you don't know how to safely generate and store the key. The asymmetric key exchange process is mainly to solve this problem, making symmetric key generation and use more secure.
Key exchange algorithm itself is very complex, the key exchange process involves random number generation, modulo exponential operation, blank completion, encryption, signature and other operations.
Common key exchange algorithms have RSA,ECDHE,DH,DHE
such algorithms. They are characterized by the following:
RSA
: The algorithm is simple, was born in 1977, has a long history, after a lengthy break test, high security. The disadvantage is that a large number of primes (currently 2048-bit) are needed to ensure security intensity and consume CPU
computational resources. RSA
is currently the only algorithm that can be used for both key exchange and certificate signing.
DH
: diffie-hellman
key exchange algorithm, the birth time is earlier (1977), but 1999 is not public. The disadvantage is the comparison of consumption CPU
performance.
ECDHE
: An algorithm using an elliptic curve () has the advantage of being ECC
DH
able to achieve RSA
the same level of security with a smaller prime number (256 bits). The disadvantage is that the algorithm is complex and the history of the key exchange is not long, and it has not been tested for long time security attack.
ECDH
: Not supported PFS
, low security, not implemented at the same time false start
.
DHE
: Not supported ECC
. Very CPU
resource intensive.
Preferred support RSA
and key ECDH_RSA
exchange algorithms are recommended. The reasons are:
1, ECDHE
Support ECC
acceleration, calculate faster. Support PFS
, more secure. Support false start
, users access faster.
2, there are at least 20% clients not supported ECDHE
, we recommend using instead of RSA
DH
or DHE
because the DH
series algorithm is very expensive CPU
(equivalent to doing two RSA
calculations).
It is important to note that the key ECDHE
exchange is usually referred to by default ECDHE_RSA
, using ECDHE
DH
the public private key that is required to generate the algorithm, and then using the RSA
algorithm to sign the final calculation of the symmetric key.
Asymmetric encryption is more secure than symmetric encryption, but there are two obvious drawbacks:
1, the CPU
computational resource consumption is very large. A full TLS
handshake, the asymmetric decryption calculation of key exchange accounted for more than 90% of the entire handshake process. Symmetric encryption is only equivalent to 0.1% of asymmetric encryption, if the application layer data also uses asymmetric encryption and decryption, the performance overhead is too large to withstand.
2, the asymmetric encryption algorithm has a limit on the length of the encrypted content and cannot exceed the public key length. For example, the current common public key length is 2048 bits, which means that the content to be encrypted cannot exceed 256 bytes.
Therefore, public key encryption can only be used for key exchange or content signature, and is not suitable for the application layer to transmit the content of the encryption and decryption .
Asymmetric key exchange algorithm is the cornerstone of the whole HTTPS
security, fully understand the asymmetric key exchange algorithm is HTTPS
the key to understand the protocol and function.
The following are briefly introduced and the application in the RSA
ECDHE
key exchange process.
RSA
Key negotiation
RSA
Algorithm Introduction
RSA
The security of the algorithm is based on the multiplication is irreversible or the large number factor is difficult to decompose . RSA
the derivation and realization of the Euler function and Fermat theorem and the concept of modulo inverse elements, interested readers can self-Baidu.
RSA
Algorithm is one of the most important algorithms to rule the world, and from the present, it RSA
is also the HTTPS
most important algorithm in the system, not one.
RSA
The calculation steps are as follows:
Randomly pick two prime numbers p
, q
assuming p = 13
that q = 19
. n = p * q = 13 * 19 = 247
;
?(n)
Represents n
the number of Inma with an integer. If n
the product equals two prime numbers, then ?(n)=(p-1)(q-1)
pick a number e
, satisfy 1< e <?(n)
and e
with coprime, assume e = 17
;
Calculation e
n
of the modulo inverse elements, ed=1 mod ?(n)
by e = 17
, ?(n) =216
available d = 89
;
4, find out e
, and d
, assuming plaintext m = 135
, the ciphertext is c
represented by. Then the encryption and decryption is calculated as follows:
In practice, a (n,e)
public key pair is formed, (n,d)
which consists of a private key pair, which n
d
is a nearly 22048
large number. Even if the performance is very strong now CPU
, want to calculate m≡c^d mod(n)
, also need to consume relatively large computational resources and time.
Public key pairs (n, e)
are generally registered in the certificate, anyone can directly view, such as Baidu Certificate public key to such as, where the last 6 digits ( 010001
) converted to 10 is 65537, that is, the public key pair e
. e
There are two advantages to taking a smaller value:
1, by the known c=m^e mod(n)
, e
smaller, client CPU
computing consumes less resources.
2, increase server
the end of the crack difficulty. e
relatively small, the private key pair d
must be very large. So d
the value of space is very large, increased the difficulty of cracking.
Why (n,e)
is it so safe to disclose it as a public key, even if you can see it directly from the certificate? The analysis is as follows:
Because ed≡1 mod ?(n)
, know e
and n
, want to ask out the private key d
, you must know ?(n)
. Instead ?(n)=(p-1)*(q-1)
, the private key must be computed p
and q
can be determined d
. But when n
large to a certain extent (such as near 2^2048), even now the fastest can CPU
not do this factorization, that is, can not know the n
number p
and q
the multiplication. So even if you know the public key, the entire encryption and decryption process is very safe.
Key negotiation in the handshake process RSA
How does the RSA
symmetric key required for the final session be generated? RSA
What's the matter with it?
TLS1.2
For example, simply describe a handshake message that is not related to key exchange. The process is as follows:
1, the browser sends client_hello
, contains a random number random1
.
2, the server reply server_hello
, including a random number random2
, while replying certificate
, carrying the certificate public key P
.
3, the browser random2
will be able to generate premaster_secrect
as well as after receiving it master_secrect
. Where premaster_secret
the length is 48 bytes, the first 2 bytes are the protocol version number, and the remaining 46 bytes are populated with a random number. The structure is as follows:
Struct {byte Version[2random[46];}
master secrect
The generation algorithm is summarized as follows:
Master_key = PRF(premaster_secret, “master secrect”,随机数1+随机数2labellabelXOR P_SHA-1label + seed)
As can be seen from the above, the premaster_key
assignment to secret
, " master key
" assignment, the label
browser and the server side of the two random number of seeds can be determined to find a 48-bit long random number.
The master Secrect contains six parts, which are keys for verifying content consistency, keys for symmetric content encryption and decryption, and initialization vectors (for CBC mode), client and server.
At this point, the browser-side key has been negotiated.
4, the browser uses the certificate public key P will be encrypted and sent to the premaster_secrect
server.
5, the server uses the private key decryption to get premaster_secrect. Because the server received a random number of 1 before, so the service side based on the same generation algorithm, under the same input parameters, the same master Secrect is obtained.
RSA
The key negotiation handshake process is illustrated as follows:
As you can see, the key negotiation process requires 2 RTT
, which is also an HTTPS
important reason for slowness. And the RSA
key role of play is to encrypt and decrypt the premaster_secrect. It is impossible for the intermediary to crack the RSA
algorithm, it is impossible to know the premaster_secrect, thus ensuring the security of the key negotiation process.
is the key ECDHE
negotiation process
Symmetric content Encryption
The asymmetric key exchange process concludes with the symmetric key that is required for this session. Symmetric encryption is divided into two modes: streaming encryption and packet encryption. Streaming encryption is now commonly used RC4
, but is RC4
no longer secure, Microsoft also recommends that the site try not to use RC4
streaming encryption.
A new alternative RC4
to the streaming encryption algorithm called ChaCha20
, it is Google introduced faster, more secure encryption algorithm. It has been adopted by Android and Chrome, has been compiled into Google's open source OpenSSL branch-boring SSL, and Nginx 1.7.4 also supports compiling BORINGSSL.
The previously common pattern for packet encryption is AES-CBC, but CBC has been shown to be susceptible to beast and LUCKY13 attacks. The currently recommended packet encryption mode is AES-GCM, but its disadvantage is that it is computationally expensive, with high performance and power consumption, and is not suitable for mobile phones and tablets.
Identity verification
Identity authentication mainly involves PKI
and digital certificates. Typically PKI
(public key Infrastructure) contains the following sections:
End entity: The terminal body, which can be a terminal hardware or Web site.
CA: Certificate issuing authority.
RA: Certificate registration and Auditing authority. such as reviewing the application website or the authenticity of the company.
CRL Issuer: Responsible for publishing and maintaining the certificate revocation list.
Repository: Responsible for digital certificates and CRL content storage and distribution.
Applying for a trusted digital certificate usually has the following process:
1, the terminal entity generates a public private key and a certificate request.
2, RA examines the legality of the entity. This step is not required if you are an individual or a small website.
3, the CA issues a certificate and sends it to the requester.
4, the certificate is updated to Repository, the terminal subsequently updates the certificate from repository, queries the certificate status, and so on.
The current certificate used by Baidu is x509v3 format, consisting of the following three parts:
1, Tbscertificate (to be signed certificate signed certificate content), this part contains 10 elements, respectively, the version number, serial number, Signature algorithm identification, issuer name, validity period, certificate principal name, certificate principal public key information, publisher unique identifier, Subject unique identification, extension, etc.
2, Signaturealgorithm, Signature algorithm identification, specifies the algorithm to sign the tbscertificate.
3, Signaturvalue (signature value), use Signaturealgorithm to calculate the signature value for Tbscertificate.
Digital certificates have two functions:
1, identity authorization. Make sure that the Web site that your browser accesses is a trusted, CA-verified site.
2. Distribute the public key. Each digital certificate contains the public key generated by the registrant. The SSL handshake is transmitted to the client through the certificate message. For example, the RSA certificate public key encryption and ECDHE signature mentioned earlier are all used in this public key.
The applicant gets the CA's certificate and deploys it on the server side of the website, and after the browser initiates a handshake to receive the certificate, how to confirm that the certificate is issued by CA? How to avoid a third party to forge this certificate?
The answer is digital signatures (digitally signature). Digital signatures are security labels for certificates, and the process of making and verifying the most widely used SHA-RSA digital signatures is as follows:
1, digital signature of the issue. The first is to use a hash function to treat the signature content as a secure hash, generate a message digest, and then encrypt the message digest using the CA's own private key.
2, digital signature verification. Use the CA's public key to decrypt the signature and then use the same signature function to sign the content of the signing certificate and compare it with the signature content in the server-side digital signature, if the same is considered successful.
Data integrity
This part of the content is better understood, similar to the usual MD5 signature, but the security requirements are much higher. OpenSSL now uses two kinds of integrity check algorithms: MD5 or SHA. Since MD5 is more likely to conflict in practice, try not to use MD5 to verify content consistency. SHA also cannot use SHA0 and SHA1, and Professor Xiao of Shandong University in China announced in 2005 that the SHA-1 full version of the algorithm was cracked.
Both Microsoft and Google have announced that they will no longer support SHA1 signing certificates after 16 and 17.
HTTPS usage Costs
The only problem with HTTPS at the moment is that it has not been applied on a large scale and has received less attention and research. As for the cost of use and the extra cost, don't worry too much at all.
Generally speaking, you may be very concerned about the following issues before using HTTPS:
Certificate fees and update maintenance. We feel that the application of the certificate is very troublesome, the certificate is also very expensive, but the certificate is not expensive, cheap dozens of yuan a year, up to hundreds of. And now there are free certification authorities, such as the famous Mozilla-sponsored free Certificate project: Let's Encrypt (https://letsencrypt.org/) supports free certificate installation and Automatic Updates. The project will be formally used in the middle of this year.
The cost of digital certificates is actually not high, for small and medium-sized websites can use cheap or even free digital Certificate Services (there may be security risks), like the famous VeriSign Company's certificate generally thousands of to tens of thousands of blocks a year. Of course, if the company's demand for certificates is relatively large, high customization requirements, you can establish their own CA site, such as Google, can be free to issue Google-related certificates.
HTTPS reduces user access speed. HTTPS has a somewhat reduced speed, but the impact of HTTPS on speed is fully acceptable as long as it is properly optimized and deployed. In many scenarios, the HTTPS speed is exactly the same as HTTP, if you use SPDY,HTTPS even faster than HTTP.
We now use Baidu HTTPS security search, have you feel slow?
HTTPS consumes CPU resources and requires a large number of machines to be added. The asymmetric key exchange is described earlier, which is a major drain on CPU computing resources, and in addition, symmetric plus decryption requires CPU computing.
Similarly, as long as reasonable optimization, HTTPS machine costs will not increase significantly. For small and medium-sized websites, there is no need to add machines to meet performance requirements.
Postscript
Many foreign large Internet companies have enabled the full-site HTTPS, which is the trend of the Internet in the future. The large domestic internet does not deploy HTTPS all the time, but HTTPS is enabled on some sub-pages/sub-requests involving accounts or transactions. Baidu search for the first full-site deployment of HTTPS, the domestic Internet, the full-site HTTPS process will have a huge role in promoting.
At present, the Internet on the Chinese information on HTTPS is relatively small, this article focuses on the HTTPS protocol involved in the important knowledge points and usually not easy to understand the blind area, hoping to understand the HTTPS protocol is helpful. Baidu HTTPS performance optimization involves a lot of content, from the front page, back-end architecture, protocol features, encryption algorithms, traffic scheduling, architecture and operation, security and other aspects have done a lot of work. The articles in this series will be described in turn.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
HTTPS practices for large Web sites (i)--HTTPS protocols and principles