Topic One: Why is HTTPS safe
1. Why is HTTP not secure?
The HTTP protocol belongs to the plaintext transmission protocol, the interaction process and data transmission are not encrypted, the communication parties have not carried out any authentication, the communication process is very easy to encounter hijacking, monitoring, tampering, in severe cases, will cause malicious traffic hijacking and other issues, It even causes serious security issues such as the disclosure of personal privacy (such as bank card numbers and password leaks).
HTTP communication can be likened to sending letters, A to B letters, letters in the delivery process, will pass through a lot of postman's hand, they can open the letter to read the contents (because HTTP is transmitted in plaintext). Any content in A's letter (including all types of accounts and passwords) will be easily stolen. In addition, the postman can forge or modify the contents of the letter, resulting in B received the letter content is false.
For example, in the HTTP communication process, the "middleman" embeds the advertisement link in the server sends the user the HTTP message, causes the user interface to appear many bad links, or modifies the user's request header URL, causes the user's request to be hijacked to another website, the user's request will never arrive the real server. This can lead to users not getting the right services, or even a heavy loss.
2, how to ensure the security of HTTPS?
To solve the problem caused by HTTP, introduce encryption and authentication mechanisms.
If the server (hereafter referred to as the server) to client (hereafter referred to as the client) message is ciphertext, only the server and the client can read, you can guarantee the confidentiality of data. At the same time, before exchanging data, verify the legal identity of each other, can guarantee the security of both sides of the communication. So, the question is, how can the client read the data after the server encrypts the data? The server must then tell the client of the encrypted key (symmetric key, which is explained later) that the client can use the symmetric key to unlock the contents of the ciphertext. However, the server if the symmetric key in plaintext to the client, or will be intercepted by the middleman, the intermediary will know the symmetric key, still cannot guarantee the confidentiality of the communication. However, if the server sends the symmetric key to the client in ciphertext way, how can the client solve the cipher and get the symmetric key?
Speaking of which, are we a little confused? A moment key, a moment of symmetric key, are a bit dizzy rhythm. Here, in advance for everyone to popularize, here the key, refers to the non-symmetric encryption key, is used for the TLS handshake phase; symmetric key, which refers to the symmetric encryption key, is used for subsequent transmission of data and decryption. This is explained in detail below.
At this point, we introduce the concept of asymmetric encryption and decryption. In the asymmetric encryption and decryption algorithm, the public key encrypted data, there is only a unique private key to be able to decrypt, so the server as long as the public key to the client, the client can use this public key to encrypt the data transmission of the symmetric key. When the client uses the public key to send the symmetric key to the server, even if the middleman intercepts the information, it cannot be decrypted because the private key is deployed only on the server and no one else has the private key, so only the server can decrypt it. After the server gets the client's information and decrypts the private key, it can get the symmetric key of the encryption and decryption data, and use the symmetric key to decrypt the subsequent communication data. In addition, asymmetric encryption can be a good management of symmetric keys, to ensure that each data encryption of the symmetric key is not the same, this way, even if the client virus pull to the communication cache information, can not steal normal communication content.
The above communication process can be drawn into the following interaction diagram:
But this does not seem to be enough, if during the communication process, during the three handshake or the client initiates the HTTP request process, the client's request is hijacked by the middleman, then the middleman can disguise as "impersonate the client" and the server communication, the middleman can also disguise as "fake server" and client communication. Next, we elaborate on the process by which the middleman obtains the symmetric key:
When the middleman receives the public key (the "correct public key") sent to the client by the server, it is not sent to the client, but the intermediary sends its own public key (where the middleman also has a public and private key, which is called "Forged public Key") to the client. After the client encrypts the symmetric key with this "forged public key", the middle person can decrypt the data with their private key and get the symmetric key, and the intermediary then sends the symmetric key back to the server with the "correct public key" encryption. At this point, the client, the middleman, the server has the same symmetric key, the subsequent client and server all the encrypted data, the intermediary can be decrypted by the symmetric key.
The process of obtaining a symmetric key by the middleman is as follows:
To solve this problem, we introduced the concept of digital certificates. The server first generates the public private key, provides the public key to the relevant authority (CA), the CA puts the public key into the digital certificate and the digital certificate to the server, at this time the server is not the simple public key to the client, but to the client a digital certificate, the digital certificate added some digital signature mechanism, Ensure that the digital certificate must be the server to the client. The fake certificate sent by the middleman cannot be certified by the CA, at which point the client and the server know that the communication is hijacked. The SSL session that joins the CA digital signature authentication process is as follows:
So synthesize the above three points: Asymmetric encryption algorithm (public and private) Exchange symmetric key + digital certificate Verify identity (Verify public key is forged) + encrypt subsequent transmitted data with symmetric key = Secure
3. Introduction to HTTPS protocol
Why simply introduce the HTTPS protocol? Because HTTPS involves too many things, especially the encryption and decryption algorithm, is very complex, the author himself to these algorithms are not finished, just understand some of the fur. This part is simply to introduce some of the most basic principles of HTTPS, for the subsequent analysis of the establishment of HTTPS and HTTPS optimization and other content to lay a theoretical foundation.
3.1 Symmetric encryption algorithm
Symmetric encryption refers to the encryption and decryption of an algorithm that uses the same key. It requires the sender and receiver to agree on a symmetric key before communicating securely. The security of a symmetric algorithm depends entirely on the key, which means that anyone can decrypt the message they send or receive, so the confidentiality of the key is critical to the communication.
3.1.1 Symmetric encryption is divided into two modes: Stream encryption and packet encryption
Stream encryption treats messages as a stream of bytes, and uses mathematical functions to function on each byte bit, respectively. With stream encryption, the same plaintext bits are converted to different ciphertext bits each time the encryption is encrypted. Stream encryption uses a key stream generator, which generates a ciphertext with a byte stream that is different from the clear text stream.
Packet encryption is the grouping of messages into groupings that are then processed by mathematical functions, one at a time. Suppose that a 64-bit block cipher is used, and if the message length is 640 bits, it is divided into 10 64-bit groupings (if the last packet length is less than 64, then 0 is added to 64 bits), each grouping is processed with a series of mathematical formulas, and finally 10 pieces of encrypted text are grouped. The ciphertext message is then sent to the peer. The peer must have the same block cipher, using the previous algorithm to decrypt the 10 cipher packets in reverse order, and eventually get the plaintext message. The more commonly used packet encryption algorithms are DES, 3DES, AES. Where des is older encryption algorithms are now proven unsafe. The 3DES is a transition encryption algorithm, which is equivalent to triple operation on des basis to improve security, but it is still in accordance with DES Algorithm in essence. AES is an alternative algorithm of DES algorithm and is one of the most secure symmetric encryption algorithms now.
Advantages and disadvantages of 3.1.2 symmetric encryption algorithm:
Advantages: Low computational capacity, fast encryption, high encryption efficiency.
Disadvantages:
(1) Both sides of the transaction use the same key, the security is not guaranteed;
(2) Every time a symmetric encryption algorithm is used, it is necessary to use a unique key that other people do not know, which makes the number of keys owned by both sides of the receiving information grow exponentially, and key management becomes a burden.
3.2 Asymmetric Encryption algorithm
Before the asymmetric key exchange algorithm, the most important flaw of symmetric encryption is that we don't know how to transfer the symmetric key between two communicating parties without letting the middleman steal it. When the asymmetric key exchange algorithm is born, the symmetric key transfer is specially encrypted and decrypted, which makes the interactive transmission of the symmetric key more secure.
Asymmetric key exchange algorithm itself is very complex, the key exchange process involves random number generation, modulo exponential operation, blank completion, encryption, signature and so on a series of extremely complex process, the author himself did not study completely thorough. The common key exchange algorithm has Rsa,ecdhe,dh,dhe and other algorithms. Involves a more complex mathematical problem. Among them, the most classic is also the most commonly used is the RSA algorithm.
RSA: Born in 1977, after a long time of the crack test, the algorithm security is very high, most importantly, the algorithm implementation is very simple. The disadvantage is that large prime numbers (currently 2048-bit) are needed to ensure security intensity and consume CPU computing resources. RSA is the only algorithm that can be used for both key exchange and certificate signature, RSA is the most classic and the most common non-symmetric encryption and decryption algorithm.
3.2.1 Asymmetric encryption is more secure than symmetric encryption, but there are two fatal drawbacks:
(1) CPU computing resource consumption is very large. A full TLS handshake, where the asymmetric decryption computation of the key exchange accounts for more than 90% of the entire handshake process. Symmetric encryption is only equivalent to 0.1% of asymmetric encryption. If the subsequent application Layer data transfer process also uses asymmetric encryption and decryption, then the CPU performance overhead is too large, the server is simply unbearable. The experimental data given by Saimentec show that the asymmetric algorithm consumes more than 1000 times times more CPU resources than the symmetric algorithm, while decrypting the same number of files.
(2) The Asymmetric encryption algorithm has a limit on the length of the encrypted content and cannot exceed the public key length. For example, the current common public key length is 2048 bits, which means that the content to be encrypted cannot exceed 256 bytes.
Therefore, asymmetric encryption and decryption (extreme consumption of CPU resources) can only be used for symmetric key exchange or CA signature, not suitable for the application layer content transfer encryption and decryption.
3.3 Identity Verification
The part of identity authentication in HTTPS protocol is completed by CA digital certificate, which consists of public key, certificate body, digital signature, etc. After the client initiates the SSL request, the server sends the digital certificate to the client and the client authenticates the certificate (verifies that the certificate is forged?). That is, if the public key is forged), if the certificate is not forged, the client obtains an asymmetric key for the symmetric key exchange (gets the public key).
3.3.1 Digital certificates have three functions:
1, identity authorization. Make sure that the Web site that your browser accesses is a trusted, CA-verified site.
2, distribute the public key. Each digital certificate contains the public key generated by the registrant (verify that the public key is legitimate and not forged). The SSL handshake is transmitted to the client through the certificate message.
3. Verify the legality of the certificate. After a client receives a digital certificate, the certificate is validated for legitimacy. The subsequent communication process is only possible if the certificate passed is validated.
3.3.2 Applying for a trusted CA digital certificate usually has the following process:
(1) The server of the company (entity) generates the public and private keys, as well as the CA digital certificate requests.
(2) RA (certificate registration and Auditing Authority) examines the legality of the entity (a formal company registered in the registration system).
(3) The CA (certificate issuing authority) issues a certificate and sends it to the applicant entity.
(4) The certificate is updated to Repository (responsible for digital certificate and CRL content storage and distribution), the entity terminal updates the certificate from Repository, inquires the certificate status and so on.
3.4 Digital Certificate Verification
When the applicant gets the CA's certificate and deploys it on the Web server side, how to confirm that the certificate is issued by the CA after the browser initiates the handshake and receives the certificate? How to avoid a third party to forge this certificate? The answer is digital signatures (digitally signature). The digital signature is the security label for the certificate, currently the most widely used Sha-rsa (Sha for hashing algorithms, RSA for asymmetric encryption algorithms). The process of making and verifying digital signatures is as follows:
1, the signing of digital signature. The first is to use a hash function to treat the signature content as a secure hash, generate a message digest, and then encrypt the message digest using the CA's own private key.
2, the verification of digital signature. Use the CA's public key to decrypt the signature, and then use the same signature function to sign the content of the signing certificate and compare it with the signature content in the service-side digital signature, if the same is considered successful.
It is important to note that:
(1) The asymmetric key used for digital signature issuance and verification is the CA's own public and private key, and is not related to the public key submitted by the certificate requester (the company entity that submitted the certificate request).
(2) The process of signing the digital signature is the opposite of the public key encryption, that is, the private key is encrypted and the public key is decrypted. (A pair of public and private keys, public key encrypted content only the private key can decrypt, in turn, the private key is encrypted, there is a public key to be able to decrypt)
(3) Now the large CA will have a certificate chain, the benefits of the certificate chain: The first is security, keep the CA's private key offline use. A second benefit is ease of deployment and revocation. Why do we have to undo it here? Because, if there is a problem with the CA digital certificate (tampered or contaminated), the root certificate remains secure only if the certificate is revoked at the appropriate level.
(4) The root CA certificate is self-signed, that is, with its own public and private keys to complete the production and verification of the signature. The certificate signature on the certificate chain is signed and validated using the asymmetric key of the previous certificate.
(5) How do I get the key pair for a root CA and a multilevel CA? Also, since they are self-signed and self-certified, are they secure and trustworthy? The answer here is: Of course, because these vendors work with browsers and operating systems, their root public keys are installed by default in the browser or operating system environment.
3.5 Data Integrity Verification
The integrity of the data transfer process is ensured by using a Mac algorithm. To avoid unauthorized tampering of data transmitted in the network, or data bits being contaminated, SSL uses a Mac algorithm based on MD5 or SHA to ensure the integrity of the message (since MD5 is more likely to conflict in real-world applications, so try not to use MD5 to verify content consistency). Mac algorithm is a data digest algorithm under the participation of key, which can convert the key and any length data into fixed length data. The sender, under the action of the key, calculates the Mac value of the message using the MAC algorithm and adds it to the receiver after the message is sent. The receiver calculates the MAC value of the message using the same key and Mac algorithm, and compares it to the Mac value received. If the two are the same, the message does not change, otherwise the message is modified or contaminated during transmission, and the receiver discards the message. SHA also can not use SHA0 and SHA1, Shandong University of Xiao Professor (very cow a female professor, everyone is interested in the Internet to search for her deeds) in 2005 announced the cracked SHA-1 full version of the algorithm, and won the recognition of industry experts. Both Microsoft and Google have announced that they will no longer support SHA1 signing certificates after 16 and 17.
Topic Two: Actual packet capture analysis
In this paper, Baidu search carried out two times to grab the packet, the first time to grab the package to clean up all the browser's cache; The second catch is half a minute after the first grab bag.
Baidu in 2015 has completed the Baidu search of the entire station HTTPS, which in the domestic development of HTTPS has significant significance (currently bat three of us, only Baidu claimed to have completed the entire station HTTPS). So this article takes www.baidu.com as an example to analyze.
At the same time, the author uses the Chrome browser, Chrome supports the SNI (server Name Indication) feature, which is useful for HTTPS performance optimization.
Note: SNI is intended to address a server that uses multiple domain names and certificates for SSL/TLS extensions. In a nutshell, it works by sending the domain name (hostname) you want to access before establishing an SSL connection with the server, so that the server returns a suitable certificate based on the domain name. Currently, most operating systems and browsers are well-supported for SNI extensions, and the OpenSSL 0.9.8 is already built in, and the new Nginx and Apache support the SNI extension feature.
This article grabs the URL for the package access: http://www.baidu.com/
(If it's https://www.baidu.com/, the following results are different!) )
Clutch results:
As you can see, Baidu adopts the following strategies:
(1) For high-version browsers, if HTTPS is supported and the decryption algorithm is above TLS1.0, all HTTP requests are redirected to HTTPS requests
(2) For HTTPS requests, the same is true.
"Detailed parsing process"
1, TCP three-time handshake
Can see, My Computer access is http://www.baidu.com/, in the first three times to build a handshake, the client is to connect to 8080 port (my cell network total exports do a layer of total agent, so the client actually and agent to do three handshake, Agent to help clients to connect to the Baidu server)
2, Tunnel established
Because the cell gateway is set up for proxy access, the client needs to make an "https Connect tunnel" connection (about the "HTTPS connect tunnel" connection at the time of HTTPS access) to the proxy machine. Can be understood as: Although the subsequent HTTPS requests are agent and Baidu server for public private key connection and symmetric key exchange, as well as data communication; However, after the tunnel connection, you can think that the client is also directly and Baidu server communication. )
Fiddler Clutch Results:
3. Client Hello
3.1 Random number
In the client greeting, there are four bytes that record the Coordinated Universal Time (UTC) of the client in UNIX time format. Coordinated world time is the number of seconds that have passed since January 1, 1970 to the current moment. In this case, 0x2516b84b is the coordinated world time. There are 28 bytes of random numbers behind him (Random_c), and we will use this random number in the following procedure.
3.2 SID (Session ID)
If for some reason the conversation is interrupted, a handshake is required. In order to avoid a re-handshake caused by inefficient access, this time introduced the concept of session ID, the idea of session ID is very simple, that is, every conversation has a number (session ID). If the conversation is interrupted, the next time you reconnect, as long as the client gives the number, and the server has this numbered record, the two sides can reuse the existing "symmetric key" without having to regenerate it.
Because when we grabbed the bag, it was the first time we visited Https://www.baodu.com in a few hours, so there was no session ID. (We'll see a half-minute next time, and the second grab will have this session ID)
Session ID is a method that is currently supported by all browsers, but its disadvantage is that the session ID is often kept on only one server. So, if the client's request to another server (this is very likely, for the same domain name, when the traffic is very large, often in the background there are dozens of Rs machine in the service), can not resume the dialogue. The session ticket was created to solve this problem and is currently supported only by Firefox and Chrome browsers.
3.3 Tantra (Cipher Suites)
A number of combinations are recommended in RFC2246, the general wording is "key exchange algorithm-symmetric encryption algorithm-hashing algorithm," Tls_rsa_with_aes_256_cbc_sha "for example:
(a) TLS is the protocol, RSA is the key exchange algorithm;
(b) AES_256_CBC is a symmetric encryption algorithm (where 256 is the key length and CBC is the Grouping method);
(c) SHA is an algorithm for hashing.
The browser supports more encryption algorithms than usual, and the server chooses the appropriate encryption combination for the client based on its own business situation. (e.g. integrated security and speed, performance, etc.)
3.4 Server_Name Extensions (general browsers also support SNI extensions)
When we go to visit a site, must be first through the DNS to resolve the IP address of the site, through the IP address to access the site, because many times an IP address is to many sites public, so if there is no server_name this field, Server is unable to give the corresponding digital certificate to the client, the server_name extension allows the server to grant the corresponding certificate to the browser request.
Server reply
(including server Hello,certificate,certificate Status)
After receiving the client Hello, the server will reply to three packets, and look at the following separately:
4. Server Hello
4.1. We got the UTC and 28 bytes random number (random_s) recorded by the Server in UNIX time format.
4.2, Seesion ID, the service side for the session ID will generally have three options (we will see a second time after half a minute, the next capture package will have this session ID):
(1) Resumed session ID: As we have already mentioned in client Hello, if the session ID in client hello is cached on the server, the server will try to restore the session;
(2) New session ID: Here are two cases, the first is the client hello inside the session ID is null, at this time the server will give the client a new session ID, the second one is client hello inside the session ID This server does not find the corresponding cache, at this time also will return a new session ID to the client;
(3) NULL: The server does not want this session to be restored, so the session ID is empty.
4.3, we remember in the client hello, clients give a variety of encryption family Cipher, and in the client provided by the encryption family, the server selected "tls_ecdhe_rsa_with_aes_128_gcm_sha256"
(a) TLS is the protocol, RSA is the key exchange algorithm;
(b) AES_256_CBC is a symmetric encryption algorithm (where 256 is the key length and CBC is the Grouping method);
(c) SHA is an algorithm for hashing.
This means that the server uses the ECDHE-RSA algorithm for key exchange, encrypts the data through the AES_128_GCM symmetric encryption algorithm, and uses the SHA256 hashing algorithm to ensure data integrity.
5, Certificate
In the previous study of HTTPS theory, we know that in order to secure the public key to the client, the server will put the public key into the digital certificate and sent to the client (digital certificate can be self-signed, but generally in order to ensure security will have a special CA authority issued), so this message is a digital certificate, 4097 Bytes is the length of the certificate.
We open this certificate, we can see the specific information of the certificate, this specific information through the packet capture message is not too intuitive, can be viewed directly on the browser. (Click the green button on the top left of the Chrome browser)
6. Server Hello Done
The package we grabbed is the package that merges server Hello done and server key exchage:
7, the client to verify the authenticity of the certificate
The client verifies the legality of the certificate, and if the validation passes for subsequent communication, otherwise prompts and actions are made depending on the error condition, including the following:
(1) The credibility of the certificate chain trusted certificate path, as described above;
(2) Whether the certificate is revoked revocation, there are two kinds of way offline CRL and online OCSP, different client behavior will be different;
(3) Validity expiry date, whether the certificate is in the valid time range;
(4) Domain name, verify that the certificate domain name matches the current access domain name, matching rules follow-up analysis;
8. Secret key exchange
This process is very complex and probably summarizes:
(1) First, the client uses CA digital certificate to realize identity authentication and uses asymmetric encryption to negotiate symmetric key.
(2) The client transmits a "PubKey" random number to the server, and after the server receives it, it generates another "PubKey" random number using a particular algorithm, and the client generates a Pre-master random number using these two "PubKey" random numbers.
(3) The client uses its own random number random_c, which is transmitted within the client hello, and the random number random_s in the received server Hello, plus the pre-master random number, and generates the symmetric key using the symmetric key generation algorithm ENC_ Key:enc_key=fuc (Random_c, random_s, Pre-master)
9. Generate Session Ticket
If for some reason the conversation is interrupted, a handshake is required. In order to avoid the inefficient access caused by a handshake, the concept of session ID is introduced, and the idea of Session ID (and session Ticke) is simple, that is, every conversation has a number (session ID). If the conversation is interrupted, the next time you reconnect, as long as the client gives the number, and the server has this numbered record, the two sides can reuse the existing "conversation key" without having to regenerate it.
Because when we grabbed the bag, it was the first visit to the https://www.baodu.com home in a few hours, so there was no session ID. (We'll see half a minute in a moment, and the second grab will have this session ID)
Session ID is a method that is currently supported by all browsers, but its disadvantage is that the session ID is often kept on only one server. Therefore, if a client's request is sent to another server, the conversation cannot be resumed. The session ticket was created to solve this problem and is currently supported only by Firefox and Chrome browsers.
Following the establishment of a new HTTPS session, you can use the session ID or session tickets, the symmetric key can be reused, thereby eliminating the HTTPS public private key exchange, CA authentication and so on, greatly shorten the HTTPS session connection time.
10. Using symmetric key to transmit data
Three and a half minutes later, visit Baidu again:
There are these big differences:
Since the server and browser cache session ID and session tickets, there is no need to pass the public key certificate, CA authentication, generate symmetric key and so on, directly using the symmetric key of half a minute and decrypting the data for the session.
1. Client Hello
2. Server Hello
HTTPS why secure & analyze HTTPS connections establish the whole process