HTTPS-related content
Why https?
HTTP is transmitted in plaintext, which means that any node between the sender and acceptor can know what the content you transmit is. These nodes may be routers and proxies.
The most common example is user login. If you enter your account and password and use HTTP, you only need to do something on the proxy server to get your password.
User Login --> Proxy Server (hands and feet) --> actual authorization Server
Encrypt the password on the sender? It's useless. Although others don't know how much your original password is, they can still log on to the encrypted account and password.
How does HTTPS ensure security?
HTTPS is actually the meaning of secure http, that is, the security upgrade of HTTP. Anyone who knows a little about the network knows that HTTP is the application layer protocol and is located under the HTTP protocol as the transmission protocol TCP. TCP is responsible for transmission, while HTTP defines how data is packaged.
HTTP --> TCP (plaintext transmission)
What are the differences between HTTPS and HTTP? In fact, a layer of encryption layer TLS/SSL is added between HTTP and TCP.
Is Shenma TLS/SSL?
In general, TLS and SSL are similar. SSL is an encryption suite that encrypts HTTP data. TLS is an upgraded version of SSL. As for HTTPS, cipher suites basically refer to TLS.
Transmission encryption process
Previously, the application layer directly transmitted data to TCP. Now, the application layer has changed the data to TLS/SSL, encrypted the data, and then transmitted to TCP.
.
That's the case. Data is encrypted before transmission, rather than being transferred to a complex and dangerous network. This ensures data security to a large extent. In this way, even if the data is intercepted by the intermediate node, the bad guys cannot understand it.
How HTTPS encrypts data
If you have knowledge about security or cryptography, You should know common encryption methods. In general, encryption is divided into symmetric encryption and asymmetric encryption (also called public key encryption ).
Symmetric encryption
Symmetric encryption means that keys used to encrypt data are the same as keys used to decrypt data.
The advantage of symmetric encryption is that encryption and decryption efficiency are usually relatively high. The disadvantage is that the data sender and data receiver need to negotiate and share the same key, and ensure that the key is not disclosed to others. In addition, for multiple individuals with data exchange needs, a key needs to be allocated and maintained between two pairs. The cost is basically unacceptable.
Asymmetric encryption
Asymmetric encryption means that the key (Public Key) used for data encryption is different from the key (Private Key) used for data decryption.
What is a public key? Actually, it is literally the public key, which can be found by anyone. Therefore, asymmetric encryption is also called public key encryption.
Correspondingly, the private key is a non-public key, which is generally held by the website administrator.
What is the relationship between the public key and the private key?
To put it simply, data encrypted by the Public Key can only be unlocked by the private key. Data Encrypted with the private key can only be unbound from the public key.
Many of you know that you can use the private key to unbind the data encrypted by the public key. However, you can use the public key to decrypt the data encrypted by the private key. This is critical to understanding the entire HTTPS encryption and authorization system.
Here is an example of asymmetric encryption used to log on to the user: James authorized Website: a well-known Social Network website (hereinafter referred to as XX)
James is a user of XX on a well-known social networking website. XX uses asymmetric encryption for login for security reasons. James typed in the account and password on the login page and clicked "log in ". As a result, the browser uses the public key to encrypt James's account and password and sends a login request to XX. XX's login authorization program decrypts the account and password through the private key, and passes verification. Then, the personal information (including privacy) of James is encrypted using the private key and transmitted back to the browser. The browser decrypts the data through the public key and displays it to James.
Step 1: Enter the account password and password, and use the public key in the browser to encrypt the password. The request is sent to XX. Step 2: decrypt the password with the private key, and pass the verification. --> get James's social data, encrypt with private key --> the browser decrypts data with the public key and displays the data.
Can asymmetric encryption solve the problem of data transmission security? As mentioned above, the public key can be unencrypted and the Public Key is encrypted. That is to say, asymmetric encryption can only ensure the security of one-way data transmission.
In addition, there is also a question about how to distribute/obtain public keys. The two issues will be further discussed below.
Public key encryption: two obvious problems
The previous example of James logging on to the social media website XX mentioned that there are two obvious problems in the use of public key encryption alone.
How to obtain the public key for data transmission is only one-way Security Question 1: How to obtain the Public Key
How does the browser obtain the XX Public Key? Of course, James can check the website on his own, and XX can also paste the public key on his home page. However, for a social network with tens of millions of users, success or failure may cause great inconvenience to users. After all, most users do not know what the "Public Key" is.
Question 2: Only one-way security for data transmission
As mentioned above, only the private key can be unbound from the data encrypted by the public key. Therefore, James's account and password are secure and are not afraid of being intercepted.
Then there is a big problem: the data encrypted by the private key can also be unlocked by the public key. When the public key is made public, James's private data is equivalent to a streaking method on the Internet. (James's data can be decrypted without hesitation after the intermediate Proxy Server obtains the public key)
The following are the answers to these two questions.
Question 1: How to obtain the Public Key
Two very important concepts are involved: Certificate and CA ).
Certificate
It can be understood as the website's ID card temporarily. This ID card contains a lot of information, including the Public Key mentioned above.
That is to say, when users such as Xiao Ming, Xiao Wang, and Xiao Guang access XX, they no longer need to find XX public keys in the world. When they access XX, XX will send the certificate to the browser and tell them, hey, use the public key in it to encrypt data.
Here is a question: where does the so-called "certificate" come from? This is what CA is responsible.
CA (Certificate Authority)
Emphasize two points:
There are many CAs that can issue certificates (both at home and abroad ). Only a few CAs are considered authoritative and fair. The certificates issued by these CAs are trusted by the browser. For example, VeriSign. (CA has never forged a certificate ...)
The certificate issue details are not described here. You can simply understand that the website has submitted an application to the CA. After the CA approves the application, the certificate is issued to the website. When a user accesses the website, the website delivers the certificate to the user.
The details of the certificate are also described later.
Question 2: Only one-way security for data transmission
As mentioned above, data encrypted with the private key can be decrypted and restored using the public key. So, does this mean that the data sent from the website to users is insecure?
The answer is: yes !!! (Three exclamation points represent the Power of Three)
Here, you may think like this in your mind: using HTTPS, data is still streaking, so unreliable, it is better to directly use HTTP to save time.
However, why is the industry increasingly popular with HTTPS-based websites? This is obviously contrary to our perceptual knowledge.
Although HTTPS uses public key encryption, it also uses other methods, such as symmetric encryption, to ensure the efficiency and security of authorization and encrypted transmission.
In summary, the entire simplified encrypted communication process is:
James visits XX and XX delivers his certificate to James (in fact, it is sent to the browser and James will not perceive it) the browser obtains the XX public key from the certificate. browser A generates A symmetric key B that only has its own, encrypts it with the public key A, and passes it to XX (in fact, there is A negotiation process, this is simplified for ease of understanding.) XX uses the private key for decryption, and obtains the data communication between the symmetric key B browser and the data communication after xx, which is encrypted with the key B.
Note: For every user accessing XX, the generated symmetric key B is theoretically different. For example, Xiao Ming, Xiao Wang, and Xiao Guang may generate B1, B2, and B3.
Reference: (attach source image)
What problems does the certificate have?
After learning about the HTTPS encrypted communication process, we should be able to eliminate any concerns about data streaking. However, a careful audience may have questions: how to ensure that the certificate is valid and valid?
There are two possible reasons for invalid certificates:
The certificate is forged: it is not a certificate issued by the CA that has been tampered with. For example, if you replace the public key of the XX website, for example:
We know that there is a kind of thing in this world called proxy. Therefore, it is possible for James to log on to the XX website. James's login request first goes to the proxy server, the proxy server then forwards the request to the authorization server.
James --> evil Proxy Server --> login authorization server James <-- evil proxy server <-- login authorization Server
Then, there were too many bad guys in the world. One day, the proxy server moved into a bad mind (or possibly hacked) and intercepted James's request. At the same time, an invalid certificate is returned.
James --> evil Proxy Server -- x --> login authorization server James <-- evil Proxy Server -- x --> login authorization Server
If good James believes in this certificate, then he will be streaking again. Of course, this is not the case. So, What mechanisms can be used to prevent such incidents from being released.
Next, let's take a look at the content of the "Certificate", and then we can roughly guess how to prevent it.
Certificate Overview
Before formally introducing the Certificate Format, insert a small advertisement, digital signature and abstract under popular science, and then give a non-in-depth introduction to the certificate.
Why? Because digital signatures and summaries are critical weapons for certificate anti-counterfeiting.
Digital Signature and summary
Simply put, the "abstract" refers to the content transmitted. A fixed-length string is calculated using the hash algorithm (does it come up with the abstract ). Then, encrypt the digest using the private key of the CA. The encrypted result is "Digital Signature ". (Here we will introduce the private key of CA)
Plaintext --> hash operation --> abstract --> private key encryption --> Digital Signature
Based on the above content, we know that this digital signature can only be decrypted by the CA Public Key.
Next, let's take a look at what the mysterious "certificate" contains and roughly guess how to prevent illegal certificates.
For more information about digital signatures and summaries, see this article.
Certificate Format
The Certificate Format comes from this good article "OpenSSL and SSL digital certificate concept Post".
There are a lot of content. here we need to pay attention to the following points:
The certificate contains the Certificate Authority name-the digital signature of the CA certificate content (encrypted with the CA private key) hash algorithm used by the certificate holder's Public Key Certificate Signature
In addition, one thing to add is:
The CA itself has its own certificate, which Jianghu calls as "root certificate ". This "Root Certificate" is used to prove the identity of the CA, essentially a common digital certificate. Browsers usually have built-in root certificates from most mainstream authoritative CAS. Certificate Format
1. the certificate Version number (Version) indicates the format Version of the X.509 Certificate. The current value can be: 1) 0: v1 2) 1: v2 3) 2: v3 is also pre-defined for future versions. the Serial Number of the certificate (Serial Number) specifies the unique "Digital identifier" assigned to the certificate by the CA ". When the certificate is canceled, the serial number of the certificate is actually put into the CRL issued by the CA, which is the only reason for the serial number. 3. The Signature Algorithm identifier (Signature Algorithm) is used to specify the "Signature Algorithm" used when the CA issues the certificate ". The algorithm identifier is used to specify the Public Key algorithm used by the CA to issue a certificate. 2) the hash algorithm example: sha256WithRSAEncryption must be registered with internationally renowned standards organizations (such as ISO). 4. issuer Name (Issuer) this domain is used to identify the X.500 DN (DN-Distinguished Name) Name of the CA that issues the certificate. Including: 1) Country (C) 2) Province (ST) 3) area (L) 4) Organization (O) 5) Unit Department (OU) 6) general name (CN) 7) email address 5. validity Period (Validity) specifies the Validity period of the certificate, including: 1) date and time when the certificate starts to take effect 2) date and time when the certificate expires each time you use the certificate, you need to check whether the certificate is within the Validity period. 6. The certificate user name (Subject) specifies the unique X.500 name of the certificate holder. Including: 1) Country (C) 2) Province (ST) 3) area (L) 4) Organization (O) 5) Unit Department (OU) 6) general name (CN) 7) email address 7. certificate Holder Public Key information (Subject Public Key Info) Certificate Holder Public Key information domain contains two important information: 1) value of the Public Key of the certificate holder 2) algorithm identifier used by the Public Key. This identifier contains the Public Key algorithm and hash algorithm. 8. extension X.509 the V3 certificate adds an extension item in the standard or common form on the basis of v2 to enable additional information to be attached to the certificate. Standard extension is an extension that has broad application prospects for the V2 version defined by X.509 V3. Anyone can register some other extensions with some authority such as ISO, if these extensions are widely used, they may become standard extensions in the future. 9. The Unique Identifier of the Issuer (Issuer Unique Identifier) is added to the certificate definition in version 2nd. This field is used to uniquely identify the issuer's X.500 name with a one-bit string when the same X.500 name is used by multiple certification bodies. Optional. 10. The Unique Identifier (Subject Unique Identifier) of the certificate holder adds the X.509 Certificate definition to the standard of version 2nd. This field is used when the same X.500 name is used by Multiple Certificate holders, and a one-bit string is used to uniquely identify the X.500 name of the certificate holder. Optional. 11. Signature Algorithm (Signature Algorithm) the Signature Algorithm example: sha256WithRSAEncryption12. Signature value (Issuer's Signature) of the Certificate Issuer's Signature value for the above content of the certificate
How to identify illegal certificates
As mentioned above, the XX certificate contains the following content:
The certificate contains the Certificate Authority name-the digital signature of the CA certificate content (encrypted with the CA private key) hash algorithm used by the certificate holder's Public Key Certificate Signature
The CA root certificate built in the browser contains the following key content:
CA Public Key (very important !!!)
Now, let's explain how to identify the two illegal certificates mentioned earlier.
Completely forged certificate
This situation is relatively simple. Check the certificate:
The certificate issuing authority is forged: the browser does not know it and directly considers it as a dangerous certificate issuing authority. Therefore, according to the CA name, find the built-in CA root certificate and CA Public Key. Use the CA Public Key to decrypt the digest of the forged certificate. Certificates deemed to be dangerous certificates that have been tampered
Assume that the proxy obtains the XX Certificate in some way, secretly modifies the public key of the certificate to its own, and then thinks that the user is hooked. However, it is too simple:
Check the certificate and find the corresponding CA root certificate and the CA Public Key Based on the CA name. Use the CA's public key to decrypt the digital signature of the certificate, and obtain the certificate digest AA. Based on the hash algorithm used by the certificate signature, the Digest BB of the current certificate is calculated and compared with AA and BB, inconsistency found --> dangerous certificate HTTPS handshake process
As mentioned above, we have talked about a large number of topics. The mechanism of how to ensure the security of encrypted data transmission over HTTPS is basically covered, and the technical details are skipped directly.
The last two questions are as follows:
How does a website send certificates to users (browsers) and negotiate the symmetric keys mentioned above?
The above two problems are actually what needs to be done in the HTTPS handshake phase. The HTTPS data transmission process is similar to HTTP. It also contains two phases: handshake and data transmission.
Handshake: Certificate delivery, key negotiation (this stage is in plain text) data transmission: This stage is encrypted, using the symmetric key negotiated in the handshake stage.
Instructor Yan's article is very good and easy to understand. If you are interested, you can read it.
Appendix: SSL/TLS Protocol operating mechanism Overview: http://www.ruanyifeng.com/blog/2014/02/ssl_tls.html
Post
Some of the content in the popular article is not rigorous enough. If there are any mistakes, please point out :)