I understand The-https principle of reptile

Source: Internet
Author: User
Tags ack decrypt sha1 asymmetric encryption

Crawler is a reverse engineering, other people develop the site, we through an interface, such as url, can get back content, only that, we can understand only these.

    • 0.how HTTPS and HTTP work (http://www.cnblogs.com/ttltry-air/archive/2012/08/20/2647898.html)
    • How HTTPS works

HTTPS requires a handshake between the client (browser) and the server (web Site) before transmitting the data, which establishes the password information for both parties to encrypt the transmitted data during the Handshake. TLS/SSL protocol is not only a set of encrypted transmission protocols, but also an artist-designed artwork, Tls/ssl using asymmetric encryption, symmetric encryption and hash algorithm. A specific description of the handshake process is as follows:


1. The browser sends a set of encryption rules that it supports to the web Site.
2. The website selects a set of cryptographic algorithms and hash algorithms, and sends its own identity information back to the browser in the form of a Certificate. The certificate contains information such as the website address, the encrypted public key, and the issuing authority of the Certificate.
3. After the browser obtains the website certificate, the browser will do the following work:
A) Verify the legality of the certificate (the issuing authority is legal, the certificate contains the address of the website is consistent with the address being accessed, etc.), if the certificate is trusted, the browser bar will display a small lock, otherwise the certificate is not trusted to Prompt.
B) If the certificate is trusted, or if the user accepts an untrusted certificate, the browser generates a random number of passwords and encrypts them with the public key provided in the Certificate.
C) computes the handshake message using the Agreed-upon hash algorithm, encrypts the message using the generated random number, and finally sends all previously generated information to the Web Site.
4. After the Web site receives the data from the browser, do the following:
A) Use your own private key to decrypt the information to remove the password, use the password to decrypt the Browser's handshake message, and verify that the hash is consistent with the Browser.
B) encrypt a handshake message with a password and send it to the Browser.
5. The browser decrypts and calculates the hash of the handshake message, if it is consistent with the hash of the server, at which point the handshake process ends, and all the communication data will be encrypted by the random password generated by the previous browser and using the symmetric encryption Algorithm.

Here the browser and the Web site to send encrypted handshake message and verify, The purpose is to ensure that both sides have obtained a consistent password, and can be normal encryption and decryption of data, for the subsequent transmission of real data to do a Test. In addition, HTTPS generally uses the encryption and hashing algorithm as Follows:


Asymmetric Encryption Algorithm: RSA,DSA/DSS
Symmetric encryption Algorithm: Aes,rc4,3des
Hash algorithm: md5,sha1,sha256

The communication timing diagram for HTTPS corresponds to the following:

The difference between the HTTPS protocol and the HTTP Protocol: (see reference 2 for an introduction to the specific HTTP Protocol)
The HTTPS protocol requires a certificate to be applied to the ca, and the general free certificate is very small and requires a fee.
HTTP is a hypertext transfer protocol, The information is plaintext transmission, HTTPS is a secure SSL encryption transport Protocol.
HTTP and HTTPS use a completely different connection method with the same port, the former is 80, the latter is 443.
The connection to HTTP is simple and stateless.
HTTPS protocol is a network protocol built by Ssl+http protocol which can encrypt transmission and authentication, and is more secure than HTTP Protocol.

    • TCP3 handshake, 4 Wave process
    • 1. Establish Connection Agreement (three Handshake)

(1) the client sends a TCP message with a SYN flag to the Server. This is the message 1 in the Three-time handshake process.
(2) server-side Response to the client, this is the 2nd message in the three handshake, the message with both an ACK flag and a SYN Flag. So it represents the response to the Client's syn message, and it also flags the SYN to the client and asks the client if it is ready for data communication.
(3) the customer must again respond to the service segment an ACK message, which is the message segment 3.

Why you need a "three-time handshake"

In the fourth edition of the "computer network," the purpose of the "three handshake" is "to prevent the failure of the connection request packet suddenly transmitted to the server, resulting in an error". In another classic computer network, The purpose of the "three-time handshake" is to solve the problem of "repeated grouping of delays in the network". These two kinds of needless statements actually clarify the same problem.
The example in the Shehiren version of the computer network is that the "failed connection request message segment" is generated in a situation where the first connection request message segment of the client is not lost, but is stuck in a network node for a long time, Delay until a certain time after the connection is released before the server Arrives. Originally this is a message segment that has already expired. however, after the server receives this failed connection request message segment, It is mistaken for a new connection request from the Client. The client is then sent a confirmation message segment, agreeing to establish a Connection. Assuming that the "three-time handshake" is not used, The new connection is established as soon as the server issues a Confirmation. Because the client is now not making a connection request, the server acknowledgement is ignored and data is not sent to the Server. But the server thought the new transport connection had been established and waited for the client to send the Data. In this way, many of the Server's resources are Wasted. The use of "three-time handshake" method can prevent the above Phenomenon. For example, in that case, the client does not issue confirmation to the Server's Confirmation. The server knows that the client does not require a connection because it cannot receive a Confirmation. ”。 The main purpose is to prevent the server side from waiting and wasting resources.

    • 2. Connection Termination Protocol (four waves)

Because TCP connections are full-duplex, each direction must be closed separately. The principle is that when a party completes its data sending task, it can send a fin to terminate the connection in this Direction. Receiving a fin only means there is no data flow in this direction, and a TCP connection can still send data after receiving a Fin. The first party to close will perform an active shutdown, while the other side performs a passive shutdown.
(1) The TCP client sends a fin to shut down the Client-to-server data transfer (message segment 4).
(2) the server receives this fin, it sends back an ack, confirms that the serial number is the received sequence number plus 1 (message segment 5). As with syn, a fin will occupy a sequence number.
(3) the server shuts down the client connection and sends a fin to the client (message segment 6).
(4) the customer segment sends back ACK message confirmation, and the confirmation serial number is set to receive the serial number plus 1 (message segment 7).

Why do I need to "wave four times"?
There might be questions about why the ACK was sent with SYN when the TCP connection handshake, but the ACK was not sent with Fin. The reason is that because TCP is in full duplex mode, when you receive fin it means that no data is sent again, but you can continue to send the Data.

Shake hands, wave The process of each state introduction (see Wiki:tcp)

3-time Handshake Process Status:
LISTEN: This is also very easy to understand a state, that the server side of a socket is listening state, can accept the Connection.
Syn_sent: when the client socket performs a connect connection, it sends a SYN message first, so it will then enter the Syn_sent state and wait for the server to send the 2nd message in the Three-time handshake. The Syn_sent status indicates that the client has sent a SYN Message. (sender Side)

Syn_rcvd: This state and syn_sent thinking back echo this state to accept the Syn message, under normal circumstances, This state is the server side of the socket in the establishment of a TCP connection during the three handshake session in the process of a middle state, very short, Basically with netstat you are very difficult to see this state, unless you deliberately write a client test program, deliberately three times the TCP handshake process last ACK message is not Sent. therefore, when the ACK message is received from the client, it goes into the established State. (server Side)
Established: this is easy to understand, indicating that the connection has been Established.

4 waves Wave process status: (can be referenced)
fin_wait_1: This state should be well explained, in fact, the real meaning of fin_wait_1 and fin_wait_2 state is to wait for each Other's FIN Message. The difference between the two states is: the fin_wait_1 state is actually when the socket in the established state, it would like to actively close the connection, send a FIN message to the other side, when the socket is entered into the fin_wait_1 State. And when the other party responds to the ACK message, then into the fin_wait_2 state, of course, under the actual normal circumstances, regardless of the circumstances of each other, should immediately respond to the ACK message, so fin_wait_1 state is generally more difficult to see, and Fin_wait_ 2 states can also sometimes be seen with netstat. (active Side)
Fin_wait_2: above has explained in detail this state, actually fin_wait_2 the socket in the state, indicates the half connection, also namely has the party request close connection, but also tells the other side, I temporarily also some data need to transmit to you (ack information), Close the connection again Later. (active Side)
Time_wait: said to receive the other side of the fin message, and sent out an ACK message, just wait for 2MSL to return to the closed usable State. If the Fin_wait_1 state, received the other side with the FIN flag and the ACK flag message, you can directly into the time_wait state, without having to go through the fin_wait_2 State. (active Side)
CLOSING (relatively rare): This state is relatively special, the actual situation should be very rare, belonging to a relatively rare exception state. normally, when you send a fin message, it is supposed to receive (or Receive) the Other's ACK message before receiving the Other's fin Message. But closing status indicates that you send fin message, and did not receive the Other's ACK message, but also received the other side of the Fin message. Under what circumstances will this happen? In fact, it is not difficult to come to a conclusion: that is, if the two sides close a socket at the same time, then there is a situation where both sides send the fin message, there will be a closing state, indicating that both sides are shutting down the socket Connection.
close_wait: The meaning of this state is actually expressed in waiting to be closed. How do you understand it? When the other side close a socket to send fin message to yourself, you will undoubtedly respond to an ACK message to each other, then enter into the close_wait State. next, The real thing you really need to consider is whether you still have the data sent to the other person, if not, then you can close the socket, send fin messages to each other, that is, close the Connection. So what you need to accomplish in the close_wait state is waiting for you to close the Connection. (passive Side)
Last_ack: This state is still relatively easy to understand, it is the passive close side after sending fin messages, and finally wait for each Other's ACK Message. When an ACK message is received, It is also possible to enter the closed available State. (passive Side)

CLOSED: indicates a connection Interruption.

The specific state diagram of TCP is available for reference:

    • one: About HTTPS
(Http://blog.csdn.net/clh604/article/details/22179907)about Https:

HTTPS is actually made up of two parts: HTTP + ssl/tls, which adds a layer of encryption information to the HTTP Module. The transfer of information between the server and the client is encrypted through tls, so the transmitted data is Encrypted.

HTTPS requires a handshake between the client (browser) and the server (web Site) before transmitting the data, which establishes the password information for both parties to encrypt the transmitted data during the Handshake.

SSL is between the application layer and the TCP layer, and the TLS/SSL protocol is a set of encrypted transport Protocols. The application layer data is no longer passed directly to the transport layer, but is passed to the SSL layer, which encrypts the data received from the application layer and adds its own SSL header.

Asymmetric encryption, symmetric encryption, and hash algorithms are used in TLS/SSL. A specific description of the handshake process is as follows:

1. The browser sends a set of encryption rules that it supports to the web Site.
2. The website selects a set of cryptographic algorithms and hash algorithms, and sends its own identity information back to the browser in the form of a Certificate. The certificate contains information such as the website address, the encrypted public key, and the issuing authority of the Certificate.
3. After the browser obtains the website certificate, the browser will do the following work:
A) Verify the legality of the certificate (the issuing authority is legal, the certificate contains the address of the website is consistent with the address being accessed, etc.), if the certificate is trusted, the browser bar will display a small lock, otherwise the certificate is not trusted to Prompt.
B) If the certificate is trusted, or if the user accepts an untrusted certificate, the browser generates a random number of passwords and encrypts them with the public key provided in the Certificate.
C) computes the handshake message using the Agreed-upon hash algorithm, encrypts the message using the generated random number, and finally sends all previously generated information to the Web Site.
4. After the Web site receives the data from the browser, do the following:
A) Use your own private key to decrypt the information to remove the password, use the password to decrypt the Browser's handshake message, and verify that the hash is consistent with the Browser.
B) encrypt a handshake message with a password and send it to the Browser.
5. The browser decrypts and calculates the hash of the handshake message, if it is consistent with the hash of the server, at which point the handshake process ends, and all the communication data will be encrypted by the random password generated by the previous browser and using the symmetric encryption Algorithm.

Here the browser and the Web site to send encrypted handshake message and verify, The purpose is to ensure that both sides have obtained a consistent password, and can be normal encryption and decryption of data, for the subsequent transmission of real data to do a Test.

Summarize:

Server generates public and private keys with RSA

Put the public key in the certificate sent to the client, the private key to save itself

The client first checks the validity of the certificate to an authoritative server, if the certificate is valid, the client generates a random number, the random number acts as the key of the communication, we call it the symmetric key, encrypts the random number with the public key, and then sends it to the server

The server uses the key to decrypt the symmetric key, and then the two sides encrypt and decrypt the symmetric key to Communicate.

HTTPS is generally used for encryption and hashing algorithms as Follows:

Asymmetric Encryption Algorithm: RSA,DSA/DSS
Symmetric encryption Algorithm: Aes,rc4,3des
Hash algorithm: md5,sha1,sha256

    • two: handshake process for SSL protocol

Before encrypting communication begins, the client and server must first establish the connection and exchange parameters, a process called handshake (handshake).

Http://www.ruanyifeng.com/blog/2014/09/illustration-ssl.html

I understand The-https principle of reptile

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.