Exploring the principle and programming implementation of the SHA-1 algorithm in Redis source code

Guide

The SHA-1 algorithm is short for the first generation of "Security Hash algorithm", and is essentially a Hash algorithm. SHA standards are mainly used for digital signatures to generate message digests, which were once considered to be the successor of the MD5 algorithm. Now the SHA family has five algorithms. Redis uses SHA-1, which can convert a message with a maximum of 2 ^ 64 bits into a 160-bit message digest, the message digest generated by any two groups of different messages is different. Although SHA1 was cracked in early years, as the first generation of SHA family algorithms, it is still of great learning value and guiding significance to us.

The sha1.c file of Redis implements this algorithm, but the source code of this file is actually from**Valgrind**Project**/Tests/sha1_test.c**File (we can see the power of Open Source: from the people, to the people ). It contains four functions:

SHA1Init

SHA1Update

SHA1Transform

SHA1FinalSHA1 algorithm flow overview sha-1 algorithm is roughly divided into five steps: Additional fill bit additional length initialization hash buffer calculation information digest output/return additional fill bit length theoretical basis for the message

**Additional fill space**Make the modulo 512 and 448 cool (M % 512 = 448 ). Even if the conditions are met, 512 bits must be filled ). The filling process is as follows: first fill in one digit and then fill in 0 until the conditions are met. Therefore, it must be filled with at least one digit and a maximum of 512 digits. Because we store messages in bytes, the message length (unit: Bit) must be a multiple of 8. However, we must fill in 8 bits and 8 bits. That is, it is impossible to fill in only one binary bit, at least eight binary bits (one byte ). Therefore, you can enter at least one byte and at most 64 bytes (64*8 = 512 ). After the additional fill space is complete

**Additional Length**That is, the length of the original message is stored with 64-bit data. After the additional fill bit is complete, the message length (unit: Bit) is 512 and 448, and after the 64-bit is appended, the message length is changed to an integer multiple of 512. Finally, when we start to calculate the message digest, every 512 bits are calculated as a group. The SHA_CTX structure is defined in the header file sha1.h:

typedef struct { u_int32_t state[5]; u_int32_t count[2]; unsigned char buffer[64];} SHA1_CTX;

It has three members, which have the following meanings:

Member |
Type |
Description |

Buffer |
Unsigned char [64] |
512 (64 × 8) bit (BIT) Message Block (obtained after processing of the original message) |

State |
U_int32_t [5] |
160 (5 × 32) Bit Message Digest (derived from the SHA-1 algorithm) |

Count |
U_int32_t [2] |
Length of the stored message (unit: Bit) |

SHA1FinalSHA1Final () is the entry and exit of the entire algorithm. This function completes the entire process of the SHA-1 algorithm by calling other functions in the file. Its statement is as follows:

void SHA1Final(unsigned char digest[20], SHA1_CTX* context);

First, three variables are declared:

unsigned i; unsigned char finalcount[8]; unsigned char c;

The following is a conditional test macro. Because it is # if 0, we only pay attention to the # else section:

for (i = 0; i < 8; i++) { finalcount[i] = (unsigned char)((context->count[(i >= 4 ? 0 : 1)] >> ((3-(i & 3)) * 8) ) & 255); /* Endian independent */ }

First, we noticed an annotation of Endian independent. The literal translation is independent of the end, that is, the result of this statement is irrelevant to whether the machine is a large end or a small end. I believe many people will be confused here after learning about the big and small terminals. On the contrary, if you do not know the size of the end, this statement is easy to understand. What we need to understand is: unsigned int a = 0x12345678; unsigned int B = (a> 24) & 255. The value of B is 0x12 (0x00000012), regardless of whether the machine is large or small ). The size end has no effect on the result of the shift operation. the semantics of a> 24 must be the power 24 after dividing a by 2. Finalcount is a char array, and context-> count is an integer array. This statement splits integer data into a single byte for storage. Finalcount

**The storage result can be understood as an extremely large integer with a large order.**. For example, context-> count [0] Stores 0x11223344, and context-> count [1] Stores 0x55667788. Then finalcount [0] ~ Finalcount [7] storage sequence: 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11

c = 0200; SHA1Update(context, &c, 1);

The binary value of c is 10 000 000. As I have explained above, the padding is in bytes, with at least 1 byte and at most 64 bytes. In addition, the first part must be filled with 1, and the second part must be filled with 0. Therefore, to get a message, we must first fill it with a byte of 10 000 000.

**SHA1Update ()**A function is a complete data filling (additional) operation. The specific details of this function are provided later. Here we first focus on the overall structure.

while ((context->count[0] & 504) != 448) {c = 0000; SHA1Update(context, &c, 1); }

This code is easy to see that its function is to cyclically test whether the data model 512 is the same as the data model 448. If the condition is not met, the entire byte is 0. If you are careful, you may find that the conditions here are incorrect:

While (context-> count [0] & 504 )! = 448) // you think it should be while (context-> count [0] & 511 )! = 448)

In theory, your thoughts are really good. But the source code is okay. We can use bc to look at the Binary Expression of the two numbers:

111111000 //504111111111 //511

We can see that their difference is the last three. The last three digits of 504 are all 0,511, and the last three digits are all 1. Context-> count stores the message length in bits. We mentioned above that our data is stored in bytes, so the data in context-> count [] Must be eight multiples, so the last three digits must be 000. Therefore, whether it is 000 & 000 or 000 & 111, the result is 0. Certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------

Although & 504 and & 511 have the same effect, & 504 is poorly readable. I guess the reason why there is code with such poor readability is efficiency. The following is my guess, which is not verified. Suppose A number A, when A and 000... (ALL 0) When performing & Operations, the result must be 0 (the compiler may directly judge it as 0, rather than ignoring the value of ). And when A and 111... when the number (all 1) is executed and operated, the result is the value of A. Therefore, perform A copy to return A. Or the compiler judges it by bit, therefore, when every bit and 1 of A are executed, the compiler needs to check the value on the corresponding Bit Of A. When it is executed with 0, the result is directly set to 0. of course, this is just my guess. please correct me or not. Certificate --------------------------------------------------------------------------------------------------------------------------------------------------------------

SHA1Update(context, finalcount, 8); /* Should cause a SHA1Transform() */

Obviously, this sentence completes the additional length. According to the comments, this will trigger the call of the SHA1Transform () function. This function is used for calculation and generates a 160-bit message digest (message digest) and stored in context-state []. It is the core of the entire SHA-1 algorithm. For details about its implementation, see the following:

**Calculate message Summary**.

for (i = 0; i < 20; i++) { digest[i] = (unsigned char) ((context->state[i>>2] >> ((3-(i & 3)) * 8) ) & 255); }

In this step, the message digest is converted into a single-byte sequence. The code is used to extract the 20-byte (5 × 4 bytes) message digest stored in context-state [5, store it in 20 single-byte arrays digest. And store data in large order (same as the conversion from context-> count [] to finalcount ). The SHA-1 algorithm finally obtains the 160-bit (20 bytes) data.

As mentioned earlier, SHA1UpdateSHA1Update () is used to attach new data (original data, fill bit, and length) to context> buffer.

void SHA1Update(SHA1_CTX* context, const unsigned char* data, u_int32_t len);

Data is the data to be appended. Len is the length of data (unit: bytes)

j = context->count[0]; if ((context->count[0] += len << 3) < j) context->count[1]++; context->count[1] += (len>>29);

Context-> count [] stores the message length. The part beyond the storage range of context-> count [0] is stored in context-> count [1. Len <3 means len * 8, because len is in bytes, and context-> count [] is the unit of storage length, so it is multiplied by 8. If (context-> count [0] + = len <3) <j) means that if len * 8 is added, context-> count [0] overflows, so we need to: context-> count [1] ++; Carry. Len <3 is a bit. len> 29 (len <3> 32) indicates the part of len that needs to be stored in context-> count [1.

j = (j >> 3) & 63;

J> 3 obtains the number of bytes. j = (j> 3) & 63 obtains the value of 6 lower bits, which indicates 64 bytes (512 bits) message length ., Because every computing operation processes 512-bit message data.

if ((j + len) > 63) { memcpy(&context->buffer[j], data, (i = 64-j)); SHA1Transform(context->state, context->buffer); for ( ; i + 63 < len; i += 64) { SHA1Transform(context->state, &data[i]); } j = 0; } else i = 0; memcpy(&context->buffer[j], &data[i], len - i);

The general meaning of this Code is: if the length of j + len is greater than 63 bytes, it will be processed separately, each 64 bytes will be processed once, and then the next 64 bytes will be processed, repeat this process; otherwise, the data is directly appended to the end of the buffer. One by one analysis:

memcpy(&context->buffer[j], data, (i = 64-j)); SHA1Transform(context->state, context->buffer);

I = 64-j, and then copy the I-byte data from the data and append it to the end of context-> buffer [j]. That is to say, the buffer is made into 64 bytes, and then SHA1Transform () is executed () to start a message digest calculation.

for ( ; i + 63 < len; i += 64) { SHA1Transform(context->state, &data[i]); } j = 0;

Then, the loop starts to be processed every 64 bytes. Some may wonder why I + 63 <len; instead of I + 64 <len? The reason is simple-the subscript is counted from 0. All these details are easy to understand. At last, j = 0. Reset the offset of buffer [] to the beginning. Because the data of the Message Digest has been calculated, it is useless.

else i = 0; memcpy(&context->buffer[j], &data[i], len - i);

If the preceding if statement is not true, the length of the original data context-> buffer plus the new data is not enough to 64 bytes, so you can simply append the data. Equivalent to: memcpy (& context-> buffer [j], & data [I], 0); if the preceding if is true, then j is equal to 0, the offset I points to is (between len/64 ┘ × 64, len. The rounded down value indicates the rounded down value.

The hash buffer SHA-1 algorithm needs to use A buffer with two five characters. The First Five-character buffer is identified as A, B, C, D, and E in the RFC document, the second 5-character buffer is marked as H0 ~ H4. H0 ~ H4 is assigned to A ~ in turn ~ E. Then update H0 ~ after each round of calculation ~ H4 value. Below is the H0 ~ Initial Value of H4:

H0 = 0x67452301 H1 = 0xEFCDAB89 H2 = 0x98BADCFE H3 = 0x10325476 H4 = 0xC3D2E1F0

Before calculating the message digest, initialize the buffer of the five words, that is, assign values according to the preceding values. This step is reflected in

**SHA1Init ()**Function.

void SHA1Init(SHA1_CTX* context){ /* SHA1 initialization constants */ context->state[0] = 0x67452301; context->state[1] = 0xEFCDAB89; context->state[2] = 0x98BADCFE; context->state[3] = 0x10325476; context->state[4] = 0xC3D2E1F0; context->count[0] = context->count[1] = 0;}

Based on the theory of Message Digest Calculation, message blocks with a length of 512 (M1, M2 ,...... Mn). to process each Message Block Mi, you need to run an 80-round operation function. In each round, the 160-bit buffer value ABCDE is used as the input, and the buffer value is updated. A non-linear function f (t) is required for each round: the following content is taken from the official RFC documentation. Note that t in brackets is not an input parameter. It can be understood as the subscript of the f function. There are four f functions.

f(t;B,C,D) = (B AND C) OR ((NOT B) AND D) ( 0 <= t <= 19) f(t;B,C,D) = B XOR C XOR D (20 <= t <= 39) f(t;B,C,D) = (B AND C) OR (B AND D) OR (C AND D) (40 <= t <= 59) f(t;B,C,D) = B XOR C XOR D (60 <= t <= 79).

Each round also uses an additional constant K (t ):

K(t) = 0x5A827999 ( 0 <= t <= 19) K(t) = 0x6ED9EBA1 (20 <= t <= 39) K(t) = 0x8F1BBCDC (40 <= t <= 59) K(t) = 0xCA62C1D6 (60 <= t <= 79)

There are five steps:

1. M (t) = W (t) (0 <= t <= 15) 2. W (t) = S ^ 1 (W (T-3) xor w (t-8) xor w (t-14) xor w (T-16) (16 <= t <= 79, S ^ 1 () indicates that the cycle shifts 1 bit left) 3. A = H0, B = H1, C = H2, D = H3, E = H4. 4. for (0 <= t <= 79), execute the 80-round transformation TEMP = S ^ 5 (A) + f (t; B, C, D) + E + W (t) + K (t); E = D; D = C; C = S ^ 30 (B); B = A; A = TEMP; 5. h0 = H0 + A, H1 = H1 + B, H2 = H2 + C, H3 = H3 + D, H4 = H4 + E.

The above mathematical expression is adapted from the RFC document,

**H0 ~ after 80 rounds of Operation ~ H4 is the 160-bit message digest to be generated by the SHA-1 algorithm.**. The five ABCDE symbols are used in the Redis source code. v represents A. w, x, and y represent B, C, and D in the above section, and z represents TEMP in the above section. In the first two steps in step 5, the conversion from the message block M (I) to the W (I) is performed. The purpose of this operation is to convert 16 words (32-bit) to 80 characters ). Basic macro for coding implementation: rol

#define rol(value, bits) (((value) << (bits)) | ((value) >> (32 - (bits))))

Round the 32-bit integer value to shift the bites bit left. The so-called loop left shift is to add the removed bits on the left to the right of the data. There is a ROL instruction in the Assembly, which is to move the loop left. Shared body variable: blockblock will be used in the next two macro functions. It is used in the function

**SHA1Transform ()**A variable declared in (because the macro actually performs the replacement operation during compilation, it can be referenced before being declared ).

typedef union { unsigned char c[64]; u_int32_t l[16]; } CHAR64LONG16;#ifdef SHA1HANDSOFF CHAR64LONG16 block[1]; /* use array to appear as a pointer */

It can be seen that although the block is an array, there is only one element. Annotations also indicate that the array type is used to make it behave like a pointer. Its size is 64 bytes (512 bits ). In the SHA1 algorithm

**"Word" (W)**Concept: A word is a 32-bit character. In other words, it is the size of 16 characters. Next I will

**Use W to represent block-l []:**
W(i) = block-l[i&15] // 16<= i <= 79

MACRO: blk0

#if BYTE_ORDER == LITTLE_ENDIAN#define blk0(i) (block->l[i] = (rol(block->l[i],24)&0xFF00FF00) \ |(rol(block->l[i],8)&0x00FF00FF))#elif BYTE_ORDER == BIG_ENDIAN#define blk0(i) block->l[i]

The blk0 function is actually to convert the byte order. If it is a small-End sequence, block-> l [I] will be converted to a large-End sequence (the first line in the code above). If it is a large-End sequence, no operation will be performed, it is directly equivalent to block-> l [I].

**In fact, when blk0 (I) is called, the value range of its parameter I is 0 ~ 15.**MACRO: blk

#define blk(i) (block->l[i&15] = rol(block->l[(i+13)&15]^block->l[(i+8)&15] \ ^block->l[(i+2)&15]^block->l[i&15],1))

**In fact, when blk (I) is called, Its Parameter I value range is 16 ~ 79**. In fact, we will use the functions it implements below. The actual calculated expression is:

Use the symbol W (I) to represent block-l [I] W (I) = S ^ 1 (W (I-3) XOR W (I-8) XOR W (I-14) xor w (I-16) // S ^ 1 () indicates the Left shift of the loop. As we can see from the above expression, we only need a storage space of 16 words to save it. So the implementation is equivalent to: W (I % 16) = S ^ 1 (W (I-3) % 16) XOR W (I-8) % 16) xor w (I-14) % 16) xor w (I-16) % 16 ))

Below is a simple proof:

Because a & 15 is equivalent to a % 16, the Operations completed by the source code blk (I) are equivalent to W (I % 16) = S ^ 1 (W (I + 13) % 16) xor w (I + 8) % 16) xor w (I + 2) % 16) xor w (I % 16 ))) set m + n = 16 (I + n) % 16 = (I + 16-m) % 16 = (I-m) % 16 + 16% 16) % 16 = (I-m) % 16 when n = 13, 8, 2, 0, m is equal to 3, 8, 14, 16, so: W (I % 16) = S ^ 1 (W (I + 13) % 16) xor w (I + 8) % 16) xor w (I + 2) % 16) xor w (I % 16) W (I % 16) = S ^ 1 (W (I-3) % 16) XOR W (I-8) % 16) xor w (I-14) % 16) xor w (I-16) % 16) equivalent

When we start to calculate the source code of sha1.c, We can find several macro functions:

/* (R0+R1), R2, R3, R4 are the different operations used in SHA1 */#define R0(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk0(i)+0x5A827999+rol(v,5);w=rol(w,30);#define R1(v,w,x,y,z,i) z+=((w&(x^y))^y)+blk(i)+0x5A827999+rol(v,5);w=rol(w,30);#define R2(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0x6ED9EBA1+rol(v,5);w=rol(w,30);#define R3(v,w,x,y,z,i) z+=(((w|x)&y)|(w&x))+blk(i)+0x8F1BBCDC+rol(v,5);w=rol(w,30);#define R4(v,w,x,y,z,i) z+=(w^x^y)+blk(i)+0xCA62C1D6+rol(v,5);w=rol(w,30);

In this Code, the three macro functions R2, R3, and R4 are t.

*(T indicates the number of rounds, a total of 80 round operations)*Calculation in the range of [20, 39], [40, 59], and [60, 79. In the corresponding RFC document: Solve TEMP and give ~ E. However, we can see that when the t range is [0, 20], the R0 and R1 macro functions are used for representation. The difference between them is that z (TEMP) is calculated in R0) blk0 (I) is added, while blk (I) is added in R1 (like R2, R3, and R4, and blk (I) is added )). The reason for this difference is that in the first two steps of the preceding five operations: when t is set to [0, 15], W (t) is directly equal to M (t ), however, after t> 15 (here the value of t is [16, 19]), Wt needs to be converted to obtain, that is

W(t) = S^1(W(t-3) XOR W(t-8) XOR W(t-14) XOR W(t-16))

We have already proved that the function implemented by blk (I) is the calculation of this formula. If you are careful, you can find that the f functions used in R0 and R1 are:

**(W & (x ^ y) ^ y**What is provided in the RFC document is (B & C) | (B & D) What is used to replace BCD with wxy?

**(W & x) | (w & y ).**So

**(W & (x ^ y) ^ y**And

**(W & x) | (w & y)**Is it equivalent? The answer is yes. The proof of logical expressions is not my strength. However, because only three variables are involved, we can use enumeration to compare the values of two logical expressions, so I wrote a small program to compare them:

# Include
Int main () {for (int w = 0; w <2; w ++) {for (int x = 0; x <2; x ++) {for (int y = 0; y <2; y ++) {printf ("------------- \ n"); // split the line, make it easier to read printf ("% d: % d \ n", w, x, y, (w & (x ^ y ); printf ("% d: % d \ n", w, x, y, (w & x) | (~ W & y ));}}}}

**After talking about this, it is time to introduce the core function of the sha1.c file-SHA1Transform ()**SHA1Transform ()

void SHA1Transform(u_int32_t state[5], const unsigned char buffer[64])

This function has a large number of lines of code. To save space, I will not list them all here. The start part declares five variables of the u_int32_t type: a, B, c, d, and e. Next we define the CHAR64LONG struct type and declare a pointer variable block of this type (actually an array implementation. Then:

memcpy(block, buffer, 64);

Copy the bytes in the buffer parameter to the block.

a = state[0]; b = state[1]; c = state[2]; d = state[3]; e = state[4];

What is actually done is the H0 ~ In the RFC document ~ The operation in which H4 is assigned to ABCDE. The next step is the 80-round operation code. Every 20 rounds is a group, which is divided into four groups.

R0(a,b,c,d,e, 0); R0(e,a,b,c,d, 1); R0(d,e,a,b,c, 2); R0(c,d,e,a,b, 3); R0(b,c,d,e,a, 4); R0(a,b,c,d,e, 5); R0(e,a,b,c,d, 6); R0(d,e,a,b,c, 7); R0(c,d,e,a,b, 8); R0(b,c,d,e,a, 9); R0(a,b,c,d,e,10); R0(e,a,b,c,d,11); R0(d,e,a,b,c,12); R0(c,d,e,a,b,13); R0(b,c,d,e,a,14); R0(a,b,c,d,e,15); R1(e,a,b,c,d,16); R1(d,e,a,b,c,17); R1(c,d,e,a,b,18); R1(b,c,d,e,a,19); ...

The first group is special. The R0 and R1 macro functions are used. The reason is described earlier. Because 0th ~ 15 rounds of operations and 16 ~ During the 79-round operation, the conversion between the Message Block M (I) and the word block W (I) is different. Next 20 ~ 39 rounds, 40 ~ 59 rounds, 60 ~ The 79 rounds are the R2, R3, and R4 used in sequence for computation. It is easy to understand and will not be displayed. Next:

state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; /* Wipe variables */ a = b = c = d = e = 0;

All done is to update the buffer H0 ~ H4 content. Then, set ~ E is cleared to 0 (this step does not make sense. In itself, it is the five variables stored in the stack. The function is released after it is completed ).

**Last state [0] ~ The message digest generated by the SHA-1 algorithm is stored in state [4.**