What determines the int type range in C?
In section 2.2 of The C Programming Language textbook K & R, The int type is described as follows:
An integer, typically reflecting the natural size of integers on the host machine
It indicates the natural size of the machine's Integer type. However,
What does natural size mean?
Later, when talking about the relationship between short, int, and long, it is said that the compiler can choose the proper size based on the machine, but the short and int must be at least 16 bits, long must be at least 32 characters long.
The problem here is:
What determines the type size of the compiler?
As mentioned later
All of them are available. I will check them in ubuntu./usr/include/limits.h
, Which does mention
/* Minimum and maximum values a `signed int' can hold. */# define INT_MIN (-INT_MAX - 1)# define INT_MAX 2147483647
However, this is also a definition. I still haven't said why. What I want to know now is
Why?
As A result, I think of "deep understanding of Computer Systems", which has been swept away by years. Its English name is Computer Systems: A Programmer's Perspective!
As mentioned at the beginning of section 2.1, bytes are the smallest addressable unit, and most computers use eight-bit blocks. Ah, eight digits. What is that? Well, a bit is a storage structure. A bit can only store 0 or 1.
As mentioned in section 2.1.2
Each computer has a word size that specifies the nominal size of the integer and pointer data.
What is a pointer? the pointer is the address in the memory. If the word length is w bits, the number of addresses is 2 ^ w. How much memory does an address represent?
As mentioned above, bytes are the smallest addressable unit, so an address represents one byte. When the word length is w, the address number is 2 ^ w, with a total of 2 ^ w bytes of memory space.
If the computer is 32 characters long, that is, the legendary 32-bit computer, then the memory space can be 2 ^ 32 bytes. This is the legendary 4G!
Now we use 32 characters in length, that is, 32 characters in integer size, and the memory space is 4 GB. I am thinking:
Is it determined from the beginning that the memory space is 4 GB, so the 32-bit character length rule is set. Therefore, the natural size of the machine is 32 bits, does the compiler make int type 32 bits in C language?
But I have no evidence!
Try to reason without evidence.
We know that 32-bit machines are extended by 16-bit machines. Why do we need to expand the machine font? One of the reasons for this problem has been explained just now. If it is not extended, the maximum addressing space of the machine will be relatively small. Even if I give you a large memory, you will not be able to use it. This may be the reason why we switched from 32-bit to 64-bit today.
So now we understand that, because we want a larger memory address space, we can increase the font length from 16 bits to 32 bits, and the font length represents the size of pointers and integers, so the final Integer type is 32 bits.
However, there are still many problems.
The word length is just an abstract concept, so that we can easily describe some of the machine's attributes.
Pointer first. For machines, where is the pointer concept? pointers are in C language. After compilation, there is no pointer concept. However, the pointer represents the memory address, and the memory address is related to what part of the machine is?
An integer. To the Assembly layer, does the Integer Concept still exist? The concept of integers should be related to the arithmetic commands in the Assembly, So what components of the arithmetic commands are related to the machine?
Finally, the pointer indicates the memory address. We have a larger memory, so the memory address needs a longer bit to indicate it is understandable. However, what is your integer? My memory address is 32 bits. Can't I use an integer of 16 bits?
In fact, the general problem is
The word length is related to the machine's components.
To explain this problem, we found that we had come to Chapter 4 "processor architecture" in "understanding computer systems ".
This chapter introduces all aspects of the processor architecture with a processor called Y86. First, we introduced the Register, which is a storage component. What is the storage? What is the storage information used? Used for computing. We use a simple addition computing in C language. At the processor layer, we need to use registers to help us calculate. Let's compile a simple C language into a compilation.
/* test_add.c */#include
int main(void) { int a = 1; int b = 2; int c = a + b; return 0;}
Compile with GCC
gcc -S test_add.c -o test_add.s
Check the main code.
movl $1, -12(%ebp) movl $2, -8(%ebp) movl -8(%ebp), %eax movl -12(%ebp), %edx addl %edx, %eax movl %eax, -4(%ebp)
The ebp eax edx is a register.
We can see that the data is first put into the stack, then from the stack into the register, then the addition operation, and finally the result is put back into the stack from the register.
The following figure shows the abstract view of a processor in the book:
What is a stack? Stack is an abstract concept. The stack here refers to memory.
As mentioned in the book, in 32-bit computers, the size of these registers is 32-bit. Visible,
The character length is the same as the register size.
In addition, we can see that when computing is required, the movl command puts the data from the memory into the register. Because the memory and register are different components, we need a component to transmit the data, this component is called a data bus.
The size of the Register is the same as the word length, so the data that can be transferred by this data bus should be the same as the word length, so:
The character length is the same as the data bus width.
In addition, imagine that if you want to retrieve data from the memory, you always need to tell the memory which address the data is retrieved. So, the "Address" data is also transferred from somewhere to the memory. As long as it is passed, it requires support from components. This component is called the address bus, and the address bus transmission address. The address size is the same as the word length. Then we can know:
The character length is the same as the address bus width.
Now, our analysis is similar. To sum up:
From the int type in C language, we get the word length concept, and find the attributes of some machine components related to it from the word length concept. Up to now, there are:
- Int type
- Pointer (memory address)
- Register
- Data Bus
- Address Bus
In the Word (computer_architecture) Entry of Wikipedia, we can see some attribute changes related to the length in a series of Computer Architectures since 1837.
Let's try again, why do we need to set these components to the same length? I think it may be because the computer is too complicated and the components need to work closely together to accomplish complex tasks. In particular, data needs to be transmitted between components. If the sizes of these components are not uniform, the complexity of the machine will be increased. As a result, we try to unify the sizes of these components as much as possible, then we propose the concept of word length to describe the important nature of computers.
Here, let's take another look at the concept that the word length is related to so many components that it should be determined not only to the memory size. For example, the Word Length indicates the size of the Register, which is directly related to the operation of the machine. When the word length increases, the values that can participate in the calculation increase accordingly. In the past, we calculated two large numbers and the time, several registers may need to be used. Now we have grown up and the registers are large. We only need two registers.
It can be seen that the determination of word length is a comprehensive consideration, representing a comprehensive improvement in computer computing and storage capabilities.
The article is over and the thinking will never stop.