-- What is a pointer?
This article describes the key to the duplication in C ++-pointer type and two meaningful concepts-static and dynamic.
Array
As mentioned above, the memory is accessed through variables in C ++, but according to the previous instructions, C ++ can only use variables to operate the memory, that is to say, to operate a memory, you must first bind the first address of the memory with a variable name, which is very bad. For example, if there are 100 pieces of memory to record the wages of 100 workers, we need to increase the wages of each worker by 5% now to know the wages of each worker after they increase, define a variable float A1; use it to record the wages of 1st workers, and then execute the statement A1 + = A1 * 0.05f;, then the increased salary is in A1. As there are 100 workers, There must be 100 variables, respectively recording 100 salaries. Therefore, the preceding value assignment statement requires 100 records, and each variable name is different.
You need to manually repeat the variable definition statement float A1; 100 times (change the variable name every time), unnecessary work. Therefore, if you want to apply for 100*4 = 400 bytes of continuous memory from the operating system at one time, you need to change the salary for the worker I, you only need to add 4 * I bytes from the first address (because float occupies 4 bytes ).
To provide this function, C ++ proposes an array type. An array is a group of numbers. Each number is called an element of the corresponding array, and the size of each element must be equal (because the elements in the array are identified by a fixed offset ), that is, an array represents a group of numbers of the same type, which must be stored continuously in the memory. When defining a variable, to indicate that a variable is of the array type, add square brackets to the variable name, specify the number of array elements to be applied in square brackets, and end with a semicolon. Therefore, the above 100 wage variables can be defined as array variables as follows:
Float a [100];
The above defines a variable A, allocating 100*4 = 400 bytes of continuous memory (because a float element occupies 4 bytes ), then, bind the first address to variable name. The type of variable A is called an array with 100 float elements. The following will explain the content in the memory corresponding to variable A (the type is how to explain the content in the memory): the memory identified by address a is the first address of a continuous memory, the size of this continuous memory can accommodate the next 100 float numbers.
Therefore, we can regard the previous float B as a float array variable B that defines an element. In order to access an element in the array, a number is placed in the square brackets after the variable name. The number must be a non-floating point number, that is, a number represented by a Binary source code or a complement code. For example, a [5 + 3] + = 32; that is, the value of the 5th + 3 element of array variable A is increased by 32. Also:
Long c = 23; float B = A [(C-3)/5] + 10, D = A [C-23];
The value of B above adds 10 to the value of the 4th elements of array variable A, and the value of D is the value of the 0th elements of array variable. That is to say, the elements in the C ++ array are numbered with 0 as the basic sequence number. That is, a [0] actually represents the value of the first element in array variable, 0 indicates that the address obtained after adding 0*4 to the address corresponding to a is the address of the first element.
It should be noted that it cannot be written like this: long a [0];, the array defining 0 elements is meaningless, and the compiler will report an error, however, it can be written like this after the structure or class or union meets certain rules. It is a technology proposed in the C language era to Realize Variable Length of the structure type, it will be stated in C ++ from (9.
It should also be noted that variables cannot be written in square brackets when defining arrays, that is, long B = 10; float a [B]; is incorrect, because when this code is compiled, you cannot know the value of variable B, and thus cannot allocate memory. But I already wrote B = 10. Why do I still know the value of B? That's because you cannot know the address corresponding to B. Because the compiler only binds B to an offset during compilation, which is not the real address, that is, B may correspond to base-54, the base is the end address of the large block of memory dynamically applied to the operating system at the beginning of the program execution, because it may change, therefore, the actual address of B cannot be known (the actual virtual address can be obtained due to the use of the virtual address space on the Windows platform, but it is still not the actual address, therefore, the value of a variable cannot be known during compilation ).
However, the compiler can still calculate the value of base-54 as 10 based on the previous long B = 10? The point is that the compiler only knows to generate an instruction when it sees long B = 10. This instruction will put 10 into the memory of base-54, others will not be asked (and there is no need to ask), so even if long B = 10 is written, the compiler cannot know the value of B.
Array is a type, which is not accurate. Actually, it should be -- array is a type modifier, which defines a type modifier. The Type modifier will be detailed later.
String
As mentioned in "C ++ from scratch (2)", to check the ASCII code corresponding to a character, you must add single quotation marks on both sides of the character, for example, 'A' is equivalent to 65. To indicate multiple characters, use double quotation marks, for example, "ABC ". To record characters, you need to record the corresponding ASCII code, and the ASCII code value is within-128 to 127. Therefore, you can use a char variable to record an ascii code, in order to record "ABC", it is normal to use an array of Char to record. As follows:
Char A = 'a'; char B [10]; B [0] = 'a'; B [1] = 'B'; B [2] = 'C ';
The value of A is 65, B [0] is 65, B [1] is 66, and B [2] is 67. Because B is an array of 10 elements, it records a string of 3 characters, but when B's address is obtained, how can we know that the first few elements are valid characters? If no value is assigned for B [4] above, how can we know that B [4] should not be interpreted as a character? You can check the values of each char element from the first element until the value of a char element is 0 (because 0 does not have a corresponding character in the ASCII code table ), all the elements above are considered to be characters that should be interpreted by the ASCII code table. Therefore, B [3] = 0; should also be used to indicate the end of the string.
The above rules are widely used. All the operations related to strings provided in the C Runtime Library are based on the above rules to interpret strings (about the C Runtime library, see C ++ from scratch (19th). However, it seems cumbersome to record a string. You need to write several value assignment statements for the length of the string, and assign the value of the element at the end to 0. If you forget it, the problem is serious. In this case, C ++ enforces the following shorthand method:
Char B [10] = "ABC ";
The above is equivalent to all the work done previously. The "ABC" is an address-type number (it is an initialization expression, in C ++ from scratch (9) it is of the char [4] type, that is, a char array with four elements. An additional end element is used to put 0 to identify the end of the string. It should be noted that because B is Char [10], and "ABC" returns char [4], the type does not match, and implicit type conversion is required, but no conversion is actually performed, instead, it performs a series of value assignment operations (just like the previous work), which are hard-coded by C ++, called initialization and only valid for Array initialization, the following is an error:
Char B [10]; B = "ABC ";
Even char B [4]; B = "ABC"; is still incorrect, because the B type is an array and multiple elements are represented, the assignment of multiple elements is undefined, that is, float d [4]; float dd [4] = D; is also incorrect, it is not defined whether the elements in D are sequentially placed into the corresponding elements in DD or in reverse order. Therefore, an array type variable cannot be assigned a value.
Due to the increasing number of characters (originally only English letters are used, and now Chinese and Japanese characters are required), char is used to represent characters, A maximum of 255 characters can be entered (0 indicates the end of a string). Therefore, a multi-byte string (multibyte) occurs ), the text files recorded in this representation are called in MBCS format, while the strings originally represented using the char type are called single-byte strings (singlebyte ), text files recorded in this representation are called in ANSI format.
Because the char type can represent a negative number, when extracting characters from a string, if the value of the element obtained is negative, combine this element with the next char element to form a short-type number, and then follow the Unicode encoding rules (A encoding rule, equivalent to the previously mentioned ASCII code table) to explain the number of this short type to get the corresponding characters.
The above "ABC" returns a string in Multi-byte format. Because there are no Chinese characters or special characters, it seems to be represented in a single byte format. However, if: char B [10] = "AB C";, B [2] is-70, B [5] is 0, instead of imagining that B [4] is 0 because of the four characters, because the "Han" character occupies two bytes.
The disadvantage of the above multi-byte format is that the length of each character is not fixed. If you want to take the value of 3rd characters in a string, the value of each element must be checked from the beginning, instead of a fixed length multiplied by 3, reducing the processing speed of the string, when displaying strings, it is more efficient to check whether the value of the current character is less than zero, so a third character format is introduced: wide byte string (widechar ), text files recorded in this representation are called in unicode format. The difference from multi-byte is that whether the character can be expressed in ASCII or not, it is expressed by a short number, that is, the length of each character is fixed to 2 bytes, c ++ supports this.
Short B [10] = l "AB Han C ";
Adding "L" before double quotation marks (uppercase and lowercase letters are required) indicates that the characters in the double quotation marks must be encoded in unicode format, therefore, the above array B uses Unicode to record strings. Likewise, there are: Short c = L 'a'; Where C is 65.
It doesn't matter if you don't understand it clearly. In the following examples, we will gradually learn how to use strings.
Static and Dynamic
The above still does not solve the fundamental problem-C ++ still can only access the memory through the ing element of the variable. before accessing a block of memory, you must first establish a corresponding ing, that is, define the variable. What are the disadvantages? Let's first understand what static and dynamic means.
The cashier issues the invoice manually. Each time the invoice is issued, the printed invoice is used to issue the invoice to the customer. Only four grids are printed on the invoice to record the product name, when a customer buys more than four types of products at a time, two or more invoices must be issued. Here, the number of grids on the invoice link is called static, that is, the invoice link is printed with four item names at any time when any customer buys the item.
The cashier at the supermarket issues the invoice, enters the product name and quantity into the computer, and immediately prints an invoice to the guests, the length of the printed invoice may be different (some customers buy more but some do not). The invoice length is dynamic, that is, the length of the invoice is bought by different customers at different times, the invoice length may be different.
No matter how many times the program executes, it always applies for a fixed size of memory when applying for memory, it is said that the memory is statically allocated. When we define variables, the memory allocated by the compiler from the stack is static. When you execute a program and may apply for memory of different sizes based on user input, the memory is dynamically allocated, and the allocation from the heap is dynamic.
Obviously, the dynamics are more efficient than static ones (the utilization rate of invoice length is higher), but the requirements are higher-computers and printers are needed, and cashiers are required to be of high quality (be able to operate computers ), static Requirements are relatively low. You only need the printed invoice, and only the cashier can write.
Similarly, the static allocated memory usage is not high or the usage is not flexible enough, but the code is easy to write and runs fast. The dynamic allocated memory usage is high, but it is more complicated to write code, you need to manage the memory (allocate and release) by yourself, and the operation speed is slow due to such management intervention and the code length increases.
Static and Dynamic are not only of this significance, but there are many improvements, such as hard coding and soft coding, tight coupling and loose coupling, which are both static and dynamic.
Address
As mentioned above, "an address is a number that uniquely identifies a specific memory unit", and then says, "The address is the same as a long integer or a single-precision floating point number, it is a type of number. "is the address both a number and a number? Isn't it a conflict? As follows:
A floating point number is a number-decimal number-and a number type. That is to say, the former is the actual use of the address, while the latter is because the computer only recognizes the status, but the type must be used to explain how to handle the status, the address type is used to tell the compiler to process the corresponding status with the memory unit identifier.
Pointer
We have learned that the dynamic and static memory allocation are different. Now we want to record the order data entered by the user. The number of orders entered by the user at a time is not fixed, so we choose to allocate memory on the stack. Assume that 1 MB of memory needs to be applied for temporary recording of user input data based on user input. In order to operate this 1 MB of continuous memory, the first address needs to be recorded, however, because the memory is dynamically allocated, that is, it is not allocated by the compiler (but dynamically allocated by the program code), it is not possible to create a variable to map this first address, therefore, you must record the first address on your own.
Because each address is a 4-byte binary number (for 32-bit operating systems), a 4-byte memory is allocated statically to record this first address. Before checking, you can store the data at the first address in variable A of the unsigned long type, and then read the 4-byte long memory content at the 4th bytes in the memory of this 1 m, you can obtain the corresponding address by adding 4 to the value of A, and then retrieve the subsequent four bytes of memory. But how do I write code that retrieves the memory content corresponding to an address? As mentioned above, as long as the number of the address type is returned, it will automatically retrieve the corresponding content because it is of the address type. But if you write a + 4 directly, because a is unsigned long, then a + 4 returns the unsigned long type, not the address type. What should we do?
C ++ puts forward an operator called "*", called the content operator (which is not accurate in practice ). It is the same as the multiplication operator, but it only connects numbers to the right, that is, * (a + 4 ). This expression returns an unsigned long number after the value of A is 4 and converts it to an address-type number. But there is a question: How does one explain the memory content expressed in A + 4? Is it one or two bytes? In what format is the extracted content interpreted? If you compile the assembly code yourself, this is not a problem, but now the compiler writes the assembly code on our behalf, therefore, you must tell the compiler how to explain the memory content of the given address.
C ++ puts forward a pointer to this. Like the above array, It is a type modifier. When defining variables, add "*" before the variable name to indicate that the corresponding variable is of the pointer type (just as "[]" after the variable name indicates that the corresponding variable is of the array type ), its size is fixed to 4 bytes. For example:
Unsigned long * pA;
The above PA is a pointer variable. Its size is 4 bytes because it is a 32-bit operating system code. When * pA; is used, the value of PA is calculated first, it is to return four bytes of content starting from the memory of the address corresponding to Pa, and then calculate "*" to convert the obtained content to an unsigned long address number, calculate the number of this address type and return the number of unsigned long, which is interpreted in the original code format, finally, the unsigned long number is calculated and the binary number interpreted in the original code format is returned.
That is to say, when the type of an address is pointer, it indicates the content in the memory corresponding to this address, which should be interpreted as an address by the compiler.
Because the variable is the address ing, each variable has a corresponding address, so c ++ provides an operator to get the address of a variable -- "&", it is called the address fetch operator. It is the same as the "number and" operator, but it is always connected to a number (rather than a number on both sides) on the right side ).
The right side of "&" can only be an address-type number, and its calculation (evaluate) it is to convert the numbers of the address type on the right to the pointer type and return a number of the pointer type, which is exactly the opposite of the "*" operator.
Under normal circumstances, the above should make you dizzy, and I will explain it below.
Unsigned long a = 10, B, * pA; Pa = & A; B = * pA; (* pA) ++;
The first sentence above defines a pointer type variable PA through "* pa", that is, the compiler helps us allocate a 4-byte memory on the stack, and bind the first address to the PA (that is, form a ing ). Then "& A" is a variable, equivalent to the address, so "& A" is calculated and returns a number of the type unsigned long * (that is, the pointer of unsigned long.
It should be noted that although the number returned above is a pointer type, its value is the same as the address corresponding to a, but why not directly say it is the number of the unsigned long address, what about the addition of a pointer type? Because pointer-type numbers directly return their binary values, address-type numbers return the memory content corresponding to their binary values. Therefore, if the address of the preceding variable A is 2000, A; returns 10, and & A; returns 2000.
Let's see what the return value of the pointer type is. When pa is written, the address corresponding to PA is returned (according to the above assumption, it should be 2008), and the value of this address is calculated, returns the number 2000 (since Pa = & A;), whose type is unsigned long *, and then calculates the number of this unsigned long, return the binary number corresponding to 2000 (note the content of the red letter above ).
Let's take a look at the content operator "*". The right-side numeric type is pointer type or array type, its calculation is to directly convert the number of this pointer type to the number of the address type (because the number of the pointer type is the same as the number of the address type, only the calculation rules are different ). Therefore:
B = * pA;
Returns the address corresponding to Pa, calculates the value of this address, returns the number 2000 of the unsigned long * type, and then "* pa" returns the number 2000 of the unsigned Long Address type, then, calculate the value of the number of this address type, return 10, and then simply assign values. Similarly, for ++ (* pA) (because "*" has a lower priority than the prefix ++, add "()"), calculate "* pa" and return the number 2000 of the unsigned Long Address type, then calculate the prefix ++, and finally return the number 2000 of the unsigned Long Address type.
If you still cannot understand the difference between the address type and the pointer type, we hope the following sentence can be useful: the number of the address type is used by the compiler during compilation, pointer-type numbers are used for code during runtime. If you still don't understand it, I hope it will be helpful after reading the type modifier section.
Allocate memory on the heap
As mentioned above, the so-called heap allocation is to apply for memory from the operating system during runtime, but to apply for memory from the operating system, different operating systems provide different interfaces, there are different ways to apply for memory, and this is mainly manifested by different function prototypes to be called (for function prototypes, refer to C ++ from scratch (7). C ++ is a language and should not be related to the operating system. Therefore, C ++ provides a unified memory application interface, that is, the new operator. As follows:
Unsigned long * pA = new unsigned long; * pA = 10;
Unsigned long * pb = new unsigned long [* PA];
The above two memories are applied. The memory (that is, the memory corresponding to the PA value) referred to by PA is 4 bytes, the memory size of Pb is 4*10 = 40 bytes. It should be noted that because new is an operator, its structure is new <type Name> [<integer number>]. It returns a number of pointer types. The <type Name> parameter specifies the pointer type, and the square brackets are used to specify the number of elements, the same as when defining an array, but it does not return the array type, but the pointer type.
It should be noted that the new operator above applies for memory from the operating system, instead of allocating memory, that is, it may fail. When memory is insufficient or for other reasons, new may return pointer-type numbers with a value of 0 to indicate memory allocation failure. Check whether the memory is allocated successfully.
Unsigned long * pA = new unsigned long [10000];
If (! Pa)
// Memory failure! Do the corresponding work
The above if statement is a judgment statement, which will be introduced in the next article. If Pa is 0, then! The inverse of the PA logic is non-zero, so the logic is true, and then the corresponding work is executed.
As long as the memory is allocated, the memory needs to be released. This is not necessary, but as a programmer, it is a good habit (resources are limited ). To release the memory, use the delete operator as follows:
Delete Pa; Delete [] Pb;
Note that the delete operator does not return any number, but it is still called an operator. It seems that it should be called a statement more appropriate, but to meet its needs, it is still an operator, c ++ provides a special numeric type-void. It indicates none, that is, nothing. This is detailed in C ++ from scratch (7. Therefore, the delete operation returns a number, but the return number type is void.
Note that the release of PA is different from that of Pb, because PA is returned by new unsigned long according to the initial writing, while Pb is returned by new unsigned long [* PA. Therefore, when releasing Pb, you need to add "[]" to the end of the delete statement to indicate that an array is released. However, in VC, both the former and the latter can release the memory correctly, there is no need for [] intervention to help the compiler correctly release the memory, because the VC of the program developed on Windows is allocated by means of the Windows operating system, when the Windows operating system releases the memory, it does not need to know the length of the memory block to be released because it has been recorded internally (this statement is not accurate, actually, the C Runtime Library does these tasks, but it depends on the operating system, that is, there are actually two layers of memory management packaging, which is not shown here ).
Type-specifier)
A type modifier is a type modifier. It is used to further specify how to operate the memory corresponding to a variable when defining a variable. Some common operation methods, that is, this operation method applies to each type, so they are separated separately to facilitate code writing, just like fruit. Eat the apple flesh, eat the Pear flesh, do not eat the apple skin, do not eat the pear skin. Here, Apple and pear are both fruit types, which are equivalent to the type, and "XXX flesh" and "XXX skin" are used to modify the type of Apple or pear, to generate a new type-Apple's flesh, pear skin, which is equivalent to a type modifier.
The array and pointer mentioned in this article are both type modifiers. The "&" referenced variable mentioned earlier is also a type modifier. In C ++, it starts from scratch (7) several types of modifiers will be introduced in this section. The two important concepts are also described and defined together, And Decl-specifier is proposed ).
The Type modifier only works when defining variables, such as the previous unsigned long A, B [10], * pA = & A, & RA = ;. The above three types of modifiers are used: "[]", "*", and "&". The above unsigned long is now called the original type, indicating that the previous type is not modified by the Type modifier. The following describes the functions of the three types of modifiers.
Array modifier "[]"-- It is always followed by the variable name, and an integer c is placed in the brackets to specify the number of array elements, indicating that the current type is the original type of C elements for continuous storage, the length is the length of the original type multiplied by C. Therefore, long a [10]; indicates that the type is continuously stored with 10 long elements. The length is 10*4 = 40 bytes. Long a [10] [4]; indicates that A is continuously stored with four long [10] elements. The length is 4*40 = 160 bytes.
I believe it has been found that because multiple "[]" can be connected, there is a relationship between the computing order. Why is it not that 10 long [4] elements are stored consecutively but are reversed? The Type modifier is calculated from left to right. Therefore, short * A [10]; indicates continuous storage of 10 elements of the short * type, with a length of 10*4 = 40 bytes.
Pointer modifier "*"-- It is always connected to the variable name, indicating that the current type is a pointer of the original type. Therefore:
Short A = 10, * pA = & A, ** PPA = & PA;
Note that the PPA here is called a multi-level pointer, that is, the pointer of its type short, that is, short **. Short ** PPA = & PA; calculates the address value of PA. A number of short * type is obtained, then, the "&" operator converts the number to the pointer type number of short *, and finally assigns the value to the variable PPA.
If the above is dizzy and you don't have to think about it, you just need to pay attention to the type Matching. Below is a brief description: If the address of a is 2000, the address of PA is 2002, the PPA address is 2006.
For Pa = & ;. Calculate the value of "& a" first. Because a is equivalent to an address, "&" is used to convert the number of A's address to the long * type and return the value, then assign the value to Pa, and the value of PA is 2000.
For PPA = & PA ;. Calculate the value of "& PA" first. Because PA is equivalent to the address, "&" plays a role, directly convert the number of PA addresses to the long ** type (because PA is already of the long * type) and return the result. then assign the value to the PPA, then the PPA value is 2002.
Reference modifier "&"-- It is always connected to the variable name, indicating that the variable does not need to be allocated memory to bind to it, but it cannot be included in the description of the type, which is described below. Because the corresponding variables do not need to be allocated memory to generate ing, they are not similar to the above two types of modifiers and can be repeatedly written because it is meaningless. And it must be on the right side of the "*" modifier, that is, long ** & A = PPA; but not long * & * A; or long & **; because long * & * indicates the reference pointer of the long pointer in the order calculated by the modifier from left to right. The reference only tells the compiler not to allocate memory to the variable on the stack, actually, it is irrelevant to the type, so the referenced pointer is meaningless. Long & ** indicates the pointer of the referenced long pointer. Similarly, long & A [40]; is incorrect because it indicates that a memory of 40 elements that can be continuously stored as a reference of Long is allocated, references are just a means to inform the compiler of some type-independent information and cannot be instantiated as a type. (For more information about instantiation, see C ++ from scratch (10) ).
It should be noted that the reference is not a type (but for convenience, long is often referred to as a type), and long ** & rppa = & PA; will be wrong, the preceding statement does not allocate memory to the variable rppa. The address after "=" is directly used as the corresponding address, and & PA does not return an address-type number, it is a pointer type, so the compiler will report a Type Mismatch Error. However, even long ** & rppa = PA; also fails, because long * and long ** are different, but because of the type matching, the following is acceptable (rpa2 is confusing and will be explained in "C ++ from scratch (7 ):
Long A = 10, * pA = & A, ** PPA = & PA, * & RPA1 = * PPA, * & rpa2 = * (PPA + 1 );
The Type modifier and the original type are combined to form a new type, such as long * & and short * [34, note that the <type Name> in the new operator requires that the type name be written, you can also write the preceding long *, that is:
Long ** PPA = new long * [45];
That is, a 4*45 = 180 bytes of continuous memory space is dynamically allocated, and the first address is returned to the PPA. You can also:
Long *** PPPA = new long *** [2];
Long * (* pA) [10] = new long * [20] [10];
It may seem strange that the type of PA is long * (*) [10], which indicates a pointer to an array with 10 long * elements, the allocated memory length is (4*10) * 20 = 800 bytes. Because the array modifier "[]" can only be placed behind the variable name, And the type modifier is always calculated from left to right, the pointer to an array of 10 long elements cannot be used, because "*" on the left side is always better than "[]" on the right side. Therefore, C ++ proposed the above syntax to enclose the variable name in parentheses to indicate the final Modification of the type in it. Therefore: Long * (a) [10]; it is equivalent to long * A [10]; and long * (& aa) [10] = A; is also true. Otherwise, according to the preceding rules, if long * & aa [10] = A; is used, an error is returned (the cause is described earlier ). And long * (* pA) [10] = & A; can normally represent the type we need. Therefore, you can also long * (* & RPA) [10] = PA; and long * (** PPA) [10] = & PA ;.
Due to space limitations, some pointer discussions will be put in "C ++ from scratch (7)". If this article is dizzy, in the following example, we will try to describe the usage and usage of the pointer as much as possible, hoping to help.