String operations in Delphi are very simple, but the background is quite complicated. The traditional string operation method of Pascal is different from that of Windows. Windows adopts the string operation method of C language. The 32-bit Delphi added the long string type, which is a string type that is indeed saved by Delphi.
String type
In Borland's Turbo Pascal and 16-bit Delphi, the traditional string type is a character sequence, and the sequence header is a Length byte, indicating the length of the current string. Because only one byte is used to represent the length of a string, the string cannot exceed 255 characters. This length limit is inconvenient for string operations because each string must be set to a fixed length (the maximum value is 255). Of course, you can declare a shorter string to save storage space.
The string type is similar to the array type. In fact, a string is almost an array of character types, because you can access the characters in the string with the [] symbol. This fact fully demonstrates the above point of view.
To overcome the limitations of traditional Pascal strings, 32-bit Delphi has added support for long strings. There are three types of strings:
- The short string type is the traditional Pascal string type described above. This type of string can contain a maximum of 255 characters, which is the same as the string in the 16-bit Delphi. Each character in a short string belongs to the ANSIChar type (standard character type ).
- The ANSIString long string type is the new variable length string type. The memory of such strings is dynamically allocated and referenced for counting, and the pre-update copy-on-write technology is used. There is no limit on the length of such strings (up to 2 billion characters can be stored !), The character type is also ANSIChar.
- The WideString long string type is similar to the ANSIString type, but it is based on the WideChar character type. The WideChar character is a double-byte Unicode character.
Use a long string
If you only use String to define a String, the String may be a short String or an ANSI long String, depending on the value of the $ H compilation command, $ H + (indeed) long string (ANSIString type ). The long string is the string used by the control in the Delphi library.
Based on the reference counting mechanism, Delphi long string traces the string variables of the same string in memory by reference counting. When the string is no longer used, that is, when the reference count is zero, memory is released.
If you want to increase the length of a string, and there is no idle memory near the string, there is no room for expansion in the same storage unit string, in this case, the string must be completely copied to another storage unit. In this case, the Delphi runtime support program will re-allocate the memory for the string in a completely transparent manner. To effectively allocate the required storage space, you can use the SetLength process to set the maximum length of the string:
SetLength (String1, 200);
The SetLength process only completes a memory request and does not actually allocate memory. It only reserves the memory required in the future and does not actually use this memory. This technology originated from the Windows operating system and is now used by Delphi to dynamically allocate memory. For example, when you request a large array, the system reserves the Array Memory, but does not allocate the memory to the array.
Generally, you do not need to set the length of a string. However, when you need to pass a long string as a parameter to an API function (after type conversion), you must use SetLength to reserve memory space for the string, this will be explained later.
Take a look at the strings in the memory
To help you better understand the memory management details of strings, I wrote a simple example StrRef. In the program, I declare two full-course strings: Str1 and Str2. When the first button is pressed, the program assigns a String constant to the first variable, then, assign the first variable to the second one:
Str1 := 'Hello';Str2 := Str1;
In addition to string operations, the program also uses the following StringStatus function to display the internal status of the string in a list box:
function StringStatus (const Str: string): string;begin Result := 'Address: ' + IntToStr (Integer (Str)) + ', Length: ' + IntToStr (Length (Str)) + ', References: ' + IntToStr (PInteger (Integer (Str) - 8)^) + ', Value: ' + Str;end;
In the StringStatus function, it is critical to pass strings with constant parameters. Passing through the copy method (value parameter) will cause side effects, because an additional reference to the string will be generated during function execution; in contrast, by reference (var) or constant (const) parameter passing does not result in this situation. In this example, you do not want the string to be modified. Therefore, use a constant parameter.
To obtain the string memory address (the actual content that helps identify the string also helps to check whether two different string variables reference the same memory zone), I forcibly convert the string type to an integer type through type ing. The string is actually a reference, that is, a pointer: the string variable stores the actual memory address of the string.
To extract the reference count information, I used a little-known fact that the string length and reference count information are actually stored in the string, before the actual content and the memory location indicated by the string variable, the negative offset is-4 for the string Length (this value can be easily obtained using the Length function), and-8 for reference records.
However, it must be noted that the above internal information about the offset may change in the future Delphi version, and it is difficult to ensure that it remains unchanged in the future without writing the official Delphi document.
By running this example, you will see two strings with the same content, the same memory location, 2 reference records, as shown in the upper part of the list box in 7.1. Now, if you change the value of a string, the memory address of the updated string will change. This is the result of the copy-on-write technology.
The OnClick Event code of the second button (Change) is as follows. Result 7.1 shows the second part of the list box:
procedure TFormStrRef.BtnChangeClick(Sender: TObject);begin Str1 [2] := 'a'; ListBox1.Items.Add ('Str1 [2] := ''a'''); ListBox1.Items.Add ('Str1 - ' + StringStatus (Str1)); ListBox1.Items.Add ('Str2 - ' + StringStatus (Str2));end;
Note: BtnChangeClick can only be executed after BtnAssignClick is executed. Therefore, the second button cannot be used after the program starts (the Enabled attribute of the button is set to False); the second button is activated after the first method ends. You can freely extend this example and use the StringStatus function to explore the features of long strings in other cases.
Delphi string and Windows PChar string
The long string is a zero-terminated string, which means that the long string is completely compatible with the C language zero-terminated string used by Windows, which facilitates the use of long strings. A zero-terminated string is a character sequence that ends with a zero-byte (or null) character. Zero-terminator string can be represented by a character array with a subscript starting from zero in Delphi. C language defines strings with this array type. Therefore, the zero-terminator character array is used in Windows API functions (based on C language) is common. Since the Pascal long string is fully compatible with the C-language zero-Stop string, you can directly map the long string to the PChar type when you need to pass the string to a Windows API function.
The following example copies the title of a form to the PChar string (using the API function GetWindowText), and then copies it to the Caption attribute of the button. The Code is as follows:
procedure TForm1.Button1Click (Sender: TObject);var S1: String;begin SetLength (S1, 100); GetWindowText (Handle, PChar (S1), Length (S1)); Button1.Caption := S1;end;
You can find this code in LongStr. Note: The SetLength function is used in the code to allocate memory for strings. If the memory allocation fails, the program will crash; if you directly use the PChar type to pass a value (instead of accepting a value as in the above Code), the code will be very simple, because you do not need to define a temporary string or initialize the string. The following code passes the Caption attribute of a Label control as a parameter to the API function. You only need to map the property value to the PChar type:
SetWindowText (Handle, PChar (Label1.Caption));
To map WideString to a Windows compatible type, you must use PWideChar instead of PChar for conversion. WideString is commonly used in OLE and COM programs.
I have demonstrated the advantages of long strings. Now I will talk about its disadvantages. When you convert a long string to the PChar type, some problems may occur. The problem is that after conversion, you will be responsible for the string and its content. Delphi will not care about it anymore. Now, modify the above Button1Click code:
procedure TForm1.Button2Click(Sender: TObject);var S1: String;begin SetLength (S1, 100); GetWindowText (Handle, PChar (S1), Length (S1)); S1 := S1 + ' is the title'; // this won't work Button1.Caption := S1;end;
The program is compiled, but the execution result is surprising because the button title is not changed and the constant string added is not added to the button title. The cause is that when writing a string in Windows (in the call to the GetWindowText API), Windows does not correctly set the length of the Pascal long string. Delphi can still output this string and use the zero terminator to determine when the string ends. However, if you add more characters after the zero Terminator, these characters will be ignored.
How can this problem be solved? The solution is to tell the system to convert the string returned by the GetWindowText API function into a Pascal string. However, if you use the following code:
S1 := String (S1);
The Delphi system ignores this because it is useless to convert a type to its own type. To obtain the correct Pascal long string, you need to remap the string to a PChar string, and then let Delphi convert it back to the string:
S1 := String (PChar (S1));
In fact, you can skip string Conversion (S1: = PChar (S1); because in Delphi, Pchar is automatically converted to string. The final code is as follows:
procedure TForm1.Button3Click(Sender: TObject);var S1: String;begin SetLength (S1, 100); GetWindowText (Handle, PChar (S1), Length (S1)); S1 := String (PChar (S1)); S1 := S1 + ' is the title'; Button3.Caption := S1;end;
Another way is to use the length of the PChar string to reset the length of the Delphi string, which can be written as follows:
SetLength (S1, StrLen (PChar (S1)));
In the LongStr example, you can see the results of the three methods, which are executed by three buttons respectively. If you only want to access the Form title, you only need to use the Caption attribute of the form object. You do not need to write this confusing code. This code is only used to illustrate the problem of String Conversion. When you call a Windows API function, you have to consider this complexity.
Format a string
Using the plus sign (+) operator and Conversion Function (such as IntToStr), you can really combine existing values into strings, but there is another way to format numbers, currency values, and other strings, this is a powerful Format function and its family.
The Format function parameters include a basic text string, some placeholders (usually marked by the % symbol), and a numeric array. Each value in the array corresponds to a placeholder. For example, the Code for formatting two numbers as strings is as follows:
Format ('First %d, Second %d', [n1, n2]);
N1 and n2 are two integer values, the first placeholder is replaced by the first value, the second placeholder is replaced by the second value, and so on. If the placeholder output type (represented by letters after the % symbol) does not match the corresponding parameter type, a running time error occurs, therefore, setting the Compilation Time type check is helpful for the Format function.
In addition to % d, the Format function also defines many placeholders, as shown in table 7.1. These placeholders define the default output of the corresponding data type. You can use a deeper formatting constraint to change the default output. For example, a width constraint determines the number of characters in the output, the precision constraint determines the number of decimal places. For example
Format ('%8d', [n1]);
This sentence converts the number n1 to an 8-character string, and fills in the white space to align the right of the text, and uses minus signs (-) for the left alignment (-).
Table 7.1: placeholders of the Format Function
Placeholder |
Description |
D (decimal) |
Converts an integer value to a decimal string. |
X (hexadecimal) |
Converts an integer value to a hexadecimal numeric string. |
P (pointer) |
Convert the pointer value to a hexadecimal numeric string |
S (string) |
Copy string, character, or character pointer value to an output string |
E (exponential) |
Converts a floating point value to an exponential string. |
F (floating point) |
Converts a floating point value to a string represented by a floating point. |
G (general) |
Converts a floating point value to a shortest decimal string using a floating point or an index. |
N (number) |
Converts a floating point value to a floating point value with a thousands separator. |
M (money) |
Converts a floating point value to a string expressed by the amount of cash. The conversion result depends on the region settings. For details, see the topic of the "Currency and date/time formatting variables" in the Delphi help file. |
The best way to understand the above content is to perform a string formatting experiment in person. For convenience, I wrote the FmtTest program, which can convert integers and floating-point numbers into formatted strings. As shown in Figure 7.2, the program form is divided into two parts: Integer Conversion on the left and floating point conversion on the right.
The first edit box of each part shows the values to be formatted as strings. There is a button at the bottom of the first edit box to perform the formatting operation and display the result in the message box. Then, the second edit box is used to enter the formatting type string. You can also click any row in The ListBox control and select a predefined formatting string. You can also enter a new formatting string for each row, this type of string is added to the list box (note that the added type is lost when the program is closed ).
In this example, only different control texts are used to generate outputs. The Show button Event code is listed below:
procedure TFormFmtTest.BtnIntClick(Sender: TObject);begin ShowMessage (Format (EditFmtInt.Text, [StrToInt (EditInt.Text)])); // if the item is not there, add it if ListBoxInt.Items.IndexOf (EditFmtInt.Text) < 0 then ListBoxInt.Items.Add (EditFmtInt.Text);end;
This code mainly uses the text in the EditFmtInt edit box and the value of the EditInt Control for formatting. If the formatted string is not listed in the list box, the input string is added to the list box. If you click in the list box, the Code moves the clicked string to the edit box:
procedure TFormFmtTest.ListBoxIntClick(Sender: TObject);begin EditFmtInt.Text := ListBoxInt.Items [ ListBoxInt.ItemIndex];end;