The string in Delphi

Source: Internet
Author: User

First, the string before Delphi 2009 (Unicode is not supported):

The string before Delphi 2009 is divided into 3 types: shortstring, ansistring, widestring.



"Shortstring"

Shortstring is an older Pascal string format that can hold up to 255 bytes of characters. When we declare a variable of type shortstring, Delphi automatically applies 256 bytes of memory space to the variable, where the first byte is used to hold the length of the string, and the next 255 bytes are used to hold the string contents, if the length of the string is not 255 bytes, The number of characters to use is how much memory is used, and the memory that is not used later is zeroed out.

Var
sstr:shortstring;

The above declaration makes the SSTR have 256 bytes of memory space:

Sizeof (SSTR); {= 256;}

Until SSTR is no longer in use, Delphi automatically frees the memory space occupied by SSTR.

We can also declare variables of type shortstring in the following way:

Var
SSTR:STRING[16];

In this way, we declare a shortstring string that can hold only 16 bytes of content, plus a byte to hold the length of the string, SStr a total of 17 bytes of memory space:

Sizeof (SSTR); {= 17;}

We can use shortstring like byte arrays (array of byte), for example, we can use subscripts to access individual characters in shortstring, and you can use the high and low functions to get the upper and lower positions of the shortstring. Because the first byte of a string holds the length of a string, sstr[0] holds the length of the string, for example:

Var
SSTR:STRING[16];
Begin
SSTR: = ' ABC ';
At this time
Ord (Sstr[0]); = 3 String length is 3
SSTR[1]; = ' A ' The first character is a
SSTR[2]; = ' B ' A second character is B
SSTR[3]; = ' C ' A third character is C
SSTR[4]; = #0 the rest of the position clear 0 #0
...//The rest of the location 0 #0
SSTR[255]; = #0 the rest of the position clear 0 #0
High (SSTR); = 16 Upper position is 16
Low (SSTR); = 0 Lower limit position is 0}
End

Next, let's take a look at SSTR's pointer situation.

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
sstr:shortstring;

Ps:pointer;
Ps1:pointer;
Begin
SSTR: = ' ABC ';

PS: = Addr (SSTR); {The address of the string variable SSTR}
PS1: = Addr (sstr[0]); {The first address of the string}

Memo1.clear;
MEMO1.LINES.ADD (IntToStr (PS)); {Displayed as: 1242240} on My Computer
MEMO1.LINES.ADD (IntToStr (pS1)); {Displayed as: 1242240} on My Computer
End

----------

The above code shows that the address of the variable SStr is the address of the memory block where the string is stored. This is different from the ansistring and Widestring mentioned later.



"Ansistring"

Ansistring is a dynamically allocated memory string, which means that when we declare a ansistring, it does not occupy memory space, such as:

Var
astr:ansistring;

At this point the length of the string is 0 and does not occupy memory space:

Length (ASTR); {= 0}

Why not use Sizeof here (ASTR); And with Length (ASTR); Because AStr and shortstring are different, the address of the variable AStr is not the address of the "memory block of the string", the variable AStr is simply a pointer to the "memory block that holds the string", and we can find the "memory block holding the string" through the variable ASTR. Take a look at the following code:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
astr:ansistring;

Pa:pointer;
Pa1:pointer;
Begin
ASTR: = ' ABC ';

PA: = Addr (ASTR); {The address of the string variable ASTR}
PA1: = Addr (astr[1]); {The address of the memory block that holds the string.} Delphi does not allow access to astr[0]}

Memo1.clear;
MEMO1.LINES.ADD (IntToStr (PA)); {Displayed as: 1242504} on My Computer
MEMO1.LINES.ADD (IntToStr (pA1)); {Displayed as: 17539780} on My Computer
End

----------

As you can see from the code above, the address of the string variable AStr is not the same as the address of the memory block that holds the string. So with sizeof (ASTR) can not get the size of the string, can only get a pointer size, always for sizeof (Pointer);

Just now, ansistring variable is not allocated memory at the time of declaration, so it cannot use astr[1], because astr[1] refers to the first character in "Allocated memory block", and "memory block" is not allocated at all, which is the first character? So before using astr[1] first to determine whether the Length (ASTR) is 0, if 0, it means that the memory block is not allocated, you cannot use astr[1].

When will the ASTR allocate memory? When we assign a value to AStr, Delphi allocates memory to ASTR, for example:

----------
Var
astr:ansistring; {ASTR memory not allocated at this time}
Begin
ASTR: = ' ABC '; {At this point, Delphi allocates three bytes of memory to AStr to hold ' ABC '}
ShowMessage (astr[1]); {You can use astr[1] now}
ASTR: = ' ABCDEF '; {This time Delphi increases the memory space of the string to hold more characters}
ASTR: = "; {At this point AStr will release all the memory just allocated}
ShowMessage (astr[1]); {error because memory is not allocated at this time, so astr[1]} is not available
End

----------

What is the relationship between the variable AStr and the memory block that holds the string? The variable AStr is actually a pointer to the memory block that holds the string. The following code can be a good understanding of the problem:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
astr:ansistring;

Pa:pointer; {Address of variable ASTR}
Pa1:pointer; {The address of the first character in the string (that is, the address of a memory block)}
Pap:pointer; {The pointer stored in the variable ASTR}
Begin
ASTR: = ' ABC '; {Request a three-byte memory block for ' ABC '}

PA: = Addr (ASTR); {Get the address of the variable AStr}
PA1: = Addr (astr[1]); {Get the address of the memory block}
PAP: = Pointer (ASTR); {Gets the pointer stored in the variable ASTR}

Memo1.clear;
MEMO1.LINES.ADD (IntToStr (PA)); {Address of variable ASTR}
MEMO1.LINES.ADD (IntToStr (pA1)); {Address of Memory block}
MEMO1.LINES.ADD (IntToStr (PAP)); {The pointer stored in the variable ASTR}
End

----------

{Run result}
1242504 {The address of the variable AStr}
17539780 {Address of memory block}
17539780 {The pointer stored in the variable ASTR}

----------

This shows that the AStr is just a pointer to the "Memory block to hold the string", we can Pointer (ASTR) to get the address of this memory block.

The above code has a strange place that will be pA1: = Addr (astr[1]); and PAP: = Pointer (ASTR); Two lines of code before and after the location of the switch, the results are different (test environment: Delphi XE2), the following is the result of the swap:

----------

{To swap for subsequent run results}
1242504 {The address of the variable AStr}
17539780 {Address of memory block}
5327792 {The pointer stored in the variable ASTR}

----------

This may be the result of Delphi's optimization of the String (Delphi's Copy-on-write technique), when we simply read the string, its address is the address (5327792) when the initial value was just assigned, and when we want to modify the string (pA1: = Addr ( ASTR[1]) is also considered to be the user is going to modify the string by pointer, Delphi will copy the string to a new place (17539780) for the user to modify and continue to use. If the "string in the old address" is also referenced by other variables, such as AStr: = ' ABC ' and then AStr2: = AStr, the old address is reserved for ASTR2 to continue to use, and if the old address is no longer used by other variables, all the strings in the old address are freed. )

In order to test the above, we test again with the following code, this time, we define ansistring as constants, do not allow modification, and see how the results are:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Const
astr:ansistring = ' ABC ';
Var
Pa:pointer; {Address of variable ASTR}
Pa1:pointer; {The address of the first character in the string (that is, the first address of a memory block)}
Pap:pointer; {The pointer stored in the ASTR}
Begin
PA: = Addr (ASTR); {Get the address of the variable AStr}
PAP: = Pointer (ASTR); {Gets the pointer held in the variable AStr (put in front)}
PA1: = Addr (astr[1]); {Get the address of the memory block (back)}

Memo1.clear;
MEMO1.LINES.ADD (IntToStr (PA)); {Address of variable ASTR}
MEMO1.LINES.ADD (IntToStr (pA1)); {Address of Memory block}
MEMO1.LINES.ADD (IntToStr (PAP)); {The pointer stored in the variable ASTR}
End

----------

{Run result at this time}

5379472 {The address of the variable AStr}
5327460 {Address of memory block}
5327460 {The pointer stored in the variable ASTR}

----------

The address of the string has not changed, which validates the copy-on-write technology we have just said about Delphi. This also explains why it is recommended that you try to declare a string as a constant (such as a function's string argument) without modifying the contents of the string, because constants do not trigger Delphi's copy-on-write technology, which increases the efficiency of code execution.

Next, it is necessary to explain the length function, the length function for shortstring and ansistring, refers to "the number of bytes in the string (not the number of characters)", see the following example:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
sstr:shortstring;
astr:ansistring;
Begin
Memo1.clear;

SSTR: = ' 123 How are you? ' {Altogether 6 characters}
ASTR: = ' 123 How are you? ' {Altogether 6 characters}

MEMO1.LINES.ADD (IntToStr (Length (SSTR))); {Result 9}
MEMO1.LINES.ADD (IntToStr (Length (ASTR))); {Result 9}
End

----------

Because an English character occupies only 1 bytes of memory space, and a Chinese character occupies 2 bytes of memory space, both SSTR and AStr use 9 bytes of memory space to hold the string. If you use astr[4] to get the kanji "you", then only half of the kanji can be obtained, and the astr[4] is the Ansichar type (single byte). So it is very troublesome to deal with Chinese characters with ansistring, we usually use widestring to handle strings with Chinese characters.

About Delphi does not allow access to astr[0], in fact, ansistring and shortstring have a similar structure, that is, in front of the ansistring string also has a data area to save the length of the ansistring (bytes), the data area is is the 4 bytes before the string, we can access the area through a pointer, and then the previous 4 bytes contain a reference count of the string, which identifies the string as being referenced by several string variables. The definition of the string structure can be found in the system unit of Delphi 7:

Strrec = Packed record
Refcnt:longint; {Reference count}
Length:longint; {String length}
End

Take a look at the following code:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
astr:ansistring;
p:pcardinal; {Sizeof (Cardinal) = 4 bytes}
Begin
Memo1.clear;

SetLength (ASTR, 65530);
P: = Pcardinal (ASTR);

Dec (P); {Move forward 4 bytes}
MEMO1.LINES.ADD (IntToStr (p^)); {result 65530 string length}
Dec (P); {Then move forward by 4 bytes}
MEMO1.LINES.ADD (IntToStr (p^)); {Result 1 reference count}
End

----------

As you can see from the above results, the length of the string is 65530 and the reference count is 1.

About the content that can be stored in ansistring, it is said that Ansistring is stored in a string with the end of the #0, similar to the C language string, so that the understanding is not accurate, because ansistring can hold multiple #0 characters, #0 does not necessarily represent the end of the string. For example:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
ASTR, astr2:ansistring;
I:integer;
Begin
Memo1.clear;

SetLength (ASTR, 10); {Store 10 #0 characters in ASTR}
For I: = 1 to Ten do
Astr[i]: = #0;

ASTR2: = ASTR; {See if character loss will occur during assignment}
ASTR2[5]: = ' A ';
MEMO1.LINES.ADD (IntToStr (Length (ASTR2))); {result 10, no missing characters}
End

----------

That is to say, ansistring can be used directly as memory, it can not only store characters, but can store anything, you can even put a picture of the data into ansistring memory block. The advantage of using ansistring instead of memory is that DELPIH will help you manage this memory, and when you don't need to use that block of memory, Delphi will automatically release it for you.

But such a string cannot be used with the Pansichar type, because Pansichar is the real string that ends with the #0 (we didn't just say ansistring is actually a pointer, Pansichar is also a pointer, in the usage and Ansistring have many of the same places), when you assign the above string to a Pansichar variable, Pansichar only reads the contents of the first #0 and discards everything after the first #0. Take a look at the following code:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
astr:ansistring;
Pastr:pansichar;
I:integer;
Begin
Memo1.clear;

SetLength (ASTR, 10);
For I: = 1 to Ten do
Astr[i]: = #0;

Pastr: = ' ABC ';
MEMO1.LINES.ADD (IntToStr (Length (PASTR))); {Result 3}

PASTR: = Pansichar (ASTR);
MEMO1.LINES.ADD (IntToStr (Length (PASTR))); {Result 0}

PASTR: = #0 #0#0;
MEMO1.LINES.ADD (IntToStr (Length (PASTR))); {Result 0}
End

----------

Well, sum up ansistring:

The ansistring variable is just a pointer to a block of memory that is used to hold the actual string. The memory that ansistring points to is dynamically allocated, and if no strings are stored in ansistring, ansistring points to nil.

Although the ansistring variable is not a real block of memory, it is just a pointer, but we can still access the characters in the memory block by means of subscripts. The subscript must start from 1, and Delphi does not allow ansistring and widestring to use subscript 0.

We can access the address of the memory block pointed to by ASTR in the following ways:

@AStr [1]
Pointer (ASTR)

We can access the first character, the second character, and the third character in the memory block pointed to by AStr as follows:

ASTR[1], astr[2], astr[3]
Pansichar (ASTR) ^, (Pansichar (AStr) +1) ^, (Pansichar (AStr) +2) ^

The Pansichar type and the ansistring type can be easily converted to each other. However, the compiler will give a warning (not an error) when compiling. For example:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
astr:ansistring;
Pastr:pansichar;
Begin
Memo1.clear;

ASTR: = ' ABC ';
PASTR: = Pansichar (ASTR);
MEMO1.LINES.ADD (ASTR);
MEMO1.LINES.ADD (PASTR);

MEMO1.LINES.ADD (");

Pastr: = ' 123 ';

ASTR: = Pastr;
MEMO1.LINES.ADD (ASTR);
MEMO1.LINES.ADD (PASTR);
End

----------

With respect to the length function, the length function returns the number of bytes of the string they hold, not the number of characters, for Shortstring and ansistring.

The total length of the string (the total number of bytes) is stored in the 4 bytes before the ansistring string, and the reference count of the string is stored in the previous 4 bytes.

Astr[n] can only return half of Chinese characters, so it is very troublesome to deal with Chinese characters with ansistring, we generally use widestring to deal with strings with Chinese characters.

Delphi uses Copy-on-write technology for both ansistring and widestring, so when we are not ready to modify the contents of a string, it is best to declare the string as constant to improve the execution efficiency of the program.

The ansistring in Delphi can be used directly as memory, and if you want to use it, Delphi will help you manage this memory. Application memory is very convenient, direct SetLength on it. When you're done with Delphi, you'll be released. However, ansistring cannot be used with Pansichar at this time. To get the address of the memory, you can use Pointer (ASTR).



"Widestring"

The usage of widesting and ansistring is basically the same, and it is dynamically allocated memory. The difference is that any character stored by the widestring (either English or kanji) uses 2 bytes, but widestring does not have a reference count. In fact, widestring is to facilitate the use of COM generated, that is, BSTR string. A BSTR has no reference count and is less efficient.

The type corresponding to the widestring has Widechar, and the usage is the same as Ansichar. and Pwidechar, as with Pansichar.

For widestring, the length function returns the number of characters, not the number of bytes, to get the number of bytes in the widestring, and to multiply the return value of length by 2.

For the following code:

----------

Var
wstr:widestring;
Begin
WSTR: = ' 123 How are you? '
End

----------

Wstr[1] means the character "1", Wstr[4] represents the character "you", except that wstr[1 at this time is the Widechar type (double byte).



Second, the string after Delphi 2009 (start full Unicode support):

Delphi 2009 then added a new string: Uniodestring, and changed the structure of the string, adding the CodePage and elemsize domains. The following is the string structure defined in the System unit:

Pstrrec = ^strrec;
Strrec = Packed record
Codepage:word; Code page: Unicode, UTF-8, UTF-16, GB2312
Elemsize:word; Element size: One character for a few bytes
Refcnt:longint; Reference count: String is used by several string variables
Length:longint; String length: Number of bytes
End

This structure is used only for ansistring and unicodestring, that is, 4 bytes before the string ansistring and unicodestring are stored in the string length, and the previous 4 bytes hold the reference count of the string, before the 2 words The section holds the element size, and the previous 2 bytes hold the code page of the string. And the widestring is still the same, no change. So we can use unicodestring to deal with Chinese characters in the future.

A string of type utf8string and ucs4string is also defined in the system unit, defined as follows:

utf8string = Type ansistring (65001);
ucs4string = array of Ucs4char; {Ucs4char = Type Longword;}

In addition, Delphi defines a string of type RAWBYTESTRNG, defined as follows:

rawbytestring = Type ansistring ($FFFF);

About RAWBYTESTRNG types: When you assign a string of ansistring format to a string in utf8string format, Delphi automatically converts it (and other formats automatically), so If we have a function parameter that needs to receive various types of strings, then it is very difficult to implement, because when the parameters are passed, Delphi will automatically format the conversion, so Delphi defines the rawbytestring type, this type of variable in the receipt of any format of the string, will maintain the memory format of the source string without making any changes.



"String"

The type of string that we often use is type string and seldom uses ansistring or unicodestring directly, so what type is string?

Depending on the compilation parameters, the string type represents different meanings, or depending on the Delphi version, the meaning is also different, in the Ansi version (before Delphi 2009), string represents ansistring, and in the Unicode version ( After Delphi 2009), it represents unicodestring.

The same Char represents Ansichar in the Ansi version and represents Widechar in the Unicode version. PChar represents Pansichar in the Ansi version, and Pwidechar is represented in the Unicode version.

In the Unicode version of Delphi, there is a new function bytelength the number of bytes used to get a string. But this function can only be used for variables of type string, so let's look at its source code and know why.

----------

function ByteLength (const s:string): Integer;
Begin
Result: = Length (S) * SIZEOF (Char);
End

----------

This function produces different results based on different compilation conditions, so it can only be used with string. If you use it for ansisting or unicodestring variables, you might get an error under different compilation conditions. Take a look at the following code:

----------

{Place a TMemo and a TButton} on a blank form
Procedure Tform1.button1click (Sender:tobject);
Var
sstr:shortstring;
astr:ansistring;
ustr:unicodestring;
wstr:widestring;
str:string;
Begin
SSTR: = ' 123 How are you? '
ASTR: = ' 123 How are you? '
USTR: = ' 123 How are you? '
WSTR: = ' 123 How are you? '
STR: = ' 123 How are you? '

Memo1.clear;
MEMO1.LINES.ADD (IntToStr (Length (SSTR))); {Result 9}
MEMO1.LINES.ADD (IntToStr (Length (ASTR))); {Result 9}
MEMO1.LINES.ADD (IntToStr (Length (USTR))); {Result 6}
MEMO1.LINES.ADD (IntToStr (Length (WSTR))); {Result 6}
MEMO1.LINES.ADD (IntToStr (Length (STR))); {Result 6}

MEMO1.LINES.ADD (IntToStr (ByteLength (SSTR))); {Result 12}
MEMO1.LINES.ADD (IntToStr (ByteLength (ASTR))); {Result 12}
MEMO1.LINES.ADD (IntToStr (ByteLength (USTR))); {Result 12}
MEMO1.LINES.ADD (IntToStr (ByteLength (WSTR))); {Result 12}
MEMO1.LINES.ADD (IntToStr (ByteLength (STR))); {Result 12}
End

----------

(go) a string in Delphi

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.