Delphi-detailed description of strings

Source: Internet
Author: User

Technical Exchange, DH explanation.

I wrote it once before, and now I want to rewrite it all. It's more basic.

I remember that one of my interview questions was displayed on the box. The first question is:
The difference between ansistring and widestring.
Well, let's think about it first. After I finish the lecture, you should know it.

First classification:
1Optional string, Can contain 255 characters, mainly for compatibility with earlier versions
2Ansistring, Can contain 2 to the power of 31 characters, d2009 before the default string type
3Unicodestring, Can contain 2 to the power of 30 characters, d2009 and later default string type
4WidestringCan contain 2 to the power of 30 characters, mainly used in COM.

One by one:
Optional string
We can see that it can contain 255 characters, but the space it occupies is the length of the characters plus 1. Why?
Let's look at an example:

 
Procedure tform3.btn1click (Sender: tobject); var S: String [15]; begin S: = 'huangjackyaaaaa'; showmessagefmt ('% d', [length (s)]); // 15 characters showmessagefmt ('% d', [sizeof (s)]); // the size of the space is 16end;

From aboveCodeWe can see that the objective string variable is not a pointer, but a memory block. Otherwise, sizeof should be 4. since we found this problem, we must check its memory.

We can find that the first byte is used to store the length of the string, so it can only contain 255 characters, $ ff, right.
Modify the above Code:

 
Procedure tform3.btn1click (Sender: tobject); var S: String [15]; begin S: = 'huangjackyaaaaa'; showmessagefmt ('% d ', [ord (s [0]); // It is also 15, haha. showmessagefmt ('% d', [sizeof (s)]); end;

Okay, so we have mastered the first string.

Ansistring
Test with the same code:

Procedure tform3.btn1click (Sender: tobject); var S: ansistring; begin S: = 'huangjackyaaaaa'; showmessagefmt ('% d', [length (s)]); // 15 showmessagefmt ('% d', [sizeof (s)]); // 4end;

The variable S is just a pointer.

Let's take a look at this address:

Well, how is the memory for storing the actual string organized?

Offset -12 -10 -8 -4 0-length-1 Last digit
Content Character page number Size of each character Times of reference String Length Actual content 0

What is the page number? Encoding, UTF-8, or GBK.
We know that a string is actually an object, but it does not need to be released because the compiler will automatically release it when the number of references is 0.
We mentioned above that ansistring can contain 2 to the power of 31 characters, yes, because it occupies 4 bytes.
Let's write code to test it:

Procedure tform3.btn1click (Sender: tobject); var S: ansistring; I: integer; begin S: = 'huangjackyaaaaa'; I: = INTEGER (s ); showmessage (inttohex (pword (I-12) ^, 4); // $03a8 showmessage (inttohex (pword (I-10) ^, 4); // $0001, an element in ansistring is a byte showmessage (inttohex (pcardinal (I-8) ^, 8); // $ ffffffff showmessage (inttohex (pcardinal (I-4) ^, 8 )); // $ 0000000f showmessagefmt ('% d', [length (s)]); // 15 showmessagefmt (' % d', [sizeof (s)]); // 4end;

To map one:

Just like what we mentioned above.
Let's look at the code corresponding to the page number. $03a8 is 936. Check msdn 936-gb2312. haha.
UnicodestringThe memory distribution is the same. therefore, I feel that-12 and-10 are mainly used for the unicodestring service. because Unicode contains a lot of encoding, the size of each element varies according to the encoding.
Before entering widestring, let's see how the length function is implemented in Delphi.

 unit3.pas. 40: I: = length (s); 004b33cd 8b45fc mov eax, [EBP-$04] 004b33d0 85c0 test eax, limit 7418 JZ $ limit 8bd0 mov edX, eax004b33d6 83ea0a sub edX, $201766833a01 CMP word PTR [edX], $ 01004b33dd 740d JZ $ 20178d45fc Lea eax, [EBP-$04] 004b33e2 33c9 XOR ECx, ecx004b33e4 8b55fc edX mov, [EBP-$04] 004b33e7 quit call @ internallstrfromustr // because we are not using Unicode, We will skip 004b33ec 85c0 test eax, eax004b33ee 7405 JZ $ limit 83e804 sub eax, $04 // For example, offset-4004b33f3 8b00 mov eax, [eax] // get its value. right? 004b33f5 8bd8 mov EBX, eax 

The decompilation code is much more than Delphi7...
When I flipped through the VCL code, I found such a struct:

 
Strrec = packed record codePage: word; elemsize: word; refcnt: longint; Length: longint; end;
 
 

Widestring
From the name, we can see that a character must take 2 bytes. Haha, let's look at the example:

Procedure tform3.btn1click (Sender: tobject); var S: widestring; begin S: = 'huangjackyaaaaa'; showmessagefmt ('% d', [length (s)]); // 15 showmessagefmt ('% d', [sizeof (s [1]); // 2 showmessagefmt (' % d', [sizeof (s)]); // 4end;

Right, each character is 2 bytes, And the variables are still stored pointers. Look at the actual data organization in the memory like ansistring:

Offset -4 0 ~ Length-1 Last
Content String Length Actual string content $00 $00

It also uses 4 bytes to store the length, but each element is 2 bytes in size, so it can only store 2 to the power of 30 characters.

Do you want to view its memory?

The length is $ 1E ?? Haha, the length must be divided by 2, because the length is the length of the memory block.

Let's talk about one more thing:
Pansichar, pwidechar
This is the char * in C ++, that is, the character with 0 at the end. This is simple. There is no special byte to store the length and the number of references. so we can use pchar in many places, because the string type is indeed very resource-consuming.
C ++ people laugh. Don't laugh. cstring has this problem.
Let's look at an example:

 
Procedure tform3.btn1click (Sender: tobject); var S: pansichar; begin S: = 'huangjackyaaaaa'; showmessagefmt ('% d', [strlen (s)]); // 15 showmessagefmt ('% d', [sizeof (s ^)]); 1end;

Check the memory to prove that I have not fooled anyone.

There is no edge between the front and it. The tail ends with 0.

We also know the difference in the question we asked at the beginning. Do not look at string simplicity at ordinary times. In fact, we do not know enough about it.

From the above, we can see that the first element subscript used by the string is from 1, not 0. Other precautions are to be written in the next article.ArticleInside.

I am DH and it seems that I should have lunch.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.