Delphi-String explanation

Source: Internet
Author: User

Technical Exchange, DH commentary.

Once written once, and now all rewrite it. Comparative basis, Master mock not a.

Remember once on the box to see a person out of the face of the question, the first question is:
The difference between ansistring and widestring.
OK here first let everybody think, I finished, everybody should know. Hey.

First Category:
1 shortstring, can hold 255 characters, mainly for old version compatible
2 ansistring, can hold 2 of 31 characters, D2009 before the default string type
3 unicodestring, can hold 2 of 30 characters, D2009 and later default string type
4 widestring, can accommodate 2 of the 30 characters, mainly in the COM used more.

Good one One:
Shortstring
We see above that it can hold 255 characters, but the space it occupies is the character length plus 1, why?
Let's look at an example:

?
12345678 Procedure TForm3.Btn1Click(Sender: TObject);Var  S: String[15];Begin  S:= ‘HuangJackyAAAAA‘;  ShowMessageFmt(‘%d‘, [Length(S)]);//15个字符  ShowMessageFmt(‘%d‘, [SizeOf(S)]);//空间大小是16End;

From the above code we can see that the shortstring variable is not a pointer, but is directly a memory block, or sizeof should be 4. Now that we have found this problem, we must look at its memory.

We can find its first byte to hold the length of the string, so it can only hold 255 characters, $FF, right.
Then change the code above:

?
12345678 Procedure TForm3.Btn1Click(Sender: TObject);Var  S: String[15];Begin  S:= ‘HuangJackyAAAAA‘;  ShowMessageFmt(‘%d‘, [ord(S[0])]);//同样是15,哈哈.  ShowMessageFmt(‘%d‘, [SizeOf(S)]);End;

OK, so we have mastered the first kind of string.

Ansistring
Test with the same code:

?
12345678 Procedure TForm3.Btn1Click(Sender: TObject);Var  S: AnsiString;Begin  S:= ‘HuangJackyAAAAA‘;  ShowMessageFmt(‘%d‘, [Length(S)]);//15  ShowMessageFmt(‘%d‘, [SizeOf(S)]);//4End;

The description variable s is just a pointer.

Okay, check this out. Address:

Yes, so how does the memory of the actual string be organized?

Offset -12 -10 -8 -4 0-Length-1 Last one
Content Character page number Each character size Number of citations String length Actual content 0

What is the page number? Encode, UTF-8 or GBK these.
We know that a string is actually an object, but its release does not require us to worry because the compiler will automatically release it when the number of references is 0.
We said above that ansistring can hold 2 of the 31 characters, yes, because it takes up 4 bytes in length.
OK, let's write the code to test it:

?
1234567891011121314 Procedure TForm3.Btn1Click(Sender: TObject);Var  S: AnsiString;  I:Integer;Begin  S:= ‘HuangJackyAAAAA‘;  I:=Integer(S);  ShowMessage(IntToHex(PWord(I-12)^,4));//$03A8  ShowMessage(IntToHex(PWord(I-10)^,4));//$0001,AnsiString中一个元素就一个字节  ShowMessage(IntToHex(PCardinal(I-8)^,8));//$FFFFFFFF  ShowMessage(IntToHex(PCardinal(I-4)^,8));//$0000000F  ShowMessageFmt(‘%d‘, [Length(S)]);//15  ShowMessageFmt(‘%d‘, [SizeOf(S)]);//4End;

To map a picture:

It's the same as what we said above.
Let's go back to the page number that corresponds to what the code is. $03A8 is 936, check MSDN 936-gb2312. haha.
The memory distribution of the unicodestring is the same. So I feel-12 and 10 are joined primarily for unicodestring service. Because Unicode has many encodings, each element is different in size depending on the encoding.
Before entering widestring, let's look at how the length function is implemented in Delphi.

?
1234567891011121314151617 Unit3.pas.40: I:=Length(S);004B33CD 8B45FC           mov eax,[ebp-$04]004B33D0 85C0             test eax,eax004B33D2 7418jz $004b33ec004B33D4 8BD0             mov edx,eax004B33D6 83EA0A           sub edx,$0a004B33D9 66833A01         cmp word ptr [edx],$01004B33DD 740D             jz $004b33ec004B33DF 8D45FC           lea eax,[ebp-$04]004B33E2 33C9             xor ecx,ecx004B33E4 8B55FC           mov edx,[ebp-$04]004B33E7 E8782EF5FF       call @InternalLStrFromUStr//因为我们不是用的Unicode所以这里会被跳过004B33EC 85C0             test eax,eax004B33EE 7405jz $004b33f5004B33F0 83E804           sub eax,$04 //对就是这样,偏移-4004B33F3 8B00             mov eax,[eax] //取得它的值.是吧004B33F5 8BD8             mov ebx,eax

The anti-compilation code is much more than the Delphi7 ...
Just now when I looked at the VCL code, I found a structure like this: That's what we just said.

?
123456 StrRec = packedrecord  codePage: Word;  elemSize: Word;  refCnt: Longint;  length: Longint;end;
?
1  

Widestring
From the name, we knew it. A character must be 2 bytes. Haha, look at the example:

?
123456789 Procedure TForm3.Btn1Click(Sender: TObject);Var  S: WideString;Begin  S:= ‘HuangJackyAAAAA‘;  ShowMessageFmt(‘%d‘, [Length(S)]);//15  ShowMessageFmt(‘%d‘, [SizeOf(S[1])]);//2  ShowMessageFmt(‘%d‘, [SizeOf(S)]);//4End;

Yes, each character is 2 bytes, a variable or a stored pointer. Look at it like ansistring. The actual data is organized in memory:

Offset -4 0~ Length-1 Last one
Content String length Actual string contents $00 $00

It also uses 4 bytes to store the length, but the size of each element is 2 bytes, so it can store up to 2 30 characters.

Do you want to run to see the memory of it?

Length is $1e?? Haha, the length should be removed 2 plug, because this length is the length of this memory block.

Finally, say one more:
Pansichar,pwidechar
This is the C + + char*, that is, the end is 0 of the kind of character, this is relatively simple, there is nothing special bytes to store the length, the number of references. So many places we can use Pchar to use Pchar, because the string type is really resource-intensive.
C + + people laugh, do not laugh, CString the same problem.
Look at an example:

?
12345678 Procedure TForm3.Btn1Click(Sender: TObject);Var  S: PAnsiChar;Begin  S:= ‘HuangJackyAAAAA‘;  ShowMessageFmt(‘%d‘, [StrLen(S)]);//15  ShowMessageFmt(‘%d‘, [SizeOf(S^)]);1End;

Look at the memory and prove I'm not bluffing.

The front and it is not close, the tail is 0 end of it.

At the beginning of the question raised, we also know the difference. Usually don't look at the string simple, in fact, we do not know enough.

As you can see from above, the string using the first element subscript is starting from 1 instead of 0. Other precautions to use are intended to be written in the next article.

I am DH, it seems time to have lunch.

Delphi-String explanation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.