Technical Exchange, DH commentary.
Once written once, and now all rewrite it. Comparative basis, Master mock not a.
Remember once on the box to see a person out of the face of the question, the first question is:
The difference between ansistring and widestring.
OK here first let everybody think, I finished, everybody should know. Hey.
First Category:
1 shortstring, can hold 255 characters, mainly for old version compatible
2 ansistring, can hold 2 of 31 characters, D2009 before the default string type
3 unicodestring, can hold 2 of 30 characters, D2009 and later default string type
4 widestring, can accommodate 2 of the 30 characters, mainly in the COM used more.
Good one One:
Shortstring
We see above that it can hold 255 characters, but the space it occupies is the character length plus 1, why?
Let's look at an example:
?
12345678 |
Procedure TForm3
.
Btn1Click(Sender: TObject);
Var
S:
String
[
15
];
Begin
S:=
‘HuangJackyAAAAA‘
;
ShowMessageFmt(
‘%d‘
, [Length(S)]);
//15个字符
ShowMessageFmt(
‘%d‘
, [SizeOf(S)]);
//空间大小是16
End
;
|
From the above code we can see that the shortstring variable is not a pointer, but is directly a memory block, or sizeof should be 4. Now that we have found this problem, we must look at its memory.
We can find its first byte to hold the length of the string, so it can only hold 255 characters, $FF, right.
Then change the code above:
?
12345678 |
Procedure TForm3
.
Btn1Click(Sender: TObject);
Var
S:
String
[
15
];
Begin
S:=
‘HuangJackyAAAAA‘
;
ShowMessageFmt(
‘%d‘
, [ord(S[
0
])]);
//同样是15,哈哈.
ShowMessageFmt(
‘%d‘
, [SizeOf(S)]);
End
;
|
OK, so we have mastered the first kind of string.
Ansistring
Test with the same code:
?
12345678 |
Procedure TForm3
.
Btn1Click(Sender: TObject);
Var
S:
AnsiString
;
Begin
S:=
‘HuangJackyAAAAA‘
;
ShowMessageFmt(
‘%d‘
, [Length(S)]);
//15
ShowMessageFmt(
‘%d‘
, [SizeOf(S)]);
//4
End
;
|
The description variable s is just a pointer.
Okay, check this out. Address:
Yes, so how does the memory of the actual string be organized?
Offset |
-12 |
-10 |
-8 |
-4 |
0-Length-1 |
Last one |
Content |
Character page number |
Each character size |
Number of citations |
String length |
Actual content |
0 |
What is the page number? Encode, UTF-8 or GBK these.
We know that a string is actually an object, but its release does not require us to worry because the compiler will automatically release it when the number of references is 0.
We said above that ansistring can hold 2 of the 31 characters, yes, because it takes up 4 bytes in length.
OK, let's write the code to test it:
?
1234567891011121314 |
Procedure TForm3
.
Btn1Click(Sender: TObject);
Var
S:
AnsiString
;
I:
Integer
;
Begin
S:=
‘HuangJackyAAAAA‘
;
I:=
Integer
(S);
ShowMessage(IntToHex(PWord(I-
12
)^,
4
));
//$03A8
ShowMessage(IntToHex(PWord(I-
10
)^,
4
));
//$0001,AnsiString中一个元素就一个字节
ShowMessage(IntToHex(PCardinal(I-
8
)^,
8
));
//$FFFFFFFF
ShowMessage(IntToHex(PCardinal(I-
4
)^,
8
));
//$0000000F
ShowMessageFmt(
‘%d‘
, [Length(S)]);
//15
ShowMessageFmt(
‘%d‘
, [SizeOf(S)]);
//4
End
;
|
To map a picture:
It's the same as what we said above.
Let's go back to the page number that corresponds to what the code is. $03A8 is 936, check MSDN 936-gb2312. haha.
The memory distribution of the unicodestring is the same. So I feel-12 and 10 are joined primarily for unicodestring service. Because Unicode has many encodings, each element is different in size depending on the encoding.
Before entering widestring, let's look at how the length function is implemented in Delphi.
?
1234567891011121314151617 |
Unit3
.
pas
.40
: I:=Length(S);
004B33CD 8B45FC mov eax,[ebp-
$04
]
004B33D0 85C0 test eax,eax
004B33D2
7418
jz
$004b33ec
004B33D4 8BD0 mov edx,eax
004B33D6 83EA0A sub edx,
$0a
004B33D9 66833A01 cmp
word ptr [edx],
$01
004B33DD 740D jz
$004b33ec
004B33DF 8D45FC lea eax,[ebp-
$04
]
004B33E2 33C9
xor ecx,ecx
004B33E4 8B55FC mov edx,[ebp-
$04
]
004B33E7 E8782EF5FF call @InternalLStrFromUStr
//因为我们不是用的Unicode所以这里会被跳过
004B33EC 85C0 test eax,eax
004B33EE
7405
jz
$004b33f5
004B33F0 83E804 sub eax,
$04 //对就是这样,偏移-4
004B33F3 8B00 mov eax,[eax]
//取得它的值.是吧
004B33F5 8BD8 mov ebx,eax
|
The anti-compilation code is much more than the Delphi7 ...
Just now when I looked at the VCL code, I found a structure like this: That's what we just said.
?
123456 |
StrRec = packed record codePage: Word ; elemSize: Word ; refCnt: Longint ; length: Longint ; end ; |
?
Widestring
From the name, we knew it. A character must be 2 bytes. Haha, look at the example:
?
123456789 |
Procedure TForm3
.
Btn1Click(Sender: TObject);
Var
S:
WideString
;
Begin
S:=
‘HuangJackyAAAAA‘
;
ShowMessageFmt(
‘%d‘
, [Length(S)]);
//15
ShowMessageFmt(
‘%d‘
, [SizeOf(S[
1
])]);
//2
ShowMessageFmt(
‘%d‘
, [SizeOf(S)]);
//4
End
;
|
Yes, each character is 2 bytes, a variable or a stored pointer. Look at it like ansistring. The actual data is organized in memory:
Offset |
-4 |
0~ Length-1 |
Last one |
Content |
String length |
Actual string contents |
$00 $00 |
It also uses 4 bytes to store the length, but the size of each element is 2 bytes, so it can store up to 2 30 characters.
Do you want to run to see the memory of it?
Length is $1e?? Haha, the length should be removed 2 plug, because this length is the length of this memory block.
Finally, say one more:
Pansichar,pwidechar
This is the C + + char*, that is, the end is 0 of the kind of character, this is relatively simple, there is nothing special bytes to store the length, the number of references. So many places we can use Pchar to use Pchar, because the string type is really resource-intensive.
C + + people laugh, do not laugh, CString the same problem.
Look at an example:
?
12345678 |
Procedure TForm3
.
Btn1Click(Sender: TObject);
Var
S:
PAnsiChar
;
Begin
S:=
‘HuangJackyAAAAA‘
;
ShowMessageFmt(
‘%d‘
, [StrLen(S)]);
//15
ShowMessageFmt(
‘%d‘
, [SizeOf(S^)]);
1
End
;
|
Look at the memory and prove I'm not bluffing.
The front and it is not close, the tail is 0 end of it.
At the beginning of the question raised, we also know the difference. Usually don't look at the string simple, in fact, we do not know enough.
As you can see from above, the string using the first element subscript is starting from 1 instead of 0. Other precautions to use are intended to be written in the next article.
I am DH, it seems time to have lunch.
Delphi-String explanation