. Net, have you forgotten? (6) -- talk about string

Last Update:2018-12-07 Source: Internet

Author: User

Tags string methods

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I.ArticleBeginning

Before writing this article, let's talk about the purpose of writing this article. In my article yesterday, <Review design mode (I) -- enjoy meta mode>, I mentioned the string resident mechanism in this article. In the comments of the article, Yang raised questions about my string-related points and wrote them into a document. However, I cannot find the link now.

So I want to repeat this old topic here.

How much do we understand the commonly used string type.

Ii. Starting from C

C language is my first contactProgramLanguage. I still remember that the C language teacher I gave me was a professional Java SOA teacher.

As a result, when she was teaching C, she often made comparisons with Java from time to time. Although we didn't know what Java was at that time, we only knew that this word often appeared on mobile games.

At that time, I still remember the teacher's classic saying: we should remember that C does not have the string concept (in fact, we didn't know what a string is at that time ), the so-called string is represented as a character array in C.

Let's review the representation of "string" in C:

Char s [] = "ABC ";

Next, we can use s to call various "string" functions.

So we can clearly see that in C language, the "string" actually stores the first address of the character array, so what is in. Net?

Iii. String vs string

At school, I was asked this question countless times, especially many Java students.

String is actually the alias of string.CodeThere is no difference between the two, just as Int Is in system. int32.

The two are only:

1. String is the primitive type of the C # language and looks more C #.

2. system. String is the primitive type of FCL.

I often use this method:

1. If language interoperability is involved, there is no doubt that it must be system. String and will not be repeated. If you have any questions, please refer to <. net. Have you forgotten it? (1) -- Follow CLS>.

2. If you just declare a string, I will use the string, which looks more readable, similar to using int I = 3, and rarely see system. int32 I = 3.

3. if it involves the static method of using strings, I often use system. String, because string looks more like a class.

4. Constant string Mode

Let's take a look at the example here:
Static voidMain (String[] ARGs ){StringS1 ="Hello";StringS2 ="Hello";Console. Writeline (Object. Referenceequals (S1, S2); S2 ="Hello World";Console. Writeline (Object. Referenceequals (S1, S2 ));}
 
What are the results?

So why is the first reference of S1 and S2 clearly equal, but it is different after a change?

Let's reveal the secret.

5. In-depth string resident

First, we need to understand the. NET running process. If you do not know much about it, please refer to my <parsing. Net running process>.

After the CLR is loaded, A hashtable is initialized in the managed heap corresponding to systemdomain. This hashtable aims to store the created strings.

In hashtable, the key is the content of the string, and the value is the memory address corresponding to the string.

Let's analyze the above Code. The process is as follows:

String S1 = "Hello ";

String S2 = "hello ";

Now S1 and S2 point to the same address, and then we change the value of S2, assuming S2 = "world"; then at this time:

Of course, S1 and S2 do not point to the same referenced address at this time.

Well, now let's analyze this process systematically.

When initializing a string, the system first searches for the existence of each string constant in hashtable in the system-initialized hashtable,

If the string does not exist, allocate a memory address in the managed heap to store the String constant. Then, add a key-value pair to hashtable to store the string content as a key, the allocated memory address is stored as value.

If yes, point the reference to the original address. The following code further confirms this point of view:
 Static void Main ( String [] ARGs ){ String S1 ="Hello" ; String S2 = "Hello" ; String S3 = "Hello World" ; Console . Writeline ( "Compare S1 and S2 :" + Object . Referenceequals (S1, S2); S2 = "Hello World" ; Console . Writeline ( "Compare S1 and S2 :" + Object . Referenceequals (S1, S2 )); Console . Writeline ( "Compare S2 and S3 :" + Object . Referenceequals (S2, S3 ));}
So what happens when the string value is changed? This is the essence of the immutable mode, also known as the horizontal character of the string.

6. go deep into the constant character string

A string is a string that cannot be changed once it is created. That is to say, when we change the string value, we will re-allocate a new memory space on the hosting stack, without affecting the value stored on the original memory address.

In fact, this is why the string is a special reference type. It has the characteristics of value type and reference type.

I think Microsoft's design is based on two reasons:

1. Maintaining the constant character string means that thread synchronization does not occur when multiple users share the same string address.

2. If the string is not constant, the resident of the string is basically impossible.

7. Pursue the design Source

Why does Microsoft Design the string structure like this?

I think this is also the root cause for Yang's questioning my point of view.

Of course, the essence is to save memory. How much memory will be saved. I will explain it in the following example.

We create a web program that allows thousands of users to access the same server. strings are frequently used as the most common type, which is beyond doubt. On the server, the same string may be initialized to thousands of users. If we increase the number of users, it may be a million-level number, so how much memory is occupied by the server.

So with the string resident, if the two strings point to the same object when their values are equal, does this save a lot of memory?

As for Yang's other point of view, do hashtable exist throughout the operation and store all the data?

I think it is easy to refute. Obviously, hashtable exists throughout the running process, and I personally think there are two possibilities for data:

1. Clear useless key-value pairs on a regular basis.

2. Like garbage collection, it is reclaimed only when the size of the stored string exceeds the size of hashtable, or the efficiency of the Key location is affected.

8. Maverick string on IL

We all know that when constructing a reference object, the corresponding il code is newobj.

However,. Net has prepared a special declaration method for string: ldstr. This command constructs a String object through the text constants obtained in the metadata.

In addition, why is it a text constant obtained through metadata? Because C # uses string as its primitive type, the compiler stores these text constants in the metadata of the hosted module.

In any case, this indicates that string is a special type, and CLR has prepared a more efficient and Maverick Method for it.

9. Does string actually enjoy the yuan?

String stores a cache for itself through a hashtable. Then, each time a string is initialized, it first searches for the cache to find out whether there are reusable objects, and then reuse or create a new instance.

The sharing mode uses the sharing technology to solve the explosion of a large number of fine-grained objects.

Is it the same?

10. Intern Method

Let's first look at msdn's explanation of Intern:

Intern: retrieves the system's reference to a specified string. If such a string exists, its reference is returned. Otherwise, a new string is created and Its Reference is returned.

Why do I love this method in so many string methods? Let's take a look at this Code:
 Static voidMain (String[] ARGs ){StringS1 ="Hello";StringS2 ="Helloworld";StringS3 = S1 +"World";Console. Writeline (Object. Referenceequals (S2, S3 ));}
 
 
 
This is also a theoretical basis for Yang to refute me.

The reason is that S3 is a dynamically generated string, which is not added to the cache hash table for maintenance.

In this case, intern will be used.

Let's rewrite the above Code:
 Static voidMain (String[] ARGs ){StringS1 ="Hello";StringS2 ="Helloworld";StringS3 = S1 +"World";Console. Writeline (Object. Referenceequals (S2, S3); S3 =String. Intern (S3 );Console. Writeline (Object. Referenceequals (S2, S3 ));}
 
Because intern is used to search for the entire hashtable and then find whether there is any reference to this string. Then he will return this reference to S3 after finding it. At this time, of course, the references of S2 and S3 are equal.

The implementation of this method also shows the following:
 Public static stringIntern (StringStr ){If(STR =Null){Throw newArgumentnullexception("Str");}ReturnThread. getdomain (). getorinternstring (STR );}
 
What is the actual use of this method? Let's look down.

11. Learn how to use it.

In practical applications, strings are widely used, String. Compare ().

The essence of this method is to split the entire string and then compare each character. That is, it is often encountered in the pen test questions. The most common sentence I can say is to treat a string as a character array!

However, we know that this is very inefficient. So we can think about other methods based on the resident characteristics of strings.

In the above, I frequently use object. referenceequals (). As the name suggests, this method is to compare whether the memory addresses of two strings are equal. Because of the string resident technology, two equal strings point to the same address.

Therefore, we can use object. referenceequals () to improve the efficiency of string comparison.

Of course, it is very likely that a certain string is dynamically generated, so intern will be used in this case. Of course, intern is also an efficient method, therefore, we do not need to use this method if we only need to perform a few comparison operations.

In addition, because of the constant immutability of strings, a large amount of memory garbage is generated during String concatenation. Therefore, if you want to splice a large number of strings, you need to use the stringbuilder class to complete them. Many people know this. I will not elaborate on this.

12. Summary

String is a special reference object.

The key lies in his character string resident technology and constant immutability of character strings.

It's not too early. I have to get up and go to work after a sleep. Thank you for your attention. If you have any different comments, please leave a message to discuss them. I always think that technology is discussed, rather than being bored at home!

Reference: What you must know. net

. NET Framework Program Design

In-depth understanding of. net

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More