"Meal": from their own just stepping into the chaos of the ape, self-feeling is not a general dish, hence the name "meal", in their own mutual encouragement.
Extended reading: Deep understanding of value types and reference types
Basic concepts
String (Strictly System.String) type is one of the most used types in our daily coding. What is a string? ^ ~ ^
The string is an immutable collection of 16-bit Unicode code values that are derived directly from the System.Object type.
There is also a less commonly used secure string type, System.Security.SecureString, which is allocated on unmanaged memory to avoid the Black Hand of the GC. Used primarily for security-specific scenarios. [Specifically, you can view MSDN here without a discussion.] =>MSDN View Details
Characteristics
- Because the string type is directly derived from object, it is a reference type, which means that an instance of the string object always exists on the heap.
- A string has immutability, which means that once initialized, its value will never change.
- The string type is closed, in other words, any of your types cannot inherit a string.
- A keyword that defines a string instance is simply a mapping of the System.String type.
Precautions
- About carriage returns and line breaks in strings generally, you would like to hard-code the ' \ r \ n ', but it is not recommended that an error will occur once the program is migrated to another platform. Conversely, it is recommended to use the NewLine property of the System.Environment class to generate carriage returns and line feeds that can be used across platforms.
Concatenation of constant strings and non-const strings behave differently in the CLR. Please see the performance section for details.
- A string preceded by an @ symbol alters the compiler's behavior, and if the @ symbol is added, the compiler treats the escape character in the string as a normal character. That is, what I define is what content, mainly in the use of file paths or directory strings used. The output of the following two string contents will be exactly the same.
static void Main(string[] args) { string a = "c:\\temp\\1"; string b = @"c:\temp\1"; Console.WriteLine(a); Console.WriteLine(b); Console.Read(); }
Performance
- The C # compiler directly supports string types and stores the defined constant strings directly into the module's metadata at compile time . It is then loaded directly at run time. This also indicates that a constant of type string has special treatment at run time.
- Because of the invariant character of a string, there is no thread-safe problem with multiple threads manipulating the string at the same time. This is useful in the design of some shared configurations.
- If a program often compares a string with a higher repetition, this can have a performance impact, because it takes several steps to compare strings. For this reason, the CLR introduces a technique for string reuse, a scientific name called ' string retention '. The principle is that the CLR creates an internal hash table at initialization time, key is a string, and value is a reference to the retained string on the managed heap.
The string type provides two static methods to manipulate the hash table:
String.intern
string.isinterned
For details, see MSDN (Msdn.microsoft.com/zh-cn/library/system.string.isinterned (v=vs.110). aspx)
However, the C # compiler does not turn on string retention by default, because the overall performance of the application may become slower if the program has a large number of strings retained. (Microsoft is also quite tangled, the programmer TMD more tangled)
- If there are many constant strings of identical values in our program, the C # compiler will merge these strings into one and write to the module's metadata during compilation, and then modify any code that references that string. This is also a string reuse technique, the scientific name ' string pool '. What does that mean? This means that all constant strings with the same value are actually references to instances of the same memory address, which can significantly improve performance and save a lot of memory when the same value is very large.
string s1 = "hello 大菜";string s2 = "hello 大菜";unsafe{ fixed (char* p = s1) { Console.WriteLine("字符串地址= 0x{0:x}", (int)p); } fixed (char* p = s2) { Console.WriteLine("字符串地址= 0x{0:x}", (int)p);
Output Result:
字符串地址= 0x80002d84字符串地址= 0x80002d84
The value of the visible instance is assigned only once, but one thing to note is that the string is used only for the compilation period to determine the value of the string, which is the constant string. If my program has been modified to:
args = new string[] { "dfasfdsa"};string s1 = "hello 大菜"+ args[0];string s2 = "hello 大菜"+args[0];unsafe{ fixed (char* p = s1) { Console.WriteLine("字符串地址= 0x{0:x}", (int)p); } fixed (char* p = s2) { Console.WriteLine("字符串地址= 0x{0:x}", (int)p); }}
Operation Result:
字符串地址= 0x2e3c字符串地址= 0x2e7c
- Usually coding avoid string connection, if a frequent splicing string in the scene using ' + ', the overall performance of the program and GC impact is quite large, for this C # introduced the StringBuilder type to optimize the concatenation of strings. StringBuilder is more like a mutable string type than a string-type invariance. its underlying data structure is an array of char . There is also capacity (default is 16), maximum capacity (the default is int.) MaxValue) and other properties. The advantage of StringBuilder is that when the total number of characters does not exceed the ' capacity ', the underlying array is not reassigned, which is the biggest contrast with each redistribution of the string. If the total number of characters exceeds ' capacity ', StringBuilder will automatically multiply the capacity attribute, using a new array to hold the original value, and the original array will be recycled by GC. It can be seen that StringBuilder frequent dynamic scaling can also compromise performance, but the impact may be much smaller than a string. Reasonable setup StringBuilder initial capacity is a great help to the program. The test is as follows:
int count = 100000;Stopwatch sw = new Stopwatch();sw.Start();string s = "";for (int i = 0; i < count; i++) { s += i.ToString(); }sw.Stop();Console.WriteLine(sw.ElapsedMilliseconds);
Operation Result:
14221
To view GC conditions
The GC is performing so frequently. Performance is conceivable. Then look at StringBuilder.
int count = 100000;Stopwatch sw = new Stopwatch();sw.Start(); StringBuilder sb = new StringBuilder();//听说程序员都这样命名StringBuilderfor (int i = 0; i < count; i++) { sb.Append(i.ToString());}sw.Stop();Console.WriteLine(sw.ElapsedMilliseconds);
Operation Result:
12
GC Status:
There are few GC (which may not have reached the critical point of triggering GC), and if I properly initialize the StringBuilder capacity, the result gap will be greater in the production environment. hehe ^ ~ ^
Other about string retention and string pooling
- When an assembly is loaded, the CLR defaults to retaining all literal constant strings that are described in the assembly metadata. This feature can now be disabled because of the performance degradation that may occur due to additional hash table lookups.
- Coding we usually compare two strings for equality, what about the process?
- First, determine whether the number of characters is equal.
- The CLR eventually determines whether the characters are equal by contrast.
This scenario is suitable for string retention. Since it is no longer necessary to go through the above two steps, the direct hash table gets the value can be compared to determine.
About string Stitching performance
Based on all of the above knowledge, is it not StringBuilder stitching string performance is always higher than the symbol ' + '? The answer is in the negative.
static void Main(string[] args) { int count = 10000000; Stopwatch sw = new Stopwatch(); sw.Start(); string str1 = "str1", str2 = "str2", str3 = "str3"; for (int i = 0; i < count; i++) { string s = str1 + str2 + str3; } sw.Stop(); Console.WriteLine($@"+用时: {sw.ElapsedMilliseconds}" ); sw.Reset(); sw.Start(); for (int i = 0; i < count; i++) { StringBuilder sb = new StringBuilder();//听说程序员都这样命名StringBuilder sb.Append(str1).Append(str2).Append(str3); } sw.Stop(); Console.WriteLine($@"StringBuilder.Append 用时: {sw.ElapsedMilliseconds}"); Console.Read(); }
Operation Result:
+用时: 553StringBuilder.Append 用时: 975
The symbol ' + ' will eventually call the String.Concat method, and when connecting several strings at the same time, not every connection is allocated one memory at a time, but a few words nonalphanumeric as the parameters of the String.Concat method, allocating only one memory at a time. So the number of strings in the stitching is less than the scene, String.Concat performance is slightly higher than stringbuilder.append. String. The Format method eventually calls StringBuilder, which is not discussed here, please refer to other documents yourself.
so everything is not absolute!! Every thing has its own scene, and we all need to explore it ourselves . ( the programmer is too tired )
Above are non-production environment test results, if wrong, please correct me
Please respect the hard work of an ape, please indicate the source ^ ~ ^. Some pictures from the network, if there is infringement please contact. Let's make progress together.
A public number that goes beyond the content of the IT community, welcome attention, and exchange more it knowledge. There will be surprises when you are uncertain.