How do I correctly manipulate strings in C #?

Source: Internet
Author: User
Tags mscorlib

The string should be one of the most frequently used underlying data types in all programming languages. If used carelessly, we will pay a price for the extra performance overhead of a string operation. This article proposes a two-part approach to how to circumvent this type of performance overhead:
1. Ensure that the packing is as small as possible
2. Avoid allocating additional memory space.

The first aspect: make sure that the packing is as small as possible

For unpacking, we should not be unfamiliar, the value type is converted to a reference type is boxed, the reference type conversion into a value type is unboxing. In your own code, you should avoid writing unnecessary boxing code as much as possible. Boxing can cause performance loss because it needs to complete the following three steps:
• First, memory is allocated to the value type in the managed heap. In addition to the memory allocated by the value type itself, the total amount of memory is added to the memory occupied by the type object pointer and the synchronization block index.
• Then, copy the value of the value type into the newly allocated heap memory.
• Finally, return the address of the object that has become a reference type.

Here is a single line of the simplest boxed code

1 Object 1;

This line of statements assigns the integer constant 1 to the variable obj of type object, which is well known as a value type, where the value type is to be placed on the stack, and object is a reference type, which needs to be placed on the heap, and a boxing operation is required to place the value type on the heap.

The IL code for this line of statements is as follows, note that the comment section explains:

. Locals init ([0object//9//  Indicates that the integer number 9 is placed on top of the stack // execute IL box instruction, and the heap space required for System.Int32 type Il_0008:stloc is requested in the memory heap . 0 // pops a variable on the stack and stores it in a local variable indexed to 0

This is what the boxing is going to do, and it is unavoidable to request a memory space on the heap and to copy the value type data on the stack to the requested heap memory space, which is sure to consume memory and CPU resources. Let's see what's going on with the unpacking operation:

Take a look at the following C # code:

Object 4 ; int value = (int) ObjValue;

The above two lines of code perform a boxing operation to boxing the numeric constant 4 into a reference type, the object variable ObjValue, and then a unboxing operation that stores the reference variable ObjValue stored on the heap to the local value type variable value.

Again we need to look at the IL code:

. Locals init ([0]Objectobjvalue,[1] Int32'value') //above IL declares two local variables the value variable of type ObjValue and int32 types of type ObjectIL_0000:nopIL_0001:ldc.i4.4 //pressing the integer number 4 into the stackIl_0002:box [Mscorlib]system.int32//execute the Il box instruction to request the heap space required by the System.Int32 type in the memory heapIl_0007:stloc.0 //pops a variable on the stack and stores it in a local variable indexed to 0Il_0008:ldloc.0//Press the local variable indexed to 0 (that is, the ObjValue variable) into the stackIL_0009:unbox.any [Mscorlib]system.int32//execute IL unboxing Directive unbox.any convert reference type object to System.Int32 typeIl_000e:stloc.1 //store data on the stack to a local variable indexed to 1 that is value

The execution and boxing operations of a unboxing operation are reversed by converting the value of the reference type stored on the heap to a value type and to a value type variable.

Boxing and unpacking operations are additional CPU and memory resources. So how to avoid packing and unpacking operations? The following methods are available:
1. Replace ArrayList with a generic collection.
2. Convert a value type to a reference type using the conversion method that comes with C #.

Let's look at the use of generics and the case of throwing a boxing unboxing without using generics.
1. Boxing and unboxing actions that are raised when using a non-generic collection

Look at the following section of code:

var New ArrayList (); array. ADD (1); array. ADD (2); foreach (int in is {0} ", value);}

The code declares a ArrayList object, adds two digits 1, 2 to ArrayList, and then uses foreach to print the elements in the ArrayList to the console.

In this process, two boxing operations and two unboxing operations occur, boxing occurs when an int type element is added to ArrayList, a unboxing operation occurs when using the type int element in the Foreach enumeration ArrayList, and the object type is converted to the int type. A two-time boxing operation is performed when the Console.WriteLine is executed, and the code performs 6 boxing and unboxing operations, and if the ArrayList has a large number of elements, there are more operations to perform a boxing unboxing.

You can view the process of boxing and unpacking by using tools such as Ilspy to view the Box,unbox instructions for the IL Code

2. Use of generic collections

Take a look at the following code:

1 varList =Newlist<int>();2List. ADD (1);3List. ADD (2);4 5 foreach(intValueinchlist)6 {7Console.WriteLine ("value is {0}", value);8}

The difference between the code and the code in 1 is that the type of the collection uses the generic list, not the ArrayList; we can also view the IL code to see the case of the boxing unboxing, the above code only executes 2 boxing operations on the Console.WriteLine () method, and does not require a unpacking operation.

It can be seen that generics can avoid unnecessary performance costs of packing and unpacking, and of course generics benefit more than that, generics can also increase program readability, make programs easier to reuse, and more.

However, we note that when using a generic collection, the Console.WriteLine () method still performs 2 boxing operations. Can the two boxing operations also be optimized out? This uses the second method, which converts the value type to a reference type, using the C # own conversion method. As follows:

var New list<int>(); list. ADD (1); list. ADD (2); foreach (int in list) {Console.WriteLine (string. Format ("value is {0}", value. ToString ()));}

When you look at the IL code, you can see that the boxing operation has been completely eliminated. It actually invokes the ToString method of shaping. The prototype of the ToString method is:

 Public Override string ToString () {returnnull, numberformatinfo.currentinfo);}

It is done by directly manipulating the memory to complete the conversion from int to string, which is much more efficient than boxing. Therefore, when you use other value types to convert to strings and complete stitching, you should avoid using the operator "+" to complete, instead use the ToString method provided by the value type.

The second aspect: avoid allocating additional memory space.
For the CLR, a string object is a very special object that cannot be changed once it has been assigned a value. Calling any method in the System.String class at run time or doing any operation (such as "=" assignment, "+" stitching, etc.) creates a new string object in memory, which also means allocating a new memory space to the new object. Code like the following will bring additional overhead to the runtime.

Private Static voidTest6 () {stringS1 ="ABC"; S1="123"+ S1 +"456";//The above two lines of code create 3 string objects and execute a String.contact methodstringS2 =9+"456";//The code occurs one time and the String.concact method is called}Private Static voidTest7 () {stringS1 ="123"+"ABC"+"456";//the code is equivalent to string s1 = "123abc456"}

Since the use of the string class can lead to significant performance losses in some situations, Microsoft also provides a type StringBuilder to compensate for the lack of a string.

StringBuilder does not recreate a string object, which is efficient because it allocates memory in a pre-unmanaged manner. If StringBuilder does not define a length first, the default allocation length is 16, and StringBuilder does not reallocate memory when the StringBuilder character length is less than or equal to 16 o'clock. When the StringBuilder character is longer than 16 o'clock and is less than 32 o'clock, StringBuilder re-allocates memory, making it a multiple of 16. In the above code, if you prejudge that the length of the string will be greater than 16, you can set a more appropriate length for it.
Microsoft also provides another way to simplify this operation, which is to use string. The Format method. String. The Format method internally uses StringBuilder for string formatting.

Private Static voidTest9 () {stringA ="T";stringb ="e";stringc ="s";stringD ="T"; StringBuilder SB=NewStringBuilder (); sb. Append (a); sb. Append (b); sb. Append (c); sb. Append (d); Console.WriteLine (sb.) ToString ());}Private Static voidTest10 () {stringA ="T";stringb ="e";stringc ="s";stringD ="T"; Console.WriteLine (string. Format ("{0}{1}{2}{3}", A, B, C, D));}

The final summary: how to correctly manipulate strings:

1. Ensure minimal unboxing operations: Using generics, using ToString () to convert value types to reference types
2. Avoid allocating additional memory space: do not use + =, + operator, StringBuilder, String.Format () Link Multiple strings

Reference list:?

Http://www.cnblogs.com/yukaizhao/archive/2011/10/18/csharp_box_unbox_1.html
Http://www.cnblogs.com/yukaizhao/archive/2011/10/19/csharp_box_unbox_2.html
? "Writing High-quality code: 157 Recommendations for improving C # programs"

How do I correctly manipulate strings in C #?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.