Encoding. getbytecount memory problems that are easily overlooked in C #

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If you want to judge whether the character is full or half-width in C #, the general method is to use encoding. default. the getbytecount method causes memory garbage collection when calling the getbytecount method for multiple times (tens of thousands of times, which may vary with computers, this means that a large number of temporary objects are generated in this process.

The following testCodeIt calculates the number of bytes for the char array with a total length of 60 thousand and loops 10 times. Test 1: Get 1 character at a time, call getbytecount 60000 times at a time, Test 2: Get 2 Characters at a time, call 30000 times at a time; Test 3: take 5 Characters at a time and call 12000 times in a loop at a time until Test 6: Take 60000 characters at a time and call once at a time. The codetimer class is a performance counter from Lao Zhao.

Char [] chararr = new char [60000]; for (INT I = 0; I <60000; I ++) {chararr [I] = (char) randomext. next (char. maxvalue);} GC. collect (); codetimer. time ("testgetbytecount 1", 10, () =>{ for (INT I = 0; I <60000; I ++) {encoding. default. getbytecount (chararr, I, 1) ;}}); codetimer. time ("testgetbytecount 2", 10, () =>{ for (INT I = 0; I <60000/2; I ++) encoding. default. getbytecount (chararr, I * 2, 2) ;}); codetimer. time ("testgetbytecount 5", 10, () =>{ for (INT I = 0; I <60000/5; I ++) encoding. default. getbytecount (chararr, I * 5, 5) ;}); codetimer. time ("testgetbytecount 10", 10, () =>{ for (INT I = 0; I <60000/10; I ++) encoding. default. getbytecount (chararr, I * 10, 10) ;}); codetimer. time ("testgetbytecount 100", 10, () =>{ for (INT I = 0; I <60000/100; I ++) encoding. default. getbytecount (chararr, I * 100,100) ;}); codetimer. time ("testgetbytecount 65536", 10, () => {encoding. default. getbytecount (chararr, 0, 60000 );});

you do not need to check the test results. The efficiency must be lower than the previous one. This is not the focus. The following is the test result. Check whether Gen 0 indicates the number of garbage collection times of the 0 generation ).

Testgetbytecount 1 time elapsed: 52 Ms CPU cycles: 113,265,292 Gen 0: 8 Gen 1: 0 Gen 2: 0 testgetbytecount 2 time elapsed: 41 Ms CPU cycles: 90,435,216 Gen 0: 5 Gen 1: 0 Gen 2: 0 testgetbytecount 5 time elapsed: 35 milliseconds CPU cycles: 77,586,978 Gen 0: 2 Gen 1: 0 Gen 2: 0 testgetbytecount 10 time elapsed: 32 Ms CPU cycles: 71,327,412 Gen 0: 1 Gen 1: 0 Gen 2: 0 testgetbytecount 100 time elapsed: 32 Ms CPU cycles: 65,847,702 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 65536 time elapsed: 34 Ms CPU cycles: 72,340,460 Gen 0: 0 Gen 1: 0 Gen 2: 0

List the garbage collection times separately, which are 8, 5, 2, 1, 0, 0. Is it amazing? It is clear that no temporary object is created, but the memory is recycled several times. Use the Performance Analyzer provided by Vs to analyze and obtain the following figure:

Figure 1 function with the most memory allocated

Okay, now we know that all of them are system. Text. encodingnls. getbytecount (char [], int32, int32) errors ...... However, this is a function that comes with the system. You should first try to find a problem from yourself and then look at the allocation View:

Figure 2 Allocation View

Let's see the first item with far-ahead allocation: system. Text. internalencoderbestfitfallbackbuffer. Well, it turns out to be an encoderfallbackbuffer problem. It provides a rollback request.ProgramReturns the buffer from the alternate string to the encoder if the character cannot be encoded. When you call encoding. getbytecount, a rollback may occur. Therefore, the encoding creates a buffer internally to handle the rollback problem. In addition, a new buffer zone will be created during each call. When used up, a large number of temporary buffers will be created and recycled, resulting in increased memory pressure.

This problem is not obvious and requires more than 60 thousands or 70 thousands calls (on my computer). However, if there is a problem, we need to solve it.

Here I provide a simple method, that is, to call encoding. Default. getencoder () to obtain the default encoding encoder, and then call the getbytecount method of this encoding to solve the problem perfectly. Note that the getbytecount method of encoder has a flush parameter more than the encoding method, indicating that the internal state clearing process of the encoding must be simulated after calculation.

The changed code is as follows:

Char [] chararr = new char [60000]; for (INT I = 0; I <60000; I ++) {chararr [I] = (char) randomext. next (char. maxvalue);} encoder = encoding. default. getencoder (); codetimer. time ("testgetbytecount 1", 10, () =>{ for (INT I = 0; I <60000; I ++) {encoder. getbytecount (chararr, I, 1, true) ;}}); codetimer. time ("testgetbytecount 2", 10, () =>{ for (INT I = 0; I <60000/2; I ++) encoder. getbytecount (chararr, I * 2, 2, true) ;}); codetimer. time ("testgetbytecount 5", 10, () =>{ for (INT I = 0; I <60000/5; I ++) encoder. getbytecount (chararr, I * 5, 5, true) ;}); codetimer. time ("testgetbytecount 10", 10, () =>{ for (INT I = 0; I <60000/10; I ++) encoder. getbytecount (chararr, I * 10, 10, true) ;}); codetimer. time ("testgetbytecount 100", 10, () =>{ for (INT I = 0; I <60000/100; I ++) encoder. getbytecount (chararr, I * 100,100, true) ;}); codetimer. time ("testgetbytecount 65536", 10, () => {encoder. getbytecount (chararr, 0, 60000, true );});

Test results:

Testgetbytecount 1 time elapsed: 45 Ms CPU cycles: 98,742,656 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 2 time elapsed: 38 Ms CPU cycles: 83,395,672 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 5 time elapsed: 34 Ms CPU cycles: 74,867,809 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 10 time elapsed: 31 Ms CPU cycles: 70,190,804 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 100 time elapsed: 31 Ms CPU cycles: 68,862,872 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 65536 time elapsed: 30 ms CPU cycles: 65,830,539 Gen 0: 0 Gen 1: 0 Gen 2: 0

obviously, the memory problem is completely solved, and the speed is slightly improved. If you need to call getbytecount multiple times, it is better to call the encoder method.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More