Encoding. getbytecount memory problems that are easily overlooked in C #

Source: Internet
Author: User

If you want to judge whether the character is full or half-width in C #, the general method is to use encoding. default. the getbytecount method causes memory garbage collection when calling the getbytecount method for multiple times (tens of thousands of times, which may vary with computers, this means that a large number of temporary objects are generated in this process.

The following testCodeIt calculates the number of bytes for the char array with a total length of 60 thousand and loops 10 times. Test 1: Get 1 character at a time, call getbytecount 60000 times at a time, Test 2: Get 2 Characters at a time, call 30000 times at a time; Test 3: take 5 Characters at a time and call 12000 times in a loop at a time until Test 6: Take 60000 characters at a time and call once at a time. The codetimer class is a performance counter from Lao Zhao.

Char [] chararr = new char [60000]; for (INT I = 0; I <60000; I ++) {chararr [I] = (char) randomext. next (char. maxvalue);} GC. collect (); codetimer. time ("testgetbytecount 1", 10, () =>{ for (INT I = 0; I <60000; I ++) {encoding. default. getbytecount (chararr, I, 1) ;}}); codetimer. time ("testgetbytecount 2", 10, () =>{ for (INT I = 0; I <60000/2; I ++) encoding. default. getbytecount (chararr, I * 2, 2) ;}); codetimer. time ("testgetbytecount 5", 10, () =>{ for (INT I = 0; I <60000/5; I ++) encoding. default. getbytecount (chararr, I * 5, 5) ;}); codetimer. time ("testgetbytecount 10", 10, () =>{ for (INT I = 0; I <60000/10; I ++) encoding. default. getbytecount (chararr, I * 10, 10) ;}); codetimer. time ("testgetbytecount 100", 10, () =>{ for (INT I = 0; I <60000/100; I ++) encoding. default. getbytecount (chararr, I * 100,100) ;}); codetimer. time ("testgetbytecount 65536", 10, () => {encoding. default. getbytecount (chararr, 0, 60000 );});

you do not need to check the test results. The efficiency must be lower than the previous one. This is not the focus. The following is the test result. Check whether Gen 0 indicates the number of garbage collection times of the 0 generation ).

Testgetbytecount 1 time elapsed: 52 Ms CPU cycles: 113,265,292 Gen 0: 8 Gen 1: 0 Gen 2: 0 testgetbytecount 2 time elapsed: 41 Ms CPU cycles: 90,435,216 Gen 0: 5 Gen 1: 0 Gen 2: 0 testgetbytecount 5 time elapsed: 35 milliseconds CPU cycles: 77,586,978 Gen 0: 2 Gen 1: 0 Gen 2: 0 testgetbytecount 10 time elapsed: 32 Ms CPU cycles: 71,327,412 Gen 0: 1 Gen 1: 0 Gen 2: 0 testgetbytecount 100 time elapsed: 32 Ms CPU cycles: 65,847,702 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 65536 time elapsed: 34 Ms CPU cycles: 72,340,460 Gen 0: 0 Gen 1: 0 Gen 2: 0

List the garbage collection times separately, which are 8, 5, 2, 1, 0, 0. Is it amazing? It is clear that no temporary object is created, but the memory is recycled several times. Use the Performance Analyzer provided by Vs to analyze and obtain the following figure:

Figure 1 function with the most memory allocated

Okay, now we know that all of them are system. Text. encodingnls. getbytecount (char [], int32, int32) errors ...... However, this is a function that comes with the system. You should first try to find a problem from yourself and then look at the allocation View:

Figure 2 Allocation View

Let's see the first item with far-ahead allocation: system. Text. internalencoderbestfitfallbackbuffer. Well, it turns out to be an encoderfallbackbuffer problem. It provides a rollback request.ProgramReturns the buffer from the alternate string to the encoder if the character cannot be encoded. When you call encoding. getbytecount, a rollback may occur. Therefore, the encoding creates a buffer internally to handle the rollback problem. In addition, a new buffer zone will be created during each call. When used up, a large number of temporary buffers will be created and recycled, resulting in increased memory pressure.

This problem is not obvious and requires more than 60 thousands or 70 thousands calls (on my computer). However, if there is a problem, we need to solve it.

Here I provide a simple method, that is, to call encoding. Default. getencoder () to obtain the default encoding encoder, and then call the getbytecount method of this encoding to solve the problem perfectly. Note that the getbytecount method of encoder has a flush parameter more than the encoding method, indicating that the internal state clearing process of the encoding must be simulated after calculation.

The changed code is as follows:

Char [] chararr = new char [60000]; for (INT I = 0; I <60000; I ++) {chararr [I] = (char) randomext. next (char. maxvalue);} encoder = encoding. default. getencoder (); codetimer. time ("testgetbytecount 1", 10, () =>{ for (INT I = 0; I <60000; I ++) {encoder. getbytecount (chararr, I, 1, true) ;}}); codetimer. time ("testgetbytecount 2", 10, () =>{ for (INT I = 0; I <60000/2; I ++) encoder. getbytecount (chararr, I * 2, 2, true) ;}); codetimer. time ("testgetbytecount 5", 10, () =>{ for (INT I = 0; I <60000/5; I ++) encoder. getbytecount (chararr, I * 5, 5, true) ;}); codetimer. time ("testgetbytecount 10", 10, () =>{ for (INT I = 0; I <60000/10; I ++) encoder. getbytecount (chararr, I * 10, 10, true) ;}); codetimer. time ("testgetbytecount 100", 10, () =>{ for (INT I = 0; I <60000/100; I ++) encoder. getbytecount (chararr, I * 100,100, true) ;}); codetimer. time ("testgetbytecount 65536", 10, () => {encoder. getbytecount (chararr, 0, 60000, true );});

Test results:

Testgetbytecount 1 time elapsed: 45 Ms CPU cycles: 98,742,656 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 2 time elapsed: 38 Ms CPU cycles: 83,395,672 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 5 time elapsed: 34 Ms CPU cycles: 74,867,809 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 10 time elapsed: 31 Ms CPU cycles: 70,190,804 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 100 time elapsed: 31 Ms CPU cycles: 68,862,872 Gen 0: 0 Gen 1: 0 Gen 2: 0 testgetbytecount 65536 time elapsed: 30 ms CPU cycles: 65,830,539 Gen 0: 0 Gen 1: 0 Gen 2: 0

obviously, the memory problem is completely solved, and the speed is slightly improved. If you need to call getbytecount multiple times, it is better to call the encoder method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.