Practices for reducing memory usage of. Net Applications

Source: Internet
Author: User

I have been busy for the last week. I am mainly working on something called "keyboard wizard". In short, I put a lot of data into the memory and perform quick search on the data, then, find the 10 most matched records based on the input conditions and display them. The functions are similar to those of the following two stock trading software:

Data is stored in a file in the form of text, and the data volume is large. There are nearly 0.2 million entries. Each record has several fields separated by delimiters. At that time, we used 60 thousand records of test data, and the text file was nearly 10 MB. After this module was loaded into the memory and cached, it would occupy nearly 70-80 mb of memory. After taking over, the main task is to reduce memory consumption and improve matching efficiency.

I, Avoid creating unnecessary objects

After getting the code, the first step is to read the design document, and then the breakpoint step by step to read the code. After understanding the logic, I found that there were some problems with the idea. The previous code processing process is like the following:

  1. Read the file to the memory and instantiate it.
  2. Search for files based on conditions and store them in result set 1.
  3. Calculate the matching degree of the result in result set 1 and store it to result set 2.
  4. Sort by matching degree of result set 2, obtain the 10 records that match the most, and then return

This process is quite satisfactory. However, there are many problems. The biggest problem is that temporary variables store too many intermediate processing results, and these objects are discarded immediately after a query is complete, A large number of temporary objects bring great GC pressure. For example, if the user enters 1 in the input box and uses contains for matching, more than 60 thousand records containing 1 may be found from 40 thousand records, then, you need to store these 40 thousand records in temporary variables for processing, further calculate the matching degree of these 40 thousand records, and store them in a collection similar to keyvaluepair, the key is the matching degree, and the set is sorted by the key, and the first 10 optimal records are obtained. As you can see, a large number of temporary variables are created in the middle, which leads to a sharp increase in memory. After a large number of temporary objects are created, they will be recycled immediately. GC is under great pressure.

In the design document, only 10 records that match the most are required to be returned, which does not seem to be noticed in the previous solution. After taking over the service, the first step is to streamline the process. Simplified as follows:

  1. Read the file to the memory and instantiate it.
  2. Search for Objects Based on conditions. If yes:
    • Calculate the matching degree.
    • With the matching degree as the key, it is stored in sortlist with only 11 capacities.
    • If there are more than 10 records added to the sortlist set, the last element is removed, and the first 10 records with the best matching degree are always kept.
    • After the traversal is completed, the set object is returned.

This modification reduces the memory usage of a large amount of temporary data. Throughout the process, I only used a sortlist structure with a capacity of 11 to store the intermediate process. Each time I inserted an element, sortlist helps us sort the order and then remove the least unmatched one, that is, the last element (sorted from small to large, the more matched, the smaller the value ). The consumption here is mainly about sortlist insertion, internal sorting, and removal records. Speaking of this, I am entangled in selecting sortlist or sortdictionary, so I found some information. sortdictionary uses the red and black trees internally, and sortlist uses ordered arrays, when the internal sorting is O (logn), the time complexity of sortdictionary's O (logn) insertion and deletion of elements is better than that of sortlist, but sortdictionary will occupy more memory than sortlist. Basically, this is a balance between query speed and memory allocation. Because only 11 objects need to be stored here, the difference between the two is not big. In fact, even if there is no such structure, you can implement it by yourself. It is nothing more than a set. Add one at a time, sort the order, and then remove the largest one .. Net is easy to use because there are many powerful built-in data structures.

After this small modification, the memory usage was reduced by 1 time, from 70-80 m to 30-40 m, in fact, this is the most basic principle to reduce memory overhead, that isAvoid creating unnecessary objects.

II, Optimize data types and Algorithms

It is more and more difficult to reduce the memory. After carefully reading the code, besides the above, the Code also has some other problems, such as instantiating a large number of objects into the memory at the beginning, and then saving them all the time. Each record contains a large amount of information, but only the following four fields are useful for searching and matching. However, the overall instantiation will serialize other useless fields. As a result, a lot of memory is occupied by useless fields.

Stock Code stock Chinese name Chinese pinyin market type ......

600000 Pudong Development Bank pfyh Shanghai stock exchange ......

Therefore, the first step is to store only the preceding four key fields to be retrieved in the memory. Each record uses string [] data at the beginning, instead of using classes or other structures to store the data, I also tried to use structure to save it. However, because of the four fields, the data volume is large and must be passed as a parameter in the middle, it is bigger than the class. Here, the array is simply used.

In addition to the above, in order to improve the search efficiency, data is split into blocks and cached starting with 0-9 and A-Z, so that when the user inputs 0, reading data directly from a block with 0 as the key accelerates the speed, but a large amount of cache also increases memory consumption. The cached data is basically as big as the raw data loaded into the memory. In addition, it is also used in the search process.CompleteSearch: for the four fields of 0.17 million data records, each query requires 170000*4 traversal and comparison to find the 10 most matched data records.

ThereforeIncompleteSearch is to sort each type of securities, such as stocks, funds, and bonds, by securities code in advance. When the user sets the Search priority, search for each type in turn. If 10 records meet the conditions are found, return immediately, because the data has been sorted according to the securities type and code in advance, the matching we found later is certainly not as high as we found previously. This improvement directly improves the efficiency of search and query.Searching ordered data is generally more efficient than searching unordered data.. Some of our common search algorithms, such as the binary search method, must be sorted in an orderly manner.

III, Write Data processing logic using unmanaged code or modules

Although the above two operations reduced memory usage by nearly 50-60%, they still did not meet the requirements of the leadership, then I tried and compared the memory usage of loading data into the memory using different data structures, including directly reading files into strings, arrays, structures and classes by type, when the memory usage is minimal, files are directly read as strings. Reading 10 MB of data files into the memory also occupies 20-30 mb of space, we will not talk about the memory usage of some temporary variables generated during processing. After checking with tools such as dottrace and CLR profile, it is found that the memory usage is also the raw data. Then I searched the internet using "How to Reduce the memory usage of. Net Applications" to find some methods to reduce. Net memory usage. I saw this answer on stackoverflow:

The student pointed out that. Net Applications will occupy a large amount of memory compared with other programs written using local code. If you are concerned about memory overhead,. NET may not be the best choice .. . NET application memory is affected by garbage collection to some extent. It is also pointed out that some data structures, such as list, will allocate extra space. You can use the value type instead of the reference type. Do not create large objects to avoid memory fragmentation and other suggestions to reduce memory usage.

After all these considerations, the memory still fails to meet the requirements, so we began to look for methods to call unmanaged code to control the allocation and destruction of memory more flexibly. However, the entire program is used. it is unrealistic to switch all the code written in. Net to C or C ++. Therefore, there are only two solutions. One is to use the Unsafe code, the second is to write the data loading and retrieval modules in C or C ++. net is called using P/invoke technology.

At the beginning, I wanted to use the Unsafe code, and load and retrieve data directly in the Unsafe code. Later, I felt that the code was a bit messy. The Code of different styles was not very good together, and the data loading and retrieval logic was also complicated. Therefore, the second method is directly used to write data loading and retrieval logic using C ++. Then it is called in. net.

Some evaluations were made before the start, such as loading the same 10 MB of data into the memory, which were stored as strings ,. net will occupy 20-30 m of memory, while C ++ only looks like 9-10 m, and the change is very small. This is the expected result.

Due to lack of familiarity with C ++, I took a temporary look at C ++ primier plus's chapters on strings and STL, and asked other development teams for some assistance, defines basic interfaces. For demonstration, I created two projects: one is the c ++ Win32 DLL project named secudata, and the other is the C # winform program named secudatatest of the class library.

I have defined four methods in C ++, one for initializing and loading data, one for setting Search priority, one for finding and matching methods, and one for uninstalling data methods, the specific algorithm cannot be posted due to work reasons. Here is a simple example. The method name and engineering structure are as follows:

Then, use P/invoke technology in. Net to introduce the methods defined in C ++ DLL.

In this way. net, it must be noted that the input value of the method uses the string type here, and the second stringbuilder type parameter is the true return value of the method, the overall int type return value of the method indicates whether the method is successfully executed. When calling the search method, the second stringbuilder parameter must initialize the maximum size of the query result, because the result is written to this object in C ++, if it is not initialized or the initialization is too small, an exception is thrown. Of course, you can also directly return the struct, which requires additional definitions. Here, all strings are returned. After parsing it in. net.

In this way, you can set breakpoints in the C ++ project and start the. NET winform program. When P/invoke triggers a breakpoint, You can gradually debug the C ++ code.

During release, it is best to change the default dynamic library configuration to a static library, so that Vs will package the dependent C ++ library into the generated DLL, no problems will occur when deployed on the customer's machine. The property settings of the secudata class library project are as follows:

After the P/invoke mode is changed, 10 MB of data is loaded into the memory, and the memory usage is only about 10 MB. net's 30-40 m memory is greatly reduced, and the memory fluctuation is relatively small, meeting the requirements for memory usage.

Using this "mix and match" method has some advantages, including fast development of. net, flexible memory allocation and destruction mode of C ++, and code security protection. In many cases, some processing logic that is sensitive to memory usage and large data volume can be stored in C ++ for processing. The flexible manual memory management mode can be used to reduce memory usage; using C ++ to write core data structures and algorithms improves code security and program decompilation difficulty.

IV, Conclusion

. NET applications need to load Clr and some common class libraries, and have a garbage collection mechanism. Compared with other local languages such as C and C ++, it has a large footprint. net to create a simple winform may occupy nearly 10 MB of memory, so as development progresses, the memory usage will be relatively large. Of course, these are often caused by developers' own.. Net underlying mechanism is unfamiliar. For example, in some places, the value type can be used to use the reference type; a large number of temporary objects with short periods are created; too many static variables and members are used to occupy the memory for a long time and cannot be recycled. net internal mechanisms, such as the collection object will be allocated with extra space in advance. Most of the time. net GC mechanism, so that we do not have to pay attention to object destruction and create new objects in a very "generous" manner, to use some heavy built-in objects, resulting in excessive memory usage. To solve these problems, we can actually reduce the unnecessary memory usage of a large part of. NET applications.

In addition to understanding some internal mechanisms of the. NET Framework, good ideas and efficient data structures and algorithms can simplify the problem and reduce memory overhead.

Finally, sensitive to memory requirements, you can use the C/C ++ manual flexible memory management language to write the corresponding module, in. net uses P/invoke technology for calling to reduce some memory.

The above is my practice and summary on reducing the memory usage of. NET applications. I hope it will help you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.