When processing a large amount of data, Xiao Dingdong encountered a memory leak.

Source: Internet
Author: User
Tags high cpu usage
When processing a large amount of data, Xiao Dingdong encountered a memory leak.


Recently, we have been testing the effect of applying the word segmentation to the weblucene search engine.

We use an XML file of about 1.2 GB for the source data.

The index files after the creation are compared as follows:

Source File: 1.2 GB
Index file generated by the word segmentation: 2217 MB

Index file generated by binary word segmentation: 2618 MB difference: 401 MB

For more detailed comparison, see the comprehensive comparison between Chinese Word Segmentation and binary word segmentation.
The following is a comparison of index files. We can see that the main difference lies in the difference in term information.

Index file list (121 m) Word Segmentation (146 m) binary word segmentation
 
Deletable 4 4  
_ Fg4.f10 19 k 19 k  
_ Fg4.f11 19 k 19 k  
_ Fg4.f12 19 k 19 k  
_ Fg4.f13 19 k 19 k  
_ Fg4.f19 19 k 19 k
_ Fg4.fdt 80 m 80 m Domain value
_ Fg4.fdx 156 K 156 K Domain Index
_ Fg4.fnm 135 135 Standardization factor
_ Fg4.frq 12 m 23 m Item Frequency
_ Fg4.prx 26 m 36 m Item location
_ Fg4.tii 15 K 74 K Item index
_ Fg4.tis 1.1 m 5.8 m Item Information
Segments 17 17  


Two problems were encountered during the test,
1. Memory leakage
There are two scenarios for Memory leakage:
1. memory usage increases over time (Memory leakage ?), You can see through the top command in Linux;
2. When the program was running for half an hour, the memory usage suddenly increased, and the CPU usage also increased.
2. High CPU usage
The CPU usage is proportional to the memory usage, that is, when the memory increases to around 99.9% MB, the CPU usage jumps.

Therefore, we need to solve the problem of increasing memory usage.

Lhelper also recommends many tools:

Http://www.samspublishing.com/articles/article.asp? P = 23618 & seqnum = 7 & RL = 1



Http://tech.ccidnet.com/pub/article/c1112_a265199_p1.html
 
Check Java Memory Leak
Tips

Memory leaks, be gone

I do not know whether you have shared any experience in this regard.

Related connections:

[Sandbox] Two Test Modules for Lucene Chinese Word Segmentation
» Grassland development Diary

Bea releases jrock IT 5.0 Update and Memory Leak Detector Tool


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.