Editor's note: this is an early technical article, but it is better than the classic one. It should be of great reference value to many friends who are learning or are engaged in Java development, at the same time, we should thank the original author for sharing their ideas.
The development of CERT has come to an end for the time being. It takes less than one month for a single person. There are 16902 lines of underlying Java code and 27685 lines of JSP code, totaling 44587 lines. Many problems have been encountered throughout the development process, but they are finally solved. The following lists all the problems and solutions I encountered during development for your reference.
System Architecture: RedHat as4/apache2.0.59/resin2.1.17/jdk6.0 U2/hibernate3.0/javase2.2/urlrewrite3.0.4. The database uses mysql4.1.15, and the database cache is structured on hibernate, is a Java class with only 794 rows, but this Java class is used for database object caching, list caching, update caching, and automatic delete list caching, it also provides all database query, update, and insert operations, which saves more than half of my development time. It takes less than 10 lines of code to obtain a list containing five query conditions. [Add: Now I have changed this database operation tool to distributed. You can add Java application servers to achieve load balancing and cache synchronization between servers. In this way, even if the number of users reaches one million per day, my architecture is also supported. I will describe my architecture in detail below for your reference only. Many people have asked me to open source. I think the system has not been verified by a large number of users. It is not time for me to think about open source. My Code does not have many advanced algorithms and design patterns, the template mode is mainly used, but it is all written with heart.
Problem 1: problems encountered during database caching.
Hashmap reports concurrentmodificationexception during the concurrency. Even if you use collections. synchronizedlist to package the map, this exception is very simple and the solution is simple. The first solution is to use set (map. keyset method) toarray method to traverse, this method will consume a certain amount of performance and memory, but it is much better than adding synchronized before the method; the second solution is to use concurrenthashmap after jdk5.0.
[Correction: after testing and verification, the first method won't work, that is, concurrent map operations and only concruuenthashmap can be used when traversing is required. I would like to thank the experts who wrote concurrenthashmap .]
[Added: there are many ways to cache the database. It is easier to cache a single object and can be directly used with hashmap, but the list cache is complicated, the memcachedb + memcache_engine or berkeleydb mentioned on the Internet cannot automatically cache the list. Because adding a record will affect the sorting of many lists, it is a headache to delete the list cache, my solution is that the cache key of the list contains the query condition information. For example, a table t has fields a, B, c, corresponding to T. java has domains a, B, c, the key used to query the list of conditions for a combination of a = 1 and B = 2 and C> 0 is a = 1 # B = 2 # C> 0. In this way, if an object T is added, T. a = 1, T. B = 3, t.c = 0. Obviously, the query condition in the list above contains Condition B = 2, and the added object B = 3, in any case, the newly added object t cannot be in this list. That is to say, you do not need to delete this list. Only the added object t meets T. a = 1, T. B = 2. When t.c> 0, the list needs to be retrieved from the database again, so that the list can be updated and deleted when a t object needs to be deleted.]
[Added: for distributed systems, I used memcached for remote caching and UDP packets to synchronize the cache between servers. For example, get an object: first obtain it in the local cache, if it is not obtained in the remote memcached server, if it is not obtained from the database and put into the memcached server and the local cache at the same time, this way, no matter how many servers are performing load balancing, for an object T, you only need to read it once in the database. All other servers can share the load, and there is almost no pressure on database queries, of course, if the number of inserts and modifications reaches several thousand times per second, I may use MySQL-proxy. However, currently, hundreds of thousands of users/day are not needed, use a good database server]
[To be humble, my distributed cache solution is powerful and is carefully written, although there are only 2874 lines of code. Instance: to obtain a user with ID 6789, use user u = usermanager. getinstance (). getbyid ("6789"); in one sentence, all the cache logic is included.]
Problem 2: jfreecharts cannot display Chinese characters on Linux
It takes no longer to solve this problem. You can search for the problem as soon as possible. The solution is as follows:
Download a TTF font in Linux from the Internet. In this example, zysong. TTF is used.
1. Confirm that zysong. TTF exists in the % javahome %/JRE/lib/fonts directory.
2. Run the "ttmkfdir-O fonts. dir" command in the % javahome %/JRE/lib/fonts directory to regenerate the fonts. dir file.
3. confirm that the/usr/share/fonts/zh_cn/TrueType directory exists. If not, mkdir is created.
4. Confirm that zysong. TTF exists in the/usr/share/fonts/zh_cn/TrueType directory.
5. In the % javahome %/JRE/lib directory, run CP fontconfig. redhat.3.properties. SRC fontconfig. Properties
6. Restart resin, OK.
Problem 3: The too program open files in Linux is incorrect.
This problem is serious. The default number of files opened in as4 is 1024. If this number is exceeded, resin will automatically go down, which is disgusting. The solution is as follows:
Echo 65536>/proc/sys/fs/file-max
Edit the/etc/sysctl. conf file and edit the row fs. File-max = 65536.
Edit the/etc/security/limits. conf file and add the line *-nofile 65536.
Use ulimit-a to view the files. If you see the open files (-N) 65536 line, it is correct.
Problem 4: Memory leakage
The characteristic is that the CPU accounts for 99.9%, and the memory increases from 10% to 50% in a few hours. Java developers know that this problem is very painful and there is no good solution, because it is hard to see the Code directly. I always thought that the problem would occur on the cache, but think about Apache's lrumap to avoid Memory leakage. In particular, I set the maximum lrumap length to bytes, later I found it was a problem with smartupload. I changed it to Apache's fileupload sub-project. In addition, I added-xmx2048m-xms2048m-xmn768m-xss512k-XX: + useparallelgc-XX: parallelgcthreads = 4-XX: + useparalleloldgc-XX: + useadaptivesizepolicy: the memory in the old zone can be recycled. Currently, the memory usage is relatively stable. Generally, the memory usage will not increase after about 27%. These parameters are not optimal and need to be explored. In addition, I also played the Jprobe software used to find out the memory leakage. It is true that other code has no obvious Memory leakage. [Correction: no dog P is used in the above parameter configuration, and the memory is always rising. I also tried to use other methods to reclaim the memory, and the results were not very good. It is best to directly use the default JVM reclaim method, that is to say, the best effect is to configure only two parameters-xmx768m-xms768m, so that Java can really recycle several hundred megabytes of memory at a time.]
Question 5: search word segmentation.
A user reported that "beer" and "Moutai liquor" could not be searched for by "liquor", for a very simple reason, "beer" and "Moutai liquor" are separate words. When Lucene is written, it is not separated. Therefore, you must search for "beer" or "Moutai liquor" to find it, this is technically reasonable, but users think it is unreasonable. So I improved the search algorithm and added more than 30 thousand Chinese characters to the dictionary. In addition, I used different word segmentation algorithms for writing and searching, for example, "I like beer" is classified into "I + like + Drink + beer + hi + Huan + beer + wine ", when searching, this sentence will be segmented into "I + like + Drink + beer", so that users can search for "beer" and "Wine, and corresponding to another sentence "this person is not good beer, total drinking" Search "beer" and "Wine" can be found, but search "beer" is not found, it seems a little interesting. However, such word splitting is also a small problem, that is, the search results are not user-friendly. (I have a total of more than 0.5 million Chinese characters in the Chinese Dictionary plus idioms, which is much richer than the 10-0.2 million Chinese characters on the Internet, but it is useless)