Class hashmap, but the memory is reduced to 1/6 of the original implementation

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Class hashmap, but the memory is reduced to 1/6 of the original implementation Time:11:08:33 Source:Network Author:Unknown Click:The 429 Java map is convenient and practical, and there is a huge memory waste. When the number of entries in the map reaches 10 million or more, it requires several GB of memory space. the map format mentioned here is hashmap & lt; string, byte & gt;. Each key is about 20 characters on average and cannot exceed 200 characters.

While providing convenience and practicality, the map in Java also has a huge memory waste. When the number of entries in the map reaches 10 million or more, it requires several GB of memory space. the map format mentioned here is hashmap <string, byte>. On average, each key is about 20 characters and cannot exceed 200 characters.

In actual situations, about 5/6 of memory is wasted in areas unrelated to actual data storage. there is no need to waste so many resources when writing data multiple times at a time. The following is a simple implementation description.

Algorithm Description:

Put all the entries into different queues according to the length of the key. To facilitate the sorting of the entry queues, you can also add the entries to the memory before sorting.

There are a lot of queues now. During the query, select a queue based on the length of the query term, and search in the queue by two methods.

Algorithm applicability:

It is applicable to scenarios where key values are concentrated in a certain range and values are of simple type (such as byte, Int, long, float, and double), with a data volume of more than one million records and insufficient memory.

View plaincopy to clipboardprint?
··· · 50 ······· · 90 ····· · 140 · 150
Import java. Io. bufferedinputstream;
Import java. Io. bufferedreader;
Import java. Io. datainputstream;
Import java. Io. file;
Import java. Io. fileinputstream;
Import java. Io. filereader;
Import java. Io. ioexception;
Import java. util. hashmap;
Import java. util. Map;
Public class bsortmap {

Private Static Map <integer, bsortmap> bulk = new hashmap <integer, bsortmap> ();

Final private byte [] entrys; // entry Array
Private int COUNT = 0;
Final private int keylength; // the size of the key in bytes
Final private int entrylength; // number of bytes for each entry

Public bsortmap (INT capacity, int keylen ){
This. keylength = keylen;
Entrylength = keylen + 1;
Entrys = new byte [capacity * entrylength];
}

Public int size (){
Return count;
}

/**
* Add record entries. The entries are sorted.
* The SRC format is <key, value>
* @ Param SRC
*/
Final public void add (byte SRC []) {
System. arraycopy (SRC, 0, entrys, count * entrylength, entrylength );
Count ++;
}

Final private int compare (final int begin, final byte [] B ){
Int I = 0;
For (; entrys [begin + I] = B [I] & I <B. Length-1; I ++)
;
Return entrys [begin + I]-B [I];
}

/**
* Obtain the value associated with the key
* @ Param key
* @ Return if the value associated with the key does not exist,-1 is returned.
*/
Final public byte get (final byte [] Key ){
Int I = 0;
Int J = count-1;
Int mid;
While (I <= J ){
Mid = (I + J)> 1;
Final int ret = compare (mid * entrylength, key );
If (ret = 0 ){
Return entrys [Mid * entrylength + keylength]; // return the result
} Else if (Ret <0 ){
I = Mid + 1;
} Else {
J = mid-1;
}

}
Return-1;

}

Public static void main (string ARGs []) throws ioexception {
File dir = new file ("D:/workspace/partion_keyword/sort ");
File [] files = dir. listfiles ();

For (file F: Files ){

Datainputstream in = new datainputstream (New bufferedinputstream (New fileinputstream (F )));
/*
* F indicates the objects sorted by key value.
* The file format is:
* Entryscount: an integer with a value of the total record entries.
* Keylength: an integer with a value of the byte length of the keyword.
* <Key, value> List
*/
Final int entryscount = in. readint ();
Final int keylength = in. readint ();
Byte [] buffer = new byte [keylength + 1];
Bsortmap BST = new bsortmap (entryscount, keylength );
Int I = 0;
While (in. Available ()> 0 ){
Int L = in. Read (buffer );
While (L! = Keylength + 1 ){
System. Err. println ("not equal." + l );
}
BST. Add (buffer );
I ++;
}
Bulk. Put (keylength, BST );
If (entryscount! = I)
System. Err. println (F. getname () + ":" + entryscount + "," + I );
In. Close ();
}

Bufferedreader READ = new bufferedreader (New filereader ("D:/Eclipse/workspace/CONF/wiki_kws.data "));
String line;
Int COUNT = 0;
Long start = system. currenttimemillis ();
While (line = read. Readline ())! = NULL ){
Byte key [] = line. Trim (). getbytes ();
Bsortmap bt = Bulk. Get (key. Length );
If (BT! = NULL & BT. Get (key )! =-1 ){
Count ++;
}
}
Long end = system. currenttimemillis ();
System. Out. println (end-Start );
System. Out. println ("count:" + count );

Int totalcount = 0;
For (bsortmap S: bulk. Values ()){
Totalcount + = S. Size ();
}
System. Out. println ("Total count:" + totalcount );
Read. Close ();

/* While (true ){
Running in = New Processing (system. In );
String key = in. Next (). Trim ();
If (key. inclusignorecase ("exit "))
Break;
Int Len = key. Trim (). getbytes (). length;
Bsort bt = Bulk. Get (LEN );
Byte v = Bt. Get (key. getbytes ());
System. Err. println (Key + "=" + V );
}*/

}
}
Import java. Io. bufferedinputstream;
Import java. Io. bufferedreader;
Import java. Io. datainputstream;
Import java. Io. file;
Import java. Io. fileinputstream;
Import java. Io. filereader;
Import java. Io. ioexception;
Import java. util. hashmap;
Import java. util. Map;
Public class bsortmap {

Private Static Map <integer, bsortmap> bulk = new hashmap <integer, bsortmap> ();

Final private byte [] entrys; // entry Array
Private int COUNT = 0;
Final private int keylength; // the size of the key in bytes
Final private int entrylength; // number of bytes for each entry

Public bsortmap (INT capacity, int keylen ){
This. keylength = keylen;
Entrylength = keylen + 1;
Entrys = new byte [capacity * entrylength];
}

Public int size (){
Return count;
}

/**
* Add record entries. The entries are sorted.
* The SRC format is <key, value>
* @ Param SRC
*/
Final public void add (byte SRC []) {
System. arraycopy (SRC, 0, entrys, count * entrylength, entrylength );
Count ++;
}

Final private int compare (final int begin, final byte [] B ){
Int I = 0;
For (; entrys [begin + I] = B [I] & I <B. Length-1; I ++)
;
Return entrys [begin + I]-B [I];
}

/**
* Obtain the value associated with the key
* @ Param key
* @ Return if the value associated with the key does not exist,-1 is returned.
*/
Final public byte get (final byte [] Key ){
Int I = 0;
Int J = count-1;
Int mid;
While (I <= J ){
Mid = (I + J)> 1;
Final int ret = compare (mid * entrylength, key );
If (ret = 0 ){
Return entrys [Mid * entrylength + keylength]; // return the result
} Else if (Ret <0 ){
I = Mid + 1;
} Else {
J = mid-1;
}

}
Return-1;

}

Public static void main (string ARGs []) throws ioexception {
File dir = new file ("D:/workspace/partion_keyword/sort ");
File [] files = dir. listfiles ();

For (file F: Files ){

Datainputstream in = new datainputstream (New bufferedinputstream (New fileinputstream (F )));
/*
* F indicates the objects sorted by key value.
* The file format is:
* Entryscount: an integer with a value of the total record entries.
* Keylength: an integer with a value of the byte length of the keyword.
* <Key, value> List
*/
Final int entryscount = in. readint ();
Final int keylength = in. readint ();
Byte [] buffer = new byte [keylength + 1];
Bsortmap BST = new bsortmap (entryscount, keylength );
Int I = 0;
While (in. Available ()> 0 ){
Int L = in. Read (buffer );
While (L! = Keylength + 1 ){
System. Err. println ("not equal." + l );
}
BST. Add (buffer );
I ++;
}
Bulk. Put (keylength, BST );
If (entryscount! = I)
System. Err. println (F. getname () + ":" + entryscount + "," + I );
In. Close ();
}

Bufferedreader READ = new bufferedreader (New filereader ("D:/Eclipse/workspace/CONF/wiki_kws.data "));
String line;
Int COUNT = 0;
Long start = system. currenttimemillis ();
While (line = read. Readline ())! = NULL ){
Byte key [] = line. Trim (). getbytes ();
Bsortmap bt = Bulk. Get (key. Length );
If (BT! = NULL & BT. Get (key )! =-1 ){
Count ++;
}
}
Long end = system. currenttimemillis ();
System. Out. println (end-Start );
System. Out. println ("count:" + count );

Int totalcount = 0;
For (bsortmap S: bulk. Values ()){
Totalcount + = S. Size ();
}
System. Out. println ("Total count:" + totalcount );
Read. Close ();

/* While (true ){
Running in = New Processing (system. In );
String key = in. Next (). Trim ();
If (key. inclusignorecase ("exit "))
Break;
Int Len = key. Trim (). getbytes (). length;
Bsort bt = Bulk. Get (LEN );
Byte v = Bt. Get (key. getbytes ());
System. Err. println (Key + "=" + V );
}*/

}
}
Test results:

Class hashmap, but the implementation time of memory reduced to the original 1/6: 11:08:33 Source: Network Author: Unknown CLICK: 430 Java map while providing convenient and practical, there is also a huge memory waste problem. When the number of entries in the map reaches 10 million or more, it requires several GB of memory space. the map format mentioned here is hashmap & lt; string, byte & gt;. Each key is about 20 characters on average and cannot exceed 200 characters.
The test data is 7138595 entries distributed in 250 queues.

Result of hashmap <string, byte>:

Time: 3780, 3814

Memory: 892 MB

Result of this method:
Time: 5792, 5758

Memory: 158 MB

This article from: Development Institute http://edu.codepub.com Source: http://edu.codepub.com/2009/1028/16973_2.php

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Class hashmap, but the memory is reduced to 1/6 of the original implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Class hashmap, but the memory is reduced to 1/6 of the original implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support