How to save storage space by saving a hexadecimal string as a binary array

Source: Internet
Author: User
Tags assert comparison md5

If there is such a scene, the MD5 value of all the files on the client machine needs to be placed on the server side of the database, the server will periodically check the client's files to see if there are illegal files (note: Here with MD5 to do illegal file check, not to say that each file MD5 is unique, Please check this blog: Different files can also have the same MD5 checksum value. But in order to increase the speed of the inspection, so ready to the server database all the MD5 loaded into the cache, but here is a problem, the server cache machine memory is not large enough to be able to completely store all the MD5 string, about 75% of the data can be stored, and limited to other reasons is not possible to increase the cache memory, and can not increase the machine, only such a machine.

In view of this situation, the usual way is to load some of the data into the cache, and then sent to the client does not exist in the cache MD5 string again by looking at the data comparison, this way is also a solution, at least 75% of the data can be obtained through the cache, the other 25% again from the database query. But there is a problem here is the server to save a lot of MD5 files, such as tens of millions of, even if only 25% of the data to the database query, but each time from tens of millions of data to query A, although there is an index, but it will cost a lot of time. This time, may be some children's shoes will think of a separate table, more cache machine what, but the front has said only such a cache machine, it can not solve this problem?

The solution is certainly there, first we need to understand the composition of the MD5 value, that is MD5 result is 0-f these characters {' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ',   ' 7 ', ' 8 ', ' 9 ', ' A ', ' B ', ' C ', ' d ', ' e ', ' f '}, 0-f is also the constituent element of the 16 system. We know that each character in the 16 system needs to be represented by a 4-bit bits, and that byte in Java uses a binary representation of 8 bits, that is, a byte can hold a two-bit 16-digit number, and a further 32-bit MD5 value can be stored in 16 bytes. This saves half of the storage space (the actual savings of less than half, can not be simple to compare, but it can be so popular understanding, there will be a comparison), because if it is directly to the 32-bit MD5 storage string, is accounted for 32 bytes. The following is a Java code implementation that converts a 16 into a byte array and converts the byte data into a 16-character string (note: The implementation code is excerpted from blog:http://franksinger.iteye.com/blog/614540):

* Convert byte[] to hex string. Here we can convert byte to int and then use integer.tohexstring (int) to convert into a 16-string.    
    * @param src byte[] Data * @return Hex String */public static string bytestohexstring (byte[] src) {    
    StringBuilder StringBuilder = new StringBuilder ("");    
    if (src = null | | | src.length <= 0) {return null;    
        for (int i = 0; i < src.length i++) {int v = src[i] & 0xFF;    
        String HV = integer.tohexstring (v);    
        if (Hv.length () < 2) {stringbuilder.append (0);    
    } stringbuilder.append (HV);    
return stringbuilder.tostring (); /** * Convert hex string to byte[] * @param hexstring the hex string * @return byte[] */public stat  IC byte[] hexstringtobytes (String hexstring) {if (hexstring = = NULL | | hexstring.equals ("")) {return    
    Null    
    } hexstring = Hexstring.touppercase (); int LEngth = Hexstring.length ()/2;    
    char[] Hexchars = Hexstring.tochararray ();    
    Byte[] D = new Byte[length];    
        for (int i = 0; i < length; i++) {int pos = i * 2;    
    D[i] = (byte) (Chartobyte (Hexchars[pos]) << 4 | chartobyte (Hexchars[pos + 1]));    
return D;    
    /** * Convert Char to BYTE * @param c char * @return byte */private byte chartobyte (char c) {    
Return (byte) "0123456789ABCDEF". IndexOf (c); }

OK, now the capacity problem is solved, you can read all the data in the database into memory, applause! To ensure a quick response to customer inquiries, we will place MD5 strings that are read from the database in HashSet or HASHMAP to provide quick queries. But we found that the following code did not achieve the desired effect:

set<byte[]> hasset = new hashset<byte[]> ();    
byte[] K1 = new byte[] {1, 2, 3};    
byte[] K2 = new byte[] {1, 2, 3};    
String val = "value";    
Hasset.add (k1);    
Assert.asserttrue (Hasset.contains (K2));

Back to the column page: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/

When doing asserttrue (), the exception is thrown, which means that there is no K2 such a key in the HashSet, because the array comparison is not compared by the value, but by reference into the comparison, and here K2 and K1 are two different objects, This is the reason that HashSet can't find K2, because it only has K1. Someone in StackOverflow above also mentioned such a problem, interested to see: Http://stackoverflow.com/questions/1058149/using-a-byte-array-as-hashmap-key-java.

We want to use the byte array as key, this time we need to pack it to use, there are two packaging schemes, the first is to use Java to provide us with Java.nio.ByteBuffer This class to wrap, the second solution is to use their own encapsulation class to wrap, the following two implementations are described, there will be two ways to achieve comparison, which is more memory of the province.

1, the use of Java.nio.ByteBuffer this class to packaging

The following is a HashSet implementation class that encapsulates the Add,remove and contains methods:

public class Bytekeyhashset extends hashset<bytebuffer> {    
    private static final long Serialversionuid =-2702041 216392736060L;    
    Public boolean Add (byte[] key) {return    
        Super.add (Bytebuffer.wrap (key));    
    }    
    Public boolean Add (String key) {return    
        Super.add (Bytebuffer.wrap (Key.getbytes ()));    
    public boolean remove (byte[] key) {return    
        Super.remove (Bytebuffer.wrap (key));    
    }    
    Public boolean-Remove (String key) {return    
        super.remove (Bytebuffer.wrap (Key.getbytes ()))    
    ;    
    Public Boolean contains (byte[] key) {return    
        super.contains (Bytebuffer.wrap (key));    
    }    
    Public Boolean contains (String key) {return    
        super.contains (Bytebuffer.wrap (Key.getbytes ()));    
    }    

Let's run the following unit test:

public class Bytekeyhashmaptest extends testcase{    
    @Test public
    void Test () {    
        Bytekeyhashset Bytekeyhashset = new Bytekeyhashset ();    
        byte[] K1 = new byte[] {1, 2, 3};    
        byte[] K2 = new byte[] {1, 2, 3};    
        Bytekeyhashset.add (k1);    
        Assert.asserttrue (Bytekeyhashset.contains (K2));    
        Bytekeyhashset.remove (k1);    
        Assert.assertfalse (Bytekeyhashset.contains (K2));    
    }    

OK, no problem, you can use byte array to do the hash key stored.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.