Java's Big Data bitmap method (no repeating sort, repeating sort, de-duplication, data compression)

Source: Internet
Author: User
Tags bitset comparable repetition

the Java implementation of the Big Data bitmap method (no repetition , repetition, deduplication, data compression)

Introduction to Bitmap method

The basic concept of a bitmap is to use a bit to mark the storage state of a data, which saves a lot of space because it uses bits to hold the data. For example, in Java generally an int number takes up 32 bits, and if you can represent this number with one, you can reduce the amount of storage space. This method is generally referred to as the bitmap method, namely bitmap.

Bitmap method is suitable for judging whether there is such a problem, the state of the element is relatively small, the number of elements is more than the case. So exactly what to do, so, very simple and clear is that, 250 million integers, I maintain a length equal to the maximum integer worth the string, the existence of each integer I am in the position of the integer corresponding to 1, for example, there are {2, 4, 5, 6, 67, 5} So several integers, I maintain a 00 ... 0000 67-bit string. However, if you do not know the maximum value of an integer, you need at least one string of length 2^32, because the maximum value of an integer is 2^32, (int is 4 bytes, so it is 32 bits), then this is at least 512M of memory, the length of the char from the memory will be counted, direct, maximum integer/8*2^20 is the unit of M. So you can understand the bitmap method.

BitSet

Because of the superiority of bitmap operation in space, many languages have direct support for it. As in the C + + STL Library There is a bitset container. In Java, there is also a Bitset class under the Java.util package to implement bitmap operations. This class implements a bit vector that grows on demand. Each bit of the bitset is represented by a Boolean value. The bits of Bitset are indexed with nonnegative integers, and each indexed bit can be tested, set, or cleared. You can use one bitset to modify the contents of another bitset through logical and logical OR logical XOR or manipulation.

It is important to note that the Bitset implementation is a long array to hold the data, which means that the smallest unit of growth is the logical bit of a long that is 64 bits. But if there is no extreme requirement for storage space, and very confident about your basic skills, it is not advisable to implement a class similar to Bitset to implement related functions. Because the classes in the JDK are extremely streamlined and reasonably optimized, the Bitset class is relatively long.

No repeat Order

Java JDK inside the container class sorting algorithm is mainly used to insert sort and merge sort, the implementation of different versions may differ, the key code is as follows:

1 /**2 * Performs a sort on the section of the array between the given indices3 * Using a mergesort with exponential search algorithm (in which the merge4 * is performed by exponential search). N*log (n) performance is guaranteed5 * and in the average case it'll be faster then any mergesort in which the6 * Merge is performed by linear search.7      * 8      * @paramIn -9 * The array for sorting.Ten      * @paramOut - One * The result, sorted array. A      * @paramStart - * The start index -      * @paramEnd the * The end index + 1 -      */ -@SuppressWarnings ("Unchecked") -     Private Static voidMergeSort (object[] in, object[] out,intStart, +             intend) { -         intLen = end-start; +         //Use insertion sort for small arrays A         if(Len <=simple_length) { at              for(inti = start + 1; I < end; i++) { -Comparable<object> current = (comparable<object>) out[i]; -Object prev = out[i-1]; -                 if(Current.compareto (prev) < 0) { -                     intj =i; -                      Do { inout[j--] =prev; -} while(J >Start to&& Current.compareto (prev = out[j-1]) < 0); +OUT[J] =Current ; -                 } the             } *             return; $         }Panax Notoginseng         intMed = (end + start) >>> 1; - MergeSort (out, in, start, med); the MergeSort (out, in, med, end); +  A         //Merging the  +         //if arrays is already sorted-no merge -         if(((comparable<object>) in[med-1]). CompareTo (in[med]) <= 0) { $ system.arraycopy (in, start, out, start, Len); $             return; -         } -         intR = Med, i =start; the  -         //Use merging with exponential searchWuyi          Do { theComparable<object> Fromval = (comparable<object>) In[start]; -Comparable<object> Rval = (comparable<object>) In[r]; Wu             if(Fromval.compareto (Rval) <= 0) { -                 intL_1 = Find (In, Rval,-1, start + 1, med-1); About                 intTocopy = L_1-start + 1; $ system.arraycopy (in, start, out, I, tocopy); -i + =tocopy; -out[i++] =Rval; -r++; AStart = L_1 + 1; +}Else { the                 intr_1 = Find (in, Fromval, 0, R + 1, end-1); -                 intTocopy = r_1-r + 1; $ system.arraycopy (in, R, out, I, tocopy); thei + =tocopy; theout[i++] =Fromval; thestart++; ther = r_1 + 1; -             } in} while((End-r) > 0 && (med-start) > 0); the  the         //copy rest of array About         if((end-r) <= 0) { theSystem.arraycopy (in, start, out, I, MED-start); the}Else { theSystem.arraycopy (in, R, out, I, end-R); +         } -}

Below we say the bitmap method of the order of ideas: In fact, the beginning of the idea has been explained, in order to make it easier for everyone to understand, I will illustrate by way of example, suppose we have a non-repeating integer sequence

Java's Big Data bitmap method (no repeating sort, repeating sort, de-duplication, data compression)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.