The easiest way to find the median is to sort the sequence first and take the median. However, it takes nearly 2 GB to read all the 0.5 billion numbers into the memory.
One idea is to use the external sorting method to record the number of data in the sorting process and find the median. First, hash () % 100 is used to divide the data into 100 files, then each file is sorted in the memory, and then 100 small files are merged, and finds the median in the merge process. The time complexity is O (nlogn)
Another method is to divide the data into 0-9999999,0000000-999999999 ,...... About 50 parts, each part is saved to a small file, and the number of elements in each small file is counted. Because the files are relatively ordered, it is easy to determine which file the median is located in, the sorting order of median in the small file can be obtained, and the small file is processed in the same way. When the file content is small, the median operation can be directly performed in the memory, the time complexity of finding K small elements for n random numbers is O (n), so the total time complexity is O (n)
0.5 billion no. of elements found
The idea is: divide the 0.5 billion pieces of data into 50 parts by size, 0-9999999,100 00000-99999999... And store them in the file separately. For each file, you only need to find that there are no elements in each file.