Give you 256M of memory, the 10G files to sort (1 numbers per line of files), how to achieve? How do I find a 10G file to implement? Count how many times each keyword appears in a 10G file
Reply content:
Give you 256M of memory, the 10G files to sort (1 numbers per line of files), how to achieve? How do I find a 10G file to implement? Count how many times each keyword appears in a 10G file
Take time to change space.
The specific implementation is to load the file in batches, and then calculate
Java? with NiO and the idea of using MapReduce
Don't understand php
, but look at this topic familiar.
Let's talk about ideas.
1, the implementation of the order
This is a typical problem of the external sorting of a single machine. The specific method is 先分块进行排序
then the 多路归并
output file.
2. Find
If the file cannot be processed, it can only be traversed for lookup.
If the file can be processed, then the above has been sorted out the file, you can proceed 二分查找
.
3. Statistics
If the file can not be processed, there is no good way, can only be traversed once.
If you already have a good order, you can find it directly in two points. Search for the number of occurrences at both ends of the found position.
You can look at the book "Programming Zhu Ji Nanxiong," which seems to have this problem.