The biggest difficulty in large-scale data processing is:Full memory computing failure
Because the processed data is large, the disk must be processed, but the disk computing is very inefficient. Therefore, you must carefully consider the processing.Algorithm
Addressing
The memory works electronically. Therefore, the search speed is irrelevant to the physical structure. For addressing, only the microsecond level is required.
The disk requires 1 for addressing and 2 for moving the head. Since the disk rotation speed is limited, addressing consumes milliseconds.
* The operating system will store a continuous data together (win is usually 4 kb), so that the disk will rotate for a week to read more data, thus improving the efficiency
Transmission speed
Both memory and hard disk data are read into the CPU cache, but the transfer speed from memory to cache is very different from that from hard disk to cache.
The speed from memory to cache is about 7-8 GB/second, while the speed from disk to cache is about 60 MB/second.
Therefore, the speed difference between memory computing and disk computing can be more than 1 million times.