Hadoop provides a stable shared storage and analysis system, which is implemented by HDFs and analyzed by MapReduce
MapReduce is a distributed data processing mode and execution environment
HDFs is a distributed file system
Why use MapReduce instead of databases + more disks?
1, disk drive trends: The increase in addressing time is much slower than the speed at which the transfer rate is increased, addressing is limited by the delay of the disk operation, and the transfer rate corresponds to the bandwidth of the disk, if the access mode of the data is limited to addressing, it can lead to spend a lot of time to read and write data
2, in the update of a small number of database records, the traditional B-tree through the sorting/merging to rebuild the database is very good, but the most of the database data update, the efficiency is no mapreduce high
3. MapReduce is suitable for dealing with problems that need to analyze the entire data set, and is analyzed in batch mode for applications that are written and read multiple times.
"Ha"