The most important reason to choose Hadoop is that three points: 1, can solve the problem, 2, low cost, 3, mature ecological circle.
One, Hadoop helped us solve the problem
Large companies, both domestic and foreign, have an insatiable appetite for data, and will do everything they can to collect data,
Because the asymmetry of information can be constantly realizable, and a large number of information can be obtained through data analysis.
The source of the data is very much, the format of the data is more and more complex, and the amount of data over time is getting bigger and larger.
As a result, traditional databases are quickly becoming bottlenecks in data storage and computing based on data.
And Hadoop was created to solve such problems. The bottom of the Distributed file system has a high scalability, through data redundancy to ensure that the data is not lost and submitted computational efficiency, while the data can be stored in various formats.
At the same time, it also supports a variety of computational frameworks, which can be computed off-line or on-line.
Second, why the cost can be controlled low
When you are sure that you can solve the problem that we have, we must consider the cost problem.
1, hardware cost
Hadoop is architected on inexpensive hardware servers and does not require very expensive hardware to support
2, Software cost
Open source products, free, based on open source agreement, free to modify, more controllable
3, development costs
Because of the two-time development, and because of the very active community discussions, the ability of developers to demand relatively low, the cost of learning engineers is not high
4, maintenance costs
When the cluster is very large, the cost of development and maintenance will be highlighted. But it's a lot cheaper compared to the self-research system.
A division of the same system since the hundreds of engineers nearly 4 years of investment, burning billions of dollars, have not yet replaced Hadoop.
5, other costs
If the security of the system, the community version of the frequent upgrade and reality is not synchronized to upgrade the other hidden costs introduced.
Third, what are the benefits of a mature biosphere?
The mature ecosystem represents the future direction of development, represents a good market prospects, represents a more Qiantu job (OK, "three representatives").
Look at the picture (quote: Hadoop ecosystem map? mynosql)
Partial system classification:
deployment, configuration, and monitoring Ambari,whirr
Monitoring management Tools Hue, Karmasphere, Eclipse plugin, cacti, ganglia
Data serialization processing and task scheduling Avro, zookeeper
Data collection Fuse,webdav, Chukwa, Flume, Scribe, Nutch
Data storage HDFS
Class SQL query Data Warehouse Hive
Streaming data processing Pig
Parallel computing Framework MapReduce, Tez
Data mining and machine learning Mahout
Column Storage online database HBase
Meta Data Center Hcatalog (can be used in conjunction with pig,hive, MapReduce, etc.)
Workflow Control Oozie,cascading
Data import Export to relational database Sqoop,flume, Hiho
Data visualization Drilldown,intellicus
There's a lot of companies to use.
(citation: A New Version of the Hadoop ecosystem Map)