The Hadoop family Project Atlas
650) this.width=650; "Src=" http://www.sxt.cn/editor/attached/image/20150504/269663359/ 5aa1da2a-36b6-44c3-9478-dcde7265f636.jpg "width=" 532 "height=" "border=" 0 "hspace=" 0 "vspace=" 0 "title=" "style=" width:532px;height:400px; "alt=" 5aa1da2a-36b6-44c3-9478-dcde7265f636.jpg "/>
Introduction to each sub-project
(1) Pig
Hadoop client, addressing non-Java programmers using Hadoop challenges
Use SQL-like data flow-oriented language pig Latin
Pig Latin can complete sorting, filtering, summing, clustering, correlation and other operations, can support the custom function
Pig automatically maps pig Latin to map-reduce jobs to run to the cluster, reducing the frustration of users writing Java programs
Three modes of operation: Grunt Shell, script mode, embedded
(2) Hbase
Open source implementation of Google BigTable
In-line database for improved response speed and IO volume
can be clustered
You can access it in a variety of ways, such as Shell, Web, API
Suitable for high read and write (insert) scenarios
HQL Query Language
The typical representative of NoSQL products
(3) Hive
Data warehousing tools. Can transform the original structured data under Hadoop into a table in hive
Supports a language hiveql that is almost identical to SQL. In addition to not supporting updates, indexes, and transactions, almost all other features of SQL support
Can be seen as a mapper from SQL to Map-reduce
Provides interfaces such as Shell, Jdbc/odbc, Thrift, web, etc.
(4) Zookeeper
Open source implementation of Google chubby
Used to coordinate various services on distributed systems. For example, verify that messages arrive accurately, prevent single-point failures, handle load balancing, etc.
Application scenario: Hbase, realizing Namenode automatic switching
How it works: Leaders, followers and the electoral process
(5) Sqoop
For exchanging data between Hadoop and relational databases
Connecting to a relational database via the JDBC interface
(6) Avro
Data serialization tool developed by Doug Cutting, founder of Hadoop
For applications that support large-scale data exchange. Supports binary serialization to quickly and easily process large amounts of data
Dynamic language-friendly, Avro provides mechanisms that enable dynamic languages to easily process Avro data
Thrift interface
(7) Chukwa
Framework for data acquisition and analysis on top of Hadoop
Primarily log collection and analysis
Capture the most original log data by "Agent" installed on the collection node
Agent sends data to collector
Collector periodically writes data to the Hadoop cluster
Specify scheduled start-up of the Map-reduce job team data for processing and analysis
Hadoop Foundation Management Center (HICC) final presentation of data
(8) Cassandra
NoSQL, distributed Key-value-based database, contributed by Facebook
Similar to HBase, it is also the ideology of Google BigTable
Only sequential write, no random write design, meet the performance requirements of high load situation
Introduction to Hadoop sub-projects