1. Flume: Log collection software, the important concept inside is the agent, including the source, channel, Slink,sourc,slink can be hdfs,jdbc ... A simple scenario is to use Flume to monitor a folder Fdir data changes, when Fdir is the source, and the change is transmitted to Hdfs_path, Hdfs_path is slink.
2. Sqoop: Used primarily for data export/import between Hadoop data (hdfs/hive/hbase) and structured database (relational database), such as hive-> Mysql,mysql->hbase.
3. ZooKeeper: Most of the data stores now, the server is in the form of clusters. Zookeeper is the coordination of cluster consistency. I haven't read it yet.
4. Hive: is a data warehouse that is suitable for a number of operations for full table queries. Hive itself does not store data, which itself relies on HDFs and mapreduce, which maps a structured file on HDFs to a logical data table.
5. HBase: is a database that can be indexed.
6. Pig: A data Flow programming language that provides richer API operations than MapReduce, such as join.
Big Data Eco-open source tools