Sqoop |
Flume |
Hdfs |
Sqoop is used to import data from a structured data source, such as an RDBMS |
Flume for moving bulk stream data to HDFs |
HDFs Distributed File system for storing data using the Hadoop ecosystem |
The Sqoop has a connector architecture. The connector knows how to connect to the appropriate data source and get the data |
Flume has an agent-based architecture. Here the code is written (this is called "proxy"), which needs to be processed to fetch the data |
HDFs has a distributed architecture in which data is distributed across multiple data nodes |
HDFS uses Sqoop to export data to a destination |
Stream data to HDFs via 0 or more channels |
HDFs is used to store data to the final destination |
Sqoop data Load not event driven |
Flume data load can be driven by event |
HDFs stores data provided to it in any way |
In order to import data from a structured data source, one must only use Sqoop, because its connectors know how to interact with and get data from a structured data source |
To load streaming data, such as tweets generated by tweets. or log in to the Web server file, Flume should be available. The Flume agent is created specifically to obtain streaming data. |
HDFs has its own built-in shell command to store data. HDFs cannot be used to import structured or streaming data |
Sqoopflume, Flume, HDFs comparison