Recently looking at "hadoop:the definitive Guide", streaming data access to its distributed file system HDFs is not understandable. Stream based data read and write, too abstract, what is called based on flow, what is flow? Hadoop is written in the Java language, so to understand the streaming Data Access of Hadoop, you have to start with the Java streaming mechanism. Flow mechanism is also an important mechanism in Java and C + +, through which we can freely manipulate data including files, memory, IO devices, etc.
First of all, what is a stream?
The flow is an abstract concept, is the abstraction of the input, in the Java program, the data input/output operation is in the "Flow" way. Devices can be files, networks, memory, and so on.
Flow has directionality, as for the input stream or output stream is a relative concept, generally with the program as a reference, if the flow of data to the device, we become the output stream, and vice versa we call the input stream.
The flow can be imagined as a "flow pipe", the flow is formed in the pipeline, the concept of the direction of nature appears.
When a program needs to read data from a data source, an input stream is opened and the data source can be a file, memory, or network, and so on. Conversely, when you need to write data to a data source destination, you also open an output stream, which can also be a file, memory, network, and so on.
What are the categories of streams?
You can classify convection from different angles:
1. The processing of different units of data can be divided into: character streams, word throttling
2. Data flow direction is different, can be divided into: input stream, output stream
3. Different functions can be divided into: node flow, processing flow
1. and 2. Are better understood, to classify according to the function, can so understand:
Node Flow: A node stream reads and writes data from a specific data source. That is, a node stream is a stream of direct manipulation of files, networks, etc., such as FileInputStream and FileOutputStream, that they read directly from a file or write a byte stream to a file.
Process flow: A "connection" provides a more powerful read and write capability to a program through the processing of data over existing streams (node flows or processing streams). The filter stream is created using an existing input stream or an output stream connection, which is a series of wrappers over the node stream. Bufferedinputstream and Bufferedoutputstream, for example, use existing node streams to construct, provide buffered read and write, improve read and write efficiency, and DataInputStream and DataOutputStream, Constructs with existing node streams, providing the ability to read and write basic data types in Java. They all belong to the filter stream.
For a simple example:
public static void Main (string] args) throws IOException {
Node stream FileOutputStream directly to A.txt as a data source operation
FileOutputStream FileOutputStream = new FileOutputStream ("A.txt");
Filter flow Bufferedoutputstream Further decorate the node stream, providing buffered write
Bufferedoutputstream Bufferedoutputstream = new Bufferedoutputstream (
FileOutputStream);
Filter flow DataOutputStream Further decorate the filter stream so that it provides basic data type of write
DataOutputStream out = new DataOutputStream (bufferedoutputstream);
Out.writeint (3);
Out.writeboolean (TRUE);
Out.flush ();
Out.close ();
Here the input node stream, the filter stream exactly corresponds to the top output, the reader can analogy
DataInputStream in = new DataInputStream (New Bufferedinputstream (
New FileInputStream ("A.txt"));
System.out.println (In.readint ());
System.out.println (In.readboolean ());
In.close ();
}