To explain the difference between character streams and byte streams in Java _java

Source: Internet
Author: User
Tags abstract

This article for everyone to analyze the Java character stream and the difference between byte streams, for everyone to refer to, the specific content as follows

1. What is flow

The stream in Java is an abstraction of a sequence of bytes, and we can imagine a pipe, but now it is not water but a sequence of bytes that flows in the water pipe. As with the current, the stream in Java has a "flowing direction", where an object that can normally read a sequence of bytes is called an input stream; An object that can write a byte sequence to it is called an output stream.

2. Word throttling

The most basic unit of Word throttling processing in Java is a single byte, which is typically used to process binary data. The most basic two-byte throttle class in Java is InputStream and OutputStream, which represent the group's basic input byte stream and output byte stream respectively. Both the InputStream class and the OutputStream class are abstract classes, and we usually use their series of subclasses in the Java class Library in actual use. Let's take the InputStream class as an example to introduce the byte throttling in Java.

The InputStream class defines a basic method for reading bytes from a byte stream read, the method is defined as follows:

public abstract int Read () throws IOException;
This is an abstract method, that is to say, any input stream class that derives from InputStream needs to implement this method, which functions by reading a byte from a stream of bytes, or by returning 1 at the end, otherwise the read bytes are returned. What we need to be aware of about this method is that it blocks until you return a read byte or-1. In addition, the byte stream does not support caching by default, which means that every time the Read method is invoked, the operating system will be requested to read the bytes, which is often accompanied by a disk IO and therefore less efficient. Some small partners may think that the overloaded method read in the InputStream class as a parameter of a byte array can read multiple bytes at a time without frequent disk IO. So is this really the case? Let's take a look at the source code for this method:

public int read (byte b[]) throws IOException {return
  read (b, 0, b.length);
}

It calls another version of the read overload method, and then we go down:

  public int read (byte b[], int off, int len) throws IOException {
    if (b = = null) {
      throw new NullPointerException () ;
    } else if (Off < 0 | | Len < 0 | | len > B.length-off) {
      throw new Indexoutofboundsexception ()
    } else if ( Len = = 0) {return
      0;
    }

    int c = Read ();
    if (c = = 1) {
      return-1
    }
    B[off] = (byte) c;

    int i = 1;
    try {for
      (; i < Len; i++) {
        c = read ();
        if (c = = 1) {break
          ;
        }
        B[off + i] = (byte) c;
      }
    } catch (IOException ee) {
    } return
    i;
  }

As we can see from the above code, actually the read (byte[) method internally also implements "once" to read a byte array by looping through the Read () method, so essentially this method also does not use a memory buffer. To use memory buffers to improve the efficiency of reading, we should use Bufferedinputstream.

3. Character Streams

The most basic unit of character stream processing in Java is the Unicode code element (size 2 bytes), which is typically used to process text data. The so-called Unicode code element, which is a Unicode unit, has a range of 0x0000~0xffff. Each number in the above range corresponds to a character story, and the string type in Java defaults to encoding characters in Unicode rules and then storing them in memory. However, unlike storage in memory, data stored on disk is often encoded in a variety of ways. Using different encoding methods, the same characters have different binary representations. In fact, a character stream works like this:

output character stream: The sequence of characters to be written to a file (actually a sequence of Unicode code elements) into a sequence of bytes in the specified encoding and then written to a file;
input stream: decodes the sequence of bytes to be read into the corresponding sequence of characters (actually a Unicode sequence from) so that it can exist in memory.
We have a demo to deepen our understanding of this process, the sample code is as follows:

Import Java.io.FileWriter;
Import java.io.IOException;


public class Filewriterdemo {public
  static void Main (string[] args) {
    FileWriter FileWriter = null;
    try {
      try {
        fileWriter = new FileWriter ("Demo.txt");
        Filewriter.write ("demo");
      } finally {
        filewriter.close ()}
      }
    catch (IOException e) {
      e.printstacktrace ();
    }
}}

In the code above, we use FileWriter to write "demo" Four characters in Demo.txt, we use the hexadecimal editor winhex to view the contents of the Demo.txt:

As you can see from the diagram above, we have written "demo" coded to "6D 6F", but we do not explicitly specify the encoding in the above code, in fact, we are not specified in the operating system to use the default character encoding method to encode the characters we want to write.

Because the character stream is actually going to complete the conversion of the Unicode sequence to the byte sequence of the encoding, it uses a memory buffer to hold the converted byte sequence, and the wait is converted and written to the disk file.

4. The difference between character streams and byte stream

Through the above description, we can know that the main difference between the byte stream and character streams is embodied in the following aspects:

The basic unit of the byte stream operation is bytes; The base unit of the character streams operation is the Unicode code element.
The byte stream does not use buffers by default, and character streams use buffers.
A byte stream is typically used to process binary data, but it can actually handle any type of data, but it does not support direct write or read Unicode code; Character flow often processes text data, which supports writing and reading Unicode code elements.

Above is my Java in the character stream and the word throttling some of the understanding, if the narrative is not clear or inaccurate place hope everyone can correct, thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.