Java read, write file--solve garbled problem

Source: Internet
Author: User
Tags 0xc0 getmessage

Read the file stream, often encounter garbled phenomenon, resulting in garbled reason is certainly not a, here is mainly introduced because of the file encoding format caused by garbled problems. First, be clear about the concepts and differences between text files and binary files.

Text files are character-encoded files, common encodings are ASCII-encoded, Unicode-encoded, ANSI-encoded, and so on. Binary files are value-coded files, and you can specify what a value is (a process that can be considered a custom encoding, depending on your application). )

Therefore, it can be seen that the text file is basically fixed length encoding (also has a non-fixed length encoding such as UTF-8). Binary files can be thought of as variable-length encodings, because it is the value code, and how many bits represent a value that you decide entirely.

For binary files, it is not possible to use a string, because the string default initialization will use the system default encoding, however, the binary file because the custom encoding naturally and fixed format encoding will conflict, so for the binary file can only use the byte stream read, operation, write.

For a text file, because the encoding is fixed, so long as the file before reading, the file itself in the encoding format to parse the file, then get the bytes, and then, by specifying the format to initialize the string, the resulting text is not garbled. Although, the binary file can also get its text encoding format, but that is inaccurate, so can not be in the same.

Here's how:

1) Get the format of the text file

Public static string getfileencode (String path)  {string charset = "ASci";         byte[] first3Bytes = new byte[3];         BufferedInputStream bis = null;         try {             boolean checked = false;             bis = new bufferedinputstream (New fileinputstream (path));             bis.mark (0);             int read = bis.read (first3bytes, 0, 3);             if  (read == -1)                  return charset;             if  (first3bytes[0] ==  (byte)  0xff && first3bytes [1] ==  (byte)  0xfe)  {                 charset =  "Unicode";//utf-16le                 checked = true;             } else if  (first3bytes[0] ==  (Byte)  0xFE && first3Bytes[1] ==  (byte)  0xff)  {                 charset =  "Unicode";// utf-16be                 checked = true;            } else if  (first3Bytes[0] ==   (byte)  0xEF && first3Bytes[1] ==  (byte)  0xBB &&  first3bytes[2] ==  (byte)  0xbf)  {                 charset =  "UTF8";                 checked = true;             }             bis.reset ();            if  (! Checked)  {                 int len = 0;                 int loc = 0;                 while  ((Read = bis.read ())  != -1)  {                     loc++;                     if  (read &NBSP;&GT;=&NBSP;0XF0)                          break;                     if  (0x80 <=  READ&NBSP;&AMP;&AMP;&NBSP;READ&NBSP;&LT;=&NBSP;0XBF)  //alone appeared under the Bf, also considered gbk                          break;                    if  ( 0XC0&NBSP;&LT;=&NBSP;READ&NBSP;&AMP;&AMP;&NBSP;READ&NBSP;&LT;=&NBSP;0XDF)  {                          read = bis.read ();                         if  (0x80 <= read &NBSP;&AMP;&AMP;&NBSP;READ&NBSP;&LT;=&NBSP;0XBF)                           //Double byte   (0xC0 &NBSP;-&NBSP;0XDF)   (0X80&NBSP;-&NBSP;0XBF), may also be within GB encoding                               continue;                         else                             break;                     }  else if  (0XE0&NBSP;&LT;=&NBSP;READ&NBSP;&AMP;&AMP;&NBSP;READ&NBSP;&LT;=&NBSP;0XEF)  {  //may also be wrong, but the odds are small                          read = bis.read ();                          if  (0X80&NBSP;&LT;=&NBSP;READ&NBSP;&AMP;&AMP;&NBSP;READ&NBSP;&LT;=&NBSP;0XBF)  {                              read = bis.read ();                              if  (0X80&NBSP;&LT;=&NBSP;READ&NBSP;&AMP;&AMP;&NBSP;READ&NBSP;&LT;=&NBSP;0XBF)  {                                  charset =  "UTF-8";                                  break;                              }  else                                 break;                          } else                             break;                      }                }                 // Textlogger.getlogger (). info (loc +  " "  + integer.tohexstring (read));             }        } catch  (Exception e)   {            e.printstacktrace ();         } finally {             if  (bis != null)  {                 try {                     bis.close ();                 } catch  (Ioexception ex )  {                }             }         }   &Nbsp;    return charset;} Private static string getencode (INT&NBSP;FLAG1,&NBSP;INT&NBSP;FLAG2,&NBSP;INT&NBSP;FLAG3)   {string encode= ""; the// txt file will have a few more bytes at the beginning, namely FF, FE (Unicode),//&NBSP;FE, FF (Unicode big endian), EF, BB, BF (UTF-8) if  (flag1 == 255 && flag2 == 254)  {encode= " Unicode ";} else if  (flag1 == 254 && flag2 == 255)  {encode= "UTF-16";} else if  (flag1 == 239 && flag2 == 187 &&  flag3 == 191)  {encode= "UTF8";} Else {encode= "ASci";// ascii yards}return encode;}

2) Read the file stream through the file's encoded format

/** *  through the path to get the contents of the file, this method because the use of a string as a vector, in order to properly read the file (not garbled), can only read text files, security methods!  */public static string readfile (String path) {string data = null;//   Determine if the file exists file file = new file (path); if (!file.exists ()) {return data;}   Get file Encoding format string code = fileencode.getfileencode (path); inputstreamreader isr =  null;try{//  parse files according to the encoding format if ("ASci". Equals (code)) {//  here GBK encoding is used instead of the environment encoding format because the environment default encoding is not equal to the operating system encoding  //  code = system.getproperty ("file.encoding");code =  "GBK";} Isr = new inputstreamreader (New fileinputstream (file), code);//  reads the contents of the file Int length  = -1 ;char[] buffer = new char[1024]; Stringbuffer sb = new stringbuffer (); while (Length = isr.read (buffer, 0,  1024)    != -1) {sb.append (buffer,0,length);} Data = new string (SB);} catch (exception e) {E.priNtstacktrace (); Log.info ("Getfile io exception:" +e.getmessage ());} Finally{try {if (isr != null) {isr.close ();}}  catch  (ioexception e)  {e.printstacktrace () Log.info ("Getfile io exception:" + E.getmessage ());}} Return data;}

3) write to file in the format specified by the file

/** *  saves the contents of the file according to the specified path and encoding format, because it uses the string as the carrier, in order to write the file correctly (not garbled), only write the text content, security method  *  * @ param data *           The byte data that will be written to the file  * @ param path *           file path, including file name  * @ return boolean  *  returns True; */public static boolean writefile when writing is complete (byte  data[], string path , string code) {Boolean flag = true;o Utputstreamwriter osw = null;try{file file = new file (path); if (!file.exists ()) {File = new file (File.getparent ()), if (!file.exists ()) {file.mkdirs ();}} if ("ASci". Equals (code)) {code =  "GBK";} Osw = new outputstreamwriter (New fileoutputstream (path), code); Osw.write (New String ( Data,code)); Osw.flush ();} catch (exception e) {e.printstacktrace (); Log.info ("Tofile io exception:" +e.GetMessage ()); flag = false;} Finally{try{if (osw != null) {osw.close ();}} catch (ioexception e) {e.printstacktrace (); Log.info ("Tofile io exception:" +e.getmessage ()); flag  = false;}} Return flag;}

4) for binary files and with very little content, such as Word documents, you can read and write to the file using the following method

/** *  reads a file from a specified path into a byte array, this method can be used for some non-text formatted content  * 457364578634785634534 *  @param  path *           file path, including file name  *  @return  byte[] *   file byte array  *              */public static byte[] getfile (String path)  throws IOException  {Fileinputstream stream=new fileinputstream (path); int size=stream.available (); byte data[]= New byte[size];stream.read (data); Stream.Close (); stream=null;return data;} /** *  writes the byte contents to the corresponding file, this method can be used for some non-text files.  *  @param  data *             Byte data to be written to the file  *  @param  path *              file path, including file name  *  @return  boolean isOK  true; *  @throws returned when writing is complete   Exception */public static boolean tofile (Byte data[], string path)   Throws exception {fileoutputstream out=new fileoutputstream (path); out.write (data); O Ut.flush (); Out.close (); out=null;return true;}


Java read, write file--solve garbled problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.