When CSV is opened in Excel by default, garbled Chinese characters are displayed/BOM (bype order mark)

Source: Internet
Author: User
Tags ultraedit zen cart

Save extract data from the database as a CSV text file, encoding = 'utf8'

 

Chinese characters are garbled by default when Excel is used.

 

1. Select the appropriate Excel version and the Chinese version of Excel 2003 (this has not been tried)

2. Use editplus to open it and save it as Unicode. Open it in Excel. Then, you can see Chinese characters, but all of them are in one column. You can use data --> text to column to separate them.

3. When writing a CSV file, you can write the start of the CSV file in 'utf8' format. I have not tried this, but it should be reasonable.

4. Reference:

Http://www.java2000.net/viewthread.jsp? Tid = 5380

Http://www.java2000.net/viewthread.jsp? Tid = 7378

 

 

================================== The above two references are as follows: =====


<! --
/* Font Definitions */
@ Font-face
{Font-family: simsun;
Panose-1: 2 1 6 0 3 1 1 1 1 1;
MSO-font-alt:;
MSO-font-charset: 134;
MSO-generic-font-family: auto;
MSO-font-pitch: variable;
MSO-font-Signature: 3 135135232 16 0 262145 0 ;}
@ Font-face
{Font-family: "/@ simsun ";
Panose-1: 2 1 6 0 3 1 1 1 1 1;
MSO-font-charset: 134;
MSO-generic-font-family: auto;
MSO-font-pitch: variable;
MSO-font-Signature: 3 135135232 16 0 262145 0 ;}
/* Style definitions */
P. msonormal, Li. msonormal, Div. msonormal
{MSO-style-parent :"";
Margin: 0in;
Margin-bottom:. 0001pt;
MSO-pagination: widow-orphan;
Font-size: 12.0pt;
Font-family: "Times New Roman ";
MSO-Fareast-font-family: simsun ;}
@ Page Section1
{Size: 8.5in 11.0in;
Margin: 1.0in 1.25in 1.0in 1.25in;
MSO-header-margin:. 5in;
MSO-footer-margin:. 5in;
MSO-paper-Source: 0 ;}
Div. Section1
{Page: Section1 ;}
-->

Utf8 File Identifier

 

EF BB BF

Unicode Signature
BOM (byte
Order mark)

 

Reference:

Http://blog.csdn.net/thimin/archive/2007/08/03/1724393.aspx

 

I have recently tested
Utf8 Encoded chinese
The Zen cart website encountered a strange problem. The text displayed on the webpage is normal.
IE's view of the source file (opened in Notepad) but found garbled characters,
Firefox does not have this problem. This problem has been solved through multiple online verification and multiple tests.
UTF-8 File
Unicode Signature
BOM (byte order mark) problem.

 

BOM (byte order mark), yes
In the UTF Encoding scheme
In the UTF-16
FF Fe,
UTF-8 becomes
EF
Bb BF. This tag is optional because
Utf8 bytes have no order, so it can be used to detect whether a byte stream is
UTF-8-encoded. Microsoft does this kind of detection, but some software does not do this kind of detection, and treats it as a normal character.

 

Microsoft
The text file in UTF-8 format was added before
Ef bb bf three bytes
, Windows
Notepad and other programs are based on these three bytes to determine whether a text file is
ASCII or
Of UTF-8
However, this is only a mark by Microsoft.
,
On other platforms
The UTF-8 text file makes such a flag.

 

That is to say,
UTF-8 files may have
Bom, or
How can we differentiate Bom? Three methods.
1. Use
UltraEdit-32 open the file, switch to hexadecimal editing mode, check whether there is a File Header
Ef bb bf.
2. Use
Go to Dreamweaver and view the page properties.
Unicode Signature
Whether there is a check mark before Bom.
3. Use
Open notepad in windows and select

"Save as". The default encoding of the file is
UTF-8 or
ANSI.
ANSI
Bom.

 

I found
In the template file of Zen cart
Html_header.php, found that the file does not contain
Bom, used
The UltraEdit-32 is saved
Upload the BOM
Html_header.php, everything is normal.

 

Note
Convertz
Convert the gb2312 file
When the UTF-8 file is, the default setting is not
Bom. Without
The BOM may contain the above garbled characters,
Bom,
PHP
Be careful when you include the file.
PHP byte streams
EF
Bb BF, which may cause program errors when output to the monitor in advance. A solution is
All include files are saved
ANSI, the main file can be
UTF-8. To remove an object
Bom, use
Enable ulteredit
Switch to the hexadecimal editing mode and put the first three bytes
(That's the damn thing.
Replace ef bb bf
20. Save (note that the automatic backup function is disabled during storage), switch to the default editing mode, and remove the first three spaces.

 

I also learned some small coding knowledge: the so-called
Unicode files are actually
UTF-16, just
Unicode codes are the same.
But in terms of concept
Unicode and
UTF is two different things,
Unicode is the memory encoding representation scheme, while
How UTF saves and transfers
Unicode solution.
UTF-16 is still at the top
(LE) and high are behind
(Be. Official
UTF Encoding and
Utf-32, also points
Le and
Be. Non
Unicode official
UTF Encoding and
Utf-7, mainly used for mail Transmission.
The single-byte section of UTF-8 is
Iso-8859-1 compatible, which is mainly because some old systems and library functions cannot be correctly processed
UTF-16 and forced out, and for English characters, also save storage space (at the cost of non-English characters waste space ). In
Iso-8859-1,
Utf8 and
Iso-8859-1 is represented in one byte, when it represents other characters,
UTF-8 uses two or three bytes.

 

Bytes ----------------------------------------------------------------------------------------------------------------

 

Work must be created
UTF-16 format files
In the past few days, find
, Does not seem to have such a topic
, Whether abroad or in China
.

Code
:

Outputstreamwriter Fos = new outputstreamwriter (New
Fileoutputstream (new file ("C: // 2.csv")," UTF-16 ");

FOS. Write ("Hello
");

FOS. Flush ();

FOS. Close ();

 

After the file is generated
In
Open windows
. Use
Editeplus is garbled
. Use NotePad to open it without garbled characters
When saving the file, check that the file encoding is true.
Unicode
Big endian. garbled characters when opened with a WordPad
. Use
The Excel file is still garbled
.

 

I cannot
, Use the following code
To create
UTF-8 files
.

Outputstreamwriter Fos = new outputstreamwriter (New
Fileoutputstream (new file ("C: // 2.csv")," UTF-8 ");

FOS. Write ("Hello
");

FOS. Flush ();

FOS. Close ();

 

Who has encountered this problem?
. Please check it out.

 

 

A: A simple question: how can the landlord make it so complicated? The author's problem is: Use
UTF-16 format to save Chinese characters, requirements in
Windows platform: Wordpad,
Excel,
The word can be opened normally without garbled characters (especially
Excel can be opened for Import
. CSV file ). It's easy. Use my code to ensure you
100% successful.

View copies to clipboard Printing

 

1.
Outputstreamwriter Fos = new outputstreamwriter (

2. New fileoutputstream (New
File ("C: // 2.csv")," UTF-16LE ");

3.
FOS. Write (0 xfeff );

4. Fos. Write ("Hello
");

5. Fos. Flush ();

6.
FOS. Close ();

 

Outputstreamwriter Fos = new outputstreamwriter (

New
Fileoutputstream (new file ("C: // 2.csv")," UTF-16LE ");

FOS. Write (0 xfeff );

FOS. Write ("Hello
");

FOS. Flush ();

FOS. Close ();

 

A:

The poster is prompted to write several bytes of the encoding type into the file.
Some programs under windows can be seen, but you must pay attention to this problem when your program processes this program next time.

The first few bytes have special significance.

 

Below are some
Bom Introduction

 

In
One of the NFS codes is called
"Zero Width
The character of No-break space. Its Encoding is
Feff. While
Fffe
It is a non-existent character, so it should not appear in actual transmission.
We recommend that you transmit the characters before transmitting the byte stream.
"Zero
Width no-break space ". In this way, if the recipient receives
Feff indicates that the byte stream is
Big-Endian; if you receive
Fffe indicates that the byte stream is
Little-Endian. Therefore
"Zero Width no-break space" is also called
Bom.

UTF-8 not needed
BOM to indicate the byte order, but can be used
BOM to indicate the encoding method. Character
"Zero Width
No-break space"
UTF-8 code is
Ef bb bf. Therefore, if the recipient receives
The byte stream starting with ef bb bf knows this is
The UTF-8 is encoded.

Windows is used
BOM to mark the encoding method of text files.

 

I personally do not recommend this because there are endless troubles.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.