Read and Write Unicode files in the ANSI environment of VC Programming
I did not notice that the differences in file encoding will cause so many problems. I have searched a lot of information before I started, and I have added many of my predecessors to my blog. I would like to pay tribute to them here!
I will not talk about the principle of ANSI and Unicode encoding here. I will mainly talk about how to read and write!
First, determine the encoding environment that your project uses. The default value is ANSI. The differences between reading and writing files of different character sets are also large. I only do this in the ANSI environment, the next step is to explore how to read and write in the Unicode environment! (I did not understand this. I read a lot.CodeI found that my own experiments are incorrect ).
In the ANSI character set, cstring and so on are all single-byte versions, so be sure to pay attention. The Unicode files to be read are dual-byte files, which need to be converted here. Of course, in the ANSI character set, you should open the Unicode file in binary mode and determine whether it is a line break, convert to ANSI encoding. When writing Unicode, convert the character into unicode encoding before writing it, and add the Unicode file ID before writing the file.
The following is a read
cfile mfile (unicodefilepath, cfile: moderead);
byte head [2];
mfile. read (Head, 2);
If (head [0] = 0xff & head [1] = 0xfe) | (head [0] = 0xfe & head [1] = 0xff)
{< br> // afxmessagebox (_ T ("file is Unicode! ");
isunicode = true;
}< br> If (isunicode) mfile. seek (2, cfile: Begin); // 0 xfffe
wchar_t wch;
wchar_t wstr [2, 300];
cstring strvalue;
hile (mfile. read (char *) & wch, 2)> 0)
{< br> If (wch = 0x000d) // by line
{< br> // Chang to ANSI
int nlen = I;
char * Buf = new char [2 * nlen];
widechartomultibyte (cp_acp, 0, wstr, nlen, Buf, 2 * nlen, null, null);
Buf [2 * nLen-1] = 0; // some assertion failed. This is important. If you have any minor problems, please try again.
strvalue = Buf;
mfile. seek (2, cfile: Current); // skip the starting line symbol
I = 0;
}< br> else
{< br> wstr [I ++] = wch;
}< BR >}
// Below is the write
Cstdiofile transfile;
Transfile. Open (strunicodesavepath, cfile: modecreate | cfile: modewrite | cfile: typebinary );
Word wsignature = 0 xfeff;
Transfile. Write (& wsignature, 2); // Unicode file symbol
Char * pszansi = new tchar [strvalue. getlength () + 1];
_ Tcscpy (pszansi, strvalue );
Wchar * szwbuffer = new wchar [strvalue. getlength () + 1];
Multibytetowidechar (cp_acp, 0, pszansi,-1, szwbuffer, strvalue. getlength () + 1 );
// Write to files
Transfile. Write (szwbuffer, lstrlenw (szwbuffer) * sizeof (wchar ));
Of course, you can set your project to a Unicode Character Set. Reading ANSI files in a unicode project is also an annoying thing. When reading files into cstring, each single-byte ANSI is converted to a dual-byte and needs to be processed by myself. I will explore and record it later.
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/Augusdi/archive/2009/10/15/4677528.aspx
========================================================== ==========================================
Extends the cstdiofile class to read Unicode text files.
Recently, cstdiofile is used to read the SQL script file,ProgramFiles cannot be read during debugging.
Later, let's take a look at the text file format, which turns out to be in unicode format. When exporting an SQL script, the default option is unicode format. To support both ANSI and Unicode formats, the cstdiofileex class code is found on the codeproject site. However, during debugging and running, it is found that when the Unicode version of the execution file is generated, there is no running error, but there is an error in the non-Unicode version. In the code, this part does not consider the situation where the file is read at the end. After modification, the cstdiofileex class can be used normally, the ANSI and Unicode formats are automatically recognized when reading text files.
The implementation header file is as follows:
# Define nunicode_bom 0 xfeff // Unicode "byte order mark" which goes at start of File
# Define snewline _ T ("\ r \ n") // New Line Characters
# Define sdefault_unicode_filler_char "#" // filler char used when no conversion from Unicode to local code page is possible
Class cstdiofileex: Public cstdiofile
{
Public:
Cstdiofileex ();
Cstdiofileex (lpctstr lpszfilename, uint nopenflags );
Virtual bool open (maid, uint nopenflags, cfileexception * perror = NULL );
Virtual bool readstring (cstring & rstring );
Virtual void writestring (lpctstr lpsz );
Bool isfileunicodetext () {return m_bisunicodetext ;}
Unsigned long getcharcount ();
// Additional flag to allow Unicode text writing
Static const uint modewriteunicode;
// Static Utility Functions
// Configure //--------------------------------------------------------------------------------------------
//
// Cstdiofileex: getunicodestringfrommultibytestring ()
//
// Configure //--------------------------------------------------------------------------------------------
// Returns: bool
// Parameters: char * szmultibytestring (in) Multi-byte input string
// Wchar_t * szunicodestring (out) Unicode output string
// Short nunicodebuffersize (in) Size of Unicode output buffer
// Uint ncodepage (in) code page used to perform Conversion
// Default =-1 (get local code page ).
//
// Purpose: gets a unicode string from a multibyte string.
// Notes: none.
// Exceptions: none.
//
Static bool getunicodestringfrommultibytestring (char * szmultibytestring, wchar_t * szunicodestring,
Short nunicodebuffersize, uint ncodepage =-1 );
// Configure //--------------------------------------------------------------------------------------------
//
// Cstdiofileex: getmultibytestringfromunicodestring ()
//
// Configure //--------------------------------------------------------------------------------------------
// Returns: bool
// Parameters: wchar_t * szunicodestring (in) Unicode input string
// Char * szmultibytestring (out) multibyte output string
// Short nmultibytebuffersize (in) multibyte buffer size
// Uint ncodepage (in) code page used to perform Conversion
// Default =-1 (get local code page ).
//
// Purpose: gets a multibyte string from a unicode string.
// Notes :.
// Exceptions: none.
//
Static bool getmultibytestringfromunicodestring (wchar_t * szunicodestring, char * szmultibytestring,
Short nmultibytebuffersize, uint ncodepage =-1 );
// Configure //--------------------------------------------------------------------------------------------
//
// Cstdiofileex: isfileunicode ()
//
// Configure //--------------------------------------------------------------------------------------------
// Returns: bool
// Parameters: const cstring & sfilepath
//
// Purpose: determines whether a file is Unicode by reading the first character and detecting
// Whether it's the Unicode byte marker.
// Notes: none.
// Exceptions: none.
//
Static bool isfileunicode (const cstring & sfilepath );
Protected:
Uint processflags (const cstring & sfilepath, uint & nopenflags );
Bool m_bisunicodetext;
Uint m_nflags;
};
The implementation file is as follows:
/* Static */const uint cstdiofileex: modewriteunicode = 0x20000; // Add this flag to write in Unicode
Cstdiofileex: cstdiofileex (): cstdiofile ()
{
M_bisunicodetext = false;
}
Cstdiofileex: cstdiofileex (lpctstr lpszfilename, uint nopenflags)
: Cstdiofile (lpszfilename, processflags (lpszfilename, nopenflags ))
{
}
Bool cstdiofileex: open (lpctstr lpszfilename, uint nopenflags, cfileexception * perror/* = NULL */)
{
// Process any Unicode stuff
Processflags (lpszfilename, nopenflags );
Return cstdiofile: open (lpszfilename, nopenflags, perror );
}
Bool cstdiofileex: readstring (cstring & rstring)
{
Const int nmax_line_chars = 4096;
Bool breaddata;
Lptstr lpsz;
Int nlen = 0; //, nmultibytebufferlength = 0, nchars = 0;
Cstring stemp;
Wchar_t * pszunicodestring = NULL;
Char * pszmultibytestring = NULL;
// If at position 0, discard byte-order mark before reading
If (! M_pstream | (getposition () = 0 & m_bisunicodetext ))
{
Wchar_t cdummy;
// Read (& cdummy, sizeof (_ tchar ));
Read (& cdummy, sizeof (wchar_t ));
}
// If compiled for Unicode
# Ifdef _ Unicode
// Do standard stuff -- both ANSI and Unicode cases seem to work OK
Breaddata = cstdiofile: readstring (rstring );
# Else
If (! M_bisunicodetext)
{
// Do standard stuff -- read ANSI in ANSI
Breaddata = cstdiofile: readstring (rstring );
}
Else
{
Pszunicodestring = new wchar_t [nmax_line_chars];
Pszmultibytestring = new char [nmax_line_chars];
// Read as Unicode, convert to ANSI
If (fgetws (pszunicodestring, nmax_line_chars, m_pstream) = NULL)
{
Breaddata = false;
}
Else
{
Breaddata = true;
If (getmultibytestringfromunicodestring (pszunicodestring, pszmultibytestring, nmax_line_chars ))
{
Rstring = (cstring) pszmultibytestring;
}
If (pszunicodestring)
{
Delete pszunicodestring;
}
If (pszmultibytestring)
{
Delete pszmultibytestring;
}
}
}
# Endif
// Then remove end-of-line character if in Unicode text mode
If (breaddata)
{
// Copied from filetxt. cpp but adapted to Unicode and then adapted for end-of-line being just '\ R '.
Nlen = rstring. getlength ();
If (nlen> 1 & rstring. mid (nLen-2) = snewline)
{
Rstring. getbuffersetlength (nLen-2 );
}
Else
{
Lpsz = rstring. getbuffer (0 );
If (nlen! = 0 & (lpsz [nLen-1] = _ T ('\ R') | lpsz [nLen-1] = _ T (' \ n ')))
{
Rstring. getbuffersetlength (nLen-1 );
}
}
}
Return breaddata;
}
// Configure //--------------------------------------------------------------------------------------------
//
// Cstdiofileex: writestring ()
//
// Configure //--------------------------------------------------------------------------------------------
// Returns: void
// Parameters: lpctstr lpsz
//
// Purpose: writes string to file either in Unicode or multibyte, depending on whether the caller specified
// Cstdiofileex: modewriteunicode flag. Override of base class function.
// Notes: If writing in Unicode we need:
// A) write the byte-order-mark at the beginning of the file
// B) write all strings in byte-Mode
//-If we were compiled in UNICODE, we need to convert Unicode to multibyte if
// We Want to write in multibyte
//-If we were compiled in Multi-byte, we need to convert multibyte to Unicode if
// We Want to write in Unicode.
// Exceptions: none.
//
Void cstdiofileex: writestring (lpctstr lpsz)
{
// If writing Unicode and at the start of the file, need to write byte mark
If (m_nflags & cstdiofileex: modewriteunicode)
{
// If at position 0, write byte-order mark before writing anything else
If (! M_pstream | getposition () = 0)
{
Wchar_t cbom = (wchar_t) nunicode_bom;
Cfile: Write (& cbom, sizeof (wchar_t ));
}
}
// If compiled in Unicode...
# Ifdef _ Unicode
// If writing Unicode, no conversion needed
If (m_nflags & cstdiofileex: modewriteunicode)
{
// Write in byte mode
Cfile: Write (lpsz, lstrlen (lpsz) * sizeof (wchar_t ));
}
// Else if we don't want to write Unicode, need to convert
Else
{
Int nchars = lstrlen (lpsz) + 1; // Why plus 1? Because yes
Int nbuffersize = nchars * sizeof (char );
Wchar_t * pszunicodestring = new wchar_t [nchars];
Char * pszmultibytestring = new char [nchars];
// Copy string to Unicode Buffer
Lstrcpy (pszunicodestring, lpsz );
// Get multibyte string
If (getmultibytestringfromunicodestring (pszunicodestring, pszmultibytestring, nbuffersize, getacp ()))
{
// Do standard write
Cfile: Write (const void *) pszmultibytestring, lstrlen (lpsz ));
}
If (pszunicodestring & pszmultibytestring)
{
Delete [] pszunicodestring;
Delete [] pszmultibytestring;
}
}
// Else if ** not * compiled in Unicode
# Else
// If writing Unicode, need to convert
If (m_nflags & cstdiofileex: modewriteunicode)
{
Int nchars = lstrlen (lpsz) + 1; // Why plus 1? Because yes
Int nbuffersize = nchars * sizeof (wchar_t );
Wchar_t * pszunicodestring = new wchar_t [nchars];
Char * pszmultibytestring = new char [nchars];
// Copy string to multibyte Buffer
Lstrcpy (pszmultibytestring, lpsz );
If (getunicodestringfrommultibytestring (pszmultibytestring, pszunicodestring, nbuffersize, getacp ()))
{
// Write in byte mode
Cfile: Write (pszunicodestring, lstrlen (lpsz) * sizeof (wchar_t ));
}
Else
{
Assert (false );
}
If (pszunicodestring & pszmultibytestring)
{
Delete [] pszunicodestring;
Delete [] pszmultibytestring;
}
}
// Else if we don't want to write Unicode, no conversion needed
Else
{
// Do standard stuff
Cstdiofile: writestring (lpsz );
}
# Endif
}
Uint cstdiofileex: processflags (const cstring & sfilepath, uint & nopenflags)
{
M_bisunicodetext = false;
// If we have writeunicode we must have write or writeread as well
# Ifdef _ debug
If (nopenflags & cstdiofileex: modewriteunicode)
{
Assert (nopenflags & cfile: modewrite | nopenflags & cfile: modereadwrite );
}
# Endif
// If reading in text mode and not creating...
If (nopenflags & cfile: typetext &&! (M_nflags & cfile: modecreate )&&! (M_nflags & cfile: modewrite ))
{
M_bisunicodetext = isfileunicode (sfilepath );
// If it's Unicode, switch to binary mode
If (m_bisunicodetext)
{
Nopenflags ^ = cfile: typetext;
Nopenflags | = cfile: typebinary;
}
}
M_nflags = nopenflags;
Return nopenflags;
}
// delimiter
// cstdiofileex: isfileunicode ()
// delimiter
// returns: bool
// parameters: const cstring & sfilepath
// purpose: determines whether a file is Unicode by reading the first character and detecting
// whether it's the Unicode byte marker.
// notes: none.
// exceptions: none.
//
/* Static */bool cstdiofileex: isfileunicode (const cstring & sfilepath)
{< br> cfile file;
bool bisunicode = false;
wchar_t cfirstchar;
cfileexception exfile;
// Open file in binary mode and read first character
If (file. Open (sfilepath, cfile: typebinary | cfile: moderead, & exfile ))
{
// If byte is Unicode byte-order marker, let's say it's Unicode
If (file. Read (& cfirstchar, sizeof (wchar_t)> 0 & cfirstchar ==( wchar_t) nunicode_bom)
{
Bisunicode = true;
}
File. Close ();
}
Else
{
// Handle error here if you like
}
Return bisunicode;
}
Unsigned long cstdiofileex: getcharcount ()
{
Int ncharsize;
Unsigned long nbytecount, ncharcount = 0;
If (m_pstream)
{
// Get size of chars in file
Ncharsize = m_bisunicodetext? Sizeof (wchar_t): sizeof (char );
// If Unicode, remove byte order mark from Count
Nbytecount = (unsigned long) getlength ();
If (m_bisunicodetext)
{
Nbytecount = nbytecount-sizeof (wchar_t );
}
// Calc chars
Ncharcount = (nbytecount/ncharsize );
}
Return ncharcount;
}
// Configure //--------------------------------------------------------------------------------------------
//
// Cstdiofileex: getunicodestringfrommultibytestring ()
//
// Configure //--------------------------------------------------------------------------------------------
// Returns: bool
// Parameters: char * szmultibytestring (in) Multi-byte input string
// Wchar_t * szunicodestring (out) Unicode outputstring
// Short nunicodebuffersize (in) Size of Unicode output buffer
// Uint ncodepage (in) code page used to perform Conversion
// Default =-1 (get local code page ).
//
// Purpose: gets a unicode string from a multibyte string.
// Notes: none.
// Exceptions: none.
//
Bool cstdiofileex: getunicodestringfrommultibytestring (char * szmultibytestring, wchar_t * szunicodestring, short nunicodebuffersize, uint ncodepage)
{
Bool Bok = true;
Int nreturn = 0;
Cstring serrormsg;
If (szunicodestring & szmultibytestring)
{
// If no code page specified, take default for System
If (ncodepage =-1)
{
Ncodepage = getacp ();
}
Try
{
Nreturn = multibytetowidechar (ncodepage, mb_precomposed, szmultibytestring,-1, szunicodestring, nunicodebuffersize );
If (nreturn = 0)
{
Bok = false;
}
}
Catch (...)
{
Bok = false;
}
}
Else
{
Bok = false;
}
Assert (Bok );
Return Bok;
}
// Configure //--------------------------------------------------------------------------------------------
//
// Cstdiofileex: getmultibytestringfromunicodestring ()
//
// Configure //--------------------------------------------------------------------------------------------
// Returns: bool
// Parameters: wchar_t * szunicodestring (in) Unicode input string
// Char * szmultibytestring (out) multibyte output string
// Short nmultibytebuffersize (in) multibyte buffer size
// Uint ncodepage (in) code page used to perform Conversion
// Default =-1 (get local code page ).
//
// Purpose: gets a multibyte string from a unicode string
// Notes: none.
// Exceptions: none.
//
Bool cstdiofileex: getmultibytestringfromunicodestring (wchar_t * szunicodestring, char * szmultibytestring,
Short nmultibytebuffersize, uint ncodepage)
{
Bool buseddefchar = false;
Bool bgotit = false;
If (szunicodestring & szmultibytestring)
{
// If no code page specified, take default for System
If (ncodepage =-1)
{
Ncodepage = getacp ();
}
Try
{
Bgotit = widechartomultibyte (ncodepage, wc_compositecheck | wc_sepchars,
Szunicodestring,-1, szmultibytestring, nmultibytebuffersize, sdefault_unicode_filler_char, & buseddefchar );
}
Catch (...)
{
Trace (_ T ("Controlled exception in widechartomultibyte! \ N "));
}
}
Return bgotit;
}
This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/Augusdi/archive/2009/10/15/4677520.aspx
Sending