I recently encountered a problem when I used ifstream to open files with Chinese paths. I flipped through the Internet and found this was a common problem, many people have also provided some solutions in their blog posts, but most of them are reposted. There is very little clear and comprehensive explanation of the causes of this problem. Therefore, I think it is necessary to make a detailed analysis of the cause of the problem, hoping to help friends who encounter the same problem.
First, we will use a simple example to reproduce the problems I encountered:
(1) On the "property pages" attribute page of vs2008, select "configuration properties" --> "general". You can see that the current character set is "Multi-Byte Character Set ", that is to say, the program uses the multi-byte character set.
(2) Next let's take a look at the simple code of opening the TXT file in ifstream:
[CPP]View plaincopy
- # Include "stdafx. H"
- # Include <fstream>
- # Include <iostream>
- Using namespace STD;
- Int _ tmain (INT argc, _ tchar * argv [])
- {
- Ifstream infile ("D: // test .txt ");
- If (infile. is_open ())
- {
- Cout <"Open success! ";
- }
- Else
- {
- Cout <"Open fail! ";
- }
- Return 0;
- }
(3) running result: "Open fail" is output (opening the file failed !)
You can see from the setting options that the character set used in the project can be set to "Multi-Byte Character Set" or "Unicode Character Set ", "Multi-Byte Character Set" indicates that the ANSI encoding method is used, and "Unicode Character Set" indicates that the Unicode encoding method is used.
So what are the differences between the two encoding methods?
(1) traditional computers use ANSI encoding. in ANSI encoding mode, English characters are expressed in 1 byte, while some other countries (such as Chinese characters and Japanese ), it cannot be expressed in a single byte. ANSI uses multiple bytes to represent these characters (Chinese characters are 2 bytes ).
(2) Unicode contains UTF-8, UTF-16, UTF-32 and Other encoding solutions (currently Windows generally use UTF-16 ). For the UTF-16, it is required that all characters are expressed in 2 bytes (whether English letters or Chinese characters), and that the characters beyond the range of 2 bytes are represented in a proxy (expressed in 4 bytes ).
Unicode has many advantages over ANSI (what are the advantages ?), Microsoft strongly advocates unicode encoding, which is used in newer versions of Ms. Therefore, even if we use ANSI encoding in our own program, the system will convert it to Unicode and then process it.
Next, let's talk about ifstream. When calling the ifstream open method, the system calls mbstowcs_s internally to convert the file name (the function of mbstowcs_s is to convert multi-byte characters into wide characters). Note that, the call result of this function depends on the localization settings of the Program (what is the localization settings ?). The localization settings can be set through the setlocale function. For example, setlocale (lc_all, "Chinese") indicates to set the language of the program to Chinese, when the program starts, lc_all = "C" is set by default ". When mbstowcs_s is used for String Conversion, only Chinese strings that contain lc_all = "Chinese" can be correctly converted to their corresponding wide byte characters, otherwise (when lc_all = "C"), Chinese characters will be considered as two single-byte characters and then converted to wide-byte characters. The conversion result is obviously incorrect! This is the reason why ifstream fails to open a file containing a Chinese path, because "D: // test .txt" is converted to an incorrect path, so the file cannot be opened!
The solution is as follows:
1: /********************************************************************
2: created: 2008/05/10
3: created: 10:5:2008 23:56
4: filename: k:/sj/fstreamTest/fstreamTest/main.cpp
5: file path: k:/sj/fstreamTest/fstreamTest
6: file base: main
7: file ext: cpp
8: author: Gohan
9: *********************************************************************/
10: #include <tchar.h>
11: #include <fstream>
12: #include <iostream>
13: using namespace std;
14: int main()
15: {
16: /************************************************************************/
17:/* method 1, use the _ text () macro definition to specify the String constant as the tchar * type */
18:/* if it is me, this type is preferred */
19: /************************************************************************/
20: fstream file;
21: file. Open (_ text ("C: // test cmd.txt "));
22: cout<<file.rdbuf();
23: file.close();
24:
25: /************************************************************************/
26:/* method 2. Use the static method of the locale class in STL to specify the global locale */
27:/* after this method is used, cout may not be able to output Chinese characters normally */
28:/* I found a solution to this problem: Do not use cout or wcout to output Chinese characters before setting the restoration area */
29:/* Otherwise, the result is that the cout wcout cannot be used to output Chinese characters after the region is restored */
30: /************************************************************************/
31: locale: Global (locale (""); // set the global region to the default Operating System region.
32: file. Open ("C: // test cmd2.txt"); // the file can be opened successfully.
33: locale: Global (locale ("C"); // restore global region settings
34: cout<<file.rdbuf();
35: file.close();
36:
37: /************************************************************************/
38:/* method 3. Use the C function setlocale to output Chinese characters using cout. The solution is the same as above */
39: /************************************************************************/
40: setlocale (lc_all, "Chinese-simplified"); // set the Chinese Environment
41: file. Open ("C: // test prepare 3.txt"); // you can open the file smoothly.
42: setlocale (lc_all, "C"); // restore
43: cout<<file.rdbuf();
44: file.close();
45: }
See blog: http://www.cppblog.com/gohan/archive/2008/05/11/49488.html
Windows advocates unicode encoding. Therefore, we recommend that you use the Unicode Character Set when writing programs using. This helps avoid Character Set conversion problems and improve efficiency (as mentioned above, ANSI encoding is converted to Unicode in windows before processing, these conversions also bring about additional time consumption ).
In the example program, you can set the project character set to Unicode and add _ t before the string (in this way, when the character set has been set to Unicode, this string is automatically represented by a wide character. For example, ifstream infile (_ T ("D: // test .txt") does not cause file opening failure.
Note: The red font is used as an extension. Check the relevant information.