Analysis on the Chinese path supported by the C # version of gdal Library (continued)

Source: Internet
Author: User

In the previous blog, I mainly talked about the problems in the C # version of The gdal library. The main manifestations are:"The number of Chinese characters in the file name is an even number, which has no effect at all. Reading and creation are normal. If the number of Chinese characters in the file name is an odd number, an error will be reported for both reading and creation."

To address this problem, I carefully studied (view + toss) the default encoding method of the string type in the C # program. First, use the following code to check the default encoding method of the string type in the C # program.

Static void main (string [] ARGs) {string S = ""; // first obtain the default encoding byte and its length, and output byte [] bdefault = encoding. default. getbytes (s); console. writeline (bdefault. length); foreach (byte B inbdefault) {console. writeline (B);} // obtain the bytes encoded in Unicode and their length, and output byte [] bunicode = encoding. unicode. getbytes (s); console. writeline (bunicode. length); foreach (byte B inbunicode) {console. writeline (B);} // obtain the UTF-8 encoded bytes and their lengths, and output byte [] butf8 = encoding. utf8.getbytes (s); console. writeline (butf8.length); foreach (byte B inbutf8) {console. writeline (B);} // finally obtain the byte and length of the 936 encoding (gb2312), and output byte [] b936 = encoding. getencoding (1, 936 ). getbytes (s); console. writeline (b936.length); foreach (byte B in b936) {console. writeline (B );}}

Run the preceding code snippets on the x64bit Chinese OS and win764 bit English OS. We can view the values in the four byte arrays, as shown in. The upper part is displayed in decimal format, and the lower part is displayed in hexadecimal format.

We can clearly see that the default encoding for Chinese Characters in C # Should Be gb2312 (936. The default encoding has nothing to do with the operating system.

Now that you know the default Chinese character encoding method, let's take a look at the problem of yesterday and use the code system. text. encoding. default. getstring (system. text. encoding. after utf8.getbytes (utf8_path) is converted, what encoding is actually changed. Use the following code snippet for testing:

Static void main (string [] ARGs) {string S = "me"; string strtemp = system. text. encoding. default. getstring (system. text. encoding. utf8.getbytes (s); byte [] bdefault = encoding. default. getbytes (strtemp); console. writeline (bdefault. length); foreach (byte B inbdefault) {console. writeline (B);} console. writeline (strtemp );}

Through monitoring and viewing, we found that the bytes converted by the code above are the same as those of the first two in utf8, but the third has changed to 63 in the ASCII code, that is, the question mark "?", However, the system currently considers that the string is still gb2312 encoded, so garbled characters appear. As shown in.

Next, let's take a look at the byte encoding corresponding to the C ++ language after passing in the gdal library after the string is encapsulated by swig, using the cross-language debugging method in the previous two blogs, directly use the above string "I" with OGR. OPEN function open, and then in the C ++ library file gdal-1.10.0 \ port \ cpl_vsil_win32.cpp function vsivirtualhandle * vsiwin32filesystemhandler: open (const char * pszfilename, const char * pszaccess) add a breakpoint to view the input string, as shown in:

The conversion string and Its bytecode are as follows:

By comparing this image with the bytecode of C # above, we found a problem. In C #, The bdefault bytecode is (230, 136, 63) converted to hexadecimal notation (0xe6, 0x88, 0x3f) it is consistent with the byte code passed in the C ++ Library (pszfilename ). This means that after swig is encapsulated and passed into the C ++ library, the encoding remains unchanged and is still the wrong encoding. That is to say, through the code system. text. encoding. default. getstring (system. text. encoding. the conversion of utf8.getbytes (utf8_path) causes an Encoding Error. You only need to modify the encoding here to prevent him from transcoding or change default to utf8.

According to this idea, all the systems under the swig \ CSHARP directory. text. encoding. default. getstring (system. text. encoding. utf8.getbytes (utf8_path) are changed to system. text. encoding. utf8.getstring (system. text. encoding. utf8.getbytes (utf8_path.

There are not many files to be modified. There are four files in total, as shown in:

You can replace them in batches. After modifying and re-compiling the C # library of gdal, add the compiled eight DLL files to the project and perform debugging again according to the preceding steps. Go to the C ++ code and monitor the changes in the values before and after the encoding conversion. As shown in.

We can see that the encoding of the input string pszfilename has changed to (0xce, 0xd2). This encoding is the default encoding in C # Or gb2312 encoding, which is not utf8 encoding. Then we can set gdal_filename_is_utf8 = No to read the data.

Next, we will use a path that cannot be opened, and then set gdal_filename_is_utf8 to no for testing. As shown in the debugging code, we can see that SHP that cannot be opened has been opened normally.

The console outputs the following information:

After testing, the modification can support all Chinese and English paths. The test environment is XP 64-bit Chinese OS and win764-bit English OS.

I have packaged and uploaded the 8 DLL files of the modified C # version to the csdn resources and the QQ Group for sharing. simply replace the original DLL files in the previous gdal110 version. Csdn: http://download.csdn.net/detail/liminlu0314/5809463

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.