How to read the Chinese path when using the GDAL library in C #

Source: Internet
Author: User

1. Basic Description: In the new GDAL version (it is said that this is not verified after 18, but it is considered that this is true in later versions), GDAL adds support for the UTF8 path, A configuration item called GDAL_FILENAME_IS_UTF8 is added. You can use the following statement in C # to set it to YES or NO. The default value is YESGdal. setConfigOption ("GDAL_FILENAME_IS_UTF8", "YES"); Gdal. setConfigOption ("GDAL_FILENAME_IS_UTF8", "NO") when this value is YES (default), GDAL considers the input path string to be UTF-8 encoded, it will try to convert this string to the UCS-2 encoding, But we generally use the Chinese path is not UTF8, it will produce path garbled and cannot open the problem, can refer: about GDAL180 Chinese path cannot open problem analysis and solution http://blog.csdn.net/liminlu0314/artic Le/details/6610069 2. For the solution in C ++, refer to the previous article and use the first two solutions, set GDAL_FILENAME_IS_UTF8 to NO to read the Chinese path. 3. Problems in C # (Versions later than 18, the problem in C # is different from that in C ++. First, after compilation, when the relevant DLL of GDAL is referenced in C # to read files in the Chinese path, you do not need to set GDAL_FILENAME_IS_UTF8 to NO (in C #, setting it to NO will cause an error, which is analyzed below). In most cases, the read is correct, there are only a few problems: When an odd number of Chinese characters are connected together in a Chinese path and there are symbols or characters other, it cannot be opened. For example, [plain] C: \ test path \ aa. img Chinese path. The number of Chinese characters is even. C: \ test Folder \ aa can be opened normally. img Chinese path. The number of Chinese characters is odd, but it is followed "\". Open C: \ test folder 1 \ aa. img Chinese path. The number of Chinese characters is odd. It is not "\" and cannot be opened. The C: \ testPath \ test file is returned. img Chinese path. The number of Chinese characters is an odd number, which is not "\" and cannot be opened. Error 4. The reason why img can be normally read in most cases is mentioned in the article, when the value of GDAL_FILENAME_IS_UTF8 is YES (that is, when the GDAL library is normally used in C #), GDAL performs encoding conversion, so why can C # normally read Chinese paths (in most cases) in this case? Open the source code of GDAL and find the \ swig \ csharp folder. This file is the source code of eight C # reference files, such as gdal_csharp.dll. Open \ swig \ csharp \ gdal \ Gdal. cs, find the public static Dataset Open (string utf8_path, Access eAccess) function, the content is as follows: [csharp] {IntPtr cPtr = GdalPINVOKE. open (System. text. encoding. default. getString (System. text. encoding. UTF8.GetBytes (utf8_path), (int) eAccess); Dataset ret = (cPtr = IntPtr. zero )? Null: new Dataset (cPtr, true, ThisOwn_true (); if (GdalPINVOKE. SWIGPendingException. pending) throw GdalPINVOKE. SWIGPendingException. retrieve (); return ret;} You can see that in this function, the path (string uft8_path) is re-encoded after it is passed in, that is, this statement: [csharp] view plaincopySystem. text. encoding. default. getString (System. text. encoding. getBytes (utf8_path) then transmits it to the actual processing function written in C ++. There are many other conversions in \ swig \ csharp, because of this conversion, when GDAL is used in C #, the Chinese path can be read normally. That is to say, in C # Call GDAL, GDAL first in the path string in C # transit to the UTF-8, and then in C ++ in the UTF-8 code to the UCS-2, ensure normal reading (dizzy ...) 5. Why is there a problem with odd Chinese characters? Strictly speaking, this is not the GDAL error, but the problem of C # encoding conversion, can refer to: Analysis of GDAL library C # version support Chinese path problem http://www.cfanz.cn/index.php? C = article & a = read & id = 103228 this article has been carefully analyzed and the experiment is rigorous. To sum up, this is the conversion made by GDAL in the C # code, System. text. encoding. default. getString (System. text. encoding. getBytes (utf8_path) is to first convert the string to the Byte [] of the UTF-8 encoding, and then parse it to the Default encoding (in the Chinese system, generally refers to the GB2312) string process, when an odd Chinese character is encountered, one byte of information will be lost. As a result, the path parameter passed to the GDAL corresponding to the C ++ code is wrong, and of course it cannot be opened. (Note: In fact, strictly speaking, this problem is not a C # error. Due to the different encoding rules, this transfer process is actually very risky, in many cases, it is not possible to turn it over. You cannot blame people for C #. 6. Although the above articles on finding a solution under C # are carefully analyzed, I am sorry, it does not provide a simple solution, so you can only find it on your own. First, the simplest solution: analyze the path before each open to determine whether an error will occur according to the rules mentioned above. If yes, the system prompts the user ....... This method can solve the problem, but it seems unreliable. Second, can you find a way to keep the encoding in C # without losing the bytes? Unfortunately, we cannot find the implementation method. Third, since C ++ can directly skip these conversions, why can't C # Be? As a result, the following solution is effective after a simple test and no associated problems are found: 7. Final Solution change \ swig \ csharp file, remove all the code conversions in the C # Code. The Code is mainly concentrated in these files: \ swig \ csharp \ gdal \ Gdal. cs \ swig \ csharp \ gdal \ Driver. cs \ swig \ csharp \ ogr \ Ogr. cs \ swig \ csharp \ ogr \ Driver. cs splits the System. text. encoding. default. getString (System. text. encoding. replace all UTF8.GetBytes (utf8_path) with utf8_path for recompilation (gdal1x. dll does not need to be re-compiled. You only need to re-compile the csharp-related DLL). In this way, the path string will be directly transmitted without conversion, but it is the same as in C ++, in this case, you need to set the GDAL_FILENAME_IS_UTF8 parameter to NO in the program. Otherwise, an error occurs during reading.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.