Go deep into Microsoft. VisualBasic. Strings. strconv wide variety
Yesterday, I met another complicated demand. Although this problem has been handled before, however, in the past, we used the self-built b52gb and gb2b5 response tables to fulfill this requirement (in VB6, we used the strconv method to compile it. in the. NET environment, Microsoft. visualBasic. the DLL also provides strings. the strconv method and its usage are similar to the original VB6 method. However, when strconv was used yesterday, some strange phenomena were unexpectedly discovered, let's take a look at it and remember it!
First, check the Signature Name of the strings. strconv method:
Public static stringStrconv(String STR, vbstrconv conversion, [optional, defaultparametervalue (0)] int localeid)
The third answer is a little different from the document on msdn. The above answer name is extracted from reflector, which is also the record of this article, let's take a look at some examples:
a1 = Strings.StrConv("书樂う?", VbStrConv.TraditionalChinese, 0x0404); // a1 = "?樂??" a2 = Strings.StrConv("书樂う?", VbStrConv.SimplifiedChinese, 0x0404); // a2 = "????" b1 = Strings.StrConv("书樂う?", VbStrConv.TraditionalChinese, 0x0804); // b1 = "書樂う?" b2 = Strings.StrConv("书樂う?", VbStrConv.SimplifiedChinese, 0x0804); // b2 = "书乐う?" c1 = Strings.StrConv("书樂う?", VbStrConv.TraditionalChinese, 0x0412); // c1 = "?樂う?" c2 = Strings.StrConv("书樂う?", VbStrConv.SimplifiedChinese, 0x0412); // c2 = "??う?" d1 = Strings.StrConv("书樂う?", VbStrConv.TraditionalChinese, 0x0009); // d1 = "書樂う?" d2 = Strings.StrConv("书樂う?", VbStrConv.SimplifiedChinese, 0x0009); // d2 = "书乐う?"
The first example of the above eight examples contains Chinese, traditional, Japanese, and simplified texts, and the second example is divided into traditional and traditional datasets, the Third Region contains the localeid, including ZH-TW (0x0404), ZH-CN (0x0840), Ko-Kr (0x0412 ), en (0x0009). Let us check the result. All the hosts are on the third localeid. we first set the third parameter localeid to represent the word set of the source string. Therefore, if this parameter is set ..., let's take a look at the results:
- A1: Well, the first step should be to set "what is written? "Convert to a word set conforming to ZH-TW (0x0404), so the result is "? Why ?? ", Then the result of vbstrconv. traditionalchinese becomes "? Why ?? ", Correct!
- A2: The first step is the same as above, and then the second parameter value vbstrconv. simplifiedchinese should be changed "? Happy ?? ", But the result of A2 is obtained "???? ", It's better to wait!
- B1: the first step should be to write "books first? "Translated into a word set conforming to ZH-CN, so the result is" what is written? ", (The simplified Chinese character set contains the traditional Chinese character" simplified "), and the second numeric value is vbstrconv. traditionalchinese, so the result is changed to "too many characters? ", Correct!
- B2: Correct!
- C1: I don't know much about the text set. If there is no "book" in the text set, it should be correct!
- C2: From the end of C1, this expected period should be "? Happy? ", But the result is "?? Why? ", It's better to wait!
- D1: Success !!! How can this problem be solved? It is not as good as the Failover period. In this phase, there are four results "? !!!
- D2: The result is confusing !!!
What exactly is this? Is it a false parameter? But is there any other possibility? In order to understand this role, I finally offered another hand tool "reflector", which was used to check Microsoft. visualBasic. program secrets in the DLL, learn more about the secrets in the DLL!
First, let's take a look at a small part of the Program program after reverse engineering of the strconv method (it hasn't reached the re-point yet, so only the last line of the program is skipped ),
Let's catch up with vblcmapstring and take a look at it. It's just the lower half:
The orange background color is a program related to encoding. The background color and background color are the functions used by the Win32 API to process the keyword, there is a suffix "A" in the primary-colored form, and the number of input records is byte [], while the primary-colored form does not have a suffix, the input parameter number is string.
Now, the answer is coming out. The reason why the answer is not as good as the preparation is because of encoding. getbytes () and encoding. the getstring () method is confusing. If you can skip it, You can directly use the unsafenativemethods in the following example. if lcmapstring is used, there will be no such problems. How can we avoid the program we don't want? Let's take a look at the "encoding. issinglebyte" section under the example! No, this is why the result of D1 and D2 is so confusing, because the encoding of EN is singlebyte, it will directly jump through the Unicode and MBCS mutual neighbors, and directly enter the Unicode encoding, so it is the answer to get the beautiful response, the entire process has been analyzed!
Although I already know that the whole program has been released, but if I can learn more about the magic WIN32API: lcmapstring, I think I can be more clear about it. so let's take a look at the significance of lcmapstring! Hmm ~~ Where are the priorities? According to the requirements in this article, only the second round of dwmapflags deserves our attention. Open the msdn file, find the lcmapstring chapter in the index. We can see the following content,
For the operating system of Windows NT 4.0, microsoft has already helped programmers compile a traditional system function (alas! Why don't you know !), If you want to upload an image (lcmap_simplified_chinese) or a traditional image (lcmap_traditional_chinese), you only need to give a batch of data. This is the case!
Conclusion
If your requirements are the same as mine, but you just want to adjust the text content to a larger part, you do not want to convert it to big5 or GB. Unicode is used in the entire process, in addition, if you do not want to break other non-traditional texts, the conclusion is that the D1 and D2 commands in this article call the strings of VB. strconv takes 0x0009 or the localeid of other singlebyte character sets as the third digit !!!
If you do not want to introduce Microsoft. visualBasic. DLL (Do not ask why, just a few people's preferences) and want to achieve the same effect, the practice is also very simple, please take the following test of the program routines !!!
public static class ChineseStringUtility{ internal const int LOCALE_SYSTEM_DEFAULT = 0x0800; internal const int LCMAP_SIMPLIFIED_CHINESE = 0x02000000; internal const int LCMAP_TRADITIONAL_CHINESE = 0x04000000; [DllImport("kernel32", CharSet = CharSet.Auto, SetLastError = true)] internal static extern int LCMapString(int Locale, int dwMapFlags, string lpSrcStr, int cchSrc, [Out] string lpDestStr, int cchDest); public static string ToSimplified(string source) { String target = new String(‘ ‘, source.Length); int ret = LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_SIMPLIFIED_CHINESE, source, source.Length, target, source.Length); return target; } public static string ToTraditional(string source) { String target = new String(‘ ‘, source.Length); int ret = LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_TRADITIONAL_CHINESE, source, source.Length, target, source.Length); return target; }}
[Switch from csdn] go deep into Microsoft. VisualBasic. Strings. strconv complex routing