Recently, I want to use wxerlang in r14b of Erlang to do something, but I am suffering from Unicode. Unexpectedly, I quickly figured out the cause of the problem and found the way to understand the problem.
Erlang can process Unicode, starting with r13a. After the middle 7 Bytes 8 bytes, directly to the current r14b03, erlangcan only be opened in werl.exe shell, the UTF-8 multi-byte encoding, parsed into a single UNICODE character consisting of two bytes. In r13a and r13b, when erlc.exe compiles non-ASCII characters, the generated UTF-8 has a lot of problems, and even people suspect it is false.
The result of r13b testing is found. Accordingly, I wrote Erlang simplified Chinese characters gb2312 to Unicode. In the past two days, we have found more conclusive evidence.
Open the wxerlang routine sudoku_gui.erl of r13b and set the first line of the create_window () function.
Frame = wxframe: New (wx: NULL (),-1, "Sudoku", []),
Replace it with the following:
Bin2 = UNICODE: characters_to_binary ("Chinese Window", utf8 ),
L2 = UNICODE: characters_to_list (bin2 ),
Gb2u: Start (),
U = gb2u: get_unicode (L2, []),
Frame = wxframe: New (wx: NULL (),-1, U, []),
Here L2 should be UTF-8, but actually it is gb2312. The gb2u. erl module searches for Unicode Based on gb2312.
The current r14b has completely corrected this problem, when compiling the module, even if the Unicode library function is not explicitly called, non-ASCII code data will be automatically compiled into the UTF-8. So the gb2u. erl I wrote can only be used for the r13a and r13b versions, which are useless in other versions and cannot find the corresponding Unicode Based on the UTF-8.
What should we do? Repeatedly look at the format of UTF-8 data in Erlang and python, finally see the law, gb2312 Chinese characters into a UTF-8, the vast majority is 3 bytes. Make a slight change to gb2u. erl and use the following program.
Def all_gb2312 ():
S = '''-module (utf82u ).
-Export ([start/0, utf82unicode/2]).
Utf82unicode ([], u)->
Lists: reverse (U );
Utf82unicode ([A | T], u) when a <128->
Utf82unicode (T, [A | u]);
Utf82unicode ([a, B, c | Z], u)->
H = get ({a, B, c }),
Utf82unicode (z, [H | u]).
Start ()-> \ n '''
F = open ('utf82u. erl', 'w ')
F. Write (s)
For row in range (1, 161,248 ):
If row> 169 and row <176:
Continue
For Col in range (161,255 ):
Rows = CHR (ROW)
CH2 = CHR (COL)
W = middle + CH2
Try:
Utf8 = W. encode ('utf8 ')
UC = utf8.decode ('utf8 ')
Except t:
Continue
S = 'put ({% s, % s, % s}, % s), \ n' % (
Ord (utf8 [0]), ord (utf8 [1]), ord (utf8 [2]), ord (UC ))
F. Write (s)
F. Close ()
UTF-8 and Unicode conversion, there are specific algorithms. You can also find it online. I also looked at it and felt that it was not too difficult to understand it, but I had to spend some time thinking about it. This is annoying: since there is a simple way, why bother.
The Erlang program generated by the above Python program has tried on r14b.
Open the wxerlang routine sudoku_gui.erl of r14b and set the first line of the create_window () function.
Frame = wxframe: New (wx: NULL (),-1, "Sudoku", []),
Replace it with the following:
Create_window ()->
Bin2 = UNICODE: characters_to_binary ("Chinese Window", uft8 ),
L2 = UNICODE: characters_to_list (bin2 ),
Utf82u: Start (),
U = utf82u: utf82unicode (L2, []),
Frame = wxframe: New (wx: NULL (),-1, U, []),
Well, after the program runs, the correct display of the "Chinese Window", the UTF-8 indeed becomes Unicode.
However, Erlang's self-built GUI tool GS can generate a UTF-8, but it cannot display Unicode and cannot use the method above.
In short, Erlang can generate Unicode in the werl shell. Outside of this, it can only generate UTF-8. Erlang is mainly used for the development of servers, especially HTTP, Web servers, in the web page UTF-8 usage is large. However, on the locale local program platform, Unicode is important to use. Currently, Erlang has left the UTF-8 to Unicode.
In addition, I would like to summarize the two unexpected gains in studying Erlang and pondering Unicode.
1. Discovering flaws in Erlang dictionary Functions
A major feature of Erlang is the strict security management of program memory "privatization. No global variables or global memory buffer.
However, I found that before Erlang r13b, the dictionary functions get/1 and put/2 manage the memory, which exceeds the limit of this program and is globally available at the Erlang shell level.
Although I didn't expect this was a bug at the time, it was not officially reported to Erlang. But Erlang corrected this problem after the blog article "how to convert Erlang to Unicode in simplified Chinese characters gb2312 (3)" was published.
2. Discovery of Erlang r13b UTF-8 flaws
This article has already been described in detail and will not be repeated. Later versions have been corrected.
Note: The Erlang module utf82u. erl generated using Python has been uploaded to "my resources ".