Recently, the company has a game product, the font has problems, I hope the automatic simple and complex screen automatic conversion behavior, reduce workload.
So I use some of the Windows own conversion functions, but found that a large number of words are abnormal, can not be converted (test Iconv also found that cannot be converted).
Taking this record some OPENCC libraries use the tutorial, which calls the OPENCC library in C + + to complete character conversions.
Note: OPENCC is not a iconv-like library, he is just a code conversion library, do not use in similar iconv scenes, please note the distinction.
introduction of OPENCC :
Open Chinese Convert (OPENCC, opening Chinese conversion) is an opensource project for conversion between traditional Chinese and simplified C Hinese, supporting character-level conversion, Phrase-level conversion, variant conversion and regional idioms among MAINL And China, Taiwan and Hong Kong.
Chinese simple and complex conversion to open source items, to support the conversion of the word level, the conversion of the difference and the use of the local vocabulary conversion (Mainland China, Taiwan, Hong Kong).
Features Special Dot
- The strict distinction is "simple to multiply" and "one simple to many".
- Fully compatible with the word, you can actually swap.
- The strict jury has a simple, multi-word article which is "incompatible".
- To support the Chinese mainland, Taiwan, Hong Kong and the region to learn to use the word conversion, such as "Inside", "Mouse," mouse.
- The library and the library are completely separated and can be freely modified, guided and expanded.
- Supports C, C + +, Python, PHP, Java, Ruby, node. js and Android.
- Compatible with Windows, Linux, Mac platforms.
Through the above introduction can be found, OPENCC is a relatively perfect conversion font, note here, OPENCC is for simple conversion, not applicable to other national characters, other international languages, please use Iconv.
Attached OPENCC Evaluation: http://linux-wiki.cn/wiki/zh-hans/%E7%AE%80%E7%B9%81%E8%BD%AC%E6%8D%A2
Online simple propagation test: http://opencc.byvoid.com/
For the Windows platform development, Linux, etc. please refer to the official documentation.
1. Installation and compilation OPENCC
OPENCC's Download: Https://github.com/BYVoid/OpenCC
Currently the latest version is: 1.0.4
After the download is uncompressed, after installing CMake, execute the following statement in the path:
Cmake-h.-bbuild-g"Visual Studio "-dcmake_install_prefix= "path/to/ Install"--build build--config Release--target Install
Note here that VS2013 and above are not compatible with XP, please change the settings in SLn.
The path is especially important to note that writing relative paths is prone to situations where duplicate paths cannot be compiled.
If you only need to use the development library here, you only need to execute
Cmake-h.-bbuild-g"Visual Studio "-dcmake_install_prefix= "path/to/ Install"
The build directory is then generated and the following sln is opened.
Note OPENCC useful a lot of C11 features, in less than 2013 version difficult to compile through, if you use less than 2013 call DLL, then be careful not to use the online release of the source code, next I will describe how to convert.
The next step is to compile the OPENCC and publish and integrate the files such as Dll,include, which is not a separate introduction, very simple.
Here 1.0.4 version of the project: Opencc_phrase_extract is not compiled, there is a corresponding issue on git, delete the item can be, do not affect the use, so do not care about him.
2. Use OPENCC in code
After completing the above steps, we can formally use OPENCC in our own code to convert to traditional.
Here is a special note, OPENCC is just a UTF8-based format of the simple transformation Library, does not exist and iconv the same conversion, so the next code will use a lot of boost locale, if you feel unaccustomed, you can replace the iconv.
Describe what the configuration file means (excerpt from the official Git):
Configurations configuration file
Preset configuration file
s2t.json
Simplified Chinese to traditional Chinese simple to complex
t2s.json
Traditional Chinese to Simplified Chinese complex to simple
s2tw.json
Simplified Chinese to Traditional Chinese (Taiwan standard) simple to Taiwan
tw2s.json
Traditional Chinese (Taiwan standard) to Simplified Chinese Taiwan to the simple
s2hk.json
Simplified Chinese to Traditional Chinese (HK standard) simplified to Hong Kong complex (Hong Kong School of Small learning vocabulary)
hk2s.json
Traditional Chinese (Hong Kong standard) to Simplified Chinese Hong Kong Complex (HK School of Learning vocabulary) to the simple
s2twp.json
Simplified Chinese to Traditional Chinese (Taiwan standards) with Taiwanese idiom simple to complex (Taiwan standard) and converted to Taiwan's common vocabulary
tw2sp.json
Traditional Chinese (Taiwan standard) to Simplified Chinese with Mainland Chinese idiom complex (Taiwan normal) to the simplicity and conversion to the Chinese mainland's common vocabulary
t2tw.json
Traditional Chinese (OPENCC) to Taiwan, standard complex (OPENCC standards) to Taiwan
t2hk.json
Traditional Chinese (OPENCC standard) to Hong Kong Standard Complex (OPENCC) to HK Complex (Hong Kong School of Learning vocabulary)
Generally for the common use of simple and complex, I recommend here: s2t or t2s configuration files, when it comes to chatting and other content generally recommended to use S2TW or tw2s can, the rest of the test, recommended self-test after the choice.
The content of the configuration file is very simple, corresponding to the corresponding OCD file, take S2t.json as an example:
{ "name":"Simplified Chinese to Traditional Chinese", "segmentation": { "type":"mmseg", "Dict": { "type":"OCD", "file":"Stphrases.ocd" } }, "Conversion_chain": [{ "Dict": { "type":"Group", "dicts": [{ "type":"OCD", "file":"Stphrases.ocd" }, { "type":"OCD", "file":"Stcharacters.ocd" }] } }]}
I saw a lot of OCD in the content, but you found out there was no OCD on your side. Because OCD needs to be generated using his tools, and not under the data\dictionary directory, but in the build\data directory, So when looking for attention, at the same time if you really lazy to make, you can use TXT,OCD is to speed up the reading, do not need to care about it, if you do not care about this speed difference, it is recommended to use TXT file, in the data\dictionary directory to find, However, the OCD files in the configuration file are modified to TXT, for example:
{ "name":"Simplified Chinese to Traditional Chinese", "segmentation": { "type":"mmseg", "Dict": { "type":"text", "file":"STPhrases.txt" } }, "Conversion_chain": [{ "Dict": { "type":"Group", "dicts": [{ "type":"text", "file":"STPhrases.txt" }, { "type":the text", "file":"STCharacters.txt" }] } }]}
The configuration file is finished, we can start to write their own code, it is important to note that if you use the lower version of the version such as VS2005, such as the call Opencc.dll, you will find many tutorials on the web is wrong, because it will produce Bad_alloc exception, the specific reason is not compatible, Here, if you are using standard c+ to invoke the arguments without exception (if you have other better ways, please contact me), my workaround is to call directly the C function provided by OPENCC:
For example, write a function that GBK converted to BIG5:
opencc_t GS2TWHWD =NULL;if(GS2TWHWD = =NULL) gs2twhwd= Opencc_open ("S2t.json");//Step 1 Convert to UTF8STD::stringSzconvsert = lc::to_utf<Char> (SZGBK,"GBK"); Szconvsert= Opencc_convert_utf8 (Gs2twhwd,szconvsert.c_str (), szconvsert.size ());//Convert text GBK to BIG5Szconvsert = Lc::from_utf (Szconvsert,"BIG5");//Convert to local character set
As you can see, I use the standard C function, so there is no low version of VC + + compatible with the high version of VC + + problem.
3. Release OPENCC function
When the above code is written, the next thing to do is to publish the OPENCC program, so that you can perfect run up.
Publishing is very simple, for example, when using S2t.json and T2s.json, publish profiles and programs execute file siblings and place OCD or txt files in the same directory.
However, it is important to note that we only need to publish the files we use, and we do not need to publish other completely unused files to increase the size of the publication.
To set up a custom directory method:
For example, I want to put all the configuration files in the lang directory, then load the configuration file to write
GS2TWHWD = Opencc_open ("lang\\s2t.json");
The configuration file is also modified to:
{ "name":"Simplified Chinese to Traditional Chinese", "segmentation": { "type":"mmseg", "Dict": { "type":"text", "file":"Lang\\stphrases.txt" } }, "Conversion_chain": [{ "Dict": { "type":"Group", "dicts": [{ "type":"text", "file":"Lang\\stphrases.txt" }, { "type":"text", "file":"Lang\\stcharacters.txt" }] } }]}
After the data can be loaded into other locations, of course, you have to test yourself, the above are relative paths, so your program code to manage the corresponding directory, or more prone to anomalies.
[Original] Use OPENCC library for simple conversion (c + + code)