Currently, we have several tasks to do.
- Sort out simple-to-complex and traditional-to-complex Chinese characters (it seems that there is a complete list on Wikipedia and I have collected it ).
- Make conversion correction tables for these Chinese characters respectively (ConvertZ itself comes with Wikipedia, but the vocabulary is still too small ).
- Collect the test text examples used for simplified and traditional conversion (a complete test example is not found currently ).
- Development Program conversion.
Except that the first article is basically completed and the fourth article can only be completed by myself, the second article can be completed through collaboration. We hope that more people will participate in the above work, so that the simplified and traditional work can be better done.
My current design goal is to improve simplified and traditional Chinese character conversion as efficiently as possible, without considering the conversion of terms and vocabulary (for example, "program" to "program ").
The simplified and Traditional correction Word files are formatted as follows:
- It is divided into three columns and separated by tabs.
- The first column indicates the Chinese character conversion header, and the second column indicates the traditional (or simplified) Chinese character.
- If the first column is empty, the second column indicates that the entries corresponding to traditional Chinese characters are used.
- The entry can be simplified or traditional.
- In special cases, you can add the "=" sign before the entry to strictly match the entry string.
- If a matched entry has a specific traditional (or simplified) form, you can write the converted form in the third column (in general, you do not need to specify the third column, it is usually used only for other words of the term ). If the converted format is the same as that of the second column, replace "=" in the third column.
- When one to multiple Chinese characters do not have a specific entry, use the first corresponding word in the correction Word file.
- Long words have a higher priority.
- The content after "#" or ";" represents the comment content.
Examples of simplified and traditional conversion correction word lists:
When the following entry appears, convert "~" to "understand". When "Understanding" is displayed, convert it to "understand". It is the same as the previous line, the program treats this line as equivalent to the previous line (Correction words can be both traditional and simplified) # understand and "understand ", temporarily retained (because the second column starts with "#", indicating this behavior comment; because the word "" to "" also belongs to a simple and complex form, you should specify "" in the third column to look at the mountains and see them as fire; this is equivalent to "fire-like", and the program automatically converts the "view" corresponding to simplicity and complexity to "fail; by default, "Ten Thousand" is converted to "Ten Thousand". In the following cases, "Ten Thousand" is kept unchanged, and the number of "Ten Thousand" is less than ten thousand RMB; because the conversion from "yu" to "yu" is also a simple and complex scenario, the third column specifies the form of cloud token after the conversion; by default, "Cloud" is converted to "" cloud. When it means "say", it remains unchanged. The word "Poetry" is automatically converted to "" by the program ", the word "Cloud" remains unchanged. Cloud = cloud =