C # quick access to mnemonic code

Source: Internet
Author: User

Implementation in this article:

1. At the cost of space, it is convenient to quickly obtain the mnemonic code of Chinese characters.

2. For the pinyin character, enable multi-Tone Words and provide an explicit method for calling surnames.

There are many ways to use C # To retrieve mnemonic code on the Internet. However, the conversion of mnemonic code is generally implemented using the Chinese character methods of each code. Its disadvantage is obvious:

1. It is equivalent to storing data in the database or outside the program;

2. The positioning function of the string is not necessarily good.

In a linear table, we can find the fastest data at a specified node. For example, in a linear table containing 10000 elements, no matter which data value is directly accessed, you can directly locate the location of 9,000th elements based on the first position of the table. arrays are also a simple and effective linear table. You only need to assign a project to it by means of the data transfer.

Without using a database, you can directly use files for resources to facilitate configuration and complete conversion from Chinese to mnemonic codes. At the same time, for pinyin mnemonic codes, you must also implement: 1. match multiple phoneme phrases; 2. special Method for surnames.

At the same time, five or four-corner codes and other categories are also easy. Although it is at the cost of space sacrifice, it is very worthwhile to improve the performance of the space for time.

Data storage array

Array is a linear table. Our solution is:

1. Each project corresponds to a structure, which includes Chinese characters, Pinyin codes, multi-Phonograph words, five-character codes (or other codes );

2. Each Chinese character is directly converted into a numerical code (ASCII or UNICODE), and the number is the array subscript;

3. Read the text string, obtain numbers based on each Chinese character, locate the position in the array directly, and obtain the encoding.

Therefore, we define the structure of the mnemonic code used to store Chinese characters (with more encoding, we only need to expand the structure ):

Private struct ItemWord {

Public int numFlag; // specifies whether to enable multiple Enis.

Public char strWord; // current word

Public string strSpell1; // input code

Public string strSpell2; // input code

Public string strExt; // The reserved string of multiple phonograph words.

Public string strName; // defines the last name.

Load the help code dictionary

Because Pinyin and multi-phoneme characters are supported and five mnemonic codes are also supported, encoding tables are temporarily stored under d: myword for convenience. The following uses the loading of Pinyin codes as an example, first, initialize each item in Itemword. Generally, the actual number of Chinese Characters in GBK is about 20000 characters. From the perspective of Dual-byte, the maximum number is 64 KB, therefore, we define our array as 64 K, namely:

Private static readonly int intmax= 65536;

Private ItemWord [] stWord = new ItemWord [INTMAX];

How to load the pinyin hzpy1.txt in javasfun_loadword:

While (strInput = srFile. ReadLine ())! = Null ){

ChrWord = strInput. ToCharArray ();

NumIndex = (int) chrWord [0];

StWord [numIndex]. strWord = chrWord [0];

StWord [numIndex]. strSpell1 = chrWord [2]. ToString ();

Note: the most important thing is to convert the read content into an array of char, and then pass (int) chrWord [0] converts it to the lower mark value of an array (this is much simpler than C), and then fills in the corresponding content in the array with this subscript item. The reading method for the five-digit code is also consistent, which is very simple (more other codes are also used ). However, we need to introduce the storage of Pinyin codes. In fact, for Chinese characters, single-tone is the majority, and only a small part of single-tone is multi-tone. To make the structure simpler, the polyphonic words are stored in the corresponding Chinese character structure, namely:

NumIndex = (int) chrWord [0];

StrInput = strInput. Substring (2 );

StrInput = strInput. Replace ("","");

StrInput = strInput. Replace ("","/");

StWord [numIndex]. strExt = stWord [numIndex]. strExt + strInput + "| ";

The final content in strExt is in the format of "| Zeng Ge/Z |". Its advantages will be explained during the presentation of Multitone codes.

From Chinese characters to simplified Codes

We will give an example of the implementation of five mnemonic codes. For the input Chinese character string, we only need to extract each Chinese character, get the corresponding character encoding, and then directly extract the corresponding encoding, because the code is directly obtained by subscript, and the data is loaded into the memory array, it can be quickly described as follows:

ChrWord = strChinese. ToCharArray ();

NumCount = strChinese. Length;

For (I = 0; I <numCount; I ++ ){

NumIndex = (int) chrWord [I];

StrSpell = strSpell + stWord [numIndex]. strSpell2.ToString ();

Pinyin code, there are several main problems, one is the last name, and the other is the implementation of multi-phoneme and multi-phoneme, which are processed in the phrase mode, that is, the first is a multi-phoneme, that is, the numFlag flag in the structure. Define a line like "Zeng Ge Z. In terms of multiphoneme, we use the function funFindMulti to locate the encoding of multiphoneme:

StrWord = "|" + keys + ch2 + "/";

NumPos = stWord [numIndex]. strExt. IndexOf (strWord );

If (numPos> = 0) return stWord [numIndex]. strExt. Substring (numPos + strWord. Length, 1 );

That is to say, but when we locate the Chinese character "Zeng", it is a multi-tone character first, and then according to the encoding corresponding to the phrase "Zeng ge,

The phrase used for positioning is in the format of prefix "| phrase/". If there are multiple voices, you can locate the prefix with a unique position and then obtain the characters following the string, that is, "Z". Therefore, when we enter "confidence in the letter", "XZGDZX" is what we get. Because of the special nature of surnames, and in actual business, we clearly know whether the current content is a name, that is, we explicitly call it, while in the name, the pronunciation of the last name is only triggered when the first word is entered. You only need to call the parameter to provide the name switch (the following Variable IsName is in a compact format for the convenience of the article:

 

Code
 if (stWord[numIndex].numFlag > 0 && !IsName){

if (i > 0) strThis = funFindMulti(numIndex, chrWord[i - 1], chrWord[i]);

if (strThis.Length == 0 && i < numCount -1 ) strThis = funFindMulti(numIndex, chrWord[i], chrWord[i + 1]);

}

if (IsName && i == 0
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.