Android Input Method 02: openwnn source code parsing 06-candidate words generated

Source: Internet
Author: User

This article will introduce the openwnn input method and how candidate words are generated during the input process. Because we only study front-end Java code, we only introduce the corresponding interfaces. In fact, the input method candidate words mainly come from the backend (the part written in C code). Here we will not introduce some input method-related models.

The source code of openwnn circulated on the Internet is not processed at the backend (converting C to so files), so it is not possible to directly generate an executable APK. After compiling C code, the source code is: http://download.csdn.net/detail/xianming01/4258456.

I recently saw my article reposted on the Internet, but I did not specify the source. Since this is a series of articles, readers may not be able to understand either of them. So here is a flag. If you see the reposted student, you can visit my blog http://blog.csdn.net/xianming01.

1. Sources of candidate words

Candidate word sources are divided into two types: complex transformation and complex transformation. The definition here is:

Complex transformation: The Language Model of the input method is required, which is transformed by the input method backend.

There is no need for complex transformations: no language model or input method backend is required, and only the front-end Java code can implement the transformation.

Next we will introduce these content separately. This part involves many classes and is listed as follows:

  • Simple Transformation
Letterconverter. javaromkan. javaromkanfullkatakana. javaromkanhalfkatakana. javakanaconverter. Java
  • Complex Transformation
Wnnengine. javaopenwnnenginejajp. javaopenwnnclauseconverterjajp. javaopenwnndictionaryimpl. javaopenwnndictionaryimpljni. Java
  • Auxiliary class
Composingtext. Java 2. composingtextThis is a key class, which is involved in many codes. So let's introduce it. This class actually indicates the string being edited, that is, the part with an underline in the input box. The variables in the class are defined as follows:
/** * The container class of composing string. * * This interface is for the class includes information about the * input string, the converted string and its decoration. * {@link LetterConverter} and {@link WnnEngine} get the input string from it, and * store the converted string into it. * * @author Copyright (C) 2009 OMRON SOFTWARE CO., LTD.  All Rights Reserved. */public class ComposingText {    /**     * Text layer 0.     * <br>     * This text layer holds key strokes.<br>     * (ex) Romaji in Japanese.  Parts of Hangul in Korean.     */    public static final int LAYER0  = 0;    /**     * Text layer 1.     * <br>     * This text layer holds the result of the letter converter.<br>     * (ex) Hiragana in Japanese. Pinyin in Chinese. Hangul in Korean.     */    public static final int LAYER1  = 1;    /**     * Text layer 2.     * <br>     * This text layer holds the result of the consecutive clause converter.<br>     * (ex) the result of Kana-to-Kanji conversion in Japanese,     *      Pinyin-to-Kanji conversion in Chinese, Hangul-to-Hanja conversion in Korean language.     */    public static final int LAYER2  = 2;    /** Maximum number of layers */    public static final int MAX_LAYER = 3;    /** Composing text's layer data */    protected ArrayList<StrSegment>[] mStringLayer;    /** Cursor position */    protected int[] mCursor;

Here we divide words into three layers. Take Japanese as an example: the first layer is the key information you enter, the second layer is the Kana, and the third layer is the Chinese character information. The third layer is obtained by the second layer, and the second layer is obtained by the first layer. For example, for the two graphs above, the first graph displays information of the second layer, and the second graph displays information of the third layer (click "change ).

In addition, this class is also used to edit the text section. Because you can press the left or right arrow to select different parts of the entire string. I have not understood the specific code of this part?

3. complex transformation

The candidate words that need complex transformation come from the input method engine. Let's take a look at the engine interface of text transformation (how to obtain the candidate words from the input ). This file is wnnengine:

/** * The interface of the text converter accessed from OpenWnn. * <br> * The realization class of this interface should be an singleton class. * * @author Copyright (C) 2009, OMRON SOFTWARE CO., LTD.  All Rights Reserved. */public interface WnnEngine {    /*     * DEFINITION OF CONSTANTS     */    /** The identifier of the learning dictionary */    public static final int DICTIONARY_TYPE_LEARN = 1;    /** The identifier of the user dictionary */    public static final int DICTIONARY_TYPE_USER  = 2;    /*     * DEFINITION OF METHODS     */    /**     * Initialize parameters.     */    public void init();    /**     * Close the converter.     * <br>     *     * OpenWnn calls this method when it is destroyed.     */    public void close();    /**     * Predict words/phrases.     * <br>     * @param text   The input string     * @param minLen The minimum length of a word to predict (0  : no limit)     * @param maxLen The maximum length of a word to predict (-1 : no limit)     * @returnPlus value if there are candidates; 0 if there is no candidate; minus value if a error occurs.     */    public int predict(ComposingText text, int minLen, int maxLen);    /**     * Convert a string.     * <br>     * This method is used to consecutive/single clause convert in     * Japanese, Pinyin to Kanji convert in Chinese, Hangul to Hanja     * convert in Korean, etc.     *     * The result of conversion is set into the layer 2 in the {@link ComposingText}.     * To get other candidates of each clause, call {@link #makeCandidateListOf(int)}.     *     * @param text   The input string     * @return Plus value if there are candidates; 0 if there is no candidate; minus value if a error occurs.     */    public int convert(ComposingText text);    /**     * Search words from the dictionaries.     * <br>     * @param key  The search key (stroke)     * @return Plus value if there are candidates; 0 if there is no candidate; minus value if a error occurs.     */    public int searchWords(String key);    /**     * Search words from the dictionaries.     * <br>     * @param word  A word to search     * @returnPlus value if there are candidates; 0 if there is no candidate; minus value if a error occurs.     */    public int searchWords(WnnWord word);    /**     * Get a candidate.     * <br>     * After {@link #predict(ComposingText, int, int)} or {@link #makeCandidateListOf(int)} or     * {@code searchWords()}, call this method to get the     * results.  This method will return a candidate in decreasing     * frequency order for {@link #predict(ComposingText, int, int)} and     * {@link #makeCandidateListOf(int)}, in increasing character code order for     * {@code searchWords()}.     *     * @returnThe candidate; {@code null} if there is no more candidate.     */    public WnnWord getNextCandidate();}

In the Code referenced above, we only list the interfaces of candidate word sources. Other interfaces are used for assistance, so they are not listed. We can see from the above that the sources of candidate words include the following channels:

  • Prediction
  • Text section Transformation (single section or multi-section transformation, also known as the entire sentence transformation)
  • Search from the dictionary (prediction is actually based on this

3.2 prediction (Dictionary search)

Search from the dictionary to understand it literally. However, this involves a problem: the dictionary format. Openwnn dictionary, I guess it is written in a text file first, and then converted to the current dictionary using a certain tool. So you can't read the dictionary. Recently, our company acquired a Japanese Input Method Based on openwnn. Now we need to replace our own backend with the original Open Source backend. During the study, we found that we could not understand the dictionary format, so we could not do a lot of work.

I didn't see the code for searching from the dictionary, but it is expected that the Code will be searched according to the dictionary format. The implementation is relatively simple.
Let's take a look at the prediction code:

public int predict(ComposingText text, int minLen, int maxLen) {        clearCandidates();        if (text == null) { return 0; }        /* set mInputHiragana and mInputRomaji */        int len = setSearchKey(text, maxLen);        /* set dictionaries by the length of input */        setDictionaryForPrediction(len);                /* search dictionaries */        mDictionaryJP.setInUseState( true );        if (len == 0) {            /* search by previously selected word */            return mDictionaryJP.searchWord(WnnDictionary.SEARCH_LINK, WnnDictionary.ORDER_BY_FREQUENCY,                                            mInputHiragana, mPreviousWord);        } else {            if (mExactMatchMode) {                /* exact matching */                mDictionaryJP.searchWord(WnnDictionary.SEARCH_EXACT, WnnDictionary.ORDER_BY_FREQUENCY,                                         mInputHiragana);            } else {                /* prefix matching */                mDictionaryJP.searchWord(WnnDictionary.SEARCH_PREFIX, WnnDictionary.ORDER_BY_FREQUENCY,                                         mInputHiragana);            }            return 1;        }    }

Prediction is actually the result of searching by dictionary. There are two modes: full match and prefix match. For example, if you enter "Zhongguo", you can get the result of the pronunciation of "Zhongguo" in the dictionary, such as "China" and "over". For the latter, for example, if you enter "Zhongguo" and search for "Chinese people" and "People's Republic of China" in the dictionary.

3.3 text section Transformation

There are two types of intra-text transform: Single-text transform and multi-text transform. For example, if you enter "cute", you can change the word "cute. Multi-text section transformation refers to the continuous input of two single-Text sections. In fact, each single-text section has many results, and the most suitable combination is selected as the combination result of the two single-Text sections.

This part has a large relationship with the language, so I will not introduce it much if I don't understand it.

4. Simple Transformation

This transformation is directly completed in the front-end Java code. It consists of two parts: 1. the input of the Roman audio; 2. The full-width katakana conversion of English numerals.

4.1 audio input

The input of a Romo refers to the input of a pinyin alphabet. The input method generates the corresponding Kana Based on the pinyin alphabet (similar to the input in pinyin ). For example, if you enter "Kawai", the displayed result is "linear regression", which involves the operation to convert Ka to linear. This input mode usually exists in the case of a full keyboard. However, I found that the compiled openwnn source code does not have a full keyboard, so this function cannot be used. I downloaded a simeji Japanese input method (developed based on openwnn). The full keyboard format is as follows: This part is completed in these four classes: letterconverter. javaromkan. javaromkanfullkatakana. javaromkanhalfkatakana. Java

In fact, you can find the source code based on hashmap China. For example, the content of the romkantable class in romkan. Java is:

put("la", "ぁ")        put("xa", "ぁ")        put("a", "あ")        put("li", "ぃ")        put("lyi", "ぃ")       put("xi", "ぃ")        put("xyi", "ぃ")       put("i", "い")         put("yi", "い")

That's why, if you enter Li, you will get the "token.

4.2 English full halfwidth katakana TransformThese things are completed in the class kanaconverter. java. I will not introduce the specific transformation process of this class. I will introduce several hashmaps. After reading this, you will understand it. The content of mhansuujimap is similar:
put( "あ", "1")        put( "い", "11")        put( "う", "111")        put( "え", "1111")        put( "お", "11111")        put( "か", "2")        put( "き", "22")        put( "く", "222")        put( "け", "2222")        put( "こ", "22222")

Mzensuujimap:

put( "あ", "1")        put( "い", "11")        put( "う", "111")

Mhankatamap:

put( "あ", "ア")        put( "い", "イ")        put( "う", "ウ")

Then, when you use the service, for example, enter "English" and press "English". The following result is displayed:


Through this figure, you can get the role of those hashmaps. 5. ConclusionThrough the above introduction, you should have a certain understanding of the generation of openwnn Japanese input method candidate words. Two problems are involved:
  • Although we know so many interfaces. But how do I use these interfaces to generate candidatesview? This issue will be introduced in subsequent articles;
  • How Java code calls back-end C code. The content of JNI will be introduced later.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.