3. Local Database Locale DB
The third part of internationalization is to make sure that the local database Locale DB is well developed.
The language, Character Set, and cultural customs form the local environment Locale when the software is running. A Locale is the execution environment extracted from localization features. It includes the language, region, and character set. Locale is in the format of ZH_CN.GBK, Indicating Chinese zh), Chinese CN), and Character Set GBK ).
Locale can be set by a set of Shell environment variables LANG, LC_ALL, LC_CTYPE, LC_COLLATE, LC_TIME, LC_MONETORY, LC_NUMERIC, and LC_MESSAGES. It can also be set through setlocale provided in the International C language standard) query and set functions in applications. You can set all and part of the entire Locale at runtime.
The Setlocale () function provides an application developer with a tool for setting all or part of the Environment called class) localization. the syntax of the setlocale () function is:
Char * setlocale (category, locale)
Int category;
Char * locale;
Among them, category is the name of one of the five categories. The names of these categories are:
LC_CTYPE (supports character classification and Case sensitivity );
LC_COLLATE (supports string comparison and sorting );
LC_TIME (date and time representation formats are provided. For example, China is year, month, and day, USA is month/day/year, and UK is day/month/year );
LC_MONETARY );
LC_NUMERIC (Number Format ).
In addition, use the special LC_ALL value to set all the classes for the setlocale () function.
In addition to the above local environment services, the Locale database also includes message service and code set conversion.
4. Input and Output services I/O Services)
There are two types of input methods: Pattern Recognition and encoding. Pattern recognition includes voice input, optical character recognition, OCR, and other input methods. The encoding class is mainly based on input methods such as pinyin, strokes, and radicals.
Gui internationalization and Chinese localization, including:
Create a method and interface for picking up the Chinese input module;
Create a variety of Chinese Character Library with dot matrix, Chinese Character Library with vector contour, and Chinese Character Library with Curve Contour;
Create a graphic-based Chinese character information processing function.
The graphical interface is mainly processed at three levels:
X Font Server) provides processing of dot matrix, vector contour, and Curve Contour text;
Library Function Layer Libarries) provides functions related to the X-type server, such as loading the font to the server, loading, querying, and releasing the font;
Command layer Commands) includes tool class and conversion class Commands:
Gui Chinese localization provides the following applications: Chinese character creation tool, Chinese Character icon Editor, Chinese character text editing tool, and Chinese Character graphic printing tool.
Linux input standards are limited to keyboard input. In X11R6, there is XIMX inputmehtodd) standard. XIM is a communication protocol between applications and input methods. Currently, there is no character terminal Input Method Standard.
Localization
Localization, L10n, with the first and last two letters and 10 letters in the middle) is a conversion to a specific local language operating environment.
Localization mainly includes the implementation of the Code System national feature files and input and output services.
1. The code system refers to the character set used. Currently, it is GB2312, GBK, GB18030, and GB13000 in China.
2. The national feature file refers to the local environment content in Locale.
3. Input and Output services are closely related to international input and output services.
Input methods can be divided into the input management layer, front-end processing layer, input unit layer, input unit algorithm layer, and auxiliary area processing layer.
The input management layer automatically finds the required input method module in the specified directory according to the Locale environment, starts or loads it into the application, and fills in the relevant entry table, in order to establish the connection between the application and the input method.
The frontend Processing Layer manages input units, explains the meaning of special keys, switches input units, and stores the results of input unit processing in the corresponding buffer zone, to ensure that the Application removes or calls function display at the processing layer of the secondary zone.
The input unit layer interprets each input key event according to the requirements of the input unit, determines the corresponding action, and forms an input string that meets the requirements of the input unit, call the functions at the input cell algorithm layer to perform dictionary search or code conversion operations, and return the search results to the front-end processing layer.
The algorithm layer of the input unit performs dictionary search based on the input code string sent by the upper-layer function, and returns the search result to the upper-layer function.
The Processing Layer of the secondary area is used to provide an interface for the input method. The real-time status zone, pre-editing zone, and word-building zone processing functions are managed accordingly.
3. Modify the Linux kernel to fully support the UCS.
Internationalization is to solve localization. It is made up of "localization. Localization should follow internationalization, and localization should follow globalization completely. Localization only needs to solve the problem of input methods and Add fonts. Standard organizations should try to add the content required for localization to the international market and follow the international market as much as possible. Localization is done to eliminate localization. When will the localization work be completed.
ISO 10646 is a complete solution to the "Code barrier" software "millennium bug. The complete implementation of ISO 10646 means that the current National Character Set standards are abandoned, including Chinese GB, The United States ASCII. To fully implement the UCS, the UCS can be supported in operating systems, advanced programming languages, supporting software, application software, communication protocols, and networks. In addition, some hardware devices, such as terminals and printers, also need to support the UCS, because the basic ASCII hardware devices will regard some code values in the UCS As control characters. At this time, the entire system was completely internationalized. At that time, the localization work was only to provide the input method and the expanded Font ). To thoroughly implement ISO 10646, we should first start from the operating system and start from Linux in China.
It has been around for seven years since its publication in 1993, but it has not made much progress. To be completely internationalized, you can only go one step at a time. The first step is to define the UCS. The next step is to fully support the UCS on the operating system. Then all the software and hardware support the UCS one by one. Because the operating system is the basis of all software. The internationalization/localization of Linux core remains essentially a "Chinese platform ". Because it is still processed by byte, It is a Chinese processing environment based on the operating system. Only when character-based processing and wide character support are implemented on the operating system, ASCII and various "National Standards" are completely discarded, and the problem can be completely solved. The future of the Chinese platform is not bright. Have there been many Chinese platforms on Windows? Later, they were all defeated by Microsoft's core localization.
We used to build a Chinese platform because we didn't have our own operating system or the source code, so we couldn't do it. It was an appropriate and helpless move. Now Linux gives us an opportunity. Linux open source code has given China's software industry a golden opportunity. Linux must also be constantly innovated and evolving. We cannot always crawl behind foreigners, "patch", and do nothing. We also need to improve Linux to show Chinese talents and make contributions to Linux.
There are two main improvements to Linux. First, improve Linux security vulnerabilities. At the Linux World Conference held in Beijing at the end of August, Sun yufang, deputy director of the Software Institute of the Chinese Emy of Sciences, said that the Software Institute of the Chinese Emy of sciences and Hongqi have improved the security vulnerabilities in Linux and have done a lot of work, the source code will be released soon. Second, Linux is transformed according to the requirements of the UCS to fully support the UCS. The transformation of Linux in the core to fully support the UCS is the ultimate path for Chinese Linux and the correct development path for Linux. If the Linux core is not modified to support the UCS, Linux will also be affected by the economic globalization and the international trend of software. At that time, the operating system that fully supports UCOS will replace the current Linux status. Therefore, thoroughly transforming Linux from the core to support UCOS is also a major concern for the healthy development of Linux in the correct direction.
From economic globalization, the international trend of software, from the general direction of UCOS, it may be half the effort or even counterproductive to achieve nationalization and localization in an isolated manner.
Fully implementing the UCS not only promotes OS transformation, but also affects communication programs and many applications.
Communication Protocols should also be considered to support the UCS. The national standard "Information Technology Internet Chinese Specification-Email transmission format" uses the UCS As the common information exchange code. When a user wants to send information over the Internet, it is transferred over the Internet by converting it to a UCS-2 or its Variant Form UTF-7, UTF-8 with the conversion module of the local character set encoding and UCOS. The receiver can easily convert the UCS-2 to the character set encoding cost. This saves the recipient's ability to identify and determine the encoding used by the information sent by the recipient, and saves the cost of converting the information sent from different encodings using many conversion methods. You only need to use a method to convert the UCS-2 to the local code. If both the sender and receiver use the UCS, you can skip the conversion during receiving and sending. It can also be seen that the great benefits of fully supporting UCOS.
4. Fully supporting the UCS brings huge benefits to the Chinese software industry.
The use of UCS can also greatly reduce the workload of localization of various software. A software designed for Japan or North Korea can be easily changed to a Chinese version. You only need to change the input method and menu. The use of UCS also provides a convenient way for Chinese software to go global. software developed specifically for Chinese can be easily transplanted to internationally available versions, because the core software for text processing does not need to be transformed.
Using UCS will also greatly reduce the cost of software localization, reduce the cost of software purchased by users, and increase the scope of software selection.
The use of UCS greatly reduces the cost of software localization, allowing software developers to invest more manpower and money in the development of new products, A product is also easier to implement in the world, and users can use the latest version of new software more quickly. At that time, we can say that, the national software developed by China itself is also a universal international software.
After Linux and related application software fully support the UCS, China can establish standards and require similar software entering China to support the UCS, so as to promote the support of international software companies.
As for terminals, printers and other hardware, our country has been able to manufacture, quality is no worse than foreign. It is not a big problem for hardware devices to support the UCS. China took the lead in implementing complete support for devices, printers, and other hardware. It can also develop standards to protect domestic products and promote international support for similar hardware products.
Transforming Linux from the core to fully supporting the UCS, we need to develop the application software that fully supports the UCS in the next step. However, in order to be compatible with existing application software and hardware, the operating system must also provide a conversion mechanism to map its core uses of the UCS code into byte-based local character set encoding.
People's opinions on Microsoft are mainly because the monopoly of Microsoft hinders technological development. The product prices in China are too high, and the source code is not open, and there is a "backdoor ". However, Microsoft's Wihdows, Office after all in the market share is the first, it does have a lot of places worth learning, such as easy to learn and use, such as from the core fully achieved support for the UCS-2 code system. It should be said that Microsoft is ahead in this regard. Windows NT/95/98/2000 core fully realize the support of UCS-2 code system, it provides a unified UCS-2 code-based operating system core for users of different languages, and the application software Office97 is also based on the UCS-2 code. Windows NT/95/98/2000 provides a code page codepage) to complete the UCS-2 and existing application compatibility, with local languages such as GB/GBK, Big5, JIS conversion. Can this be true for Linux? In the operating system, the support for ISO 10646GB13000 is fully realized. By codeing similar to codepage, it is compatible with the current application without using GB. Linux should learn from Windows in this regard, and hope that Linux can do better.
XML is a new generation of Web language that replaces HTML, which is currently popular on the Web. Many Web-based programs abroad are based on XML. For example, Microsoft's. net is based on XML. The character set used by XML is ISO 10646, and the default character set is UCS-2.