Linux core Chinese Characters

Source: Internet
Author: User
Tags control characters
Linux-core Chinese characters-general Linux technology-Linux programming and kernel information. The following is a detailed description. Before explaining the technical details of Chinese Character Display Based on Linux core, it is necessary to introduce the operating mechanism of the original linux. This article mainly involves the implementation of terminals and frame buffering in Linux.

Console)
Generally, the console we see in linux is completed by several devices. They are/dev/ttyN (where tty0 is/dev/console, tty1, and tty2 are different virtual terminals (virtual console )). generally, the hot key alt + Fn is used to switch between these virtual terminals. All these tty devices are mapped by linux/drivers/char/console. c and vt. c. The console. c is responsible for drawing characters on the screen, and vt. c is responsible for managing different virtual terminals and providing the content to be drawn by console. c. Vt. c puts the content drawn by console. c to different caches under different virtual terminals. Vt. c manages the array of such a buffer and is responsible for switching between them to specify which buffer is activated. The virtual terminal you see corresponds to the activated buffer zone. Console. c is also responsible for receiving input from the terminal, and then placing the received input in the buffer zone.

Frame Buffer)
Framebuffer is a device that abstracts the video memory. You can perform operations on the video memory directly through the read/write operations of the device. Such operations are abstract and unified. Users do not have to worry about the location of Physical video memory, page feed mechanism, and other details. These are all driven by the Framebuffer device.

The source file of Framebuffer is in the linux/drivers/video/directory. The total abstract device file is fbcon. c, which contains the source files related to various graphics/card drivers. When frame buffering is used, Linux places the video card in graphics mode.

Test
A simple example is provided to illustrate the character display process. We assume that the following simple program is run in virtual Terminal 1 (/dev/tty1.

Main ()

{

Puts ("hello, world. n ");

}

The puts function calls write (2) to the default output file (/dev/tty1 ). The core function that the system calls to the linux kernel is con_write () in console. c, and con_write () will eventually call do_con_write (). In do_con_write (), the "hello, world. n" string is placed in the buffer zone corresponding to tty1.

Do_con_write () is also responsible for controlling characters and the position of the optical mark. Let's take a look at the Declaration of the do_con_write () function.

Static int do_con_write (struct tty_struct * tty, int

From_user, const unsigned char * buf, int count) Where tty is a pointer to the tty_struct structure, this structure stores all the information about this tty (refer to linux/include/linux/tty. h ). The Tty_struct structure defines attributes of common (or high-level) tty (such as width and height ).

The driver_data variable in the tty_struct structure is used in the do_con_write () function.

Driver_data is a vt_struct pointer. Include the tty serial number in the vt_struct structure (we are using tty1, so this serial number is 1 ). The Vt_struct structure contains an array vc_cons in the vc structure, which is the private data of each virtual terminal.

Static int do_con_write (struct tty_struct * tty, int

From_user, const unsigned char * buf, int count)

{

Struct vt_struct * vt = (struct vt_struct *) tty->

Driver_data; // we use the driver_data variable.

.....

Currcons = vt-> vc_num; file: // here, the vc_nums is 1.

.....

}

To access the private data of the virtual terminal, use the vc_cons [currcons]. d pointer. The Pointer Points to the position of the Light mark on the current virtual terminal, the starting address of the buffer, the buffer size, and so on.

Every character in "hello, world. n" must pass through conv_uni_to_pc ()

This function is converted to an 8-bit display character. The main purpose of this is to enable countries in different languages to map 16-bit UniCode codes to eight-bit display character sets. Currently, it is mainly for European countries, the ing result is 8 bits and does not contain the range of double bytes.

You can customize the UNICODE ing between UNICODE and display characters. In the default ing table, Chinese characters are mapped to other characters, which is not required. Therefore, we have two options:

1. conv_uni_to_pc () conversion is not performed.

2. Load the ing relationship that conforms to the two-byte processing, that is, the 1-to-1 ing of non-control characters is performed. The custom UNICODE code table that conforms to this ing is direct. uni.

To view/load the unicode ing table of the current system, use the external command loadunimap.

After conv_uni_to_pc () conversion, the characters in "hello, world. n" are entered in the buffer zone of tty1. Then, do_con_write () calls the lower-layer driver to output the content in the buffer zone to the Monitor (which is equivalent to copying the content in the buffer zone to the VGA video memory ).

Sw-> con_putcs (vc_cons [currcons]. d, (*) draw_from, (

*) Draw_to-(2010*) draw_from, y, draw_x );

The reason why the underlying driver needs to be called is that there are different display devices, and the access methods for VGA display memory are different.

The above Sw-> con_putcs () will call the fbcon_putcs () function in fbcon. c (con_putcs is a pointer to a function and points to the fbcon_putcs () function in Framebuffer mode ). That is to say, in the do_con_write () function, fbcon_putcs () function is called directly to draw characters. For example, in the 256 color mode, void fbcon_cfb8_putcs (struct vc_data * conp, struct display * p, const unsigned short * s, int count, int yy, int xx)

Show Chinese
For example, we try to output a Chinese sentence: putcs (Hello n); (your internal code is 0xc4, 0xe3, 0xba, 0xc3 ). What will happen at this time? I'm sure "hello" will not appear on the screen, because there is no Chinese Character Library in the core, and Chinese display is a rice-free experience.

1 In the void fbcon_cfb8_putcs () function that is responsible for character display, the original operation is as follows: for each character to be displayed, read from the virtual terminal buffer in sequence in the unit of WORD (low byte is ASCII code, high 8 is the character attribute), because the Chinese character is double byte encoding, therefore, this operation cannot display Chinese characters. Only xxxx_putcs () is a VGA character.

Problems to be Solved:

Make sure that the uni □pc conversion does not change the original encoding during do_con_write. A very direct implementation method is to load a custom UNICODE ing table, loadunimapdirect. uni, or directly set direct. uni as the core default ing table.

The first attempt is as follows.

First, you need to load the Chinese Character Library in the core, then modify the fbcon_cfb8_putcs () function, read two words at fbcon_cfb8_putcs (), and check whether the two low-level bytes can be combined into a Chinese character, if a Chinese character can be combined, the offset of the Chinese character in the Chinese Character Font is calculated and displayed as a 16x16 VGA character.

The test results show that:

1. Chinese characters can be output, but there are still many unsatisfactory places. For example, if a string of Chinese characters starting with half a Chinese character is output, the Chinese characters after these half are garbled. This is a problem with half a Chinese character.

2. Moving the cursor will damage the display of Chinese characters. The result is that Chinese characters move by the cursor become garbled characters. This is because the update of the cursor is completed through the xxxx_putc () function.

The xxxx_putc () function is similar to the xxxx_putcs () function, but the xxxx_putc () function refreshes only one character rather than one string. Therefore, the input parameter of xxxx_putc () is an integer, instead of a string address. The Declaration of the Xxxx_putc () function is as follows: void fbcon_cfb8_putc (struct vc_data * conp, struct display * p, int c, int yy, int xx)

The next attempt is to modify the xxxx_putcs () and xxxx_putc () functions at the same time. To solve the problem of half a Chinese character, scanning starts from the starting position of the current row on the screen before each output to determine whether the character to be output falls into the position of half a Chinese character. If it is the position of half a Chinese character, adjust it accordingly, that is, move one from the forward

The output starts at the position of byte.
This solution has a difficulty: The xxxx_putc () function uses an integer as a parameter instead of a buffer address. Therefore, xxxx_putc () cannot directly use adjacent characters to identify whether the separator is a Chinese character.

The solution is to use the cursor position parameter (yy, xx) of xxxx_putc () to roll out the position of the character in the buffer. However, there are still some minor issues. In a Linux virtual terminal, the user may roll the screen (shift + pageup), resulting in inconsistent y coordinates of the cursor and the number of lines of the corresponding characters in the buffer. The corresponding solution is to consider the parameters of the screen.

In this way, we take another step and get a relatively better version. However, the problem persists. When you press turbonetcfg, the border character of the menu is displayed as a Chinese character. This is because the border character is an extended character and uses 8th characters, so it is displayed as a Chinese character. For example, if the inner code of the single-line "1" tab is 0xC4, it is a series of 0xC4, and 0xC4C4 is the Chinese character. As a result, the horizontal tab is replaced by a series of words. It is very difficult to solve this problem, because there are many types of tabs, and the combination types of vertical tabs and subsequent characters are diverse, therefore, it is difficult to determine whether the character at the corresponding position is a Tab character. Theoretically, No matter what exclusion algorithm is used, there must be a false positive, because there is always ambiguity, there are no sufficient conditions to determine whether the current character is a tab or a Chinese character.

On the one hand, we look for better exclusion and combination algorithms, and on the other hand, we try to find other solutions. To fundamentally solve a problem, we must use other auxiliary information. It is not enough to judge from the characters in the buffer zone.

After some efforts, we found that when using extended characters in UNIX, we must first output the character Escape sequence (Escape sequence) to switch the current character set. The character escape sequence is a control command headed by the Control Character Esc. Terminal Control commands are completed on UNIX virtual terminals, including moving cursor coordinates, scrolling, deleting, and switching character sets. That is to say, before outputting a string that represents a tab, it is usually necessary to output a specific character escape sequence. In console. c, there are variables that record character states based on character escape sequence commands. By combining the information provided by the variable, you can distinguish between the tab and the Chinese character very cleanly.

Under the guidance of the above ideas, we have created a new solution. After modification, another version is obtained.

In this new version, when turbonetcfg was first drawn, the tabs and Chinese characters were clearly separated and the results were very correct. However, there are still new problems: When turbonetcfg re-draws (such as switching a virtual terminal or moving the mouse cursor), the tabs still become Chinese characters, because re-painting is completely dependent on the buffer zone, the variable used to record the character set status does not reflect the current character set status. The problem persists. We are back to the starting point. : (It seems that the final solution to the problem must be to include the state of the character set with each character in the buffer. Let's take a look at the Buffer structure.

Each character occupies a 16-bit buffer. The low 8-bit value is an ASCII value and is fully utilized. The high 8-bit value contains foreground color and background color attributes, and there is no extra space to use. Therefore, only new buffers can be opened. To maintain consistency, we decided to add a buffer of the same size after the original buffer zone to store information about Chinese characters.

Some readers may ask, we only need to add a bit of information for each character to indicate whether it is a Chinese character. Why do we need to open a double buffer with the same size as the original buffer, is it too wasteful?

Let's put down this question and answer it later.

In fact, if you add a bit to indicate whether the current character is the left half side of a Chinese character or the right half side, the whole line of string scanning on the screen will be skipped, programming is simpler. However, some readers may ask, even so, is 8 bit always enough? Why do we need to use 16 bits?

Our practice is to store the inner code of the other half of Chinese Characters in 8 bits, and store the auxiliary information mentioned above with 2 bits in 8 bits, the remaining six digits of the 8-digit height can be used to store Chinese characters or other encoding methods (such as BIG5, Japanese, and Korean, this allows us to display characters in multiple dubyte languages on the same screen without mutual interference. In addition, double buffering is easier to compute during programming.

In this way, we will answer the two questions above.

So far, we have a set of solutions to thoroughly solve problems such as mutual interference between Chinese characters and tabs, refreshing and re-painting of half Chinese characters. The rest is the specific programming implementation problem.

However, because there are a lot of Framebuffer drivers, modifying the xxxx_putc () function and xxxx_putcs () function of each driver is not a small task. Moreover, after changing the driver, testing of each type of driver is also very troublesome, especially for graphics cards with hardware acceleration, modification and testing will be more difficult.

So, there is no way to save the graphics card driver without modifying it?

After some efforts, we found that before calling the xxxx_putcs () or xxxx_putc () function to output Chinese characters, we can modify the pointer of the vga font so that it points to the position of the expected Chinese characters in the Chinese Character Font, that is, a Chinese character is output as two vga ASCII characters. That is to say, there are two character libraries in the kernel, one is the original vga character library and the other is the Chinese Character Library. When we need to output Chinese characters, point the vga font pointer to the corresponding position of the Chinese Character Font. After the Chinese character is output, point the pointer to the original position of the vga font.

In this way, we only need to modify fbcon. c and console. c, in which console. c is responsible for maintaining the double buffer, storing the information of each character in the attached buffer; while fbcon. c is responsible for adjusting the pointer of the vga font and calling the underlying display driver by using the additional information in the double buffer.

Here are a few notes:

1. Due to screen re-painting and other reasons, there are multiple places that call the underlying drivers xxxx_putc () and xxxx_putcs. We made two functions respectively to wrap these two calls, to complete the replacement of the font, call xxxx_putcs () or xxxx_putc (), and restore the font.

2. In order to see Chinese Characters in shift + pageup mode, we need to make other modifications.

During the design of virtual terminals, Linux provides the ability to review the information beyond the screen to be rolled out. This is to use hot keys to scroll up the screen (shift + pageup ). The currently used virtual terminal has a public buffer (soft back) to store information outside of the screen. When the virtual terminal is switched, the public buffer content is cleared and used by the new virtual terminal. When you scroll up, the content in the public buffer zone is displayed. Therefore, if we want to see Chinese characters when Scrolling up, the public buffer must be doubled to ensure that no information is lost. When you fill out the information on the screen and enter it in the public buffer area, you must also enter the additional information in the public buffer area. This requires fbcon. c to be aware of the additional information used in the public buffer zone.

Of course, there is another method of laziness, that is, the user is not allowed to scroll up, so as to avoid processing the Public Zone buffer.

3. Write different encoding methods (GB, BIG5, Japanese, and Korean) into different modules to achieve dynamic loading, so that the new encoding method does not need to be re-compiled.

Summary
Through this exploration of the Linux core, we found that at present, the Linux core design does not fully consider the display of Double Byte encoding characters. In this case, we found a method to solve the problem of displaying Chinese characters at the core and implemented the encoding scheme.

Following the core GPL copyright statement, we also published the source code for implementing this technology. Of course, these changes are still GPL. if we can help our friends at the core of our research and reduce the mystery of the core, it will be our greatest achievement.

But for the core and Chinese culture, this is just an attempt, far from the end. such changes are somewhat hack-colored and are unlikely to be integrated into the core of the Authority. we are still actively exploring ways to solve this problem. We believe that this result must be achieved through the joint efforts of the Linux community at home and abroad. we also welcome everyone to discuss this issue with us.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.