UTF-8 Resources

Source: Internet
Author: User
1. FAQ about UTF-8 and Unicode
This article illustrates the information required to use Unicode/UTF-8 on POSIX systems (Linux, UNIX. in the next few years, Unicode will be very close to replacing the ASCII and Latin-1 encoding. it not only allows you to process any language that actually exists on the Earth, but also provides a comprehensive set of mathematical and technical symbols, which simplifies scientific information exchange.
UTF-8 encoding provides a simple and backward compatible method that allows the operating system, such as UNIX, to use Unicode, fully around ASCII. UTF-8 is UNIX, Linux has a similar system using Unicode. it's time for you to understand it.

2. GBK Encoding
Chinese character encoding overview gb2312/GBK/gb18030/big5

3. Miscellaneous
1)Difference between utf8 and gb2312
2) UTF-8, GBK, gb2312 encoding rules and Detection
3) common character set encoding details: ASCII, gb2312, GBK, gb18030, Unicode, UTF-8

4. submit the form on the webpage (Stone's article)
Http://www.blogjava.net/emu/articles/31756.html

5. Linux Unicode Programming
How to add and use Unicode in Linux programs for foreign language support

6. interchange between UTF-8 and gb2312(Windows)
I believe that many program developers often encounter character encoding problems, which is also a headache. Because these are potential errors, you must have development experience in this area to identify these errors. Especially when processing XML documents, this problem occurs more frequently. Once a server program is written in Java and the client interaction is written in VC. Interaction protocols are all written in XML. The result is that the data reception is incorrect during communication. Wondering! So I captured the data using the network packet capture tool and later found that the xml header on Java was like this. <? XML version = "1.0" encoding = "UTF-8"?>, The default value for VC is gb2312. Therefore, Chinese character data is incorrect. I have very few articles in this regard. For such problems, I will introduce a conversion program I wrote. Of course, the program is very simple. I hope you will have a smile if you have more fun.

7. Convert gb2312 to UTF-8(Linux Command Line)
I always wanted to migrate my Linux box from the locale settings of zh_cn.gb2312 TO THE zh_CN.UTF-8.
Instead, the files used in a large number of experiments are all gb2312 encoded.
This is nearly done because you need to add Chinese support for UTF-8 encoding on a tool. Here is what I migrated here
Some problems related to Chinese characters and my personal solutions are listed here.
Provide a reference for friends who have the same requirements.
Note: most of the tools mentioned below will "write" your original files, that is, convert
May cause errors or deviations. If you are not an experienced linux user
During the operation, make sure to back up the data first. We strongly recommend that you carefully read the manual before using a tool.
Manual. ("Man Program-name ")

8. Convert the UTF-8 to gb2312 with PHP Encoding
Encoding Problems are often encountered when processing PHP programs. especially for processing RSS or some XML files. the encoding of these files is generally UTF-8, but many websites require the encoding is gb2312. to make the UTF-8 file display normally, this problem is often a headache.
Php-mbstringModuleMb_convert_encoding ()The function provides the character encoding function, but does not provide the gb2312 encoding function.
However, the following functions provide this function.

9. Windows core programming Chapter 2-Unicode programming under Window
As m I C R o s o f t company's wi n d o W S operating system becomes increasingly popular in the world, for software developers,
Targeting different international markets has become an increasingly important issue. U.S. software versions
The version was pushed to the market six months ahead of schedule, which was once commonplace. However
With more and more support, it is easier to produce a variety of application software for the international market, thus shortening the beauty of software
The time interval between the Chinese version and the international version.
The WI n d o w s operating system always provides various support to help software developers to localize applications.
Work. The application can obtain information about a specific country from various functions, and observe the settings of the control panel
Determine the user's preferences. Wi n d o w s even supports different fonts to meet application needs.
The reason for putting this chapter at the beginning of this book is that, considering that u n I c o d e is used to develop any application
Basic steps. In each chapter of this book, we will talk about the questions about u n I c o d e, and all the example applications in this book
All programs are implemented using u n I c o d e ". If you are developing for Microsoft Windows 2000 or Microsoft Windows CE
Application, you should use u n I c o d e for development. If you develop an application for Microsoft Windows 98, you must
Decisions must be made on certain issues. This chapter also describes issues related to Windows 98.

10. C Programming in Linux
During encoding and conversion on Linux, you can use both the iconv function family programming and the iconv command, but the latter is for files, converts a specified file from one encoding to another.
I. Use the iconv function family for encoding and conversion
The header file of the iconv function family is iconv. H, which must be included before use.
# Include <iconv. h>
The iconv function family has three functions. The prototype is as follows:
(1) iconv_t iconv_open (const char * tocode, const char * fromcode );
This function indicates which two types of encoding are to be converted. tocode is the target encoding and fromcode is the original encoding. This function returns a conversion handle for the following two functions.
(2) size_t iconv (iconv_t CD, char ** inbuf, size_t * inbytesleft, char ** outbuf, size_t * outbytesleft );
This function reads characters from inbuf and outputs the converted characters to outbuf. inbytesleft records the number of characters that have not been converted, and outbytesleft records the remaining space of the output buffer. (3) int iconv_close (iconv_t CD );
This function is used to close the conversion handle and release resources.
Example 1: A conversion example program implemented in C Language

/* F. C: Code Conversion example C program */
# Include <iconv. h>
# Deprecision outlen 255
Main ()
{
Char * in_utf8 = "e? Why ?? Why? ";
Char * in_gb2312 = "installing ";
Char out [outlen];

// Convert Unicode code to gb2312 code
Rc = u2g (in_utf8, strlen (in_utf8), Out, outlen );
Printf ("Unicode --> gb2312 out = % Sn", out );
// Convert the gb2312 code to the Unicode code
Rc = g2u (in_gb2312, strlen (in_gb2312), Out, outlen );
Printf ("gb2312 --> Unicode out = % Sn", out );
}
// Code Conversion: Convert from one encoding to another
Int code_convert (char * from_charset, char * to_charset, char * inbuf, int inlen, char * outbuf, int outlen)
{
Iconv_t CD;
Int RC;
Char ** pin = & inbuf;
Char ** pout = & outbuf;

Cd = iconv_open (to_charset, from_charset );
If (Cd = 0) Return-1;
Memset (outbuf, 0, outlen );
If (iconv (Cd, pin, & inlen, pout, & outlen) =-1) Return-1;
Iconv_close (CD );
Return 0;
}
// Convert Unicode code to gb2312 code
Int u2g (char * inbuf, int inlen, char * outbuf, int outlen)
{
Return code_convert ("UTF-8", "gb2312", inbuf, inlen, outbuf, outlen );
}
// Convert the gb2312 code to the Unicode code
Int g2u (char * inbuf, size_t inlen, char * outbuf, size_t outlen)
{
Return code_convert ("gb2312", "UTF-8", inbuf, inlen, outbuf, outlen );
}

Example 2: A conversion example program in C ++

/* F. cpp: Code Conversion example c ++ Program */
# Include <iconv. h>
# Include <iostream>

# Deprecision outlen 255

Using namespace STD;

// Code conversion operation class
Class codeconverter {
PRIVATE:
Iconv_t CD;
Public:
// Construct
Codeconverter (const char * from_charset, const char * to_charset ){
Cd = iconv_open (to_charset, from_charset );
}

// Structure
~ Codeconverter (){
Iconv_close (CD );
}

// Conversion output
Int convert (char * inbuf, int inlen, char * outbuf, int outlen ){
Char ** pin = & inbuf;
Char ** pout = & outbuf;

Memset (outbuf, 0, outlen );
Return iconv (Cd, pin, (size_t *) & inlen, pout, (size_t *) & outlen );
}
};

Int main (INT argc, char ** argv)
{
Char * in_utf8 = "e? Why ?? Why? ";
Char * in_gb2312 = "installing ";
Char out [outlen];

// UTF-8 --> gb2312
Codeconverter cc = codeconverter ("UTF-8", "gb2312 ");
Cc. Convert (in_utf8, strlen (in_utf8), Out, outlen );
Cout <"UTF-8 --> gb2312 in =" <in_utf8 <", out =" <out <Endl;

// Gb2312 --> UTF-8
Codeconverter CC2 = codeconverter ("gb2312", "UTF-8 ");
Cc2.convert (in_gb2312, strlen (in_gb2312), Out, outlen );
Cout <"gb2312 --> UTF-8 in =" <in_gb2312 <", out =" <out <Endl;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.