Abstr:]
ASCII and Unicode are two common character encodings. Their representation methods are different, so they must be differentiated in the program.
Based on the actual development experience of the author, this article analyzes in detail the file Writing Process of ASCII and Unicode character encoding messages, which provides a useful reference for the development of relevant software.
Key Words]
ASCII Unicode C language encoding Development
1. Introduction to ASCII and Unicode encoding
1. ASCII Encoding
ASCII is a computer coding system based on Latin letters. It uses a combination of the specified 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters.
The standard ASCII Code uses a 7-digit binary number to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation marks, and special control characters used in American English.
2. Unicode encoding Overview
Unicode code is an international standard and uses two-byte encoding, which is incompatible with ASCII code. At present, UCS-2 is widely used.
The Unicode code is the same as the ASCII Code. For example, the Unicode code of the letter "A" is 0x127, and the decimal value is 97; the ASCII code of "a" is 0x61, and the decimal value is 97.
Ii. Processing of ASCII and Unicode-encoded message write files
1. Requirement Description
To meet the requirements of a certain version, a module is required to write the text message sent from another module to the file. The message encoding format is ascii or Unicode.
2. Processing of two types of coded message write files
(1) Processing of ASCII encoded messages
For ascii-encoded messages, the message content can be directly written into the file without any additional processing.
(2) Processing of Unicode-encoded messages
For Unicode-encoded messages, you need to add "fffe" (small-end mode) or "feff" (large-end mode) to the header of the file to be written, and then splice the message content to the back. The large and small-end mode should be agreed upon by the sending module. In the configuration file of this module, control whether the messages sent at that time are in the small-end mode or the large-end mode.
The big-end mode stores the high data records in the low memory address while the low data points in the high memory address. The small-end Mode, it is the high position of the index data stored in the high address of the memory, while the low position of the data is stored in the low address of the memory.
III. C program implementation
Based on the requirements and analysis in the second part, the program framework is as follows:
......
Char szfilecontent [1024] = {0 };
Char szfilename [1024] = {0 };
Int ifilesize = 0;
Unsigned char a = 0xff;
Unsigned char B = 0xfe;
Int FD = 0; // file handle
If (imsgfmt = 1) // unicode encoding format
{
If (gconfig. iusebigendianorlittleendian = 0) // small-end mode fffe
{
Szfilecontent [0] =;
Szfilecontent [1] = B;
}
Else if (gconfig. iusebigendianorlittleendian = 1) // large-end mode feff
{
Szfilecontent [0] = B;
Szfilecontent [1] =;
}
// Copy the message content szmsgcontent to szfilecontent. Note that + 2 is required.
Memcpy (szfilecontent + 2, szmsgcontent, imsglength );
}
Else // ASCII encoding format
{
Memcpy (szfilecontent, szmsgcontent, imsglength); // directly copy
}
// Write the content to the file
If (FD = open (szfilename, o_rdwr | o_creat, s_irwxu | s_irwxg | s_irwxo) <= 0)
{
Writelogex (log_error, ("Exec open failed. filename = % s", szfilename ));
Return err_general;
}
// Save the content
Lseek (FD, 0, seek_set );
If (imsgfmt = 1) // unicode encoding format
{
// Because two new bytes are added to the file header, add 2 to the original length
Ifilesize = imsglength + 2;
If (write (FD, szfilecontent, ifilesize )! = Ifilesize) // write a file
{
Writelogex (log_error, ("Exec write failed. filename = % s", szfilename ));
Close (FD );
FD = 0;
Return err_general;
}
}
Else // ASCII encoding format
{
Ifilesize = imsglength; // The length remains unchanged.
If (write (FD, szfilecontent, ifilesize )! = Ifilesize) // write a file
{
Writelogex (log_error, ("Exec write failed. filename = % s", szfilename ));
Close (FD );
FD = 0;
Return err_general;
}
}
// The file is successfully written.
Writelogex (log_info, ("Exec write successfully. filename = % s", szfilename ));
Close (FD );
FD = 0;
Iv. Summary
This article analyzes the ASCII and Unicode character encoding, and uses the C language code to demonstrate the entire file writing process of the two types of encoding messages. Because of the diversity of character encoding formats, We need to write files according to the characteristics of each encoding. This article provides a useful reference for the development of related software projects to write files based on different encoding formats.
(My microblogging: http://weibo.com/zhouzxi? Topnav = 1 & WVR = 5, No.: 245924426, welcome !)
Analysis of ASCII and Unicode-encoded message writing files