Analysis of ASCII and Unicode-encoded message writing files

Source: Internet
Author: User
Tags control characters

Abstr:]

ASCII and Unicode are two common character encodings. Their representation methods are different, so they must be differentiated in the program.

Based on the actual development experience of the author, this article analyzes in detail the file Writing Process of ASCII and Unicode character encoding messages, which provides a useful reference for the development of relevant software.

Key Words]

ASCII Unicode C language encoding Development

 

1. Introduction to ASCII and Unicode encoding

1. ASCII Encoding

ASCII is a computer coding system based on Latin letters. It uses a combination of the specified 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters.

The standard ASCII Code uses a 7-digit binary number to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation marks, and special control characters used in American English.

 

2. Unicode encoding Overview

Unicode code is an international standard and uses two-byte encoding, which is incompatible with ASCII code. At present, UCS-2 is widely used.

The Unicode code is the same as the ASCII Code. For example, the Unicode code of the letter "A" is 0x127, and the decimal value is 97; the ASCII code of "a" is 0x61, and the decimal value is 97.

 

Ii. Processing of ASCII and Unicode-encoded message write files

1. Requirement Description

To meet the requirements of a certain version, a module is required to write the text message sent from another module to the file. The message encoding format is ascii or Unicode.

 

2. Processing of two types of coded message write files

(1) Processing of ASCII encoded messages

For ascii-encoded messages, the message content can be directly written into the file without any additional processing.

 

(2) Processing of Unicode-encoded messages

For Unicode-encoded messages, you need to add "fffe" (small-end mode) or "feff" (large-end mode) to the header of the file to be written, and then splice the message content to the back. The large and small-end mode should be agreed upon by the sending module. In the configuration file of this module, control whether the messages sent at that time are in the small-end mode or the large-end mode.

The big-end mode stores the high data records in the low memory address while the low data points in the high memory address. The small-end Mode, it is the high position of the index data stored in the high address of the memory, while the low position of the data is stored in the low address of the memory.

 

III. C program implementation

Based on the requirements and analysis in the second part, the program framework is as follows:

......

Char szfilecontent [1024] = {0 };

Char szfilename [1024] = {0 };

Int ifilesize = 0;

Unsigned char a = 0xff;

Unsigned char B = 0xfe;

Int FD = 0; // file handle

 

If (imsgfmt = 1) // unicode encoding format

{

If (gconfig. iusebigendianorlittleendian = 0) // small-end mode fffe

{

Szfilecontent [0] =;

Szfilecontent [1] = B;

}

Else if (gconfig. iusebigendianorlittleendian = 1) // large-end mode feff

{

Szfilecontent [0] = B;

Szfilecontent [1] =;

}

// Copy the message content szmsgcontent to szfilecontent. Note that + 2 is required.

Memcpy (szfilecontent + 2, szmsgcontent, imsglength );

}

Else // ASCII encoding format

{

Memcpy (szfilecontent, szmsgcontent, imsglength); // directly copy

}

 

// Write the content to the file

If (FD = open (szfilename, o_rdwr | o_creat, s_irwxu | s_irwxg | s_irwxo) <= 0)

{

Writelogex (log_error, ("Exec open failed. filename = % s", szfilename ));

Return err_general;

}

// Save the content

Lseek (FD, 0, seek_set );

 

If (imsgfmt = 1) // unicode encoding format

{

// Because two new bytes are added to the file header, add 2 to the original length

Ifilesize = imsglength + 2;

 

If (write (FD, szfilecontent, ifilesize )! = Ifilesize) // write a file

{

Writelogex (log_error, ("Exec write failed. filename = % s", szfilename ));

Close (FD );

FD = 0;

Return err_general;

}

}

Else // ASCII encoding format

{

Ifilesize = imsglength; // The length remains unchanged.

 

If (write (FD, szfilecontent, ifilesize )! = Ifilesize) // write a file

{

Writelogex (log_error, ("Exec write failed. filename = % s", szfilename ));

Close (FD );

FD = 0;

Return err_general;

}

}

 

// The file is successfully written.

Writelogex (log_info, ("Exec write successfully. filename = % s", szfilename ));

Close (FD );

FD = 0;

 

Iv. Summary

This article analyzes the ASCII and Unicode character encoding, and uses the C language code to demonstrate the entire file writing process of the two types of encoding messages. Because of the diversity of character encoding formats, We need to write files according to the characteristics of each encoding. This article provides a useful reference for the development of related software projects to write files based on different encoding formats.



(My microblogging: http://weibo.com/zhouzxi? Topnav = 1 & WVR = 5, No.: 245924426, welcome !)

Analysis of ASCII and Unicode-encoded message writing files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.