Unicode conversion to Chinese characters, Unicode conversion to Chinese Characters

Source: Internet
Author: User

Unicode conversion to Chinese characters, Unicode conversion to Chinese Characters

+ (NSString *)replaceUnicode:(NSString *)unicodeStr { NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"\\u"withString:@"\\U"]; NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"\""withString:@"\\\""]; NSString *tempStr3 = [[@"\""stringByAppendingString:tempStr2]stringByAppendingString:@"\""]; NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding]; NSString* returnStr = [NSPropertyListSerializationpropertyListFromData:tempData mutabilityOption:NSPropertyListImmutable format:NULL errorDescription:NULL]; return [returnStrstringByReplacingOccurrencesOfString:@"\\r\\n"withString:@"\n"]; }

 


Conversion Between Chinese characters and utf8

NSString * strA = [@ "% E4 % B8 % AD % E5 % 9B % BD" encoding: NSUTF8StringEncoding]; NSString * strB = [@ "China" encoding: NSUTF8StringEncoding];

NSString to utf8

NSString *strings = [NSStringstringWithFormat:@"abc"];NSLog(@"strings : %@",strings);CF_EXPORTCFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil, (__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);

Conversion from iso8859-1 to unicode encoding

+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String{NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String] autorelease];[srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearch range:NSMakeRange(0, [srcString length])];[srcString replaceOccurrencesOfString:@"&#x" withString:@"" options:NSLiteralSearch range:NSMakeRange(0, [srcString length])];NSMutableString *desString = [[[NSMutableString alloc]init] autorelease];NSArray *arr = [srcString componentsSeparatedByString:@";"];for(int i=0;i<[arr count]-1;i++){NSString *v = [arr objectAtIndex:i];char *c = malloc(3);int value = [StringUtil changeHexStringToDecimal:v];c[1] = value &0x00FF;c[0] = value >>8 &0x00FF;c[2] = '\0';[desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];free(c);}return desString;}


Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?

A: There are three or four options for making Unicode fit into an 8-bit format.

A) Use UTF-8. this preserves ASCII, but not Latin-1, because the characters> 127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. therefore, it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup ages, etc ., but where the all other characters may use arbitrary bytes.
Example: "Latin Small Letter s with Acute" (015B) wocould be encoded as two bytes: C5 9B.

B) Use Java or C style escapes, of the form \ uXXXXX or \ xXXXXX. this format is not standard for text files, but well defined in the framework of the ages in question, primarily for source files.
Example: The Polish word "wyj character cie" with character "Latin Small Letter s with Acute" (015B) in the middle (character is one character) wowould look like: "wyj \ u015Bcie ".

C) Use the & # xXXXX; or & # DDDDD; numeric character escapes as in HTML or XML. again, these are not standard for plain text files, but well defined within the framework of these markup ages.
Example: "wyj javascie" wocould look like "wyj javascie"

D) Use SCSU. this format compresses Unicode into 8-bit format, preserving most of ASCII, but using some of the control codes as commands for the decoder. however, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: "wyj fetch cie" where indicates the byte 0x12 and "response" corresponds to byte 0xDB. [AF] & [KW]


As described in c, this is an "unstandard" but widely used practice, saying that it is also applicable to the shanzhai encoding :-)

So the encoding process is

String-> Unicode encoding-> & amp; # xXXXX; or & amp; # DDDDD;

The decoding process is reversed.

Http://unicode.org/faq/utf_bom.html#General

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.