Unicode轉化為漢字,Unicode轉化漢字

來源:互聯網
上載者:User

Unicode轉化為漢字,Unicode轉化漢字

+ (NSString *)replaceUnicode:(NSString *)unicodeStr { NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"\\u"withString:@"\\U"]; NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"\""withString:@"\\\""]; NSString *tempStr3 = [[@"\""stringByAppendingString:tempStr2]stringByAppendingString:@"\""]; NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding]; NSString* returnStr = [NSPropertyListSerializationpropertyListFromData:tempData mutabilityOption:NSPropertyListImmutable format:NULL errorDescription:NULL]; return [returnStrstringByReplacingOccurrencesOfString:@"\\r\\n"withString:@"\n"]; }

 


漢字與utf8相互轉化

NSString* strA = [@"%E4%B8%AD%E5%9B%BD"stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];NSString *strB = [@"中國"stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

NSString 轉化為utf8

NSString *strings = [NSStringstringWithFormat:@"abc"];NSLog(@"strings : %@",strings);CF_EXPORTCFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil, (__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);

iso8859-1 到 unicode編碼轉換

+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String{NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String] autorelease];[srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearch range:NSMakeRange(0, [srcString length])];[srcString replaceOccurrencesOfString:@"&#x" withString:@"" options:NSLiteralSearch range:NSMakeRange(0, [srcString length])];NSMutableString *desString = [[[NSMutableString alloc]init] autorelease];NSArray *arr = [srcString componentsSeparatedByString:@";"];for(int i=0;i<[arr count]-1;i++){NSString *v = [arr objectAtIndex:i];char *c = malloc(3);int value = [StringUtil changeHexStringToDecimal:v];c[1] = value &0x00FF;c[0] = value >>8 &0x00FF;c[2] = '\0';[desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];free(c);}return desString;}


Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?

A: There are three or four options for making Unicode fit into an 8-bit format.

a) Use UTF-8. This preserves ASCII, but not Latin-1, because the characters >127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. Therefore, it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup languages, etc., but where the all other characters may use arbitrary bytes. 
Example: “Latin Small Letter s with Acute” (015B) would be encoded as two bytes: C5 9B.

b) Use Java or C style escapes, of the form \uXXXXX or \xXXXXX. This format is not standard for text files, but well defined in the framework of the languages in question, primarily for source files.
Example: The Polish word “wyjście” with character “Latin Small Letter s with Acute” (015B) in the middle (ś is one character) would look like: “wyj\u015Bcie".

c) Use the &#xXXXX; or &#DDDDD; numeric character escapes as in HTML or XML. Again, these are not standard for plain text files, but well defined within the framework of these markup languages.
Example: “wyjście” would look like “wyjście"

d) Use SCSU. This format compresses Unicode into 8-bit format, preserving most of ASCII, but using some of the control codes as commands for the decoder. However, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: “ wyjÛcie” where indicates the byte 0x12 and “Û” corresponds to byte 0xDB. [AF] & [KW]


如c所描述,這是一種“未標準"但廣泛採用的做法,說是山寨編碼也行 :-)

所以編碼過程是

字串 -> Unicode編碼 -> &#xXXXX; or &#DDDDD; 

解碼過程反過來即可 

http://unicode.org/faq/utf_bom.html#General

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.