//string to DataNSString *str =@"Jesfds"; NSData*data =[str datausingencoding:nsutf8stringencoding];//NSData Turn NSStringNSString *result =[[NSString alloc] Initwithdata:data encoding:nsutf8stringencoding];//Data Turn charNSData *data;Char*test=[data bytes];//Char to Databyte* TempData = malloc (sizeof(byte)* -); NSData*content=[nsdata datawithbytes:tempdata Length: -];
when NSData is converted to nsstring using methods such as Initwithdata, nil is returned if the contents of the NSData contain non-encoding encoded characters.
----------SDK documents are as follows-------------
-(instancetype)initwithdata: (nsdata *) data
encoding: (nsstringencoding) encoding;
Return Value
An NSString
object initialized by converting the bytes in to data
Unicode characters using encoding
. The returned object is different from the original receiver. Returns nil
If the initialization fails for some reason (for example if data
does isn't represent valid data for encoding
).
-----------------------------
This result in many cases may not be what we want, such as in the acquisition of Web page source for analysis, if the page uses UTF-8 encoding, only contains individual non-utf-8 characters, we would like to convert nsstring success, discard (or replace) those illegal characters.
According to UTF8 format standard
Unicode/ucs-4 |
Bit number |
UTF-8 |
BYTE number |
Range (16 binary) |
0000 ~007f |
0~7 |
0 XXX XXXX |
1 |
0x-7x |
0080 ~07ff |
8~11 |
the X xxxxXX xxxx |
2 |
Cx 8X-DX Bx |
0800 ~FFFF |
12~16 |
1110 xxxxtenxx xxxxxx xxxx |
3 |
EX 8x 8x-ex BX BX |
1 0000 ~1f FFFF |
17~21 |
1111 0 XXXtenxx xxxxxx xxxx xxxxxx |
4 |
F8 8x 8x 8x 8X-FB BX BX BX bx |
0000 ~3ff FFFF |
22~26 |
1111 Ten XXxx xxxx xxx xxxx xxxxxx xx xxxx |
5 |
FC 8x 8x 8x 8x 8X-FD BX BX BX BX BX |
0000 ~7FFF FFFF |
27~31 |
1111 Xtenxx xxxxxx xxxx xxxxxx xx xxxxxx xxxx |
6 |
If a byte is less than 0x80, then he is a character;
If the greater than C0 is less than E0, the 2-byte UTF8 character is represented (the first one is 110, the second is 10);
If the greater than E0 is less than F0, the 3-byte UTF8 character is represented (the first one is 1110, the second is 10, and the third is the beginning of 10);
And so on, if the Utf-8 rule is not met, an illegal character is represented, as long as the character is replaced.
The implementation is as follows (this implementation is available but not rigorous, as recommended in the Project for optimization):
[OBJC]View PlainCopy
- Replace non-UTF8 characters
- Note: If this is a three-byte utf-8, the second byte error, the first byte of the content is replaced first (think this byte error is three bytes UTF8 head), and then determine whether the remaining two bytes is illegal;
- -(NSData *) Replacenoutf8: (nsdata *) data
- {
- char aa[] = {' a ',' a ',' a ',' a ',' a ',' a '}; //utf8 up to 6 characters, current method not used
- nsmutabledata *MD = [Nsmutabledata datawithdata:data];
- int loc = 0;
- While (Loc < [md length])
- {
- char buffer;
- [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
- if ((Buffer & 0x80) = = 0)
- {
- loc++;
- continue;
- }
- Else if ((Buffer & 0xE0) = = 0xC0)
- {
- loc++;
- [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
- if ((Buffer & 0xC0) = = 0x80)
- {
- loc++;
- continue;
- }
- loc--;
- //Illegal character, replace this character (a byte) with a
- [MD Replacebytesinrange:nsmakerange (LOC, 1) withbytes:aa Length:1];
- loc++;
- continue;
- }
- Else if ((Buffer & 0xF0) = = 0xE0)
- {
- loc++;
- [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
- if ((Buffer & 0xC0) = = 0x80)
- {
- loc++;
- [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
- if ((Buffer & 0xC0) = = 0x80)
- {
- loc++;
- continue;
- }
- loc--;
- }
- loc--;
- //Illegal character, replace this character (a byte) with a
- [MD Replacebytesinrange:nsmakerange (LOC, 1) withbytes:aa Length:1];
- loc++;
- continue;
- }
- Else
- {
- //Illegal character, replace this character (a byte) with a
- [MD Replacebytesinrange:nsmakerange (LOC, 1) withbytes:aa Length:1];
- loc++;
- continue;
- }
- }
- return MD;
- }
The converted NSData can be converted to nsstring correctly.
* If the encoding is non-utf-8, please convert the corresponding photo encoding protocol yourself.
Solve the problem of NSData turn nsstring return nil