Solve the problem of NSData turn nsstring return nil

Source: Internet
Author: User
Tags vars

//string to DataNSString *str =@"Jesfds"; NSData*data =[str datausingencoding:nsutf8stringencoding];//NSData Turn NSStringNSString *result =[[NSString alloc] Initwithdata:data encoding:nsutf8stringencoding];//Data Turn charNSData *data;Char*test=[data bytes];//Char to Databyte* TempData = malloc (sizeof(byte)* -); NSData*content=[nsdata datawithbytes:tempdata Length: -];

when NSData is converted to nsstring using methods such as Initwithdata, nil is returned if the contents of the NSData contain non-encoding encoded characters.

----------SDK documents are as follows-------------

-(instancetype)initwithdata: (nsdata *) data  encoding: (nsstringencoding) encoding;

Return Value

An NSString object initialized by converting the bytes in to data Unicode characters using encoding . The returned object is different from the original receiver. Returns nil If the initialization fails for some reason (for example if data does isn't represent valid data for encoding ).

-----------------------------

This result in many cases may not be what we want, such as in the acquisition of Web page source for analysis, if the page uses UTF-8 encoding, only contains individual non-utf-8 characters, we would like to convert nsstring success, discard (or replace) those illegal characters.

According to UTF8 format standard

Unicode/ucs-4 Bit number UTF-8 BYTE number Range (16 binary)
0000 ~007f 0~7 0 XXX XXXX 1 0x-7x
0080 ~07ff 8~11 the X xxxxXX xxxx 2 Cx 8X-DX Bx
0800 ~FFFF 12~16 1110 xxxxtenxx xxxxxx xxxx 3 EX 8x 8x-ex BX BX
1 0000 ~1f FFFF 17~21 1111 0 XXXtenxx xxxxxx xxxx xxxxxx 4 F8 8x 8x 8x 8X-FB BX BX BX bx
0000 ~3ff FFFF 22~26 1111 Ten XXxx xxxx xxx xxxx xxxxxx xx xxxx 5 FC 8x 8x 8x 8x 8X-FD BX BX BX BX BX
0000 ~7FFF FFFF 27~31 1111 Xtenxx xxxxxx xxxx xxxxxx xx xxxxxx xxxx 6


If a byte is less than 0x80, then he is a character;

If the greater than C0 is less than E0, the 2-byte UTF8 character is represented (the first one is 110, the second is 10);

If the greater than E0 is less than F0, the 3-byte UTF8 character is represented (the first one is 1110, the second is 10, and the third is the beginning of 10);

And so on, if the Utf-8 rule is not met, an illegal character is represented, as long as the character is replaced.

The implementation is as follows (this implementation is available but not rigorous, as recommended in the Project for optimization):

[OBJC]View PlainCopy 
  1. Replace non-UTF8 characters
  2. Note: If this is a three-byte utf-8, the second byte error, the first byte of the content is replaced first (think this byte error is three bytes UTF8 head), and then determine whether the remaining two bytes is illegal;
  3. -(NSData *) Replacenoutf8: (nsdata *) data
  4. {
  5. char aa[] = {' a ',' a ',' a ',' a ',' a ',' a '}; //utf8 up to 6 characters, current method not used
  6. nsmutabledata *MD = [Nsmutabledata datawithdata:data];
  7. int loc = 0;
  8. While (Loc < [md length])
  9. {
  10. char buffer;
  11. [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
  12. if ((Buffer & 0x80) = = 0)
  13. {
  14. loc++;
  15. continue;
  16. }
  17. Else if ((Buffer & 0xE0) = = 0xC0)
  18. {
  19. loc++;
  20. [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
  21. if ((Buffer & 0xC0) = = 0x80)
  22. {
  23. loc++;
  24. continue;
  25. }
  26. loc--;
  27. //Illegal character, replace this character (a byte) with a
  28. [MD Replacebytesinrange:nsmakerange (LOC, 1) withbytes:aa Length:1];
  29. loc++;
  30. continue;
  31. }
  32. Else if ((Buffer & 0xF0) = = 0xE0)
  33. {
  34. loc++;
  35. [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
  36. if ((Buffer & 0xC0) = = 0x80)
  37. {
  38. loc++;
  39. [MD Getbytes:&buffer range:nsmakerange (Loc, 1)];
  40. if ((Buffer & 0xC0) = = 0x80)
  41. {
  42. loc++;
  43. continue;
  44. }
  45. loc--;
  46. }
  47. loc--;
  48. //Illegal character, replace this character (a byte) with a
  49. [MD Replacebytesinrange:nsmakerange (LOC, 1) withbytes:aa Length:1];
  50. loc++;
  51. continue;
  52. }
  53. Else
  54. {
  55. //Illegal character, replace this character (a byte) with a
  56. [MD Replacebytesinrange:nsmakerange (LOC, 1) withbytes:aa Length:1];
  57. loc++;
  58. continue;
  59. }
  60. }
  61. return MD;
  62. }

The converted NSData can be converted to nsstring correctly.

* If the encoding is non-utf-8, please convert the corresponding photo encoding protocol yourself.

Solve the problem of NSData turn nsstring return nil

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.