Unicode (Unicode code)
Unicode is a text encoding of international standards. Its standard table contains almost any character of any language and can be read and written by extension files or Web pages.
In Swift, string (string) and character (character) types are fully Unicode-compatible, and they also support non-Unicode codes.
Unicode terminology (Unicode terminology)
Each Unicode code can be represented by one or more Unicode scalars. For one character, a Unicode scalar is a unique 21-bit value (or name), such as "u+0061 represents lowercase LATIN letter A (" a "), or u+1f425 represents front-facing BABY chick ("??").
When a Unicode string is stored as a text file or otherwise, their Unicode scalar is encoded in a format defined in Unicode. Each format uses a block called the code unit to encode the string. These formats include the UTF-8 format (a string encoded using a eight-bit encoding unit) and the UTF-16 format (a string encoded using a 16-bit encoding unit)
Unicode representations of Strings (Unicode Representation of strings)
In Swift, there are several different types of strings that are accessed in Unicode form.
You can use the For-in statement to iterate through a string to access the character value of each Unicode character.
Alternatively, you can access the value of the Unicode string type in one of the following three ways.
- UTF-8 code units (accessed using the UTF8 property of the string)
- UTF-16 code units (accessed using the Utf16 property of the string)
- UTF-21 code units (accessed using the Unicodescalars property of the string)
Each of the following examples shows the string "Dog!??" Different representations of:
Let dogstring = "Dog!??"
UTF-8
Use the UTF8 property to access the string as UTF-8. The type of the UTF8 property is Utf8view, which is a set of unsigned 8-bit numeric values (UINT8), each representing a byte of UTF-8.
For CodeUnit in Dogstring.utf8 {
print ("\ (codeUnit)")
}
Print ("\ n")
68 111 103 33 240 159 144 182
In this example, the first four decimal values of codeunit (68 111 103 33) represent the characters D,o,g and!. This is the same as ASCII. The latter four values of codeunit (240 159 144 182) represent the UTF-8 four-byte form of the character dog face.
UTF-16
Using the Utf16 property of a string, you can access the string in UTF16 manner. The type of the Utf16 property is Utf16view, which is a set of unsigned 16-bit values (UInt16), a value that represents a 16-bit encoding unit in UTF16:
For CodeUnit in Dogstring.utf16 {
print ("\ (codeUnit)")
}
Print ("\ n")
68 111 103 33 55357 5637
Similarly, the first four values of codeunit (68 111 103 33) represent the characters D,o,g and!. These and UTF-8 are the same.
The 15th and 16th values of codeunit are the UTF-16 substitution representations of the dog Face: u+d83d (decimal value 55357) and u+dc36 (decimal value 56374).
Unicode scalars (Unicode scalar)
Access strings can also be accessed through the Unicodescalars property, which is of type Unicodescalarview.
Each unicodescalar has a Value property, which is a UInt32,
For scalar in Dogstring.unicodescalars {
print ("\ (scalar.value)")
}
Print ("\ n")
68 111 103) 33 128054
The first four values of the Value property of Unicodescalar (68 111 103 33) are the same as the previous two examples, all representing the characters D,o,g and!, the last value 128054 (decimal) is the hexadecimal 1f436, which represents the DOG face.
Through the Value property, each Unicodescalar value can also be used to construct a new string:
For scalar in Dogstring.unicodescalars {
println ("\ (scalar)")
}
D
O
G
// !
??”