Binary string in Erlang (2)

Source: Internet
Author: User

After learning about the basic features of Erlang, I started to learn the basic syntax, but this part really made me a little effort, because I have been familiar with the programming language (C, c ++, Java, C #) is too different. There are many new ideas. Here I will record a very distinctive language element in Erlang-a binary string.

A binary string is an unsigned 8-bit byte sequence used to store and process data blocks (usually data read from a file or received through a network protocol ). A bit string is a generalized binary string. Its length does not have to be an integer multiple of 8. For example, a half byte contains 12 digits in total. However, there is little difference between the two, but now we can use bit strings to complete some lightweight tasks that we previously could not do. Because the syntax is the same, no one can use the bitstring statement unless it is necessary to emphasize the flexibility of length. The basic syntax of a binary string is as follows:

<<0,1, 2, ..., 255>>

That is, a comma-separated integer sequence included in <...>. The integer value range is 0 ~ 255. <> Indicates an empty binary string. We can also use a string to construct a binary string, such

<<"hello", 32, "dude">>

This is the same as the 8-bit encoding (ASCII) of the corresponding characters in the input string. Therefore, this method is limited to 8 characters, but it is often used in text-based protocols. The preceding two examples show how to create a binary string with an integer multiple of 8. Next we will introduce more advanced methods for Constructing Binary bits.

Construct bits

With the bit syntax, we can construct binary strings with specified sizes and la s as we like; otherwise, it can also be used to match and extract segments specified in a bit string (for example, binary data read from a file or socket ). Bit String can be written

<<Segment1, ..., SegmentN>>

There can be zero or multiple segment specifiers (segment specifier) between the two smaller than signs and the two greater than signs ). The length of a bit string in BITs is the total length of each region segment. The Section indicator can be in one of the following forms:

DataData:SizeData/TypeSpecifiersData:Size/TypeSpecifiers

Data must be an integer, floating point number, or another bit string. We can specify the segment length as an integer multiple of the unit length. We can also specify the segment type, which determines how to interpret data and how to codec it. For example, a simple binary string <1, 2, 3> has three segments. The data in each segment is an integer, the Section has no dimension or type indicator. In this example, the default type is integer, and the default size of integer is 1. The unit of the integer type is 8 bits, so the segment is encoded as 8-bit unsigned bytes. Similarly, <"ABC"> is the abbreviation of <$ A, $ B, $ C>, that is, an 8-bit integer character encoding (Latin-1) sequence. If the number of digits of an integer exceeds the maximum space of a partition, the value <254,255,256,257> is <254,255,>.

The partition type is specified by us and does not depend on the data type. For example, we cannot concatenate two bits like this:

B1 = <<1,2>>,B2 = <<3,4>>,<<B1,B2>>.

By default, B1 and B2 are treated as integers. However, if you specify B1 and B2 as bit strings, you can

<<B1/bits,B2/bits>>

In this way, you can get the expected

<<1,2,3,4>>

We can use the typespecifiers section (located behind/) to control the block codec details. It consists of one or more parallel bars (-) split atoms, such as integer-Unsigned-big. The order in which atoms appear is not important. The set of currently available indicators is as follows:

  • Intger, float, binary, bytes, bitstring, bits, utf8, UTF16, UTF32
  • Signed, unsigned
  • Big, little, native
These indicators can be combined in multiple ways, but only one of the above groups can appear. BITs is the alias of bitstring and bytes is the alias of binary. For integer, float, and bitstring types, the unit of size is 1 bit, while that of binary is 8 bit (one byte ).

Pattern Matching in BIT syntax

We can use bitwise syntax to break down data in bits. Compared with manual operations on displacement and mask, it is more convenient to use bit syntax to parse various weird file formats and protocol data. The following shows how to use the mode in the function syntax to parse the content of the IP packet header:

ipv4(<<Version:4, IHL:4, TOS:8, TotalLength:16, Identification:16, Flags:3, FragOffset:13,TimeToLive:8, Protocol:8, Checksum:16, SourceAddress:32, DestinationAddress:32,OptionAndPadding:((IHL-5)*32)/bits, RemainingData/bytes >>) when Version =:= 4 -> ...

As long as the size of the incoming packet is sufficient for matching and the version field is 4, the packet is automatically parsed as the corresponding variable, and most of the variables are parsed as integers, only optionandpadding (a length depends on the bit string of the previously resolved IHL field) and remainingdata segment. The latter contains all data after the packet header. Interesting from one binary string another binary string does not involve data replication, so the cost of this operation is very low.

Bit String speed Structure
In many functional programming languages, the concept of list speed architecture is also extended to Erlang's bit syntax. The bit string speed structure is similar to the list speed structure, but [...] is replaced with <...>. Take a small integer list as an example. All integers are between 0 and 7. You can package them into bits based on each number of three bits, as shown below:

<< <<X:3>> ||  X <- [1,2,3,4,5,6,7] >>

Shell prints the bitstring obtained above to <41,203,>. Note that the total length of the string at the end of is 8 + 8 + 5 = 21 bits. This result is correct considering that the input list contains 7 elements. We can also decode such a single bit string using the bit string rate. However, for decoding, we need to replace <-with <= in the generator to extract content from the bit string, and <-only select elements from the list:

<< <<X:8>> || <<X:3>> <= <<41,203,23:5>> >>

The obtained binary string is <1, 2, 3, 4, 5, 6, 7>. Therefore, we have successfully converted the Three-bit integer format to an eight-bit integer format. But what if we want the result to be a list rather than a bit string? Use the Bit String generator in the list speed structure!

[X ||  <<X:3>>  <=  <<41,203,23:5>>]

The generated list is [1, 2, 3, 4, 5, 6, 7].

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.