Both arithmetic coding and program encoding are lossless compression.
Arithmetic Coding)
Arithmetic coding is a lossless data compression method and an entropy encoding method. Different from other entropy coding methods, other entropy coding methods usually divide input messages into symbols and encode each symbol. Arithmetic coding directly encodes the entire input message into a number, and a decimal n that satisfies (0.0 ≤ n <1.0.
Arithmetic Coding uses two basic parameters: the probability of a symbol and Its Encoding interval. The probability of the source symbol determines the compression encoding efficiency, and also determines the interval of the source symbol in the coding process. These intervals are between 0 and 1.
The algorithm idea of arithmetic coding is as follows:
(1) sort a set of source symbols in descending order of the probability of the symbol, and set [) to the current analysis interval. Proportional intervals are divided by the probability sequence of the source symbol in the current analysis interval.
(2) retrieve the "input message sequence" and lock the current message symbol (the first message symbol is used for the first retrieval ). Find the proportional interval of the current symbol in the current analysis interval, and use this interval as the new current analysis interval. Add the number indicated by the starting point (left endpoint) of the current analysis interval to the encoding output number. The current message symbol pointer is removed.
(3) Proportional intervals are still divided based on the probability sequence of the source symbol in the current analysis interval. Repeat Step 2. Until the "input message sequence" is retrieved.
(4) The final encoding output number is the encoded data.
Pay attention to the following issues in arithmetic coding:
(1) because the accuracy of the actual computer cannot be infinitely long, overflow is a major problem in the computation, but most of them have 16-bit, 32-bit, or 64-bit precision, therefore, the proportional scaling method can be used to solve this problem.
(2) The Arithmetic Encoder generates only one code word for the entire message. This code word is a real number in the interval [0, 1, therefore, the decoder cannot perform decoding before receiving all the bits that represent the real number.
(3) arithmetic coding is a method that is very sensitive to errors. If an error occurs, the entire message may be translated incorrectly.
Arithmetic Coding can be static or adaptive. In static arithmetic coding, the probability of the source symbol is fixed. In adaptive arithmetic coding, the probability of the source symbol is dynamically modified based on the frequency at which the symbol appears during encoding. The process of estimating the probability of the source symbol during encoding is called modeling. The reason why dynamic arithmetic coding needs to be developed is that it is difficult to know the accurate probability of the source beforehand, and it is impractical. When compressing a message, you cannot expect an arithmetic encoder to achieve the maximum efficiency. The most effective method is to estimate the probability in the coding process. Therefore, dynamic modeling becomes the key to determining the compression efficiency of encoder.
Run Length Encoding)
The travel encoding, also known as "Run Length Encoding" or "Travel encoding", is a statistical encoding, which is a lossless compression encoding. Valid for Binary graphs.
The basic principle of travel encoding is to use a symbolic value or string length instead of a continuous symbol with the same value (continuous symbols constitute a continuous "Trip ". The length of the symbol is less than the length of the original data.
Example: 5555557777733322221111111
Itinerary Code: (5, 6) (7, 5) (3, 3) (2, 4) (L, 7 ). It can be seen that the number of bits in the travel encoding is much less than the number of bits in the original string.
When encoding image data, pixels with the same gray value arranged along a certain direction can be considered as continuous symbols. Replacing these continuous symbols with strings can greatly reduce the data volume.
There are two types of travel encoding: fixed-length travel encoding and variable-length travel encoding.
The travel encoding is a continuous and accurate encoding. If one of the symbols is incorrect during transmission, the entire encoding sequence will be affected, so that the travel encoding cannot be restored back to the original data.
The compression ratio obtained by the process encoding depends on the characteristics of the image. If the larger the image blocks with the same color, the smaller the number of image blocks, the higher the compression ratio. Conversely, the smaller the compression ratio.