Re-title
In 1953, David A. Huffman published his paper "A Method for the construction of Minimum-redundancy Codes", and hence PR Inted His name, the history of computer. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem:the Huffman codes is N OT unique. For example, given a string "Aaaxuaxz", we can observe that the frequencies of the characters ' a ', ' X ', ' u ' and ' Z ' is 4, 2, 1 and 1, respectively. We may either encode the symbols as {' A ' =0, ' x ' =10, ' u ' =110, ' z ' =111}, or ' another ' as {' A ' =1, ' x ' =01, ' u ' =001, ' z ' =0 XX}, both compress the string into the bits. Another set of code can be given as {' A ' =0, ' x ' =11, ' u ' =100, ' z ' =101}, but {' A ' =0, ' x ' =01, ' u ' =011, ' Z ' =001} was not Correc t since "Aaaxuaxz" and "Aazuaxax" can both is decoded from the code 00001011001001. The students is submitting all kinds of codes, and I need a computer program to help me determine which ones is correct and which ones is not.
Input specification
Each input file contains the one test case. For each case, the first line gives an integer N ( 2≤N≤ ), then followed by a line this contains all the n Span style= "Display:inline-block; width:0px; Height:2.563em; " > Distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2... c[N] f[N]
wherec[i]
is a character chosen from {' 0 '-' 9 ', ' a '-' Z ', ' a '-' Z ', ' _ '}, andf[i]
is the frequency ofc[i]
And is a integer no more than 1000. The next line gives a positive integer M ( ≤ ), then followed by m Span style= "Display:inline-block; width:0px; Height:2.563em; " > Student submissions. Each student submission consists of N Lines, each in the format:
c[i]code[i]
Where C[i] is the i-th character and Code[i] is a non-empty string of no more than 0 ' s and ' 1 '.
Output specification
For each test case, print in each line either "Yes" if the student ' s submission is correct, or "No" if not.
Note:the optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
Sample Input
7A1B1 C 1 D 3 E 3F6G64A00000B00001C 0001D 001E onFTenG OneA01010B01011C 0100D 011E TenF OneGxxA theB001C 010D 011E -F101G theA00000B00001C 0001D 001E xxFTenG One
Sample Output
YesYesNoNo
Main topic
Given the frequency sequence, determine whether the given group encoding method is equivalent to Huffman encoding.
There are two key points, one is that the coding does not produce ambiguity, the other is the shortest length of the total encoding.
Solving the Huffman encoding length of word frequency sequence by solution
The frequency sequence is presented separately, which can produce a unique Huffman encoding length, which is also the optimal length.
According to the Huffman algorithm, each fetch two smallest, merge, finally can get this shortest length.
You can use the Insert sort here, or you can accelerate directly with the priority queue .
Determine if there is a coding ambiguity
Constructs a trie tree (dictionary) based on the given encoding method. Each node of this dictionary tree is saved:
bool// 是否被访问过bool// 是否被标记占用Trie *next[2// 指向下一级节点
When constructing by string, note the following markup along the way:
Whenever a node is accessed,isVisited = true;
Whenever you reach the end, make isMarked = true;
That
- A
isVisited
node with a tag cannot be the end of a new encoding, otherwise the new encoding is the prefix sub-code of an encoding.
- When
isMarked
marked, interrupts, otherwise an encoding must be the new encoded prefix sub-code.
This will ensure that all the ends are leaf nodes.
Code implementation
PTA Huffman Codes with Trie Tree
PTA Huffman Codes