Reference book "Introduction to Data Compression (4th edition)" Page 30
3, given the symbol set A={A1,A2,A3,A4}, to find the first-order entropy under the condition:
(a) P (A1) =p (A2) =p (A3) =p (A4) =1/4
(b) p (A1) =1/2, P (A2) =1/4, P (A3) =p (A4) =1/8
(c) p (A1) =0.505, P (A2) =1/4, P (A3) =1/4, P (A4) =0.12
Solution:
because: Entropy H =-pi * Log (PI) (i from 1 to M),
Then there are:
(a) H = 4*1/4* (-log2 (1/4))
= 2 (bit).
(b) H=-1/2LOG21/2 -1/4*LOG21/4 -2*1/8*LOG21/8
=1.75 (BIT).
(c) h=-0.505*log20.505 + 1/4*log24 + 1/4*log24-0.12*log20.12
=-0.505*log20.505 +2*1/4*log24-0.12*log20.12
= 0.5 + 1 + 0.37
=1.87 (BIT).
5. Consider the following sequence:
Atgcttacgtgcttaacctgaagcttccgctgaagaacctg
Ctgaacccgcttaagctgaaccttctgaagcttaacctgctt
(a) estimating a probability value based on this sequence and calculating the first order, second, Sankai, and four-order entropy of the sequence.
Solution:
There are 84 letters in the sequence, where a appears 21 times, T appears 23 times, G appears 16 times, C appears 24 times,
The probability values for each letter are as follows:
P (A) =21/84=1/4, P (C) =24/84=2/7, P (G) =16/84=4/21, P (T) =23/84.
Then the entropy is: H = 1/4*log24-2/7*log2 (2/7) -4/21*log2 (4/21) -23/84*log2 (23/84)
= 1.98 (bit).
7. (a) write a procedure to randomly select letters from the 26-letter set of symbols {A, b,..., z} to form 100 four-letter words, how many of these words make sense?
Answer:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
int main ()
{
int a,i,j;
Char m[100][100];
for (i=0;i<100;i++)
{
for (j=0;j<4;j++)
{
A=rand ()%26;
M[i][j]=a+ ' a ';
}
m[i][4]= ' + ';
printf ("%d:%s \ t", i+1,m[i]);
}
return 0;
}
First homework third question