As shown in table 1 and table 2 below, the vocal test set contains: TV shows, storytelling, storytelling, talk crosstalk, audio radio, etc. the Music test set contains: Chinese dance, R&b, western ballad, Chinese rock, rap, 1614 mixed album and so on.
Table 1 Test concentration, the average vocal recognition rate of 89.31%, the average recognition rate of music in 95.95%. Crosstalk and the results of the story is poor, because the test concentrated in the scene of the crosstalk is poor, noisy environment, spectrum confusion, and storytelling has a more obvious musical background accompaniment.
Table 1 music and vocals automatic classification results
|
Correct number |
Number |
Accuracy rate |
Television |
59 |
66 |
0.893939 |
Story |
69 |
88 |
0.784091 |
Storytelling |
181 |
186 |
0.973118 |
Crosstalk |
73 |
96 |
0.760417 |
Audio Radio |
41 |
44 |
0.930233 |
1800 Sound |
96 |
101 |
0.950495 |
Total |
518 |
580 |
0.893103 |
|
|
|
|
1800 Music |
1614 |
1690 |
0.95503 |
Chinese dance |
54 |
54 |
1 |
R&b |
44 |
44 |
1 |
Western Ballads |
80 |
80 |
1 |
Chinese Rock |
50 |
52 |
0.961538 |
Rap |
104 |
108 |
0.962963 |
Total |
1946 |
2028 |
0.959566 |
Reasonable threshold value |
0.5 |
|
|
Table 22 Layer Filter Automatic classification results
|
Correct number |
Number |
Accuracy rate |
Television |
62 |
66 |
0.939394 |
Story |
72 |
88 |
0.818182 |
Storytelling |
182 |
186 |
0.978495 |
Crosstalk |
77 |
96 |
0.802083 |
Audio Radio |
41 |
43 |
0.953488 |
1800 Sound |
96 |
101 |
0.950495 |
Total |
530 |
580 |
0.913793 |
|
|
|
|
1800 Music |
1592 |
1688 |
0.943128 |
Chinese dance |
54 |
54 |
1 |
R&b |
43 |
44 |
0.977273 |
Western Ballads |
80 |
80 |
1 |
Chinese Rock |
49 |
52 |
0.942308 |
Rap |
104 |
108 |
0.962963 |
Total |
1922 |
2026 |
0.948667 |
Reasonable threshold value |
0.5 |
|
|
Summary of automatic discrimination of Music and vocals