Analysis of the wave File Header Format of Speech Encoding

Source: Internet
Author: User
As one of the acoustic file formats used in multimedia, the wave file header adopts the riff format as the standard. Riff is short for resource interchange file format. The first four bytes of each wave file are "riff ". Reasonable Use of the wave file header can be more effective in speech decoding.

Generally speaking, voice encoding refers to compressing 8 kHz sampled, 16-bit quantified linear PCM voice signals into other formats of voice signals, during decoding, voice signals in other formats are transformed into 8 kHz sampled, 16-bit quantified linear PCM voice signals. Generally, the conversion process is complex and time-consuming and labor-intensive. If the corresponding wave file header is directly added to the voice signals of other formats, the conversion process is not needed. The speech can be decoded using the recorder provided by Microsoft.

The following section analyzes the wave file header formats of various speech codes, and compares them with the following tables (from table 1 to table 7.

Table 1 wave File Header Format for 8 KHz sampling and 16-bit quantization of linear PCM speech signals (44 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int10 00 00 00 H (PCM) Long int size1 = 0x10
14h2int01 00 hint fmttag = 0x01
16h2int int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h2int sampling occupies the number of segments at a time int blockalign = number of channels * quantize/8
22h2int quantization int bitpersamples = 8 or 16
24h4char "data" char data_id = "data"
28h4long int sample data bytes long int size2 = length-44
2 CH to end char sampling data

Table 2 8 KHz sampling, 8-Bit A-Law Quantization of PCM voice signal wave File Header Format table (58 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int12000000h (alaw) Long int size1 = 0x12
14h2int06 00 hint fmttag = 0x06
16h2int channels int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h2int sampling the number of characters at a time int blockalign = 0x01
22h4long int quantization number long int bitpersamples = 8
26h4char "fact" char wave_fact = "fact"
2ah8char0400000000530700h set char temp
32h4char "data" char wave_data = "data"
36h4long int number of bytes of Sampled Data lont int size2 = length-58

Table 3. 8 KHz sampling and 8-bit U-Law Quantization of PCM voice signal wave File Header Format table (58 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int12000000h (ulaw) Long int size1 = 0x12
14h2int07 00 hint fmttag = 0x07
16h2int channels int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h2int sampling the number of characters at a time int blockalign = 0x01
22h4long int quantization number long int bitpersamples = 8
26h4char "fact" char wave_fact = "fact"
2ah8char0400000000530700h set char temp
32h4char "data" char wave_data = "data"
36h4long int number of bytes of Sampled Data lont int size2 = length-58

Table 4 Wave File Header Format table after ADPCM Speech Encoding (90 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int32000000h (ADPCM) Long int size1 = 0x32
14h2int02 00 hint fmttag = 0x02
16h2int channels int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h2int sampling occupies the number of segments at a time int blockalign = number of channels * quantize/8
22h2int quantization int bitpersamples = 4
24h34char fixed byte char temp1
46h4char "fact" char wave_fact = "fact"
4ah8char0400000004930600h fixed char temp2
52h4char "data" char wave_data = "data"
56h4long int number of bytes of Sampled Data lont int size2 = length-90
5ah-end sampling data

Table 5 formats of wave file headers after GSM Speech Encoding (60 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int14000000h (GSM) Long int size1 = 0x14
14h2int31 00 hint fmttag = 0x31
16h2int channels int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h8char400000002004001h set char temp1
28h8char66616374040020.h set char temp2
30h4char40 E2 05 00h fixed char temp3
34h4char "data" char wave_data = "data"
38h4long int number of bytes of sample data lont int size2 = length-60
3 ch to end-to-end sampling data

Table 6 SBC audio-encoded wave File Header Format table (58 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int12000000h (SBC) Long int size1 = 0x12
14h2int71 00 hint fmttag = 0x71
16h2int channels int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h2int sampling the number of characters at a time int blockalign = 0x25
22h4long int quantization number long int bitpersamples = 16
26h4char "fact" char wave_fact = "fact"
2ah8char040010976280400h set char temp
32h4char "data" char wave_data = "data"
36h4long int number of bytes of Sampled Data lont int size2 = length-59

Table 7 wave File Header Format table after CELP Speech Encoding (58 bytes in total)

Offset address byte data type content file header is defined
00h4char "riff" char riff_id [4] = "riff"
04h4long int file total length-8 long int size0 = total length-8
08h8char "wavefmt" char wave_fmt [8]
10h4long int12000000h (CELP) Long int size1 = 0x12
14h2int70 00 hint fmttag = 0x70
16h2int channels int channel = 1 or 2
18h4long int sampling rate long int samplespersec
1ch4long int Number of playback bytes per second long int bytepersec
20h2int sampling the number of characters at a time int blockalign = 0x0c
22h4long int quantization number long int bitpersamples = 16
26h4char "fact" char wave_fact = "fact"
2ah8char040011660520700h fixed char temp
32h4char "data" char wave_data = "data"
36h4long int number of bytes of Sampled Data lont int size2 = length-58

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.