As one of the audio waveform file formats used in multimedia, the wave file is standard in riff (resource Interchange File Format) format. The four first bytes of each wave file are "riff ". A wave file consists of a file header and a data body. The file header is divided into two parts: Riff/WAV file ID segment and audio data format description segment. For the content and format of each part of the wave file, see the following.
There are two types of common audio files, which correspond to single-channel (11.025 KHz sampling rate, 8-bit sampling value) and dual-channel (44.1 KHz sampling rate and 16-bit sampling value ). Sampling Rate refers to the number of times the audio signal is sampled per unit time during the "modulo → count" conversion process. Sample value refers to each sampling period
The integral value of the internal voice analog signal.
For single-channel audio files, the sample data is short int 00h-ffh. For two-channel stereo audio files, the sample data is a 16-bit integer (INT ), the 8-bit high and the 8-bit low represent both the left and right channels.
The data block of the wave file contains samples in Pulse-Coded Modulation (PCM) format. A wave file is composed of samples. In a single channel wave file, channel 0 indicates the left channel, and channel 1 indicates the right channel. In a multi-channel wave file, samples are generated alternately.
In addition to the data organization described in the previous section of the wave file header, the data block is the original sound sampling data. Although the wave file can be compressed, it generally uses a non-compressed format. 44.1 KHz sampling rate, 16-bit resolution, dual-channel, so wave can save sound files with high sound quality requirements, CD also uses this format, sound experts or music enthusiasts should be very familiar with it. However, the size of such files is also very large. Taking 44.1 kHz 16-bit dual-channel data as an example, the volume of sound data in one minute is 4100*2 byte * 2channel * 60 S/1024/1024 = 10.09 m. Therefore, it is not suitable for online transmission.
The following is a detailed analysis of the wave file format.
Endian |
Field name |
Size |
|
Big |
Chunkid |
4 |
File Header ID, which is generally four letters: "riff" |
Little |
Chunksize |
4 |
The size of the entire data file, excluding the above ID and size itself |
Big |
Format |
4 |
Generally, it is four letters, namely, "wave ". |
Big |
Subchunk1id |
4 |
Format description block. This field is generally "FMT" |
Little |
Subchunk1size |
4 |
The size of the data block, excluding the ID and size fields. |
Little |
Audioformat |
2 |
Audio format description |
Little |
Numchannels |
2 |
Channels |
Little |
Samplerate |
4 |
Sampling Rate |
Little |
Byterate |
4 |
Bit rate, the number of bytes required per second |
Little |
Blockalign |
2 |
Data Block alignment Unit |
Little |
Bitspersample |
2 |
Resolution of modulus conversion during sampling |
Big |
Subchunk2id |
4 |
Real sound data block. This field is generally "data" |
Little |
Subchunk2size |
4 |
The size of the data block, excluding the ID and size fields. |
Little |
Data |
N |
Audio sampling data |
The following is a detailed description of each field:
Chunkid |
4 bytes |
The "riff" in the ASCII code ". (0x52494646) |
Chunksize |
4 bytes |
36 + subchunk2size, or 4 + (8 + subchunk1size) + (8 + subchunk2size ), This is the size of the entire data block (excluding the chunkid and chunksize) |
Format |
4 bytes |
The ASCII code represents the "wave ". (0x57415645) |
|
|
|
Subchunk1id |
|
New data block (format description block) The ASCII code indicates "FMT", which is a space at the end. (0x666d7420) |
Subchunk1size |
4 bytes |
The size of the block data (for PCM, the value is 16 ). |
Audioformat |
2 bytes |
PCM = 1 (for example, linear sampling). If it is another value, it may be compressed. |
Numchannels |
2 bytes |
1 => Single Channel | 2 => dual channel |
Samplerate |
4 bytes |
Sampling Rate, such as 8000,44100 equivalent |
Byterate |
4 bytes |
Equal to: samplerate * numchannels * bitspersample/8 |
Blockalign |
2 bytes |
Equal to: numchannels * bitspersample/8 |
Bitspersample |
2 bytes |
Sample resolution, that is, each sample is represented by a few bits, usually 8 bits or 16 bits. |
|
|
|
Subchunk2id |
4 bytes |
New data blocks, real sound data The ASCII code indicates "data", which is a space at the end. (0x64617461) |
Subchunk2size |
4 bytes |
The data size, that is, the subsequent sample data size. |
Data |
N Bytes |
Real sound data |
For data blocks, the layout is as follows (each column represents 8 bits) based on the number of channels and the sampling rate ):
1. 8 bit single channel:
Sample 1 |
Sample 2 |
Data 1 |
Data 2 |
2. 8-bit dual-channel
Sample 1 |
|
Sample 2 |
|
Channel 1 data 1 |
Channel 2 data 1 |
Channel 1 Data 2 |
Channel 2 Data 2 |
1. 16 Bit Single Channel:
Sample 1 |
|
Sample 2 |
|
Data 1 low byte |
Data 1 high byte |
Data 1 low byte |
Data 1 high byte |
2. 16-bit dual-channel
Sample 1 |
|
|
|
Channel 1 data 1 low byte |
Channel 1 data 1 high byte |
Audio channel 2 data 1 low byte |
Audio channel 2 data 1 high byte |
Sample 2 |
|
|
|
Audio channel 1 Data 2 low bytes |
Audio channel 1 Data 2 high bytes |
Audio channel 2 Data 2 low bytes |
Audio channel 2 Data 2 high byte |
The following is an example of a sound file:
52 49 46 46 24 08 00 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 02 00 22 56 00 00 88 58 01 00 04 00 10 00 64 61 74 61 00 08 00 00 00 00 00 00 24 17 1e f3 3c 13 3c 14 16 f9 18 f9 34 e7 23 a6 3c f2 24 f2 11 ce 1a 0d
Shows the corresponding analysis:
This article references:
- Http://www.diybl.com/course/3_program/game/200798/70450.html
- Https://ccrma.stanford.edu/courses/422/projects/WaveFormat/