This paper consists of three parts, the first part of the background-audio type and the motive of this article, the second classification than MATLAB under the Wavread () function, the third part gives the C + + implementation of the function.
A Background introduction1.1 Motive of this article
1) All WAV audio processing is based on the WAV format of the file parsing out, to parse the group to do for us to do the subsequent processing (FFT, etc.).
2) in MATLAB directly has a very useful function wavread (' test.wav '), the input is WAV audio, the output is an array, as described in Chapter two.
3) The general C + + function reads the data, the format as described in section 1.2, however, regardless of the format, the data can be converted to each other.
In view of this, this article will introduce how to fully implement the Wavread function of Matlab with C + +, the output data format is identical, in this process, we can also appreciate the nature of the data in the document, and the relationship between the transformation.
1.2 Audio Type
RIFF is all called the Resource Interchange File format (Resourcesinterchange fileformat), RIFF files are a file structure that most multimedia files in the Windows environment follow. The data type that the riff file contains is identified by the file's extension, and the data that can be stored in the riff file includes: Audio Video Interleaved format data (. AVI) waveform format data (. WAV) Bitmap Format data (. RDI) MIDI Format data (. RMI) palette format (. PAL) Multimedia Movies (. RMN) animation cursor (. ANI) other riff files (. BND).
C hunk is the basic unit that makes up the riff file , and its basic structure is as follows:
struct chunk{u32 ID;//consists of 4 ASCII characters to identify the data contained in the block. such as: ' RIFF ', ' LIST ', ' fmt ', ' data ', ' WAV ', ' AVI ' and other u32 size; The block size, which is the length of the data stored in the database field, and the size of the ID and Size field is not included in the value U8 Dat[size]; Block content, the data is arranged in words (word), and if the length of the data structure is odd, a null byte is added last;
1.3 WAV audio files
Wave files are one of the sound waveform file formats used in multimedia, and are standard in the format riff (Resource Interchange File format). The first four bytes of each wave file are "RIFF". Similarly, WAVE files consist of two main parts: the file header and the data body. The file header is divided into riff/wav file identification segment and sound data Format Description Section two parts. The contents and format of the wave file are shown later in this article.
There are two main types of sound files, which correspond to mono (11.025KHz sample rate, 8Bit sample value) and dual channel (44.1KHz sample rate, 16Bit sample value). The sampling rate is the number of times the sound signal is sampled in units of time during the "modulo → number" conversion process. The sampled values refer to each sampling period
The integral value of the internal sound analog signal.
For mono sound files, the sampled data is a eight-bit short integer (00H-FFH), whereas for a two-channel stereo sound file, each sampled data is a 16-bit integer (int), and the high eight-bit and low eight-bit respectively represent the left and right two channels.
WAVE file data blocks contain samples expressed in pulse-coded modulation (PCM) format. WAVE files are organized by samples. In a mono WAVE file, Channel 0 represents the left channel and Channel 1 represents the right channel. In a multichannel wave file, the sample is alternately present.
Wave files In addition to the previous small section of the file header to the data organization, the informationblock is the original sound sample data , WAVE files can be compressed, but generally use the uncompressed format. 44.1KHz sample rate, 16Bit resolution, dual channel, so wave can save very high quality sound files, CD Use this format, sound experts or music enthusiasts should be very familiar. But the size of this file is also very large, in 44.1KHz 16bit dual-channel data for example, the amount of one minute of sound data: 4100*2byte*2channel*60s/1024/1024=10.09m. So it's not appropriate to send it online.
Below we specifically analyze the format of WAVE files
Endian |
Field name |
Size |
|
Big |
Chunkid |
4 |
File header identification, generally is "RIFF" four letters |
Little |
ChunkSize |
4 |
The size of the entire data file, excluding the above ID and size itself |
Big |
Format |
4 |
It's usually "WAVE" four letters. |
Big |
Subchunk1id |
4 |
Format Description block, this field is generally "FMT" |
Little |
Subchunk1size |
4 |
The size of this data block, excluding the ID and size field itself |
Little |
Audioformat |
2 |
Format description for audio |
Little |
Numchannels |
2 |
Number of channels |
Little |
Samplerate |
4 |
Sample Rate |
Little |
Byterate |
4 |
Bit rate, number of bytes required per second |
Little |
Blockalign |
2 |
Data Block snap-in unit |
Little |
BitsPerSample |
2 |
Resolution of analog-to-digital conversion at sampling |
Big |
Subchunk2id |
4 |
Real sound data block, this field is generally "data" |
Little |
Subchunk2size |
4 |
The size of this data block, excluding the ID and size field itself |
Little |
Data |
N |
Sampled data for audio |
The following is a detailed explanation of each field:
Chunkid |
4bytes |
The ASCII code represents the "RIFF". (0x52494646) |
ChunkSize |
4bytes |
36+subchunk2size, or 4 + (8 + subchunk1size) + (8 + subchunk2size), This is the size of the entire data block (excluding the size of Chunkid and chunksize) |
Format |
4bytes |
The ASCII code represents "WAVE". (0x57415645) |
|
|
|
Subchunk1id |
|
New block of data (format information description block) The ASCII code represents the "FMT"--finally a space. (0x666d7420) |
Subchunk1size |
4bytes |
The size of this block of data (for PCM, a value of 16). |
Audioformat |
2bytes |
PCM = 1 (for example, linear sampling), if it is a different value, it may be some form of compression |
Numchannels |
2bytes |
1 = Mono | 2 = Dual Channel |
Samplerate |
4bytes |
Sample rate, such as 8000,44100 equivalent |
Byterate |
4bytes |
equals: Samplerate * numchannels * BITSPERSAMPLE/8 |
Blockalign |
2bytes |
equals: Numchannels * BITSPERSAMPLE/8 |
BitsPerSample |
2bytes |
Sampling resolution, that is, each sample is represented by several, usually 8bits or 16bits |
Subchunk2id |
4bytes |
New data blocks, real sound data The ASCII code represents "data"--and finally a space. (0x64617461) |
Subchunk2size |
4bytes |
Data size, that is, the size of the sampled data followed by. |
Data |
N bytes |
Real Sound data |
For data blocks, depending on the number of channels and the sample rate, the layout is as follows (each column represents 8bits):
1). 8 Bit Mono:
Sampling 1 |
Sampling 2 |
Data 1 |
Data 2 |
2). 8 Bit Dual Channel
Sampling 1 |
|
Sampling 2 |
|
Channel 1 Data 1 |
Channel 2 Data 1 |
Channel 1 Data 2 |
Channel 2 Data 2 |
3). Single Bit Mono:
Sampling 1 |
|
Sampling 2 |
|
Data 1 Low byte |
Data 1 High-byte |
Data 1 Low byte |
Data 1 High-byte |
4). Two-Bit dual channel
Sampling 1 |
|
|
|
Channel 1 data 1 Low byte |
Channel 1 Data 1 high-byte |
Channel 2 data 1 Low byte |
Channel 2 Data 1 high-byte |
Sampling 2 |
|
|
|
Channel 1 data 2 low byte |
Channel 1 Data 2 high-byte |
Channel 2 data 2 low byte |
Channel 2 Data 2 high-byte |
Let's look at a specific example of WAV audio file as follows: (16 binary form)
74 20 10 00 00 00 01 00 02 00 22 56 00 00 88 58 01 00 04 00 10 00 64 61 74 61 00 from xx to XX (XX) 1e 3c 3c, all of them
The corresponding analysis is as follows:
Analyze data For example: The shape of ' FFFF ' is a complete data we need. such as the SAMPLE3:3C and 13 are two numbers together is a number we need, 3c 13, but the right end is big, then 3c 13, hexadecimal number 3c bitwise conversion to 2 0011, the same 1100 bitwise conversion to 13 binary 2 0011, then the binary number of the connected 16bits is 0011 1100 0001 0011, then we can see that the sign bit is 0, that is, positive.
The Wavread () function in MATLAB
Wavread (' Testwav.wav ')
Readers try the output. For example, take one of my sound files ' testwav.wav ' and output the last 10 data as:
-0.0001-0.0001-0.0002-0.0003-0.0002-0.0002-0.0002-0.0003-0.0002-0.0002
2. Wavread (' testwav.wav ', ' native ')
Readers can try out the output. The last 10 data for my ' testwav.wav ' output are:
-4-2-8-9-7-8 -8-11-5-7
The conversion equation between the output data of 1 and 2 is:-0.0002 = -7/32768 (where 32768 = 2 ^15, or 2 to 15 power. This is normalization. Because the encoding is 16bits)
C + + implementation of three Readwav
As described above, we come to the topic, how to use C + + to implement the Wavread (' testwav.wav ') function in Matlab, and the output is consistent.
3.1 Encoding conversion rules
Before we introduce, we need to understand the relationship between these strings of data. This chapter analyses the data of the Test.wav file as an example:
(1) The data block of the wave file, which is the last 20 of the raw sampled data, is:
FC FF FE FF F8 FF F7 FF F9 ff F8 ff F8 ff F5 FF FB FF F9 FF
(2) The last 10 data parsed in MATLAB are:
-0.0001-0.0001-0.0002-0.0003-0.0002-0.0002-0.0002-0.0003-0.0002-0.0002
The two sets of data between the original code and the complement of the relationship, that is (1) is the original code and (2) is the complement.
The step of converting from data (1) to Data (2) is to convert (1) to its complement, then the complement by 32768, then get (2).
The principle of conversion between the original code and the complement:
(Conversion in 2 binary form): Wakahara code is a positive number, then the complement is its own. The Wakahara code is negative, then the complement is the sign bit, the value bit is reversed, plus 1.
(Conversion in numerical form): The Wakahara code is a positive number, then the complement is itself. Wakahara code is negative, complement = original code-2^16. Warm tip: In order to facilitate the calculation of the value of equivalent substitution 2^16 = FFFF-1.
For a better understanding, illustrate:
Step One (16 bytes per read): Because the data is data from X0000 to XFFFF. Take F9 ff For example, the right end is big, in other words, the right end is high, then it should be fff9. Step two (convert to complement): The bitwise conversion to binary form is 1111 1111 1111 1001 (1-bit 16 binary value corresponds to 4-bit binary value), the data is the original code, converted to a signed decimal form, first look at the sign bit to judge it as negative, then the complement is Fff9-ffff-1 =-7. Step Three (Normalization): Use the complement value-7 divided by 32768, take the decimal point 4 bits (rounded), then equals-0.0002, correct.
The reader can try my method to calculate the 3rd 4th number in the right of (1), whether it corresponds to the 2nd number of the right (2).
3.2 C + + implementation
Then the C + + implementation, is to read the original sample data, read 16 bytes each time, and then convert 16 bytes of 16 binary numbers into decimal numbers, and then converted to its complement, and normalized. Note the size end and symbol issues when converting.
Specific C + + code, I have shared, readers can see: http://www.oschina.net/code/snippet_1768500_39013
Reference documents
1. http://www.cnblogs.com/liyiwen/archive/2010/04/19/1715715.html
C + + implementation of the Wavread function of "WAV audio parsing"