C + + implementation of the Wavread function of "WAV audio parsing"

Source: Internet
Author: User

This paper consists of three parts, the first part of the background-audio type and the motive of this article, the second classification than MATLAB under the Wavread () function, the third part gives the C + + implementation of the function.

A Background introduction1.1 Motive of this article

1) All WAV audio processing is based on the WAV format of the file parsing out, to parse the group to do for us to do the subsequent processing (FFT, etc.).

2) in MATLAB directly has a very useful function wavread (' test.wav '), the input is WAV audio, the output is an array, as described in Chapter two.

3) The general C + + function reads the data, the format as described in section 1.2, however, regardless of the format, the data can be converted to each other.

In view of this, this article will introduce how to fully implement the Wavread function of Matlab with C + +, the output data format is identical, in this process, we can also appreciate the nature of the data in the document, and the relationship between the transformation.

1.2 Audio Type

RIFF is all called the Resource Interchange File format (Resourcesinterchange fileformat), RIFF files are a file structure that most multimedia files in the Windows environment follow. The data type that the riff file contains is identified by the file's extension, and the data that can be stored in the riff file includes: Audio Video Interleaved format data (. AVI) waveform format data (. WAV) Bitmap Format data (. RDI) MIDI Format data (. RMI) palette format (. PAL) Multimedia Movies (. RMN) animation cursor (. ANI) other riff files (. BND).

C hunk is the basic unit that makes up the riff file , and its basic structure is as follows:

struct chunk{u32 ID;//consists of 4 ASCII characters to identify the data contained in the block.    such as: ' RIFF ', ' LIST ', ' fmt ', ' data ', ' WAV ', ' AVI ' and other u32 size;   The block size, which is the length of the data stored in the database field, and the size of the ID and Size field is not included in the value U8 Dat[size]; Block content, the data is arranged in words (word), and if the length of the data structure is odd, a null byte is added last;

1.3 WAV audio files

Wave files are one of the sound waveform file formats used in multimedia, and are standard in the format riff (Resource Interchange File format). The first four bytes of each wave file are "RIFF". Similarly, WAVE files consist of two main parts: the file header and the data body. The file header is divided into riff/wav file identification segment and sound data Format Description Section two parts. The contents and format of the wave file are shown later in this article.

There are two main types of sound files, which correspond to mono (11.025KHz sample rate, 8Bit sample value) and dual channel (44.1KHz sample rate, 16Bit sample value). The sampling rate is the number of times the sound signal is sampled in units of time during the "modulo → number" conversion process. The sampled values refer to each sampling period
The integral value of the internal sound analog signal.

For mono sound files, the sampled data is a eight-bit short integer (00H-FFH), whereas for a two-channel stereo sound file, each sampled data is a 16-bit integer (int), and the high eight-bit and low eight-bit respectively represent the left and right two channels.

WAVE file data blocks contain samples expressed in pulse-coded modulation (PCM) format. WAVE files are organized by samples. In a mono WAVE file, Channel 0 represents the left channel and Channel 1 represents the right channel. In a multichannel wave file, the sample is alternately present.

Wave files In addition to the previous small section of the file header to the data organization, the informationblock is the original sound sample data , WAVE files can be compressed, but generally use the uncompressed format. 44.1KHz sample rate, 16Bit resolution, dual channel, so wave can save very high quality sound files, CD Use this format, sound experts or music enthusiasts should be very familiar. But the size of this file is also very large, in 44.1KHz 16bit dual-channel data for example, the amount of one minute of sound data: 4100*2byte*2channel*60s/1024/1024=10.09m. So it's not appropriate to send it online.

Below we specifically analyze the format of WAVE files

Endian

Field name

Size

Big Chunkid 4 File header identification, generally is "RIFF" four letters
Little ChunkSize 4 The size of the entire data file, excluding the above ID and size itself
Big Format 4 It's usually "WAVE" four letters.
Big Subchunk1id 4 Format Description block, this field is generally "FMT"
Little Subchunk1size 4 The size of this data block, excluding the ID and size field itself
Little Audioformat 2 Format description for audio
Little Numchannels 2 Number of channels
Little Samplerate 4 Sample Rate
Little Byterate 4 Bit rate, number of bytes required per second
Little Blockalign 2 Data Block snap-in unit
Little BitsPerSample 2 Resolution of analog-to-digital conversion at sampling
Big Subchunk2id 4 Real sound data block, this field is generally "data"
Little Subchunk2size 4 The size of this data block, excluding the ID and size field itself
Little Data N Sampled data for audio

The following is a detailed explanation of each field:

Chunkid 4bytes The ASCII code represents the "RIFF". (0x52494646)
ChunkSize 4bytes 36+subchunk2size, or
4 + (8 + subchunk1size) + (8 + subchunk2size),
This is the size of the entire data block (excluding the size of Chunkid and chunksize)
Format 4bytes The ASCII code represents "WAVE". (0x57415645)
Subchunk1id New block of data (format information description block)
The ASCII code represents the "FMT"--finally a space. (0x666d7420)
Subchunk1size 4bytes The size of this block of data (for PCM, a value of 16).
Audioformat 2bytes PCM = 1 (for example, linear sampling), if it is a different value, it may be some form of compression
Numchannels 2bytes 1 = Mono | 2 = Dual Channel
Samplerate 4bytes Sample rate, such as 8000,44100 equivalent
Byterate 4bytes equals: Samplerate * numchannels * BITSPERSAMPLE/8
Blockalign 2bytes equals: Numchannels * BITSPERSAMPLE/8
BitsPerSample 2bytes Sampling resolution, that is, each sample is represented by several, usually 8bits or 16bits
Subchunk2id 4bytes New data blocks, real sound data
The ASCII code represents "data"--and finally a space. (0x64617461)
Subchunk2size 4bytes Data size, that is, the size of the sampled data followed by.
Data N bytes Real Sound data

For data blocks, depending on the number of channels and the sample rate, the layout is as follows (each column represents 8bits):

1). 8 Bit Mono:

Sampling 1 Sampling 2
Data 1 Data 2

2). 8 Bit Dual Channel

Sampling 1 Sampling 2
Channel 1 Data 1 Channel 2 Data 1 Channel 1 Data 2 Channel 2 Data 2

3). Single Bit Mono:

Sampling 1 Sampling 2
Data 1 Low byte Data 1 High-byte Data 1 Low byte Data 1 High-byte

4). Two-Bit dual channel

Sampling 1
Channel 1 data 1 Low byte Channel 1 Data 1 high-byte Channel 2 data 1 Low byte Channel 2 Data 1 high-byte
Sampling 2
Channel 1 data 2 low byte Channel 1 Data 2 high-byte Channel 2 data 2 low byte Channel 2 Data 2 high-byte

Let's look at a specific example of WAV audio file as follows: (16 binary form)

74 20 10 00 00 00 01 00 02 00 22 56 00 00 88 58 01 00 04 00 10 00 64 61 74 61 00 from xx to XX (XX) 1e 3c 3c, all of them

The corresponding analysis is as follows:

Analyze data For example: The shape of ' FFFF ' is a complete data we need. such as the SAMPLE3:3C and 13 are two numbers together is a number we need, 3c 13, but the right end is big, then 3c 13, hexadecimal number 3c bitwise conversion to 2 0011, the same 1100 bitwise conversion to 13 binary 2 0011, then the binary number of the connected 16bits is 0011 1100 0001 0011, then we can see that the sign bit is 0, that is, positive.

The Wavread () function in MATLAB
    1. Wavread (' Testwav.wav ')

      Readers try the output. For example, take one of my sound files ' testwav.wav ' and output the last 10 data as:

-0.0001-0.0001-0.0002-0.0003-0.0002-0.0002-0.0002-0.0003-0.0002-0.0002

2. Wavread (' testwav.wav ', ' native ')

Readers can try out the output. The last 10 data for my ' testwav.wav ' output are:

-4-2-8-9-7-8 -8-11-5-7

The conversion equation between the output data of 1 and 2 is:-0.0002 = -7/32768 (where 32768 = 2 ^15, or 2 to 15 power. This is normalization. Because the encoding is 16bits)

C + + implementation of three Readwav

As described above, we come to the topic, how to use C + + to implement the Wavread (' testwav.wav ') function in Matlab, and the output is consistent.

3.1 Encoding conversion rules

Before we introduce, we need to understand the relationship between these strings of data. This chapter analyses the data of the Test.wav file as an example:

(1) The data block of the wave file, which is the last 20 of the raw sampled data, is:

FC FF FE FF F8 FF F7 FF F9 ff F8 ff F8 ff F5 FF FB FF F9 FF

(2) The last 10 data parsed in MATLAB are:

-0.0001-0.0001-0.0002-0.0003-0.0002-0.0002-0.0002-0.0003-0.0002-0.0002

The two sets of data between the original code and the complement of the relationship, that is (1) is the original code and (2) is the complement.

The step of converting from data (1) to Data (2) is to convert (1) to its complement, then the complement by 32768, then get (2).

The principle of conversion between the original code and the complement:

(Conversion in 2 binary form): Wakahara code is a positive number, then the complement is its own. The Wakahara code is negative, then the complement is the sign bit, the value bit is reversed, plus 1.

(Conversion in numerical form): The Wakahara code is a positive number, then the complement is itself. Wakahara code is negative, complement = original code-2^16. Warm tip: In order to facilitate the calculation of the value of equivalent substitution 2^16 = FFFF-1.

For a better understanding, illustrate:

Step One (16 bytes per read): Because the data is data from X0000 to XFFFF. Take F9 ff For example, the right end is big, in other words, the right end is high, then it should be fff9. Step two (convert to complement): The bitwise conversion to binary form is 1111 1111 1111 1001 (1-bit 16 binary value corresponds to 4-bit binary value), the data is the original code, converted to a signed decimal form, first look at the sign bit to judge it as negative, then the complement is Fff9-ffff-1 =-7. Step Three (Normalization): Use the complement value-7 divided by 32768, take the decimal point 4 bits (rounded), then equals-0.0002, correct.

The reader can try my method to calculate the 3rd 4th number in the right of (1), whether it corresponds to the 2nd number of the right (2).

3.2 C + + implementation

Then the C + + implementation, is to read the original sample data, read 16 bytes each time, and then convert 16 bytes of 16 binary numbers into decimal numbers, and then converted to its complement, and normalized. Note the size end and symbol issues when converting.

Specific C + + code, I have shared, readers can see: http://www.oschina.net/code/snippet_1768500_39013

Reference documents

1. http://www.cnblogs.com/liyiwen/archive/2010/04/19/1715715.html

C + + implementation of the Wavread function of "WAV audio parsing"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.