C language parsing WAV audio files

Source: Internet
Author: User
Tags fread function prototype mongodb postgresql readable redis
C Language resolution WAV audio file code address: Github:github.com/casterwx/c-wave-master Directory
    • Objective
    • Understanding WAV Audio Files
    • What is a binary file
    • Binary format parsing of WAV
    • C language parsing WAV audio files
    • Two details
    • Summarize

In the computer has a variety of files, such as EXE executable file, jpg This image file, we usually see the TXT, or c,cpp,php and other code files.

If you open these files in Notepad or other plain text editor, you will find that the previous type of file is basically garbled after opening, that is, non-human readable characters, and later such code or TXT file opened is a human readable string.

If we make a classification of these files, then the exe,jpg, such as the preceding one, is a file that we can not understand the outer planet text, which is called a binary file, and the latter file is called a text file.

The latter kind of classification is a good understanding of text files, after all, we know the text, but the front of those garbled why call him a binary file? How these binary files are recognized by the computer, why these garbled can be recognized by the computer, and to release melodious music or lifelike pictures? We learn programming, computer-making people can also write a program to interpret the data out? Please listen to this column of pigs slowly.


We'll take a step-by-step look at some of the basic library uses of C, and how to use these libraries to parse an audio file in WAV format and extract the metadata (that is, some of the properties of the audio file). So you need to have basic computer fundamentals and understand the C language, preferably with an interest in audio or signal processing.

Understanding WAV Audio Files

Here is the explanation of Baidu Encyclopedia

WAV is a sound file format developed by Microsoft Corporation (Microsoft) that complies with the riff (Resource Interchange File format) files specification for saving audio information resources for the Windows platform. Widely supported by the Windows platform and its applications, this format also supports a variety of compression algorithms such as Msadpcm,ccitt A Law, supports a variety of audio numbers, sampling frequency and channels, standard formatted WAV files and CD formats, Also 44.1K sampling frequency, 16-bit quantization number, so the sound file quality and CD-Similar! The WAV Open tool is a media player for Windows.
Typically three parameters are used to represent sound, quantify the number of bits, sample frequency, and sample point amplitude. Quantization bits are divided into 8-bit, 16-bit, 24-bit, three-channel channels with mono and stereo, mono-amplitude data of n1 matrix points, stereo n2 matrix points, sampling frequency 11025Hz (11kHz), 22050Hz (22kHz) and 44100Hz ( 44KHZ) three kinds, but despite the excellent sound quality, the compressed file size is too large! Relative to other audio formats is a disadvantage, its file size is calculated as: WAV format file occupied capacity (B) = (sampling frequency x quantization digit x channel) x Time/8 (byte = 8bit) each minute the size of the WAV format audio file is 10MB, its size does not vary with the volume size and sharpness and change.

We usually download songs in a variety of music players will see a variety of parameters, such as the normal quality of the stream is 128k, high quality is 320k, and lossless Ape,flac and other formats. There are also times when we encounter various parameters, such as sample rate, quantization accuracy, and whether the audio file is mono or stereo, and so on, using various audio format conversion tools.

We are now listening to MP3 format music, WAV now in addition to the Windows recorder, there is basically no place to use, why still use him to do the example? This is because WAV is essentially an uncompressed raw audio file, and his file structure is not very complex, so it can be used as a learning sample format for our beginners. You can follow the same idea to learn other formats yourself.

What is a binary file

Binary files, essentially a file that uses binary means to store the contents of files collectively, we have said before using Notepad and other tools to see is garbled, then how we analyze him, you can use Ultraeditor,hxd,c32asm and so on. For example, I'm using HxD to turn off music for Windows 7 (C:\Windows\Media\Windows Shutdown.wav) This way, on the left is the binary representation of this WAV audio file, and the right side is the ASCII representation of the binary number, since numbers like 00 are not shown in ASCII, so the right side of the interface is a point. And what do the numbers on the left side of these 52,49 correspond to? In fact, these binary numbers appear garbled, in fact, there are certain specifications, as long as we or our computer above the application to understand this specification, you can follow this specification to interpret it.

Binary format parsing of WAV

According to various data on the network, the wave file is essentially a riff format, which can be abstracted into a tree (a kind of data structure).

We see this graph above, which corresponds to the offset of the binary data in the file relative to the starting position, from top to bottom respectively. Each grid corresponds to a field, field size represents the size of each field, and depending on the size and current offset, we can also calculate the starting address (offset) of the next field.

Next, let's explain what each of the above fields means. According to Riff's specification, the top chunk of the entire WAV file is the chunkid riff of this chunk, which can also explain why we can see in the previous picture that the WAV file starts with a few letters riff. The next chunksize is the size of the chunk under the chunk, and if it is understood by "tree structure", then each child chunk (subchunk) is a tree branch. The format is the actual data for this chunk.

Frankly, a chunk structure is actually three parts, the first part of the identifier is used to illustrate what the chunk is, the second part is how much of this chunk content, for the program to know if you want to find the next chunk the address offset how much to read, And the third part is the actual content.

All right, let's finish the top chunk, we'll take a look. Chunk, the first chunk subchunk1id in a WAV file is a constant FMT, indicating that the content of the subchunk is some metadata of the WAV audio file, that is, some format information of the WAV audio. For example, audioformat This field is typically 1, indicating that the WAV audio is encoded as PCM. The numchannels is the number of channels for the WAV audio file. The samplerate is the sampling rate and the byterate is the sampling rate. Blockalign is the average size of each block, which equals numchannels * bitspersample/8, as to what the block is, and how its calculation formula is derived to look at another subchunk. The bitspersample is sampled bits per second, some of which are called quantization accuracy or PCM bit width. (not in the elegant)

Another sub-chunk is subchunk2id is in the WAV file constant data, that is, the WAV audio file of the actual audio data, said the professional point, which is stored in the audio sample data. But if our audio is two-channel, then the data sampled at one sampling time is actually composed of the left and right channels. And this co-composed sample we made him a block. In front of the Blockalign = Numchannels * BITSPERSAMPLE/8, this is now very well understood, as to why the end is divided by 8, this is because the computer is a 8 binary number represents a byte, so divide by 8来 to find the number of bytes.

As for the length of the audio, we can divide by Subchunk2size by Byterate, that is, the total length of the chunk of the actual audio data divided by the number of bytes per second to how many seconds.

C language parsing WAV audio files

In front of so many, now the problem, how to program to achieve the interpretation of the above mentioned meta-data. C language Basic binary file operation function has Fopen,fread and so on. (Note that it is a binary file operation function, so we do not discuss fgets, this is the normal text file operation function)

Fread is a function. Reads data from a file stream, reading up to count entries, each item of size bytes, if the call succeeds in returning the number of entries actually read to (less than or equal to count), and returns 0 if it is unsuccessful or read to the end of the file.

Its function prototype is

size_t fread ( void *buffer, size_t size, size_t count, FILE *stream) ;

And the C language has a type called struct, which is stored sequentially in memory. Just as we have learned the order of the WAV file in the file and the meaning of each part in that order, we can define the good one struct in advance according to the previous WAV file structure, and then initialize the struct in main function. And with the first parameter of fread into the initialized struct, the file is automatically read after execution, and the metadata is automatically populated into our initialized struct in order. We can take these metadata directly from the struct.

The code is as follows:


#include <stdio.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include "wave.h"
 int main()
    FILE *fp = NULL;
     Wav wav;
     RIFF_t riff;
     FMT_t fmt;
     Data_t data;
     fp = fopen("test.wav", "rb");
     if (!fp) {
         printf("can't open audio file\n");
     fread(&wav, 1, sizeof(wav), fp);
     riff = wav.riff;
     fmt = wav.fmt;
     data = wav.data;
     printf("ChunkID \t%c%c%c%c\n", riff.ChunkID[0], riff.ChunkID[1], riff.ChunkID[2], riff.ChunkID[3]);
     printf("ChunkSize \t%d\n", riff.ChunkSize);
     printf("Format \t\t%c%c%c%c\n", riff.Format[0], riff.Format[1], riff.Format[2], riff.Format[3]);
     printf("Subchunk1ID \t%c%c%c%c\n", fmt.Subchunk1ID[0], fmt.Subchunk1ID[1], fmt.Subchunk1ID[2], fmt.Subchunk1ID[3]);
     printf("Subchunk1Size \t%d\n", fmt.Subchunk1Size);
     printf("AudioFormat \t%d\n", fmt.AudioFormat);
     printf("NumChannels \t%d\n", fmt.NumChannels);
     printf("SampleRate \t%d\n", fmt.SampleRate);
     printf("ByteRate \t%d\n", fmt.ByteRate);
     printf("BlockAlign \t%d\n", fmt.BlockAlign);
     printf("BitsPerSample \t%d\n", fmt.BitsPerSample);
     printf("blockID \t%c%c%c%c\n", data.Subchunk2ID[0], data.Subchunk2ID[1], data.Subchunk2ID[2], data.Subchunk2ID[3]);
     printf("blockSize \t%d\n", data.Subchunk2Size);
     printf("duration \t%d\n", data.Subchunk2Size / fmt.ByteRate);


typedef struct WAV_RIFF {
    /* chunk "riff" */
    char ChunkID[4];   /* "RIFF" */
    /* sub-chunk-size */
    uint32_t ChunkSize; /* 36 + Subchunk2Size */
    /* sub-chunk-data */
    char Format[4];    /* "WAVE" */
} RIFF_t;

typedef struct WAV_FMT {
    /* sub-chunk "fmt" */
    char Subchunk1ID[4];   /* "fmt " */
    /* sub-chunk-size */
    uint32_t Subchunk1Size; /* 16 for PCM */
    /* sub-chunk-data */
    uint16_t AudioFormat;   /* PCM = 1*/
    uint16_t NumChannels;   /* Mono = 1, Stereo = 2, etc. */
    uint32_t SampleRate;    /* 8000, 44100, etc. */
    uint32_t ByteRate;  /* = SampleRate * NumChannels * BitsPerSample/8 */
    uint16_t BlockAlign;    /* = NumChannels * BitsPerSample/8 */
    uint16_t BitsPerSample; /* 8bits, 16bits, etc. */
} FMT_t;

typedef struct WAV_data {
    /* sub-chunk "data" */
    char Subchunk2ID[4];   /* "data" */
    /* sub-chunk-size */
    uint32_t Subchunk2Size; /* data size */
    /* sub-chunk-data */
//    Data_block_t block;
} Data_t;

//typedef struct WAV_data_block {
//} Data_block_t;

typedef struct WAV_fotmat {
   RIFF_t riff;
   FMT_t fmt;
   Data_t data;
} Wav;

Execution results

Two details

1, fopen when our mode to be set to "RB", R indicates that read,b represents binary, that is, binary read mode. This is somewhat different from reading the traditional text file format.

2, the struct type inside I use is the uint32_t and so on the type, but is not the traditional int,short and so on, this is in order to consider the different compiler, different platform for the int type allocates the memory space inconsistency question. These types are provided by the Stdint.h header file, so we need to import it on the head.


In fact, any binary data has its own parsing specifications, which is a bit like when we learn the computer network when the "Protocol", as long as we follow this specification or "protocol", then we can be the file really hidden information read out.

We just read a WAV audio file in the meta-data, not to its data chunk, that is, the actual audio of the digital signal read out, because this involves the conversion of digital mode signal and other knowledge, beyond the scope of our research,

Alibaba Cloud Hot Products

Elastic Compute Service (ECS) Dedicated Host (DDH) ApsaraDB RDS for MySQL (RDS) ApsaraDB for PolarDB(PolarDB) AnalyticDB for PostgreSQL (ADB for PG)
AnalyticDB for MySQL(ADB for MySQL) Data Transmission Service (DTS) Server Load Balancer (SLB) Global Accelerator (GA) Cloud Enterprise Network (CEN)
Object Storage Service (OSS) Content Delivery Network (CDN) Short Message Service (SMS) Container Service for Kubernetes (ACK) Data Lake Analytics (DLA)

ApsaraDB for Redis (Redis)

ApsaraDB for MongoDB (MongoDB) NAT Gateway VPN Gateway Cloud Firewall
Anti-DDoS Web Application Firewall (WAF) Log Service DataWorks MaxCompute
Elastic MapReduce (EMR) Elasticsearch

Alibaba Cloud Free Trail

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.