VC calls ACM audio programming interface to compress wave audio

Last Update:2018-12-05 Source: Internet

Author: User

Tags coding standards

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

Audio and video are the main methods for multimedia applications to provide information to users. These audio and video data generally have a high sampling rate, and the compressed raw data has practical value, otherwise, it will not only occupy a large amount of storage space, but also have a low validity rate during playback or network transmission. Therefore, audio and video digital compression encoding is widely used in multimedia applications. This article mainly describes Audio Encoding and compression.

There are many kinds of audio encoding compression methods, such as ITU-T low latency code excitation linear prediction coding based on LD-CELP g.728 speech coding protocol, PCM Based on ITU-T g.711 speech coding protocol (pulse code modulation, pulse Coding Modulation) coding and the voice coding standards of GSM digital cellular mobile phones that we are very familiar. These different compression methods have different data compression ratios and restoration sound quality. The specific encoding formats and algorithms are quite different. Most protocols are complex, and common programs are difficult to implement their encryption and decompression algorithms. However, ACM and VCM technologies are introduced for Windows 98 operating systems that provide strong support for multimedia, it is used to manage all audio and video encoding and Decoder (CODER-decoder) in the system, that is, codecs, which is used to implement audio and video data encoding and decoding drivers ). You can use the programming interfaces provided by them to call the existing ready-made codecs in the system to add and decompress audio data. The audio codecs in Windows 98 supports some early audio data compression standards, such as ADPCM (adaptive differential pulse code modulation, adaptive Differential Pulse Coding Modulation) coding, internet Explorer 5.0 and other applications include audio codecs that support some newer compression standards, such as MPEG Layer 3. This article describes the ACM audio compression interface programming method. The programming tool used is Microsoft Visual C ++ 6.0.

　　Implementation

Although a codec can theoretically be used to compress or decompress any data stream, however, there are a variety of codecs designed to achieve a higher compression ratio, higher fidelity or real-time compression performance to compress a specific data type. For example, the best way to obtain a high compression ratio of video compression is to apply audio data without necessarily achieving the same effect.

The main principle of compressing audio data is to reduce the amount of data required to store a sound sequence. A small amount of data means that the voice occupies less space and can be transmitted over the network through MODEM at a faster speed. If the data is compressed in a common format supported by Windows, it can be directly played without being manually decompressed. The system uses its own codecs to decompress the data and play it back. Windows 98 comes with several standard codecs, such as DSP Group, Inc. truespeech codec. Therefore, any program we write that applies to Windows 98 can apply the codec. Which codecs exists in the system can be found on the "device" tab of the "Multimedia" option of the control panel.

Codec supports conversion from the source audio format to the target format. In practice, some codec may not directly convert the source audio format to the target format, for example, if we input PCM Data with a frequency of 11025Hz, 8-bit data, and single channel to a multimedia computer through a microphone, if we use the truespeech codec of the system for processing, it will cause a failure, this codec can only process data in 8 kHz and 16-bit single channel. Therefore, the two-step conversion method is used to convert the source format to an intermediate format, and then convert the intermediate format to the target format, because linear PCM encoding is the simplest, it is supported by the vast majority of codec, so the intermediate format is generally selected as a linear PCM format. For example, you can first convert the original data to the intermediate PCM format supported by truespeech codec, and then convert it to the final compression format through truespeech codec.

　　Program Design and Implementation

The API function for ACM is defined in the header file msacm. h, in addition to adding reference to this header file in the project, ACM programming must also contain the header file mmsystem. H and mmreg. h. These two header files define the most basic constants and data structures in multimedia programming. To avoid the unavailability of functions and functions provided by later ACM versions in later ACM versions, the program should call the acmgetversion function to query the ACM version information on the user's machine.

Although you can obtain the information about a certain audio codecs manually based on the control panel, you often need to know whether a certain audio codecs exists in the application and obtain its codec parameters and other information, you can use the callback function find_format_enum to enumerate the audio compression formats in the system:

Bool callback find_format_enum (hacmdriverid Hadid, lpacmformatdetails pafd, DWORD dwinstance, DWORD fdwsupport)
{
Find_driver_info * PDI = (find_driver_info *) dwinstance;
If (pafd-> dwformattag = (DWORD) PDI-> wformattag ){
PDI-> Hadid = Hadid;
Return false; // stop Enumeration
}
Return true; // continue Enumeration
}

The find_driver_info used in the callback function is a custom data structure. The two member variables are used to save the handle of the ACM drive letter and the data format to be converted respectively:

Typedef struct {
Hacmdriverid Hadid;
Word wformattag;
} Find_driver_info;

Now we can enumerate all the drivers in the system. The enumerated functions we call in the program Use callback functions to report the data of each device. This is a common method in Windows programming. To get more details about a driver's capabilities, you must load the driver and open it by calling acmopendriver. Once the Driver opens, you can request to enumerate the supported wave data formats. However, there is a problem: All the wave format description structures are based on waveforamtex, and many formats use the extended form of this structure to save their specific information. If we want to enumerate all formats, we need to know how much space is allocated for the driver to fill in detailed information for this structure. You can pass acm_metric_max_size_format to the acm_rics function to obtain the maximum size of the structure. After opening the driver, you must use the acmmetrics function to enumerate the supported formats. This function can obtain useful information about many ACM objects. The main code for implementing this process is as follows:

Bool callback find_driver_enum (hacmdriverid Hadid, DWORD dwinstance, DWORD fdwsupport)
{
......
Mmresult MMR = acmdriveropen (& had, Hadid, 0 );
// Formats supported by Enumeration
......
MMR = acmmetrics (hacmobj) had, acm_metric_max_size_format, & dwsize );
If (dwsize <sizeof (waveformatex) dwsize = sizeof (waveformatex );
Waveformatex * PWF = (waveformatex *) malloc (dwsize );
......
PWF-> cbsize = loword (dwsize)-sizeof (waveformatex );
PWF-> wformattag = PDI-> wformattag;
Acmformatdetails FD;
......
FD. cbstruct = sizeof (FD );
FD. pwfx = PWF;
FD. cbwfx = dwsize;
FD. dwformattag = PDI-> wformattag;
MMR = acmformatenum (had, & FD, find_format_enum, (DWORD) (void *) PDI, 0); // enumeration format
......
Acmdriverclose (had, 0); // close the drive
......
}

Find the ACM drive letter corresponding to the specified format. You can use the acm api function acmdriverenum to enumerate all audio codecs. In acmdriverenum () the callback function find_driver_enum described above is specified in the parameter. You can query the information of each Codec in one step, and finally obtain the ACM drive letter handle. The name of the callback function that implements this function is find_driver, which will be used later in this article.

Before converting the original wave audio data to the intermediate PCM format data, you need to make preparations to fill in the relevant structure information, including: the waveformatex Structure describes the source format, intermediate PCM format, and final compression format. Enter a waveformatex structure that describes the source data format:

Waveformatex wfsrc;
Memset (& wfsrc, 0, sizeof (wfsrc ));
Wfsrc. cbsize = 0;
Wfsrc. wformattag = wave_format_pcm; // PCM pulse Coding Modulation
Wfsrc. nchannels = 1; // Single Channel
Wfsrc. nsamplespersec = 11025; /// 11.025 kHz
Wfsrc. wbitspersample = 8; // 8 bit
Wfsrc. nblockalign = wfsrc. nchannels * wfsrc. wbitspersample/8;
Wfsrc. navgbytespersec = wfsrc. nsamplespersec * wfsrc. nblockalign;

Then, use the callback function find_driver mentioned above to obtain the ACM Driver Number of the driver corresponding to the intermediate data format specified by wformattag, here, the truespeech codec that comes with Windows 98 is specified by wave_format_dspgroup_truespeech:

Word wformattag = wave_format_dspgroup_truespeech;
Hacmdriverid Hadid = find_driver (wformattag );

With the driver selected, A waveformatex structure is created for the compressed data format generated by the final driver and a waveformatex structure is generated for the intermediate PCM format used by the driver for input:

Waveformatex * pwfdrv = get_driver_format (Hadid, wformattag); // obtain the format details

The number of bits in the driver format is stored in the wbitspersample member variable in the structure pwfdrv, and the sampling rate of the driver format is stored in nsamplespersec. Then you can use a very similar method to obtain the PCM format labels supported by the driver:

Waveformatex * pwfpcm = get_driver_format (Hadid, wave_format_pcm );

After obtaining all the required information, you can start to convert the data. The conversion is implemented by the object called stream by ACM. We can open the stream, pass the source format and target format to it, and require it to be converted. First, convert it to the intermediate PCM format. Set Wave Convert audio Codec Supported PCM Format

Codec converts source wave audio to the PCM format supported by codec, and any drive that can be used for conversion between PCM can be used. It is also important to specify the acm_streamopenf_nonrealtime flag when opening the conversion stream. If this flag is omitted, some drivers (such as truespeech codec) will report error 512nd "impossible. This error indicates that the required conversion cannot be performed in real time. If you want to convert a large amount of data while trying to play the data, you must pay attention to this. The following is a brief description of the conversion process:

MMR = acmstreamopen (& hstr,
Null, // any drive
& Wfsrc, // source format
Pwfpcm, // target format
Null, // No Filtering
Null, // No callback
0, // initial data
Acm_streamopenf_nonrealtime );

The size of the output buffer is calculated based on the average rate of the byte, and a bit is added. If this extra space is not available, the ima_adpcm driver cannot be converted. The intermediate conversion result is stored in pdst1data:

DWORD dwsrcbytes = dwsrcsamples * wfsrc. wbitspersample/8;
DWORD dwdst1samples = dwsrcsamples * pwfpcm-> nsamplespersec/wfsrc. nsamplespersec;
DWORD dwdst1bytes = dwdst1samples * pwfpcm-> wbitspersample/8;
Unsigned char * pdst1data = new unsigned char [dwdst1bytes];
......
Acmstreamheader strhdr; // fill in the conversion information
Memset (& strhdr, 0, sizeof (strhdr ));
Strhdr. cbstruct = sizeof (strhdr );
Strhdr. pbsrc = cpbuf; // specify the source wave audio data to be converted to the data in cpbuf.
Strhdr. cbsrclength = dwsrcbytes;
Strhdr. pbdst = pdst1data;
Strhdr. cbdstlength = dwdst1bytes;
MMR = acmstreamprepareheader (hstr, & strhdr, 0 );
MMR = acmstreamconvert (hstr, & strhdr, 0); // convert data
......
Acmstreamclose (hstr, 0 );

When the stream is opened, the second parameter is null, indicating that any driver is accepted for conversion. It is only complicated to calculate the buffer size to be allocated to the output data. The conversion between PCM formats does not involve compression and decompression, And the buffer size is calculated directly. As for calling the acm api function named acmstreamprepareheader, it can arrange everything for the driver and allow the driver to lock the memory before the conversion.

　　Generate the final compression format

The conversion process to the final compression format is very similar to that of the previous PCM format, but this conversion provides the handle of the driver to be used when opening the stream. In fact, null can still be used here, because it is predicted that the driver exists, but providing a handle prevents the system from wasting time searching for this driver:

MMR = acmstreamopen (& hstr,
Had, // drive handle
Pwfpcm, // source format
Pwfdrv, // target format
Null, // No Filtering
Null, // No callback
0, // instantiate data
Acm_streamopenf_nonrealtime );

In addition, it is difficult to calculate the size of the buffer for data compression. The navgbytespersec field in the waveformatex structure indicates the average rate of bytes read during playback. We can use it to estimate how much space is needed to store compressed waves. The data provided by some drivers is indeed average, rather than the value in the worst case, so I chose to increase the space by 50%. This method is a waste in practice, but it is very effective:

DWORD dwdst2bytes = pwfdrv-> navgbytespersec * dwdst1samples/pwfpcm-> nsamplespersec;
Dwdst2bytes = dwdst2bytes * 3/2;
Unsigned char * pdst2data = new unsigned char [dwdst2bytes];

The unsigned 2D pointer pdst2data is used to store the compressed final wave audio data, and its size is estimated and saved to dwdst2bytes. Once the conversion is completed, the cbdstlengthused field of the acmstreamheader structure specifies the actual number of bytes used by the buffer. You can use it to calculate the compression ratio:

Double result = (double) dwsrcbytes/(double) strhdr2.cbdstlengthused;

When the source signal is 8 K sampling, 16 bits PCM encoding, single channel, and a wave audio signal with a length of 1 second, the driver uses the truespeech audio codec that comes with Windows 98, it can achieve a compression rate of about, which is quite satisfactory.

　　Summary:

This article takes truespeech codec as an example to introduce how to use ACM audio compression programming interface to implement wave audio compression. If you have your own compression format, you can also create and install your own codec. The implementation method is similar. After understanding the above programming ideas, you can write the corresponding decompression program with slight changes to the Code. In Windows 2000 Professional, this program is compiled by Microsoft Visual C ++ 6.0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More