Transferred from: http://shanewfx.github.io/blog/2013/08/14/caprure-audio-on-windows/
When a task is received in the previous period, the output signal of the sound card needs to be collected in order to mix with the microphone's input signal.
before considering how to implement this requirement, let's discuss three modes of computer sound:
1) Render modeThis method is actually playing (output) sound, common APIs such as PlaySound, WAVEOUTXXX, DirectSound, etc.
2) Capture modeThis approach is actually input (input) sound, that is, we enter the sound through the microphone, common API such as Waveinxxx
3) Loopback modeThis is the way we need to do it, to grab the sound that's playing in the speaker.
For the above 3 ways, the render and capture method should be better understood, also is the system has direct support of the way the API, the loopback way is more strange, on XP, the system is not formally supported in this way, Loopback's recording method actually involves CD copyright issues.
Have not studied audio technology before, this time to take a little time to learn about the windows on the acquisition of audio-related technologies.
There are several main techniques for audio processing:
Capture Microphone Input
Collect sound card output
Send audio data to the sound card for playback
mixing multi-channel audio inputs
API for audio processing on 1.Windows
On the Windows operating system, commonly used audio processing technology mainly includes:
Wave Series API functions,
DirectSound,
Core Audio.
Of these, Core audio can only be used in more than Vista (including Vista) operating systems, mainly to replace the Wave series API functions and DirectSound.
Core Audio is also more powerful, enabling the acquisition of microphones, the acquisition of sound card output, and the playback of sound control.
The Wave series API functions are mainly used to achieve the acquisition of the microphone input (using the WaveIn Series API functions) and control the playback of the sound (after using the Waveout series functions).
DirectSound can achieve the function estimate and Wave series API almost, may be stronger (because no use of DirectSound, not sure!).
In order to realize the compatibility of the acquisition module to the operating system better, basically the acquisition of microphone input using the WaveIn series API function more;
In a Windows XP system, there is no direct API to capture the output of the sound card, so the acquisition of the sound card output by Windows XP can be tricky. It is often possible to choose a sound card that supports mixing, and then capture it by using a sound card's mixing module, but not all sound cards support mixing, and such schemes are not universal.
To achieve universality, you can use the way of virtual sound card to achieve, from the drive layer to obtain the sound card output data, but this scheme is more difficult to achieve.
In more than Vista systems, such as Win7, you can use the API functions in core audio to achieve the ability to collect sound card output.
For the implementation of the mixing module, the basic method is to use a custom mixing algorithm to complete the function, the system does not have a direct API function to call.
2. Using the WaveIn Series API functions for microphone input acquisition
The API functions involved:
Waveinopen
The audio capture device is turned on, and the device handle is returned after success, and subsequent APIs need to use the handle
The calling module needs to provide a callback function (WAVEINPROC) to receive the collected audio data
Waveinclose
Turn off the Audio capture module
After success, the device handle returned by Waveinopen will no longer be valid?
Waveinprepareheader
Preparing the space for the audio capture data cache
Waveinunprepareheader
Emptying the data cache for audio capture
Waveinaddbuffer
Provide ready-to-use audio data caches to audio capture devices
You need to call Waveinprepareheader before calling the API.
Waveinstart
Acquisition of audio data by controlling the audio acquisition device
Waveinstop
Control audio acquisition device stop audio data acquisition
Once the audio data has been collected by the acquisition device, the callback function set in Waveinopen is called.
The parameters include a message type, which can be manipulated according to its message type.
If the Wim_data message is received, the new audio data is collected so that the audio data can be processed as needed.
(The example is later mended)
3. Use core audio to capture sound card output
The interfaces involved are:
Immdeviceenumerator
Immdevice
Iaudioclient
Iaudiocaptureclient
Main process:
Creating a multimedia Device enumerator (Immdeviceenumerator)
Get the sound card interface (Immdevice) via the multimedia device enumerator
Obtaining the sound card client interface via the sound card Interface (iaudioclient)
Through the sound card Client interface (iaudioclient) to obtain audio parameters of the sound card output, initialize the sound card, get the size of the sound card output buffer, turn on/Stop the acquisition of the sound card output
Acquisition of sound card output data and control of internal buffers via the sound card capture Client interface (iaudiocaptureclient)
(The example is later mended)
4. Commonly used mixing algorithms
Mixing algorithm is the multi-channel audio input signal according to a certain rule of operation (multi-channel audio signal added to do the limiting processing), to obtain a mixed audio, and as the process of output.
I've done this one, and I've been searching for a bit. There are several mixing algorithms:
Multiple audio input signals are added directly and as output
The multi-channel audio input signal is added directly and then divided by the number of mixing channels to prevent overflow
The multi-channel audio input signal is added directly and after the clip operation (the data is limited to the maximum and minimum), if there is overflow, set the maximum value
The multi-channel audio input signal is added directly to and after saturation processing, near the maximum distortion
The multi-channel audio input signal is added directly after the sum, do normalization processing, all multiply a coefficient, so that the amplitude of normalization
Using the attenuation factor to limit the amplitude after adding the multi-channel audio input signal directly
The following is the XP era audio frame composition, the architecture of audio composition and compression are in the system kernel:
In XP this way, we want to catch the sound of sound card playback no formal way, generally only 2:
One is the virtual sound card, there is also the hook audio playback related API (many times we will find that API Hook is no way to the almighty approach ^_^)
But after Vista, Microsoft modified the original media architecture to re-encapsulate the core audio API in a COM way:
You can see that the original Auido APIs (Wavexxx, Mixerxxx, and DirectSound) rely on the lower-level new encapsulated core Audio APIs, and these APIs work in user mode, which means that the sound composition is implemented in user mode via software. After Vista, you can see that we can control the sound of each application individually because each audio can work in a different audio session. With the new core audio API, we can easily achieve sound capture of sound cards
Capturing sound card data above windows