Audio Processing in WINDOWS
In WINDOWS, audio processing can be roughly divided into two parts: audio input and output, and ACM compression.
In WINDOWS, Apis such as sndPlaySound (MCI) can be called to play a WAV file, but this is obviously not what we need to do. You must be able to directly Process audio data streams. In WINDOWS, a series of APIs are also provided for it. A group of Apis starting with waveIn and waveOut are doing this.
Enter it first. Frequently Used APIs include waveInOpen (enable an audio input device), waveInPrepareHeader (prepares a header for the input buffer to be called in waveInAddBuffer), and waveInAddBuffer (adds a data buffer for input) waveInStart (Start recording), waveInClose (disable audio input device), and a callback function or thread that needs to be specified in waveInOpen, it is called after a data buffer is fully recorded to process the data and perform other related operations.
First, you must determine the callback method to be used, that is, after the audio data of a time slice is recorded, Windows will use this callback to activate the data processing process, generally, functions, threads, and events are used, while functions and threads are convenient and simple. FUNCTION means that Windows will call your FUNCTION, while THREAD is activated by Windows. These are all specified in waveInOpen. Its function prototype is:
MMRESULT waveInOpen (LPHWAVEIN phwi,
UINT uDeviceID,
LPWAVEFORMATEX pwfx,
DWORD dwCallback,
DWORD dwCallbackInstance,
DWORD fdwOpen
);
Phwi is the address for storing the returned handle, and uDeviceID is the ID of the audio device to be opened. It is generally specified as WAVE_MAPPER. DwCallback is the address of the specified callback function or thread. fdwOpen specifies the callback method. dwCallbackInstance is the USER parameter to be sent to the callback function or thread. As for pwfx, it is critical that it specifies the audio format to enable the audio input device. It is a structure of WAVEFORMATEX:
Typedef struct {WORD wFormatTag;
WORD nChannels;
DWORD nSamplesPerSec;
DWORD nAvgBytesPerSec;
WORD nBlockAlign;
WORD wBitsPerSample;
WORD cbSize;
} WAVEFORMATEX;
When installing WIN9X on the machine, audio compression is selected. You can specify some compressed audio formats in the wFormatTag, such as G723.1, ture dsp, and so on. However, the WAVEFORMAT_PCM format is generally used, that is, the uncompressed audio format. For compression, you can call the ACM mentioned below after recording.
NChannels indicates the number of audio channels, which can be 1 or 2. NSamplesPerSec is the number of samples per second, and several standard values are 8000, 11025, 22050, and 44100. I have not tried other non-standard values. NAvgBytesPerSec is the average number of bytes per second. in PCM mode, it is equal to nChannels * nSamplesPerSec * wBitsPerSample/8, but for other compressed audio formats, because many compression methods are performed by time slice, for example, G723.1, is to take 30 ms as a compression unit. In this way, nAvgBytesPerSec is only an approximate number and is not accurate, the calculation in the program should not be based on this amount. This is important in the following compressed audio output and ACM audio compression. NBlockAlign is a special value that represents the minimum processing unit for audio processing. For PCM non-compression, It is wBitsPerSample * nChannels/8, but for non-compression formats, the minimum unit for compression/decompression, for example, G723.1, is the 30 ms data size (20 bytes or 24 bytes ). WBitsPerSample is the number of digits per sample value, 8 or 16. CbSize indicates the number of bytes after the standard header of the WAVEFORMATEX structure. For many non-PCM audio formats, there are some custom format parameters, these are immediately followed by the standard WAVEFORMATEX, And the size is specified by the cbSize. The PCM format is 0, or ignore the check.
After these parameters are specified, you can enable the audio input device. The following is how to prepare several buffer zones for recording. Multiple buffers are usually prepared and used cyclically in the callback. In addition, you have to consider where the recorded audio data is stored, such as a temporary file, you have to prepare the file handle. For the buffer zone, you must use waveInPerpareHeader to prepare the header. This API is relatively simple. If you use a buffer loop, you only need to call the waveInPrepareHeader once for each buffer zone.
After everything is ready, you can call waveInAddBuffer and waveInStart to start recording. As soon as you call this waveInStart, the recording starts. Even if the buffer is full, you are not added to the new buffer zone, the recording won't stop, but all the audio data in the middle is lost. After the buffer sent through waveInAddBuffer is fully recorded, Windows calls back the recorded voice data in the callback method specified in waveInOpen, if you want to continue recording, add the next buffer. Considering that the processing has a time delay and the audio is very sensitive to time, it is generally necessary to pre-add several buffers. For example, a total of eight buffers are defined, and to ensure security, it is best to ensure that at any time there are at least three buffers that can be used by the recording. Then, when starting the recording, add four buffers, and then in the callback, if the number of currently recorded buffers is n, waveInAddBuffer is called for the number (n + 4) % 8, and the number (n + 1) % 8, (n + 2) the three buffers % 8, (n + 3) % 8 are available, which basically ensures that there is no disconnection interval in the recorded audio.
When you want to end the recording, you 'd better call waveInReset before waveInClose to clear the buffer waiting for the recording. In the callback, you must also pay attention to the type of message sent to the parameter.
The audio output is relatively simple. The corresponding APIs include waveOutOpen, waveOutPrepareHeader, waveOutWrite, and waveOutClose. If you want to directly output compressed audio, pay attention to the audio format parameters specified in waveOutOpen. You must be clear about the parameters and meanings of such formats. However, you can use the ACM (Audio Compress Manager) mentioned below to obtain the specific Audio format parameters you need. This format parameter can be directly used for waveOutOpen. As in audio input, waveOutPrepareHeader is also required. WaveOutWrite is used to fill in the output buffer. To avoid interruption, make sure that the number of buffer queues is sufficient at a certain time.
If audio compression is selected in the attachment during WIN98 installation, ACM on the machine is available. ACM is the Audio Compress Manager. WIN98 provides some common audio compression algorithm packages for users to call. You can use ACM to obtain all the audio compression drivers on the local machine and Their supported audio formats. However, it seems that not every ACM format can be called for compression. However, most of the compression drivers in ACM are for voice frequency bands. If they are used to compress audio in a wider band, such as music, the effect is poor.