Principle of WINDOWS recording program

Source: Internet
Author: User

Dependencies: 1 # pragma comment (Lib, "winmm. lib ")

Audio input is divided into three steps.

1. Enable the device ----- waveinopen (enable an audio input device ),

2. Start recording ------ start recording with waveinstart

3. Disable the recording device ------- waveinclose. Call waveinreset to clear the buffer that is waiting for the recording.

Frequently Used APIs include waveinopen (enable an audio input device), waveinprepareheader (prepares a header for the input buffer to be called in waveinaddbuffer), and waveinaddbuffer (adds a data buffer for input) waveinstart (Start recording), waveinclose (disable audio input device), and a callback function or thread that needs to be specified in waveinopen, it is called after a data buffer is fully recorded to process the data and perform other related operations. Note a data buffer here.

The following describes in detail the relationship between them.

1 --------------- waveinopen

Mmresult waveinopen (lphwavein phwi, // phwi is the address of the returned handle
Uint udeviceid, // udeviceid is the ID of the audio device to be opened, which is generally specified as wave_mapper
Lpwaveformatex pwfx,
DWORD dwcallback, // dwcallback is the address of the specified callback function, thread, window, etc.
DWORD dwcallbackinstance, // dwcallbackinstance is the USER parameter to be sent to the callback function or thread
DWORD fdwopen // fdwopen specifies the callback method: callback_function, callback_thread, and callback_window.
);
As for pwfx, it is critical that it specifies the audio format to enable the audio input device. It is a structure of waveformatex:
Typedef struct {word wformattag;
Word nchannels;
DWORD nsamplespersec;
DWORD navgbytespersec;
Word nblockalign;
Word wbitspersample;
Word cbsize;
} Waveformatex;
You can specify some compressed audio formats in wformattag, such as G723.1, ture DSP, and so on. However, the waveformat_pcm format is generally used, that is, the uncompressed audio format. For compression, you can call the ACM mentioned below after recording.
Nchannels indicates the number of audio channels, which can be 1 or 2. Nsamplespersec is the number of samples per second, and 8000, 11025, 22050, and 44100 are several standard values. Navgbytespersec is the average number of bytes per second. in PCM mode, it is equal to nchannels * nsamplespersec * wbitspersample/8, but for other compressed audio formats, because many compression methods are performed by time slice, for example, G723.1, is to take 30 ms as a compression unit. In this way, navgbytespersec is only an approximate number and is not accurate, the calculation in the program should not be based on this amount. This is important in the following compressed audio output and ACM audio compression. Nblockalign is a special value that represents the minimum processing unit for audio processing. For PCM non-compression, It is wbitspersample * nchannels/8, but for non-compression formats, the minimum unit for compression/decompression, for example, G723.1, is the 30 ms data size (20 bytes or 24 bytes ). Wbitspersample is the number of digits per sample value, 8 or 16. Cbsize indicates the number of bytes after the standard header of the waveformatex structure. For many non-PCM audio formats, there are some custom format parameters, these are immediately followed by the standard waveformatex, And the size is specified by the cbsize. The PCM format is 0, or ignore the check.
After these parameters are specified, you can enable the audio input device. The following is how to prepare several buffer zones for recording. Multiple buffers are usually prepared and used cyclically in the callback. For the buffer zone, you must use waveinperpareheader to prepare the header. This API is relatively simple. If you use a buffer loop, you only need to call the waveinprepareheader once for each buffer zone. You can use it once. See the description of waveinperpareheader. This function is used to locate the data zone address and data size of the buffer zone. For use by the system.
A)

First, you must determine the callback method to be used, that is, after the audio data of a time slice is recorded, Windows will use this callback to activate the data processing process, generally, functions, threads, and events are used, while functions and threads are convenient and simple. Function means that Windows will call your function, while thread is activated by windows. These are all specified in waveinopen.
B)
After everything is ready, you can call waveinaddbuffer and waveinstart to start recording. As soon as you call this waveinstart, the recording starts. Even if the buffer is full, you are not added to the new buffer zone, the recording won't stop, but all the audio data in the middle is lost. After the buffer sent through waveinaddbuffer is fully recorded, Windows calls back the recorded voice data in the callback method specified in waveinopen, if you want to continue recording, add the next buffer. Considering that this processing has a time delay and the audio is very time sensitive,

Generally, you have to pre-add several buffers first. Someone suggested that, for example, a total of eight buffers are defined, and for the sake of security, it is best to ensure that at any time there are at least three buffers that can be used by the recording. Then, when starting the recording, add four buffers, and then in the callback, if the number of currently recorded buffers is N, waveinaddbuffer is called for the number (n + 4) % 8, and the number (n + 1) % 8, (n + 2) the three buffers % 8, (n + 3) % 8 are available, which basically ensures that there is no disconnection interval in the recorded audio. For example, add 0, 1, 2, and 3 first. When 0 is good, call waveinaddbuffer for 4, 5, 6, and 7.

Why not: Put all eight in the buffer at the beginning, call the callback when a buffer is full, reuse the buffer immediately after processing, and add it to the buffer queue. It is not simpler and clearer. As follows:

Mmreturn =: waveinprepareheader (m_hrecord, phdr, sizeof (wavehdr); // prepare

Mmreturn =: waveinaddbuffer (m_hrecord, phdr, sizeof (wavehdr); // Add

Note that the two steps are handled in callback or in the county.

C)

When you want to end the recording, you 'd better call waveinreset before waveinclose to clear the buffer that is waiting for the recording. The common problem here is that the waiting buffer is cleared, but what should you do if the buffer is in use? If waveinclose is used at this time, the system will fail. Solution 1: In the callback function, note that after a buffer zone is full, do not use waveinaddbuffer to increase the cache. When the buffer zone uses 1, call waveinreset to clear the buffer zone waiting for the recording and continue waveinclose.

To sum up the above three points of attention: Select callback, pay attention to the principle of the buffer, and pay attention to the processing of the end

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.