The river mining Furong--Audio Video collection detailed (source code gift)

Source: Internet
Author: User

For OMCS , acquisition is the first link. The so-called "paddle", first we have to get first-hand from the multimedia equipment. For audio, it is to obtain the audio signal from the microphone, sound card and other devices, in the case of video, from the display, camera and other devices to obtain image information. How do I get to this information? So we need to collect.

The so-called acquisition can be split into sampling and aggregation of two steps to understand. This is like "Cao Cong", the first large total decomposition into a sample, and then the decomposition of the sample to be integrated, the idea is to use a sample to simulate the overall.

(1)For video, the acquisition process is as shown

We know that the video is always continuous, but we can decompose it into a picture, that is, the picture frame, then we will get these pictures in accordance with the corresponding timing to play can basically revert to the original video.

Here, there are a few important quantities that we need to focus on.

A. Resolution

First, we should look at the size of each sample, because the size of the sample means the amount of data it contains, and a sample with a large amount of data can reflect the population more than a sample with a small amount of data. Taking the example in the picture, we take a continuous picture of the takeoff of a pigeon, but we use high pixels to shoot and shoot with low pixels, and when we look at it continuously, the effect is definitely not the same. Because the amount of data per image frame will affect the sharpness of the resulting video. And the size of this image frame is expressed by what amount? This amount is the resolution . The higher the resolution of the image, the more pixels it contains, the greater the amount of data contained, and the more it will reflect the original image.

B. Frame frequency

Second, the original video is continuous, and the images we collect are discrete. Then this degree of dispersion will certainly affect the effect of the resulting video. If the image is too discrete, then the resulting video will look like a card, and if the degree of dispersion is small enough, the resulting video will appear smooth and natural. So the degree of dispersion between samples affects the smoothness of the video. And what is the amount of this dispersion to be expressed? This is the frame rate. Frame rate for the acquisition is the number of frames per second to capture the image, naturally, the greater the frame rate, the more smooth the picture.

(2)For audio, the acquisition process can be similarly used to illustrate

Similar to the acquisition of video, the acquisition of audio also requires the continuous decomposition of the whole into discrete samples, and then "Cao Cong", assembled into a whole.

Similarly, there are several important quantities in the audio collection that need our attention.

A. Sample depth

Similar to video capture, we need to focus on the size of the data volume for each sample. For audio acquisition, what is the size of the data volume of the sample represented? We use the sampling depth to indicate that the so-called sampling depth can also be called the number of sample bits, that is, the number of bits per sampled sound data. Similarly, the size of the sample depth also affects the sharpness of the resulting audio. If the number of sample bits is too low, the resulting audio will sound ambiguous.

B. Sample rate

Similar to video capture, we also need to focus on the degree of dispersion between samples. For audio acquisition, this dispersion is represented by the sampling rate , which is the number of samples collected per second. The size of the sampling frequency affects how smooth the resulting audio is. If the sampling frequency is too low, the sound will be stuttering.


Whether it is video capture or audio capture, in the final analysis are sample collection, and we collect the purpose is to use a sample to simulate the overall, as to the quality of this simulation is determined by two factors, one is the size of the data of each sample, and the second is the density between samples. That is, the larger the sample data, the greater the density between the samples, the more it can represent the overall, the more it can reflect the overall original appearance. So, we can give such a general formula--

The sample reflects the overall effect = size of the individual sample data X the density between samples

Then, for video capture and audio acquisition, the following two sub-formulas can be given-

1. Effect of the resulting video = Resolution X frame rate

2. Effect of the resulting audio = sample depth × sample rate

Attached: Sample demo (with recording)

Reference:"Talking about network speech technology"

A brief discussion on Network voice video technology (with multiple demo source download)

A brief discussion on network speech technology (II.)--How to guarantee real-time and fluency?

The river mining Furong--Audio Video collection detailed (source code gift)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.