Basic audio-related knowledge

Source: Internet
Author: User
Tags value of pi what bit

Recent projects need to deal with audio, so the online collection of some basic audio-related knowledge, organized as follows

The sound in nature is very complex, the waveform is extremely complex, usually we use pulse Code code modulation code, namely PCM coding. PCM encoding converts a continuously varying analog signal to a digital signal by sampling, quantization, and encoding three steps

Sampling (sample)

Digital audio system is a The waveform is converted into a series of binary data to reproduce the original sound (the original sound is an analog signal), The device used to implement this step is a modulo/number converter (A /d converter, or ADC, or analog to digital convert). is called a sample. A series of samples are connected, you can describe a sound wave, the number of samples per second is called sampling frequency or yield, in Hz (Hz). The higher the sampling frequency, the higher the acoustic frequency can be described. The sampling rate determines the range of the sound frequency (equivalent to the pitch) and can be represented by a digital waveform. The frequency range represented by a waveform is often referred to as bandwidth. To correctly understand audio sampling can be divided into the number of bits sampled and the frequency of sampling.

sample bits ( sampling accuracy )

The sound files in the computer are represented by the numbers 0 and one. So the essence of recording on a computer is to convert analog sound signals into digital signals. Conversely, the digital signal is restored to the analog sound signal output when playing. the number of sample bits can be understood as the resolution of the Acquisition card processing sound. The higher the value, the higher the resolution, and the more realistic the recording and playback sounds. the bit of the acquisition card refers to the number of bits of the digital sound signal used by the acquisition card when collecting and playing the sound file. The bit of the acquisition card objectively reflects the accuracy of the digital sound signal to the input sound signal description. The 8-bit representation of 2 of the 8-square---256,16 bit represents 2 of the 16-square--64k.

Sample rates (sample rate)

sampling frequency refers to the number of times a recording device samples a sound signal in a second, and the higher the frequency of the sound, the more realistic it is to restore it. In today's mainstream capture card, the sampling frequency is generally divided into 22.05KHz, 44.1KHz (44100Hz), 48KHz three levels, 22.05 khz can only achieve FM broadcast sound quality, 44.1KHz is the theoretical CD quality limits, 48KHz is more accurate. For sampling frequencies above 48KHz The human ear cannot be identified, so there is not much value on the computer.
5khz sample rate can only be achieved The sound quality of people's speech. &NBSP,
The 11kHz sample rate is the lowest standard for playing small segments of sound, and is One-fourth of the CD quality. &NBSP,
22kHz sample rate of sound can reach half the CD quality, most of the current Web sites have chosen such sampling rate. &NBSP,
The 44kHz sample rate is the standard CD sound quality, which can achieve very good hearing effects.  

Number of channels (channel)

Divided into mono mono, stereo stereo. There are, of course, more channel numbers. For example, the channel is many, the effect is good, two channels, indicating that only the left and right side of the voice transmitted over, four-channel, explained before and after the voice passed over

Bit rate (bitrate)

Also called bit rate. For encoding format, indicates the amount of audio data per second after compression encoding. Calculation formula: Bit rate = sample rate × Sample accuracy X number of channels. Unit kbps, where K is 1000.

VBR, ABR, CBR

VBR (Variable bitrate) dynamic bit rate. that is, there is no fixed bitrate, and compression software instantly determines what bit rate to use based on audio data when compressing. This is the Xing development algorithm , they will be a song of the complex parts of high bitrate encoding, simple part with low bitrate encoding. Although the idea is good, unfortunately Xing Encoder VBR algorithm is poor, sound quality and cbr far from. Fortunately, lame perfectly optimizes the VBR algorithm, making it the best coding pattern for MP3. This is the recommended encoding mode when quality is the basis for file size.

abr (Average bitrate) average bitrate is an interpolation parameter for VBR. Lame this encoding pattern for CBR's poor file volume ratio and variable VBR generation file size. The ABR is also referred to as "Safe VBR", which is within the specified average bitrate, to every 50 frames (30 frames for about 1 seconds) for a segment, low frequency and insensitive frequencies using relatively low flow rates, high frequency and large dynamic performance when using a higher flow rate. For example, when you specify a WAV file to be encoded with a 192kbps ABR, lame encodes 85% of the file in 192kbps, and then dynamically optimizes the remaining 15%: The complex part is encoded with more than 192kbps, and the simple part is encoded below 192kbps. Compared with 192kbps CBR, the 192kbps ABR has little difference in file size, and the sound quality is improved a lot. The ABR code is 2 to 3 times times the speed of the VBR code, and the quality is better than CBR in the 128-256kbps range. Can be used as a tradeoff for VBR and CBR.

CBR (Constant bitrate), constant bitrate, refers to the file is a bit rate from beginning to end. Compared to VBR and ABR, it compresses the file volume is very large, but the sound quality will not be significantly improved.

Lossy and Lossless

According to the process of sampling and quantification, the audio coding can only achieve the infinite close to the nature signal, at least the current technology is not likely to be exactly the same. This is because the signals in nature are continuous, and the values after the audio encoding are discrete. Therefore, any digital audio coding scheme is lossy, which means that no audio can completely restore the sound of nature.

In computer applications, PCM coding can achieve the highest fidelity level. It has been widely used in material preservation and music appreciation, including CDs, DVDs, WAV files and so on. As a result, PCM has a conventional lossless encoding, but this does not mean that the PCM will be able to ensure that the signal is absolutely true, and the PCM can only achieve maximum proximity.

We habitually put MP3 in the category of lossy audio coding, which is relative to PCM coding.

It is very difficult, even impossible, to emphasize the loss and damage of the relative nature of the coding, and to achieve real damage. Just as we use decimals to express pi, no matter how high the decimal precision is, it can only be approached infinitely, not really equal to the value of pi.

Why use audio compression technology

To calculate the bitrate of a PCM audio stream is a very easy thing to do, sample rate value x sample size value x channel number bps. A sample rate of 44.1KHz, the sample size of 16bit, two-channel PCM encoded WAV file, its data rate is 44.1kx16x2=1411.2 Kbps. We often say that the 128K MP3, the corresponding WAV parameter, is this 1411.2Kbps, this parameter is also called the data bandwidth, it and ADSL bandwidth is a concept. By dividing the bitrate by 8, you can get the data rate of this WAV, which is 176.4kb/s. This means that the storage of a second sampling rate of 44.1KHz, sampling size of 16bit, two-channel PCM encoded audio signal, the need for 176.4KB of space, 1 minutes is about 10.34M, which is unacceptable to most users, especially like listening to music on the computer friends, to reduce disk occupancy, There are only 2 ways to reduce the sampling indicator or compression. Reducing the indicators is undesirable, so experts have developed a variety of compression schemes. Due to the different use and target market, various audio compression coding achieves the same sound quality and compression ratio, which we will mention in the following article. One thing is for sure, they have all been compressed.

The relationship between frequency and sampling rate

The sample rate represents the number of times the original signal is sampled per second, and the sample rate of the audio file we often see is 44.1KHz, what does that mean? Suppose we have 2 sine wave signals, 20Hz and 20KHz, each for a second, to correspond to the lowest and highest frequencies we can hear, and to sample 40KHz of these two signals separately, what kind of results can we get? The result: The 20Hz signal is sampled 40k/20=2000 times per vibration, while the 20K signal is sampled only 2 times per vibration. Obviously, at the same sampling rate, the information recorded at low frequencies is much more detailed than the high frequencies. That is why some audio enthusiasts accuse CDs of having a digital sound that is not real enough, and 44.1KHz sampling of CDs does not guarantee a better recording of high-frequency signals. To better record the high-frequency signal, it seems to need a higher sampling rate, so some friends in the capture of CD tracks when the use of 48KHz sampling rate, this is not advisable! This is not really good for the sound quality, and for the capture software, keeping the same sample rate as 44.1KHz from the CD is one of the best quality guarantees, rather than improving it. Higher sampling rates are only useful when compared to analog signals, and if the sampled signal is digital, do not attempt to increase the sampling rate.

PCM encoding

PCM Pulse-coded modulation is the abbreviation for the Pulse codemodulation. The previous text we mentioned the PCM general workflow, we do not need to care about what the PCM final code is calculated, we only need to know the PCM encoded audio stream advantages and disadvantages. The biggest advantage of PCM coding is good sound quality, the biggest drawback is the large size. Our common AUDIOCD uses PCM encoding, and the capacity of a disc can only hold 72 minutes of music information.

WAVE

This is an ancient audio file format developed by Microsoft. WAV is a file format that complies with the PIFF Resource Interchange FileFormat specification. All WAV has a file header, which is the encoding parameter of the header audio stream. WAV does not have a hard-coded audio stream encoding, except for PCM, and almost any code that supports the ACM specification can encode WAV audio streams. Many friends do not have this concept, we take AVI to do a demonstration, because AVI and WAV in the file structure is very similar, but AVI more than a video stream. We are exposed to many kinds of AVI, so we often need to install some decode to watch some avi, we touch more DivX is a video encoding, AVI can use DivX encoding to compress the video stream, of course, can also use other encoding compression. Similarly, WAV can also use a variety of audio encoding to compress its audio stream, but we are common is the audio stream is PCM encoding processing of WAV, but this does not mean that WAV can only use PCM encoding, MP3 encoding can also be used in WAV, and AVI, as long as the corresponding Decode installed, You can enjoy these wav.
Under the Windows platform, PCM-encoded WAV is the best supported audio format, all audio software can be perfectly supported, because it can achieve high sound quality requirements, therefore, WAV is also the preferred format for music editing, suitable for saving music material. As a result, PCM-encoded WAV is used as an intermediary format, often in the conversion of other encodings, such as MP3 to WMA.

MP3 encoding

 mp3 as the most popular audio compression format, for everyone's acceptance, a variety of MP3-related software products emerge, And more hardware products are also beginning to support MP3, we can buy a lot of Vcd/dvd players can support MP3, there are more portable MP3 players and so on, although a few major musicians are extremely disgusted with this open format, but also can not prevent this audio compression format survival and spread. MP3 Development has been 10 years, he is MPEG (mpeg:moving picture experts Group) AudioLayer-3 abbreviation, is MPEG1 's derivative coding scheme, 1993 by the German Fraunhoferiis Research Institute and Thomson company to develop success. MP3 can do 12:1 of the amazing compression ratio and maintain the basic audible sound quality, in the year of hard disk days, MP3 quickly accepted by users, with the popularity of the network, MP3 by hundreds of millions of users to accept. At the beginning of the release of MP3 coding technology is very imperfect, due to the lack of sound and human ear auditory research, the early MP3 encoder almost all in a rough way to encode, the sound quality is seriously damaged. With the continuous introduction of new technologies, MP3 coding technology has been improved once, with 2 significant technical improvements. The
 vbr:mp3 format file has an interesting feature, which is that it can be read side-by-side, which also conforms to the most basic characteristics of streaming media. This means that the player can play without the full contents of the pre-read file and read where it is played, even if the file is partially damaged. Although MP3 can have a file header, it is not important for files in the MP3 format, and because of this feature, it is determined that each frame of the MP3 file can have a separate average data rate, without the need for a special decoding scheme. So there is a technology called VBR (variablebitrate, Dynamic Data rate), can let MP3 file every paragraph even every frame can have a separate bitrate, the advantage is to ensure the quality of the premise of the maximum limit the size of the file. The superiority of this technique is obvious, but it is difficult to use it, because it requires the encoder to know how to allocate bitrate for each section, which is a fake technique for encoders without waveform analysis. It is true that VBR technology does not appear to be dazzling.

Through long-term acoustic studies, experts found that the human ear has a masking effect. The sound signal is actually a kind of energy wave, in the air or other medium transmission, the ear of the sound energy is the most direct response to loudness or sound pressure is to hear the size of the sound, we call it loudness, the loudness of this energy is expressed in decibels (db). Even with the same loudness, people will feel different sizes of sounds because of their frequency. The most easily heard in the ear is the frequency of 4000Hz, regardless of whether the frequency is higher or lower, even if the loudness in the same situation, everyone will feel that the sound is getting smaller. But the loudness drops to a certain extent, the human ear can not hear, each frequency has a different value.

You can see that this curve basically into a V-shaped, when the frequency of more than 15000Hz, the ear will feel the sound is very small, a lot of hearing is not very good people, simply can not hear the frequency of 20000Hz, regardless of the loudness of how big. When the human ear hears at the same time two different frequencies, the different loudness sound, the small loudness also can be ignored, for example: in the daytime we can not hear the cooling fan sound in the computer, the night becomes the noise source, according to this principle, the encoder may filter out many inaudible sounds, simplifies the information complexity, increases the compression ratio, Without noticeable reduction in sound quality. This masking is called the simultaneous masking effect. But sound a is obscured by sound B, and if a is in the masking range of center B, the shading is more pronounced, and this range is called the critical bandwidth. The critical bandwidth of each frequency is different, and the higher the frequency, the wider the critical bandwidth.

Frequency (Hz) critical bandwidth (Hz) frequency (Hz) critical bandwidth (Hz)
50 80 1850 280
150 100 2150 320
350 100 2500 380
450 110 3400 550
570 120 4000 700
700 140 4800 900
840 150 5800 1100
1000 160 7000 1300
1170 190 8500 1800
1370 210 10500 2500
1600 240 13500 3500

According to this effect, the experts designed the human auditory psychological model, which was imported into the MP3 code, resulting in a revolution in the quality of the sound, MP3 coding technology has been saddled with poor sound reputation, but the stigma has now gradually been eluted. At this time, has been buried by the VBR technology brilliance, with the use of psychological model of the reality of a strong temptation and lethality.

For a long time, many people are not good impression of MP3, more people think the best sound quality WMA is better than MP3, this argument is not correct, in the high code rate, coding the right MP3 is much better than WMA, can be very close to the CD quality, in the less well supported by hardware devices, not many people can distinguish between the difference between the two, This is not a myth, although you can easily distinguish between MP3 and CDs in the past, but now you cannot guarantee that you can tell the difference correctly. Because MP3 is excellent coding, it was buried before.

Basic audio-related knowledge

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.