Audio (quad)-Audio compression (Speex using &opus Introduction)

Last Update:2016-10-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copyright NOTICE: This article is original article, not allowed to reprint without permission

Blog Address: http://blog.csdn.net/kevindgk

GitHub Address: Https://github.com/KevinDGK/MyAudioDemo

A Brief introduction
Two LAN Voice configuration
Three Speex
- 1 Introduction
- 2 Technical Features
- 3 Development-Voice compression
- 4 related calculations
Four opus-The Swiss Army knife in audio codec
- 1 Introduction
- 2 Technology
- 3 Developing Plugins
- 4 Version Information
  - Libopus 113 Stable Release version
- 5 contrast
- 6 Module API Documentation
- Opus Encoder
  - Type definition
  - Method
  - Detailed description
  - Type definition Document
  - Method document
- Opus Decoder
Summary of Five
Contact information

First, Introduction

Now there is a need for real-time voice in the LAN, the Transport layer protocol using the UDP protocol, if directly using Audiorecord to record the audio stream and send to the other end for playback, the sound quality will be very poor, and intermittent, for the following reasons:

Sampling frequency: FM = 44.1KHz

Quantization number of digits: 16bit

Channel configuration: 2 (Dual channel)

Then, the bitrate V = 44.1K * 1411.2 Kbps = 176.4KBps, i.e. the transfer rate per second is approximately 176.4KB,

If the audio frame time is 20ms, each audio packet is size = 176.4KBps * 0.02s = 3.528KB,

In general, each time we read the data of an audio frame, we can take the whole to 3600Byte,

Therefore, approximately 176.4/3.6=49 packets per second are sent, each packet size is 3.6KB.

If the data header is considered, the measured data is sent about 45 packets per second, and the transmission rate is approximately 180KB per second.

Due to the general use of mobile phone connection to WiFi, which requires network quality and hardware equipment must be good, and channel interference is weak, and linked devices can not be too many. As long as the number is not good, it will lead to a particularly high drop rate, and the delay is very large, can not meet the needs of communication. In this case, we need to do speech compression, noise reduction and other processing.

Second, LAN voice configuration

If the transmission is only voice information, then do not need a high sampling frequency, you can use 8KHz for sampling, single channel.

    Private intDefault_samplerateinhz =8000;//Sampling frequency    Private intDefault_audioformat = Audioformat.encoding_pcm_16bit;//Data format    Private intDefault_streamtype = Audiomanager.stream_music;//Audio type    Private intDefault_channelconfig_out = Audioformat.channel_out_mono;//Channel configuration    Private intDefault_mode = Audiotrack.mode_stream;//Output mode    Private intdefault_channelconfig_in = Audioformat.channel_in_mono;//Channel configuration    Private intDefault_audiosource = MediaRecorder.AudioSource.MIC;//Audio source

Sampling frequency: 8KHz, can collect the more complete voice information. Of course, the high-frequency information is powerless;
Data format: 16bit, can be more detailed representation of the amplitude of the sound;
Channel configuration: Mono input and output, can be adapted to all models, a small number of mobile phones do not support two-channel (stereo), if the stereo will only appear on the left ear machine has a sound situation;
Audio type: Music streaming
Output mode: Audio stream
Audio Source: Microphone

Third, Speex

Official website

3.1 Introduction

Speex is a set of audio compression formats primarily for voice-free, open-source, non-proprietary protection. The Speex project focuses on reducing the input threshold for voice applications by providing an alternative to high-performance speech codecs. In addition, relative to other codecs, Speex is also very suitable for network applications, in the network application has its own unique advantages. At the same time, Speex is part of the GNU Project and is well supported in the revised BSD protocol.

3.2 Technical Features

Speex is based on the CELP and is specifically designed for bitrate in 2-44kbps speech compression. It is characterized by:

Narrowband (8kHz), Broadband (16KHZ) and ultra-wideband (32KHZ) are compressed in the same stream.
Enhanced Stereo coding
Packet loss Concealment
Variable bit rate (VBR)
Voice Capture (VAD)
Non-continuous transmission (DTX)
Fixed-point arithmetic
Sensory echo Cancellation (AEC)
Noise Shielding

3.3 Development-Voice compression

Because the low-level code of speech compression is written in C/s, so for our poor Android, we need to use the NDK for JNI development, if you do not understand this, you can refer to small compilation of JNI (a)-Android studio Simple development process , and then need to understand some of the basic knowledge of C language, you can do a simple JNI development, in this demo, small series also wrote a lot of annotations, I hope to help everyone.

GitHub address for this project

To integrate code into your own project:

The first step: Copy the entire JNI directory of the demo to its main directory;

Step Two: Modify the method name in Speex_jni.cpp in the copied Jni folder

? The method name is Java _ Package Name _ Class Name _ Method name (), the middle using a single underline connection, see Demo;

Step three: Add Gradle configuration

? Copy the red box around the area to your project.

Fourth step: Compile the project build. So file

? Select Build->make Project, and then locate the. So library and copy it to your own libs:

Fifth step: Use in the program

? The Speex this codec tool class copied to their own projects, you can use the normal, detailed use of the demo.

private Speex speex;                    // Speex音频编解码器new Speex();                    // 创建Speex编解码实例speex.open(4);speex.encode(recordData,0// 语音压缩-对音频数据编码int decode = speex.decode(audioData, decodedShorts, audioData.length);  // 对音频数据解码

3.4 Related calculations

Sampling frequency: FM = 8KHz

Quantization number of digits: 16bit

Channel configuration: 1

Then, the bitrate V = 8K * 1 * = 128Kbps = 16KBps, i.e. the transmission rate per second is approximately 16KB,

If the audio frame time is 20ms, the audio data per frame is size = 16KBps * 0.02s = 320KB, that is,

Set the compression quality to 4, after each frame of audio data compression is only 20Byte, compression ratio is 320:20, that is, 16:1, send 1/0.02=50 packets per second, that is, the transmission of audio data alone consumes 1 KBps, if you need to add some data header, It can basically be maintained at around 5KBps!

Turn 176.4KBps into 16KBps, then compress to 1KBps, OK, now you can meet the basic LAN voice transmission needs. With headphones on, it's basically a perfect voice call. Without headphones, as the call, microphone or environment echo will become stronger, will seriously reduce the call quality, we need to do is to do echo processing. This is specifically introduced in the blog post.

Although Speex was also excellent, he was kicked off the throne by new technology, and even its official web site stated that the technology had been replaced by opus, and that Opus's performance would be much better than Speex's. The next small series will give you a preliminary introduction to the characteristics of opus and api~

Iv. Introduction to the Swiss Army knife in Opus-Audio codec 4.1

Opus is a fully open, free, multi-function audio codec. It has unparalleled advantages in the transmission of Interactive voice and music in the Internet, but is equally committed to storing and streaming applications. It is a standard developed by the Internet Engineering Task Force (IETF), the standard format for RFC 6716, developed by the merger of the Skype Silk codec and the Xiph.org Celt codec, the Swiss Army knife (from the official video) known as the audio codec.

Official website: http://opus-codec.org/

4.2 Technology

Opus can handle a wide range of audio applications, including IP telephony, video conferencing, in-game chats, and even remote live music performances. It can be applied from low bit-rate narrowband voice to very high-quality stereo music. Technical Features:

6 kb/s to 510 kb/s bit rate
Sample rate from 8 khz (narrow band) to Quanping khz
Frame size from 2.5 ms to 60 MS
Supports constant bit rate (CBR) and variable bit rate (VBR)
Audio bandwidth from narrowband to full-band
Support for voice and music
Supports mono and stereo
Supports up to 255 channels (majority stream frames)
Adjustable bitrate, audio bandwidth and frame size dynamically
Good robustness loss rate and packet loss concealment (PLC)
Floating-point and fixed-point implementations

You can read the complete specification in the RFC 6716 standard, including the reference implementation. You can also get an up-to-date opus standard on the download page.

Libopus is a reference implementation of the Opus codec and can be developed with reference to this code.

4.3 Developing plugins

In order to be able to support Opus,mozilla in Firefox, a dedicated Opus tool is available. Opus-tools provides a command-line program for encoding, checking, and decoding. Opus files. In the HTML of the voice-related can be used, the original is not Android, need to use their own to the official homepage to download.

Small translation to here has not been found and Android-related, blue thin, shiitake mushrooms!

4.4 Version Information

Although Opus is now specified by the IETF, the implementation of Opus will continue to improve. Of course, all future versions will still fully conform to the standard IETF specification. The latest development version information can be viewed in the development interface.

Libopus 1.1.3 (Stable release version)

Opus-1.1.3.tar.gz

4.5 Contrast

Small series once said, no comparison, there is no harm.

Quality and bit rate

The function relation of the quality and bit rate of different codec is explained.

Narrowband-Narrow Band

Wideband-Broadband

Super-wideband-Ultra Broadband

Fullband-Full Band

Fullband Stereo-Full band Stereo

As you can see from the graph, the advantages of opus are very obvious. In particular, compared to the previous Speex, there is a greater bitrate range and bandwidth.

Bit Rate/latency comparison

As you can see, Opus has less latency at any bit rate.

Hearing test

Opus has been tested several times, but only a few test results based on Bitstream are listed below. While the Opus release should give a good idea of quality standardization, we hope that newer and more advanced encoders will achieve better quality.

Test results

Audio test Cases

4.6 Module API documentation

As of the current location, the latest version is the 1.13 stable release, so this version of the API is translated here.

4.6.1 Opus Encoder

Opus Encoder

Type definition

typedefstruct OpusEncoder OpusEncoder  Opus编码器，包含了编码器的全部状态。

Method

-intOpus_encoder_get_size (intChannels) Gets the size of a Opusencoder encoder. -Opusencoder * Opus_encoder_create (Opus_int32 Fs,intChannelsintApplication,int*error) Allocates and initializes an encoder state. -intOpus_encoder_init (Opusencoder *st, Opus_int32 Fs,intChannelsintapplication) Initializes the encoder pointer to an in-memory encoder before it is assigned, and must return the minimum memory size by using the Opus_encoder_get_size () method. -Opus_int32 Opus_encode (Opusencoder *st,ConstOpus_int16 *PCM,intFrame_size,unsigned Char*data, Opus_int32 max_data_bytes) encoded. -Opus_int32 opus_encode_float (Opusencoder *st,Const float*PCM,intFrame_size,unsigned Char*data, Opus_int32 max_data_bytes) encode the audio stream. -voidOpus_encoder_destroy (Opusencoder *st) releases a opusencoder assigned through Opus_encoder_create (). -intOpus_encoder_ctl (Opusencoder *st,intRequest,...) Perform the CTL function in the Opus Encoder.

Detailed description

Through official document interpretation and the C-language Foundation, we know that Opusencoder *enc represents a pointer to a Opus encoder structure that points to the memory of the encoder, which contains the entire state of the encoder. There is no concept of classes and objects in C, but there are structs that can be used to emulate classes in Java, so instances of structs can be compared to objects. From now on, I am called enc as the Encoder object (the pointer), so it is more customary to say.

Because Opus is a stateful encoding, the encoding process starts with creating an encoder state:

int error;OpusEncoder *enc; enc = opus_encoder_create(Fs, channels, application, &error);   // 创建

From this point, enc can be used to encode an audio stream. An encoder can only be used for one audio stream encoding at a time, and the encoder state initialized for each audio format cannot be initialized again.

After executing opus_encoder_create () allocates memory for the encoder, the pre-allocated memory can be initialized:

intint error; OpusEncoder *enc; size = opus_encoder_get_size(channels);     // 获取需要的最小内存malloc(size);                         // 分配内存error = opus_encoder_init(enc, Fs, channels, application);  // 初始化内存

Opus_encoder_get_size () returns the memory size required by the Encoder object, note that the code may change to a memory size in future versions, so do not assume that the code has some logical processing based on the method of acquiring the memory size. Because the later version changes may affect your code.

The state of the encoder is kept in memory, and only shallow replication can effectively replicate the state, for example: memcpy ().

Use the Opus_encoder_ctl () interface to change some encoder settings. All settings are set to recommended values by default, so change them only when necessary. The most common settings you want to change are as follows:

opus_encoder_ctl(enc, OPUS_SET_BITRATE(bitrate));opus_encoder_ctl(enc, OPUS_SET_COMPLEXITY(complexity));opus_encoder_ctl(enc, OPUS_SET_SIGNAL(signal_type));

Bitrate: Bit rate, b/s
Complexity: Complexity, 1-10,1 minimum, 10 max
Signal_type: Signal type, can be Opus_auto (default), Opus_signal_voice, Opus_signal_music

View CTLs related codes to get a complete set of parameter lists, most of which can be set and changed at any time in an audio stream.

In order to encode a frame of data, opus_encode () or opus_encode_?oat () must use exactly one frame (2.5,5,10,20,40,60 milliseconds) of audio data at the time of the call.

len = opus_encode(enc, audio_frame, frame_size, packet, max_packet);

Audio data in audio_frame:opus_int16 format
Frame_size: The time size of each frame sampled (per channel)
Packet: Compressed encoded byte array to write to
Max_packet: The maximum number of bytes that can be written in a byte array, the recommended size is 4000 bytes, do not use Max_packet to control the variable bit rate, but instead use the Opus_set_bitrate CTL command.

Opus_encode () and Opus_encode_float () return the number of bytes of encoded audio data actually connected to the packet. The return value may be invalid, indicating a coding error. If the return value is less than 2 bytes or smaller, then the packet does not need to be sent out.

Once the encoder object is not needed, it can be destroyed:

opus_encoder_destroy(enc);

If the encoder object was created using Opus_encoder_init () instead of the Opus_encoder_create () method, no additional action is required in addition to the possibility of releasing our manually allocated memory.

Type definition Document

typedefstruct OpusEncoder OpusEncoder编码器结构体，包含了一个编码器所有的状态。它是位置独立的，可以被随意的复制。

Method document

constintunsignedchar? data, opus_int32 max_data_bytes)

Parameters:

Parameters	parameter Type	Enter or exit parameter	explain
St	Opusencoder?	Inch	Encoder Object
Pcm	Const OPUS_INT16*	Inch	The input signal (dual channel is interleaved mode), the length is frame_sizexchannelsxsizeof (opus_int16), that is, the number of Samples x channels x16.
Frame_size	Int	Inch	Input the number of samples per channel of the audio signal, this must be the size of a opus Frame Encoder sample rate. For example, when the sampling rate is 48KHz, the number of samples allowed is 120, 240, 480, 960, 1920, and 2880. Pass an audio data that lasts less than ten MS (480 samples in kHz), and the encoder will not use LPC or mixed mode.
Data	unsigned char?	Out	The output encoding result, which contains at least max_data_bytes bytes.
Max_data_bytes	Opus_int32	Inch	In order to output the memory allocated by the encoding result, it may be used to control an instant bitrate on-line, but should not be controlled as a unique bit rate.

Regarding the relationship between the sampling rate and the number of samples, as mentioned above, Opus_encode () or opus_encode_?oat () must use exactly one frame (2.5,5,10,20,40,60 milliseconds) of audio data at the time of invocation, if the sampling frequency is 48KHz So:

∵ sampling Frequency Fm = 48KHz

∴ sampling Interval T = 1/FM = 1/48000 s = 1/48 ms

∴ when T0 = 2.5 ms, N = t0/t = 2.5/(1/48) = 120,

When T0 = 5.0 ms, N = t0/t = 2.5/(1/48) = 240,

When T0 = Ten ms, N = t0/t = 2.5/(1/48) = 480,

When T0 = ms, N = t0/t = 2.5/(1/48) = 960,

When T0 = ms, N = t0/t = 2.5/(1/48) = 1920,

When T0 = ms, N = t0/t = 2.5/(1/48) = 2880,

That is, when FM = 48KHz:

- + -

sampling Time (ms)	2.5	5	Ten
Number of samples	120	240	480	960	1920	2880

constintunsignedchar? data, opus_int32 max_data_bytes)

4.6.2 Opus Decoder

The translation is not moving ...

In fact, with the above introduction, crossing should be able to have a general understanding, the following posted a Chinese version of the translation of documents, we browse, because at present I also almost read, so small series on the document after the correctness of the chapter is not responsible ~

opus-Official Document Chinese version

V. Summary

If you compare Speex and opus, you will find from the integration and use of the point of view, very similar to the API when the same way, but the function of opus more powerful, more rich API, but also disguised to increase the difficulty of our development, the need for C language has a fairly good foundation, can be customized to achieve their own needs.

The path of programming is a long way to go. First for the purpose of life and programming, and then to go beyond the programming!

Contact information

Email: [Email protected]

：

Audio (quad)-Audio compression (Speex using &opus Introduction)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Audio (quad)-Audio compression (Speex using &opus Introduction)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support