WEBRTC Speech Processing

Source: Internet
Author: User
Tags apm configuration settings strcmp
Cross-platform WEBRTC

WEBRTC is Google Open source of a plug-in real-time video communication technology, which is divided into web development and native development; currently supports Chrome,firefox,android,ios,opera,edge. is a true sense of cross-platform plug-in real-time video communication technology. Video applications are generally based on web-level development. This paper is mainly about the code architecture of WEBRTC native layer speech processing and the carding of a native-layer speech algorithm test program.

Some speech algorithms in the native layer can be used for pre-processing of speech recognition. installation and compilation of the native layer

http://blog.csdn.net/shichaog/article/details/50246155, in order to get the test program, you need to install and compile the native layer code (the source amount is large, may be slow download)

If you want to use only the native layer of speech processing algorithm (AEC,AECM,AGC,NS,VAD, etc.), then you can choose to install the following code base, there are a few points to explain:. This codebase is based on Linux systems, consider Android, please bypass The code does not have a audioproc processing example, you need to write your own test program, you can use CMake, and refer to the Audioproc source program. This code and the latest WEBRTC is not synchronous, half a year is very normal.

Git://anongit.freedesktop.org/pulseaudio/webrtc-audio-processing
native Layer speech algorithm test program

First, the audio algorithm related code in the Modules/audio_processing directory, the directory of all the files are as follows:


After completing the steps in the above connection, you will find an executable test file Audioproc, which can be used to complete the test of the relevant algorithm. The location of the file is:

webrtc-checkout/src/out/debug$./audioproc

The output of this file is as follows using the Help option:

gsc@gsc-250:~/webrtc-checkout/src/out/debug$./audioproc--help usage:process_test [Options] [-PB PROTOBUF_FILE] [-ir

Reverse_file] [-I primary_file] [-o out_file] process_test is a test application for audioprocessing. When a protobuf debug file is available, specify it WITH-PB. Alternately, When-ir or-i is used, the specified files would be processed directly in a simulation mode. Otherwise the full set of legacy test files are expected to being present in the working directory.

Out_file should is specified without extension to support both raw and WAV output. Options General configuration (with used for the simulation mode):-fs sample_rate_hz-ch channels_in channels_out- RCH reverse_channels Component Configuration:all components is disabled by default. Each block below begins with a flag to enable the component with default settings.

  The subsequent flags in the block is used to provide configuration settings. -AEC Echo Cancellation--drift_compensation--nO_drift_compensation--no_echo_metrics--no_delay_logging--aec_suppression_level level [0-2]--extended_filter
  --NO_REPORTED_DELAY-AECM Echo Control Mobile--aecm_echo_path_in_file file--aecm_echo_path_out_file file --no_comfort_noise--routing_mode mode [0-4]-AGC Gain control--analog--adaptive_digital--fixed_digit Al--target_level level--compression_gain gain--limiter--NO_LIMITER-HPF High pass Filter-ns No Ise suppression--ns_low--ns_moderate--ns_high--ns_very_high--ns_prob_file file-vad Voice activity D Etection--vad_out_file File-expns Experimental Noise suppression level metrics (enabled by default)--no_leve
  L_metrics Modifiers:--noasm Disable SSE optimization.
  --add_delay Delay Add delay ms to input value.
  --delay delay Override input delay with delay Ms.
  --perf Measure performance.
  --quiet suppress text output. --no_Progress suppress progress.
  --raw_output raw output instead of WAV file.
 --debug_file file Dump a debug recording.

speech Algorithm compilation relationship

In this directory there is a BUILD.GN file, which specifies the compiled rules and targets and the source files that generate the target, here you can see how audioproc is generated, and what other test programs are available in the directory. Some of the objectives of the document are listed below:

Rtc_static_library: Compile generates Static library
rtc_executable: An executable program is generated, which indicates the generated executable program,
The following are the executable programs and their dependent source program relationships
audioproc:test/process_test.cc

unpack_aecdump:test/unpack.cc

audioproc_f:test/aec_dump_based_ simulator.cc;test/audio_processing_simulator.cc;test/audioproc_float.cc;test/wav_based_simulator.cc

transient_suppression_test:transient/transient_suppression_test.cc

nonlinear_beamformer_test:beamformer/ nonlinear_beamformer_test.cc

intelligibility_proc:intelligibility/test/intelligibility_proc.cc

This article is not intended to cover specific algorithms, but to comb the code, so here's a process_test.cc to see the most important APM (Audio processing module). process_test.cc (this file is not in the latest WEBRTC)

1141}  //namespace
1142}  //namespace WEBRTC
1143 
1144 int main (int argc, char* argv[]) {
1145< C6/>webrtc::void_main (argc, argv);
1146 
1147   //Optional, but removes memory leak noise from Valgrind.
1148   Google::p rotobuf::shutdownprotobuflibrary ();
1149   return 0;
1150}

This global main function calls the Void_main function in the WEBRTC command space.

48 lines in this cc file

48:namespace WEBRTC {...
143//void function for gtest.
144 void Void_main (int argc, char* argv[]) {...
Create Key Objects 155   std::unique_ptr<audioprocessing> APM (Audioprocessing::create ());
183   Audioprocessing::config Apm_config; ...//aec mode enable 227    } else if (strcmp (Argv[i], "-aec") = = 0) {228&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NB Sp
Assert_eq (Apm->knoerror, Apm->echo_cancellation ()->enable (true)); 229       assert_eq (Apm->knoerror, 230                  apm->echo_cancellation ()->enable_metrics (
true)); 231       assert_eq (Apm->knoerror, 232                  apm->echo_cancellation ()->enable_delay_
Logging (true)); ...//This is the echo suppression factor, divided into low, moderate and high three modes 258    } else if (strcmp (argv[i], "--aec_suppression_level") = = 0) {259       i++; 260     & nbsp
Assert_lt (i, argc) << "Specify level after--aec_suppression_level";
261       int suppression_level;
262       assert_eq (1, sscanf (Argv[i], "%d", &suppression_level)); 263       assert_eq (Apm->knoerror, 264                  apm->echo_cancellation ()->set_suppression_ Level (265                      static_cast<webrtc::echocancellation::suppressionlevel> (266                      
    suppression_level));
...

458   Apm->applyconfig (apm_config); ForIntelligent Voice speaker, near-end signal is the voice of the speaker to his speech, the remote signal is the voice of the smart speaker itself//for VoIP, the near-end signal is the person to the computer to speak, computer-acquired voice, the remote signal is the other end of the VoIP voice signal.

767           err = Apm->processstream (&near_frame); Perform remote processing, here is the echo cancellation, there is also a delay estimate, temporarily skipped 691           Assert_eq ( Apm->knoerror, 692               


      Apm->processreversestream (&far_frame));
 1159} //namespace 1160} //namespace WEBRTC
As can be seen here, is to execute the Void_main function here, the function of the implementation of the PB format file, pb format file is Google's protocol buffer protocol format file, and PB-related can choose to skip. The naming of other variables is self-explanatory.
The method of creating an APM module on line 155 can also be used in a clearer way:

    webrtc::audioprocessing* APM = Webrtc::audioprocessing::create ();
As you can see above, the audioprocessing in the WebRTC namespace is a key class.

Audioprocessing class

This class is defined in <include/audio_processing.h> as the header file to be included by the application (the algorithm implementation is compiled into a library), with nearly 300 rows, but since this class is a bridge of communication algorithms and applications, it is very important to So it's all expanded here, but some of the algorithm implementations of the call are not expanded. This class inherits from the Rtc::refcountinterface class in the latest WEBRTC.

This class includes several components of real-time speech processing, which is based on frame-by-box processing, and the main frame (as the frame parameter in Processstream () brackets) is processed by all the enabling components (the algorithm is actually processstream the near-end signal). The Processreversestream () method is to process the remote signal by frame. This module is usually placed in the HAL or under the application layer (in fact it does not have a relationship with the application, if it is really to be and the application to make, it is also best through the socket or other communication mode of time, rather than directly embedded application code).

All components are disable at the time of creation and are tuned on the basis of the default setting values to trigger memory allocation and initialization when a component is enabled.

Lock-free thread safety relies on the following conditions: Stream gets and sets in the same processstream () thread, it is not allowed to operate in multiple threads. Parameter get and set cannot be called concurrently

APM accepts only 10ms of data, Int16 is a cross-data arrangement, and the float interface is non-cross-arranged.

An example of use is:

   197//Usage example, omitting error checking:198//audioprocessing* APM = audioprocessing::create (0);
   199////Audioprocessing::config Config;
   201//config.level_controller.enabled = true;
   202//Apm->applyconfig (config) 203//204//Apm->high_pass_filter ()->enable (true);
   205//206//Apm->echo_cancellation ()->enable_drift_compensation (false);
   207//Apm->echo_cancellation ()->enable (true);
   208//209//Apm->noise_reduction ()->set_level (khighsuppression);
   Apm->noise_reduction//()->enable (true);
   211//212//Apm->gain_control ()->set_analog_level_limits (0, 255);
   213//Apm->gain_control ()->set_mode (Kadaptiveanalog);
   214//Apm->gain_control ()->enable (true);
   215//216//Apm->voice_detection ()->enable (true);
   217//218////Start a voice call ... 219//220///...
   Render frame arrives bound for the audio HAL ... 221//Apm->processreveRsestream (Render_frame); 222//223///...
   Capture frame arrives from the audio HAL ...
   224////Call required SET_STREAM_ functions.
   225//Apm->set_stream_delay_ms (Delay_ms);
226//Apm->gain_control ()->set_stream_analog_level (analog_level);
   227//   228//Apm->processstream (Capture_frame);
   229//  /////Call required STREAM_ functions.
   231//Analog_level = Apm->gain_control ()->stream_analog_level ();
   232//Has_voice = Apm->stream_has_voice ();    233//   234/////Repeate render and capture processing for the duration of the ...  &nb Sp
235////Start A new call ...    236//Apm->initialize ();
    237//   238////Close the application ...    239//delete APM;

Its creation method is located in the audio_processing_impl.cc file:

audioprocessing* audioprocessing::create () {
  config config;
  return Create (config, nullptr);
}

audioprocessing* audioprocessing::create (const config& Config) {
  return Create (config, nullptr);
}

audioprocessing* audioprocessing::create (const config& Config,
                                         beamformer<float>* beamformer) {
  audioprocessingimpl* APM = new Audioprocessingimpl (config, beamformer);
  if (apm->initialize ()! = knoerror) {
    delete apm;
    APM = NULL;
  }

  return APM;
}

containing pure virtual functions can not be instantiated (new), there will inevitably be inherited classes to overload these pure virtual functions, the inheriting class is the Audioprocessingimpl class, Audioprocessingimpl is inherited from the base class audioprocessing. If you want to see a specific implementation, look at the file and the method defined inside it.


The design of the idea is still very ingenious, and each of the other algorithm modules into a class, and then instantiate the algorithm module class in this, and then call the corresponding class enable method to enable, so there will be a number of impl.cc end of the function, these functions (such as Noise_suppresion_ impl.cc) is the encapsulation of the core algorithm, which is provided to the APM module to instantiate and use, the code is simple/efficient/extensible.

Summarized as follows: Compile, test
python webrtc/build/gyp_webrtc
ninja-c out/debug 
 ./nonlinear_beamformer_test-i out2.wav- Mic_positions "1.0 1.0 1.0 2.0 2.2 2.2 1.1 1.2 0" Out.wav

20170630 can only be used after audio_proc_f, file in:
Webrtc/modules/audio_ processing/test/audioproc_float.cc

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.