In-depth OSS (OpenSoundSystem) development

Source: Internet
Author: User
Article Title: go deep into OSS (OpenSoundSystem) development. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
This article will conduct some in-depth discussions on the development of the OSS (Open Sound System), including the latency of playing audio, and quantitative analysis of different buffer configurations; non-blocking write; the application directly accesses the DMA buffer in the driver. These are some of the actual problems that developers encounter when they go deep into OSS development. for example, when developing a game program based on the Linux platform, you must consider how to reduce the playback latency so that you can play the game sound as soon as possible and synchronize it with the screen when necessary.
  
When discussing these aspects, we will not only introduce them from the perspective of usage, but also analyze the internal principles of these functions based on the specific driver implementation to deepen the reader's understanding.
  
In order to have a common understanding when reading this article, this article first briefly introduces some basic content of OSS.
  
   1. Introduction to OSS
The OSS hierarchy is very simple. applications access the OSS driver through APIs (defined as), and the OSS driver controls the sound card. As shown in:
    
Sound cards mainly have two basic devices: Mixer and CODEC (ADC/DAC ). Mixer is used to control the input volume. the corresponding device file is/dev/mixer. CODEC is used to realize recording (analog signal is converted to digital signal) and play the sound (digital signal to analog signal) function, the corresponding device file is/dev/dsp.
  
The general process for developing OSS applications is:
1) include the OSS header file: # include
2) open the device file and return the file descriptor
3) use ioctl to set device parameters and control device features
4) for recording, read from the device (read)
5) write to the device for playback)
6) disable the enabled device.
  
   2. Performance Analysis of buffer settings
There is a conflict when setting the buffer inside the driver: in the sound card driver, an internal buffer-DMA buffer is set to prevent jitter and ensure the playing performance. During playback, the application first writes audio data from the application buffer-APP buffer through the driver to the DMA buffer. Then, the DMA controller sends the audio data in the DMA buffer to the DAC (Digital-Analog Converter ). At some times, the CPU is very busy, such as reading data from the disk, or re-drawing the screen, there is no time to put new audio data into the DMA buffer. Because the DAC does not input New audio data, audio playback is interrupted, resulting in sound jitter. In this case, you must set the DMA buffer to be large enough to enable the DAC to always play data. However, the increase in the DMA buffer results in a longer copy time from the APP buffer, resulting in a greater playback latency. This may cause problems for latency-sensitive applications, such as audio applications that interact with users.
  
This conflict can be resolved from two different aspects. The driver uses Multi-buffering to divide a large DMA buffer into multiple small buffers, which are called fragment. they are of the same size. When the driver starts, you only need to wait for the two fragment to start playing. In this way, you can increase the buffer size by increasing the number of fragment, but each fragment is limited to an appropriate size without affecting the latency. The multi-buffer mechanism in the audio driver generally utilizes the scatter-gather function of the underlying DMA controller.
  
On the other hand, the application can also instruct the driver to select a buffer of the appropriate size, so that the latency is as small as possible without jitter. In particular, after the application maps the buffer in the driver to its own address space through mmap, it will process these buffers in its own way (not necessarily consistent with the driver ), in this case, the application usually sets the size of the internal buffer in the driver according to its own needs.
  
In the ioctl interface of OSS, SNDCTL_DSP_SETFRAGMENT is used to set the internal buffer size of the driver. The specific usage is as follows:
  
Int param;
Param = (0x0004 <16) + 0x000a;
If (ioctl (audio_fd, SNDCTL_DSP_SETFRAGMENT, & param) =-1 ){
... Error handling...
}
  
The param parameter consists of two parts: the low 16-bit fragment size. 0x000a indicates that the fragment size is 2 ^ 0xa, that is, 1024 bytes. The high 16-bit value indicates the number of fragment, 0x0004, that is, four fragement. After setting the fragment parameter, use the SNDCTL_DSP_SETFRAGMENT command of ioctl to adjust the buffer zone in the driver.
  
To show audio program developers the effect of buffer configuration on playing performance, we will test the relationship between buffer configuration and playing performance. The following describes the test environment, including the principle of the test method and the meaning of the test results. then, the test is conducted in two cases and the test results are explained.
  
   Test Environment
The test is performed on a PC. for the specific test environment, see the following table.
  
The test software (latencytest) consists of two parts: the audio playing test program and the system running load simulation program. (Note: latencytest mainly aims to test the kernel latency, but it is used as a tool to compare different buffer configurations .)
  
For the workflow of the audio playback test program, see the following code. To ensure the priority of audio playback scheduling, the audio playback test program uses the SCHED_FIFO scheduling policy (through sched_setscheduler ()).
  
While (1)
{
Time1 = my_gettime ();
Consumes a certain amount of CPU time through an empty loop
Time2 = my_gettime ();
Write (audio_fd, playbuffer, fragmentsize );
Time3 = my_gettime ();
}
  
My_gettime returns the current time, and records the time at the start and end of each operation to get the operation time. Audio_fd is the file descriptor for opening the audio device. playbuffer is the buffer for storing audio data in the application, that is, the APP buffer. fragmentsize is the size of a fragment, write operation control writes a fragment to the driver. An empty loop is used to simulate the CPU computing load during audio playback. a typical example is that a synthesizer (synthesizer) generates a waveform in real time before playing (write ). The duration consumed by an empty loop is set to 80% of the playback latency of a fragment.
  
The calculation method of related indicators is as follows:
1) the playback latency of one fragment (fragm. latency) = fragment size/(frequency * 2*2 ). Taking a testing environment with a fragment size of 512 bytes and above as an example, a fragment latency = 512/(44100*2*2) = 2.90 ms [44100 indicates the sampling frequency of 44.1KHz, the first 2 represents the two sound channels of the stereo sound, and the second 2 represents 16 bits as 2 bytes].
2) the transmission latency of one fragment = the latency for copying an fragment from the APP buffer to the DMA buffer.
3) time3-time1 = the duration of a loop = the CPU time consumed by an empty loop + the transmission latency of a fragment.
4) time2-time1 = the actual CPU time consumed by the empty loop (cpu latency ).
  
To simulate real system running conditions, a system load is also run while the test program plays audio data. Five load scenarios are set in sequence:
1) high-intensity graphic output (using x11perf to simulate a large number of BitBlt operations)
2) high-intensity access to the/proc file system (use top, update frequency: 0.01 seconds)
3) high-intensity disk writing (writing a large file to the hard disk)
4) High-intensity disk copy (copy one file to another)
5) high-intensity disk reading (reading a large file from the hard disk)
  
The test results are given for different system load scenarios. The test results are displayed in graphs. the meanings of the images in the test results are left to be explained after performance analysis.
  
   Performance analysis
Next, we will compare the performance of the two buffer configurations,
1) Case 1: The fragment size is 512 bytes, and the number of fragment is 2. Result 1 (2x512.html)
2) case 2: The fragment size is 2048 bytes, and the number of fragment is 4. Test result 2(4x2048.html)
  
To understand the test results, you need to understand the meanings of various marks in the test results graph:
1) Red Line: Playback latency of all buffers. Playback latency of all buffers = the number of fragment latencies x. In the first case of the test, the latency of all buffers is 2.90 ms x 2 = 5.8 ms.
2) White line: the actual scheduling delay, that is, the time of a loop (time3-time1 ). If the white line crosses the red line, it means that after the audio data playing in all the buffers ends, the application still has no time to put the new data into the buffer, and the sound will be lost, at the same time, overruns increases by 1.
3) Green Line: the CPU executes the time of an empty loop (that is, the time2-time1 in front ). The nominal value of the Green Line is fragm. latency x 80%. Because the playing process uses the SCHED_FIFO scheduling policy, if the green line shows a longer time, it indicates that there is a bus competition, or the system is in the kernel for a long time.
4) yellow line: a fragment playback delay. The white line should be close to the yellow line.
5) white between +/-1 ms: the actual scheduling latency falls into the ratio of fragm. latency +/-1 ms.
6) white between +/-2 ms: the actual scheduling latency falls into the ratio of fragm. latency +/-2 ms.
7) Green between +/-0.2 ms: percentage of the CPU's empty cycle delay fluctuation +/-0.2ms (that is, the ratio falling into the nominal value +/-0.2ms ).
8) Green between +/-0.1 ms: percentage of the CPU's empty cycle delay fluctuation +/-0.1ms (that is, the ratio falling into the nominal value +/-0.1ms ).
  
In the first case, the buffer is very small. each fragment has only 512 bytes, and the total buffer size is 2x512 = 1024 bytes. 1024 bytes can only be played for 5.8 ms. According to OSS instructions, since Unix is a multi-task operating system with multiple processes sharing the CPU, the playing program must ensure that the selected buffer configuration should provide sufficient size, so that when the CPU is used by other processes (new audio data cannot be transmitted to the sound card at this time), there will be no arrearage. If the playback speed of the audio data provided by the application cannot meet the playback speed of the sound card, the playback will be paused or ticking. Therefore, it is not recommended that you set the fragment size to less than 256 bytes. From the test results, we can see that the system load will be in arrears, especially when writing the hard disk, a total of 14 times (overruns = 14 ).
  
Of course, for those audio playing programs with high real-time requirements, we hope to use a smaller buffer zone, because only in this way can we ensure a smaller latency. In the above test results, we can see the phenomenon of loading in arrears, but this is not entirely caused by the small buffer zone. In fact, the Linux kernel cannot be preemptible, so we cannot know Lin
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.