Linphone learning-OSS

Last Update:2014-08-04 Source: Internet

Author: User

Tags usleep linphone

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction to OSS

The OSS hierarchy is very simple. applications access the OSS driver through APIS (defined as), and the OSS driver controls the sound card. As shown in:

OSS Structure

Sound cards mainly have two basic devices: mixer and CODEC (ADC/DAC ). Mixer is used to control the input volume. The corresponding device file is/dev/mixer. codec is used to realize recording (analog signal is converted to digital signal) and play the sound (Digital Signal to analog signal) function, the corresponding device file is/dev/DSP.

The general process for developing OSS applications is:

1) include the OSS header file: # include
2) Open the device file and return the file descriptor
3) Use IOCTL to set device parameters and control device features
4) for recording, read from the device (read)
5) write to the device for playback)
6) disable the enabled device.

2. Performance Analysis of buffer settings

There is a conflict when setting the buffer inside the driver: In the sound card driver, an internal buffer-DMA buffer is set to prevent jitter and ensure the playing performance. During playback, the application first writes audio data from the application buffer-app buffer through the driver to the DMA buffer. Then, the DMA controller sends the audio data in the DMA buffer to the DAC (digital-analog converter ). At some times, the CPU is very busy, such as reading data from the disk, or re-drawing the screen, there is no time to put new audio data into the DMA buffer. Because the DAC does not input new audio data, audio playback is interrupted, resulting in sound jitter. In this case, you must set the DMA buffer to be large enough to enable the DAC to always play data. However, the increase in the DMA buffer results in a longer copy time from the app buffer, resulting in a greater playback latency. This may cause problems for latency-sensitive applications, such as audio applications that interact with users.

This conflict can be resolved from two different aspects. The driver uses multi-buffering to divide a large DMA buffer into multiple small buffers, which are called fragment. They are of the same size. When the driver starts, you only need to wait for the two fragment to start playing. In this way, you can increase the buffer size by increasing the number of fragment, but each fragment is limited to an appropriate size without affecting the latency. Generally, the multi-buffer mechanism in the audio driver uses the scatter-gather function of the base layer DMA controller.

On the other hand, the application can also instruct the driver to select a buffer of the appropriate size, so that the latency is as small as possible without jitter. In particular, after the application maps the buffer in the driver to its own address space through MMAP, it will process these buffers in its own way (not necessarily consistent with the driver ), in this case, the application usually sets the size of the buffer area in the driver according to its own needs.

In the ioctl interface of OSS, sndctl_dsp_setfragment is used to set the internal buffer size of the driver. The specific usage is as follows:

Int Param;
Param = (0 × 0004 «16) + 0x000a;
If (IOCTL (audio_fd, sndctl_dsp_setfragment, & PARAM) =-1 ){
... Error Handling...
}

The param parameter consists of two parts: the low 16-bit fragment size. 0x000a indicates that the fragment size is 2 ^ 0xa, that is, 1024 bytes. The high 16-bit value indicates the number of fragment, the value is 0 × 0004, that is, four fragement entries. After setting the fragment parameter, use the sndctl_dsp_setfragment command of IOCTL to adjust the buffer zone in the driver.

To show audio program developers the effect of buffer configuration on playing performance, we will test the relationship between buffer configuration and playing performance. The following describes the test environment, including the principle of the test method and the meaning of the test results. Then, the test is conducted in two cases and the test results are explained.

Test Environment

The test is performed on a PC. For the specific test environment, see the following table.

Project Parameters
Cpu pii 800
Memory 256 MB SDRAM
Hard Disk st 80g udma
Graphics tnt2 M64 16 m
Sound Card motherboard integration (working in 44.1 kHz, stereo, 16 Bit mode)
Kernel Linux kernel 2.4.20 (RedHat 9.0)

The test software (latencytest) consists of two parts: the audio playing test program and the system running load simulation program. (Note: latencytest mainly aims to test the kernel latency, but it is used as a tool to compare different buffer configurations .)

For the workflow of the audio playback test program, see the following code. To ensure the priority of audio playback scheduling, the audio playback test program uses the sched_fifo scheduling policy (through sched_setscheduler ()).

While (1)
{
Time1 = my_gettime ();
Consumes a certain amount of CPU time through an empty Loop
Time2 = my_gettime ();
Write (audio_fd, playbuffer, fragmentsize );
Time3 = my_gettime ();
}

My_gettime returns the current time, and records the time at the start and end of each operation to get the operation time. Audio_fd is the file descriptor for opening the audio device. playbuffer is the buffer for storing audio data in the application, that is, the app buffer. fragmentsize is the size of a fragment, write operation control writes a fragment to the driver. An empty loop is used to simulate the CPU computing load during audio playback. A typical example is that a synthesizer (synthesizer) generates a waveform in real time before playing (write ). The duration consumed by an empty loop is set to 80% of the playback latency of a fragment.

The Calculation Method of related indicators is as follows:

1) fragment playback latency (fragm. latency) = fragment size/(Frequency22 ). For example, if the fragment size is 512 bytes and above, the latency of one fragment is 512/(4410022) = 2.90 Ms [44100 represents the sampling frequency of 44.1khz, the first two represents two sound channels of the stereo, and the second two represents two bytes of 16 bit].
2) The transmission latency of one fragment = the latency for copying an fragment from the app buffer to the DMA buffer.
3) time3-time1 = the duration of a loop = the CPU time consumed by an empty loop + the transmission latency of a fragment.
4) time2-time1 = the actual CPU time consumed by the empty loop (CPU latency ).

To simulate real system running conditions, a system load is also run while the test program plays audio data. Five load scenarios are set in sequence:

1) high-intensity graphic output (using x11perf to simulate a large number of bitblt operations)
2) high-intensity access to the/proc file system (use top, update frequency: 0.01 seconds)
3) high-intensity Disk Writing (writing a large file to the hard disk)
4) high-intensity disk copy (copy one file to another)
5) high-intensity disk reading (reading a large file from the hard disk)

The test results are given for different system load scenarios. The test results are displayed in graphs. The meanings of the images in the test results are left to be explained after performance analysis.

Performance Analysis

Next, we will compare the performance of the two buffer configurations,

1) Case 1: The fragment size is 512 bytes, and the number of fragment is 2. Result 1 (22.16512.html)
2) Case 2: The fragment size is 2048 bytes, and the number of fragment is 4. Test result 2(4×2048.html)

To understand the test results, you need to understand the meanings of various marks in the test results graph:

1) Red Line: playback latency of all buffers. Playback latency of all buffers = the number of fragment latencies X. In the first case of the test, the latency of all buffers is 2.90 Ms x 2 = 5.8 ms.
2) white line: the actual scheduling delay, that is, the time of a loop (time3-time1 ). If the white line crosses the red line, it means that after the audio data playing in all the buffers ends, the application still has no time to put the new data into the buffer, and the sound will be lost, at the same time, overruns increases by 1.
3) Green line: the CPU executes the time of an empty loop (that is, the time2-time1 in front ). The nominal value of the green line is fragm. latency x 80%. Because the playing process uses the sched_fifo scheduling policy, if the Green Line shows a longer time, it indicates that there is a bus competition, or the system is in the kernel for a long time.
4) Yellow Line: A fragment playback delay. The white line should be close to the yellow line.
5) White between +/-1 ms: the actual scheduling latency falls into the ratio of fragm. latency +/-1 ms.
6) White between +/-2 MS: the actual scheduling latency falls into the ratio of fragm. latency +/-2 ms.
7) Green between +/-0.2 ms: Percentage of the CPU's empty cycle delay fluctuation +/-0.2ms (that is, the ratio falling into the nominal value +/-0.2ms ).
8) Green between +/-0.1 MS: Percentage of the CPU's empty cycle delay fluctuation +/-0.1ms (that is, the ratio falling into the nominal value +/-0.1ms ).

In the first case, the buffer is very small. Each fragment has only 512 bytes, and the total buffer size is 2x512 = 1024 bytes. 1024 bytes can only be played for 5.8 ms. According to OSS instructions, since UNIX is a multi-task operating system with multiple processes sharing the CPU, the playing program must ensure that the selected buffer configuration should provide sufficient size, so that when the CPU is used by other processes (new audio data cannot be transmitted to the sound card at this time), there will be no arrearage. If the playback speed of the audio data provided by the application cannot meet the playback speed of the sound card, the playback will be paused or ticking. Therefore, it is not recommended that you set the fragment size to less than 256 bytes. From the test results, we can see that no matter what type of system load is used, the load will be in arrears, especially when the hard disk is written, a total of 14 times (overruns = 14 ).

Of course, for those audio playing programs with high real-time requirements, we hope to use a smaller buffer zone, because only in this way can we ensure a smaller latency. In the above test results, we can see the phenomenon of loading in arrears, but yes, this is not entirely caused by the small buffer zone. In fact, because the Linux kernel cannot be preemptible, it is impossible to determine the Linux stay time in the kernel, so it is impossible to schedule a process at a certain speed, even if the player uses the sched_fifo scheduling policy. From this perspective, multimedia applications (such as audio playback) have higher requirements on the operating system kernel. In the case of the Linux kernel, smaller scheduling latency can be achieved through some specialized Kernel patches (low-latency patches. However, we believe that the new kernel of linux2.6 will have a better performance.

In the second case, the buffer size is much larger. The total buffer size is 4x2048 = 8192 bytes. 8192 bytes can be played for 0.046 seconds. From the test image, the results are satisfactory. Even when the system load is heavy, the playback latency is basically guaranteed, and there is no arrearage.

Of course, it doesn't mean that the larger the buffer, the better. If you continue to select a larger buffer, it will produce a relatively large latency, which is unacceptable for audio streams with high real-time requirements. The test results show that the latency jitter of the second configuration is much larger than that of the first configuration. However, in general, the driver selects a default buffer configuration based on hardware conditions. Generally, the player does not need to modify the buffer configuration of the driver, however, you can get a better playing effect.

3. Non-blocking write)

If the write speed of the player exceeds the playback speed of the DAC, the DMA buffer will be filled with audio data. When the application calls write, it will be blocked because there is no idle DMA buffer until the DMA buffer is idle. At this time, to some extent, the application's propulsion speed depends on the playback speed. Different playback speeds may produce different propulsion speeds. Therefore, sometimes we do not want the write to be blocked, so we need to know the usage of the DMA buffer.

For (;;){
Audio_buf_info Info;
/Ask OSS if there is any free space in the buffer./
If (IOCTL (DSP, sndctl_dsp_getospace, & info )! = 0 ){
Perror ("unable to query buffer space ");
Close (DSP );
Return 1;
};
/Any empty fragments?/
If (info. fragments> 0) break;
/Not enough free space in the buffer. waste time./
Usleep (100 );
};

The above code keeps querying for free fragment (sndctl_dsp_getospace) in the driver. If not, it goes to sleep (usleep (100). At this time, the application does other things, for example, updating images and network transmission. If there is an idle fragment (info. fragments> 0), exit the loop and perform non-blocking write.

4. direct access to the DMA buffer (MMAP)

In addition to relying on the operating system kernel to provide better scheduling performance, audio playback applications can also adopt some technologies to improve real-time audio playback. The MMAP Method for directly accessing the DMA buffer by bypassing the app buffer is one of them.

We know that the system usually calls write to output audio data to audio devices, but this will cause performance loss because a buffer copy from user space to kernel space is required. In this case, the MMAP system call can be used to obtain the ability to directly access the DMA buffer. The DMA controller constantly scans the DMA buffer and sends data to the DAC. This is a bit similar to the video card operations on the video memory. As we all know, the GUI can map the framebuffer (video memory) to its own address space through MMAP, and then directly manipulate the video memory. The DMA buffer here is the framebuffer of the sound card.

The best way to understand the MMAP method is through the actual example code 1 (list1.c ).

The Code has a detailed comment. Here we only provide some instructions.

The samples parameter of the playerdma function points to the buffer for storing audio data. The rate/bits/channels parameter specifies the sampling rate, number of digits of each sampling, and number of audio channels respectively.

After enabling/dev/DSP, configure the driver according to the/rate/bits/channels parameter. Note that these requirements must be met, and the driver needs to be selected based on its own situation. Therefore, after configuration, You need to query again to obtain the true parameter values used by the driver.

Before using MMAP, check whether the driver supports this mode (sndctl_dsp_getcaps ). Use sndctl_dsp_getospace to know the size and number of framgment selected by the driver, and the dmabuffer_size of all DMA buffers can be calculated.

MMAP maps dmabuffer_size DMA buffer to the address space of the calling process. The initial address of the DMA buffer in the application process is dmabuffer. In the future, you can directly use the pointer dmabuffer to access the DMA buffer. Here we need to explain the parameters in MMAP.

The audio driver has separate buffers for playback and recording. MMAP cannot map the two sets of buffers at the same time. The specific buffer to be mapped depends on the MMAP prot parameter. Prot_read selects the input (recording) buffer, prot_write selects the output (playback) buffer, and the Code uses prot_write | prot_read, which is also the output buffer. (This is the requirement of the BSD system. If only prot_write is required, segmentation/Bus Error will occur for each access to the buffer ).

Once the DMA buffer is MMAP, the driver cannot be controlled through the read/write interface. You can only enable the DAC enable through sndctl_dsp_settrigger. Of course, you must disable the enable.

Once the DMA is started, the DMA buffer is repeatedly scanned. Of course, we always want to prepare new data for the DMA in advance so that the DMA playing is always continuous. Therefore, the playerdma function divides the DMA buffer after MMAP into two blocks, and sets a boundary between them. When the DMA scans the first part, the latter part is filled. Once the DMA goes beyond the boundary, fill in the previous one.

The problem with MMAP is that not all sound card drivers support MMAP. Therefore, in the case of incompatibility, applications must be able to use the traditional method instead.

Finally, in order to thoroughly understand the implementation principle of MMAP, we take a sound card driver as an example to introduce the specific implementation of its internal MMAP function. Code 2 (list2.c)

Audio_mmap () is a function that implements the MMAP interface. It first selects an appropriate buffer (input or output) based on the prot parameter (VMA-> vm_flags) called by MMAP ); VMA-> vm_end-VMA-> vm_start is the size of the address space to be mapped to the application process. It must be consistent with the size of the DMA buffer (S-> fragsize * s-> nbfrags; if the DMA buffer has not been created, call audio_setup_buf (s) to establish the buffer. Then, call all fragment, starting from the starting address of the ing (VMA-> vm_start ), establishes the correspondence between the actual physical address and the mapped virtual address (remap_page_range ). Finally, set the MMAP flag (S-> mapped = 1 ).

5. Conclusion

Of course, in addition to the issues discussed above, there are still many practical issues to be faced in the development of audio applications, such as the merging of multiple audio streams and opening of various audio file formats.

The OSS audio interface has been in the Linux kernel for many years. Due to its architecture limitations, an all-new audio system and interface-ALSA (Advanced Linux sound architecture) is introduced in the Linux 2.6 kernel, which provides many better features than OSS, including full thread-safe and SMP-safe, modular design, support for multiple sound cards, and so on. To maintain compatibility with OSS interfaces, ALSA also provides OSS simulation interfaces so that a large number of applications developed for OSS interfaces can still work normally in the new ALSA system.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More